The assumption made by HMM
The assumption is that for all random variables in the (conditional) probability chain, the conditions is only made on the previous n variables in the sequence. In another way, the conditional probability of the variable is independent of the variables other than the previous n variables.
HMM on NLP
The tagging problem can be abstracted as to model the joint probability of two sequences: sentence sequence and tag sequence. In a HMM approach to solve this joint probability. Tag sequence is modeled (approximated) as a Markov sequence. Sentence sequence is modeled as a independent occurring of events that are only conditioned on the tagging of the corresponding position.
Generative or Discriminative
HMM is by definition is generative model because it models the sequence with joint probability rather than conditional probability.
Interpretation from ML Perspective
The training objective of the HMM is a probabilistic model that can not output target labeling directly. Instead, a labeling function has to be defined in addition to the HMM probabilistic model. The training set of the HMM model consists of training samples made by a pair of word sequence and pos tagging sequence.
The training process is essentially a counting process in which the statistical property of the labeling sequences (pos tagging) is estimated. Also, the conditionally probability of word/tagging pair is estimated. These estimates then are used to generate the parameters of the HMM model (transitional probability and emission probability).
In prediction, the labeling function (output function) acquire parameters in the HMM to make predictions of the labeling of the new word sequences.
Origin of Name: Probability Distribution
"Distribution" indicates that the sum of probability "1" is divided and distributed into the probability of each random variable.
Human Learning Process:
Watch -> Practice -> Fail -> Learn -> Improve/Learn from outside knowledge -> Practice -> Loop
How to Write Multi-threading Code ?
The multi-threading programming of Java is achieved through the use of Thread object.
1. Declare a class to be the subclass of Thread class and overrides its run method.
2. Declare a class and implement the Runnable interface, then implement the run method. (Recommended)
The create of a new thread requires that a Thread object to be created with a Runnable object given as the first argument. To start running a new thread, a thread.start() method is provided. Also, thread.join() method is provided to synchronize states between different threads.
Static class members are shared by all threads (atomic w/r). Keyword: Synchronized is provided to ensure the all threads execute certain methods sequentially.
Class method members (as well as local variables in the method definition) is independent for each threads.
Challenges of Parsing CFG Correctly
- Part-of-speech Ambiguity
: The grammar can assign multiple part-of-speech tags (word categories) to a lexicon
- Structural Ambiguity
: The grammar can assign more than one possible parse to a sentence
- Attachment Ambiguity
: A particular constituent can be attached to the parse tree at more than one place
- Coordination Ambiguity
: The reference of a lexicon has multiple choices
- Local Ambiguity
: Some part of the sentence has more that one parse while the whole sentence is not ambiguous
- a phrasal node is a non-terminal node
CFG Tree v.s. Dependency Grammar Tree
Strucutrally cfg tree has little difference with dp tree.
1. In cfg tree, the internal nodes are phrasal nodes (non-lexicalized). In dp tree, the internal nodes are by definition lexicalized.
2. Both cfg and dp tree has pos tags as the next-to-leaf node. For dp tree this pos tag can be replaced by synthetic parts-of-speech (subject, object, predicate, etc).
Provide an unified framework (mostly data structure to represent knowledge/model entities) for the representation and parsing.