Hidden Markov Model

The assumption made by HMM

The assumption is that for all random variables in the (conditional) probability chain, the conditions is only made on the previous n variables in the sequence. In another way, the conditional probability of the variable is independent of the variables other than the previous n variables.


The tagging problem can be abstracted as to model the joint probability of two sequences: sentence sequence and tag sequence. In a HMM approach to solve this joint probability. Tag sequence is modeled (approximated) as a Markov sequence. Sentence sequence is modeled as a independent occurring of events that are only conditioned on the tagging of the corresponding position.

Generative or Discriminative

HMM is by definition is generative model because it models the sequence with joint probability rather than conditional probability.

Interpretation from ML Perspective

The training objective of the HMM is a probabilistic model that can not output target labeling directly. Instead, a labeling function has to be defined in addition to the HMM probabilistic model. The training set of the HMM model consists of training samples made by a pair of word sequence and pos tagging sequence.

The training process is essentially a counting process in which the statistical property of the labeling sequences (pos tagging) is estimated. Also, the conditionally probability of word/tagging pair is estimated. These estimates then are used to generate the parameters of the HMM model (transitional probability and emission probability).

In prediction, the labeling function (output function) acquire parameters in the HMM to make predictions of the labeling of the new word sequences.

Origin of Name: Probability Distribution

"Distribution" indicates that the sum of probability "1" is divided and distributed into the probability of each random variable.

Grammar Study Note, Part One

  1. Pos tagging indicates the property of the word itself.
  2. Dependency tagging indicates the function (or relative information) of the word.
  3. Both N-gram model and PCFG model assumes the a Markov property and then applies the conditional probability.

Dependency Grammar Notions

  1. Subject-Predicate relationship
    subject: The person or thing about whom the statement is made
    predicate: The purpose of the predicate is to complete an idea about the subject

Part of Speech Tagging

  1. Ambiguity:
    Local Preference -> The probability of a part-of-speech tag for a specific word in the vocabulary
    Contextual Preference -> The probability of a part-of-speech tag for a specific word in given context.

Java Multi-threading

How to Write Multi-threading Code ?

The multi-threading programming of Java is achieved through the use of Thread object.
1. Declare a class to be the subclass of Thread class and overrides its run method.
2. Declare a class and implement the Runnable interface, then implement the run method. (Recommended)

The create of a new thread requires that a Thread object to be created with a Runnable object given as the first argument. To start running a new thread, a thread.start() method is provided. Also, thread.join() method is provided to synchronize states between different threads.


Static class members are shared by all threads (atomic w/r). Keyword: Synchronized is provided to ensure the all threads execute certain methods sequentially.

Class method members (as well as local variables in the method definition) is independent for each threads.

Context-Free Grammar Study Note, Part Two

Challenges of Parsing CFG Correctly

Syntactic Ambiguity

  1. Part-of-speech Ambiguity
    : The grammar can assign multiple part-of-speech tags (word categories) to a lexicon
  2. Structural Ambiguity
    : The grammar can assign more than one possible parse to a sentence

    • Attachment Ambiguity
      : A particular constituent can be attached to the parse tree at more than one place
    • Coordination Ambiguity
      : The reference of a lexicon has multiple choices
    • Local Ambiguity
      : Some part of the sentence has more that one parse while the whole sentence is not ambiguous
    • etc
      : ...

Phrasal Nodes

  1. a phrasal node is a non-terminal node

CFG Tree v.s. Dependency Grammar Tree

Strucutrally cfg tree has little difference with dp tree.
1. In cfg tree, the internal nodes are phrasal nodes (non-lexicalized). In dp tree, the internal nodes are by definition lexicalized.
2. Both cfg and dp tree has pos tags as the next-to-leaf node. For dp tree this pos tag can be replaced by synthetic parts-of-speech (subject, object, predicate, etc).

The Meaning of Arithmetic Operations and Mathematical Terms

  1. Addition
    The amount of adder and addee combined

  2. Multiplication
    A repeated addition

  3. Exponentiation
    A repeated multiplication

  4. Multinomial
    An algebraic expression constructed by the sum of multiple mathematical terms (could be exponential with a degree of 1, 2, 3, etc with any base).

  5. Polynomial
    A special case of Multinomial expression. It is constructed by the sum of powers of one or more variables multiplied by coefficients.

  6. Algebraic Expression
    An expression in which a finite number of symbols is combined using only the operations of addition, subtraction, multiplication, division and exponentiation with constant rational exponent.

  7. Term
    The components in an expression connected by the addition or subtraction.

  8. Factor
    The elements in a term, which is connected by multiplication

  9. Coefficient
    The constant in a term, which is also a factor.