Hidden Markov Model

The assumption made by HMM

The assumption is that for all random variables in the (conditional) probability chain, the conditions is only made on the previous n variables in the sequence. In another way, the conditional probability of the variable is independent of the variables other than the previous n variables.

HMM on NLP

The tagging problem can be abstracted as to model the joint probability of two sequences: sentence sequence and tag sequence. In a HMM approach to solve this joint probability. Tag sequence is modeled (approximated) as a Markov sequence. Sentence sequence is modeled as a independent occurring of events that are only conditioned on the tagging of the corresponding position.

Generative or Discriminative

HMM is by definition is generative model because it models the sequence with joint probability rather than conditional probability.

Interpretation from ML Perspective

The training objective of the HMM is a probabilistic model that can not output target labeling directly. Instead, a labeling function has to be defined in addition to the HMM probabilistic model. The training set of the HMM model consists of training samples made by a pair of word sequence and pos tagging sequence.

The training process is essentially a counting process in which the statistical property of the labeling sequences (pos tagging) is estimated. Also, the conditionally probability of word/tagging pair is estimated. These estimates then are used to generate the parameters of the HMM model (transitional probability and emission probability).

In prediction, the labeling function (output function) acquire parameters in the HMM to make predictions of the labeling of the new word sequences.

Origin of Name: Probability Distribution

"Distribution" indicates that the sum of probability "1" is divided and distributed into the probability of each random variable.