Feature Embedding of Categorial Values

Word Embedding

  1. The embedding of a word is the hidden layer ouput (a vector) got when feeding the one-hot vecotr (input) of the word.
  2. The target value in the training data carries sequential information (word sequence in a sentence)
  3. The hidden layer weight matrix is the word vector lookup table
  4. Word embedding is a by-product of language modelling

Embedding Layer

  • Embedding layer is just a linear layer that transforms one-hot input into embedding matrix (look-up table)

Categorial Embedding

  1. The embedding of categorial data is obtained in model training, just as that of word embedding.
  2. The target value carries prediction information (predicted value), which is opposed to the sequential information in word embedding

Pre-trained Embedding

  • Pre-trained embedding can be used in new model training to improve performance (both accuracy and speed)

Ordinal Values

  1. Ordinal Values that carries information but in discrete (by definition) values should be normalized to 0-1 (or -1 - 1) scale as the input for DNN