Python Programming Notes

Language Features

  1. Context
    The variables defined within a context can be accessed outside of the cotext.
    The context is used to apply automatic operations.

Escape Depth

  1. Python Interpreter will escape special characters only once !
    \n ---> r"\n"

  2. Unicode Escape Characters
    Python will treat \x0343 as unicode string when printing to console or write to text file.

Python will treat \x0343 as ASCII characters when printing to console or write to text file (!!! Due to The Escape Only Once Principle)

  1. Unnecessary Escapes can be removed by line = line.replace(r"\", "\")

Python Json String "()"

(1,2,3) ---> String Repr: "(1, 2, 3)"

Tensorflow Study Note, Part Three


  1. Interactive Session

tf.InteractiveSession() creates a default session that can be used without explictly called in a IPython environment


a = tf.constant(1)

  1. Regular Session

Regular session needs to be run within a python context or through session object

Example 1:

a = tf.constant(1)
sess = tf.Session()

Example 2:

a = tf.constant(1)
with tf.Session():


This function is used to add bias to the input tensor (element-wise addition between "bias" vector and the feature vector)

It should be noted that the bias added here is completed different from the normal concept of adding a bias to the hidden unit (summed weight before activation)


The run function of tf.Session provides a interface to execute provided TF operations and evaluate Tensors.

Feature Preprocessing

Continuous features can be feed into the first hidden layer of the neural network directly. Discrete features are recommended to go through an embedding layer.


This API accepts a [batch_size, doc_length] tensor of type "int32" or "int64"


This API is used to loopup an embedding using ID.

Categorial to Embedding Pipeline

Categorial Label ---> Ordinal ID ---> One-hot Embedding ----> Dense Embedding

About Immigration

The Cause of Immigration

The life quality of western countries is far better than that of China in terms of economical, environmental and social conditions. I found myself increasingly believe in western cultures and values and that is the primary driving force of my immigration thoughts. I need to improve my language skill as well as professional experiences to get ready for the oppertunity. Meanwhile, I need to take it very carefully to examine this thought to confirm it is a good fit for me and my family.

I need a carefully planning to merge my immigration effort with the long-term paradigm shift of my family earning, that is, to shift from work-based earning to asset-based earning. (Asset here is something that can generate revenue without extra mental/physical labor)

Feature Embedding of Categorial Values

Word Embedding

  1. The embedding of a word is the hidden layer ouput (a vector) got when feeding the one-hot vecotr (input) of the word.
  2. The target value in the training data carries sequential information (word sequence in a sentence)
  3. The hidden layer weight matrix is the word vector lookup table
  4. Word embedding is a by-product of language modelling

Embedding Layer

  • Embedding layer is just a linear layer that transforms one-hot input into embedding matrix (look-up table)

Categorial Embedding

  1. The embedding of categorial data is obtained in model training, just as that of word embedding.
  2. The target value carries prediction information (predicted value), which is opposed to the sequential information in word embedding

Pre-trained Embedding

  • Pre-trained embedding can be used in new model training to improve performance (both accuracy and speed)

Ordinal Values

  1. Ordinal Values that carries information but in discrete (by definition) values should be normalized to 0-1 (or -1 - 1) scale as the input for DNN

Two Sigma Competition


The dataset of Two Sigma Competition is a h5 file. It can be read into a 1,710,756 rows x 111 columns pandas dataframe.

All datapoints are identified by two attributes combined: id + timestamp, both of which are not unique. Id is a financial security and timestamp indicates the time of quote (Y).

In the dataset there are 1424 different ids and 1813 different timestamps. In general, the data point is sampled in a fixed timestamp interval of 750.