Math Notation Conventions

Superscript and Subscript

Superscript refers to the index of training examples
Subscript refers to the index of vector elements.

Sequence Notation

A sequence may be named or referred to by an upper-case letter such as "A" or "S". The terms of a sequence are usually named something like "ai" or "an", with the subscripted letter "i" or "n" being the "index" or the counter. So the second term of a sequnce might be named "a2" (pronounced "ay-sub-two"), and "a12" would designate the twelfth term.

The sequence can also be written in terms of its terms. For instance, the sequence of terms ai, with the index running from i = 1 to i = n, can be written as:

Hat Operator

In statistics, the hat is used to denote an estimator or an estimated value, as opposed to its theoretical counterpart. For example, in the context of errors and residuals, the "hat" over the letter ε indicates an observable estimate (the residuals) of an unobservable quantity called ε (the statistical errors).

Prime Notation

Derivative Operator

Linux Tool Note

  1. TCP/UDP Connection Test
    nc -zv 25331

  2. Show all process
    ps aux | less

  3. Uncompress gzip file
    tar -zxvf {file.tar.gz}

  4. List all hardware information

  5. Grep Multiple Patterns
    grep -E '123|abc' filename

  6. Show CPU Information
    cat /proc/cpuinfo

  7. Check if a Process Exist
    ps -ef | grep deplearning

  8. Server Benchmark
    ab -c 1000 -n 50000 http://localhost:8080/

  9. 查看系统日志

tail -f /var/log/syslog

  1. 查看文件夹大小
    du -hs /path/to/directory

Word Embedding

Two ways of modeling sentences

  1. s = [x, y, z];
    x, y, z represents three slots in the sentence.
    Three dimensions

  2. s = [x0, x1, x2, ... xn]
    xn represents any words in the vocabulary, value represents its existence in the sentence.

Probability and Statistics Reviews, Part Two

"Or" Rule

The probability of Event A or Event B happens is the addition of P(A) and P(B) subtracted by the probability of event a and event b happens at the same time.

P(A or B) = P(A) + P(B) - P(A and B)

"Multiplication Rule"

The probability of Event A and Event B happens at the same time is the product of the conditional probability of A given B and the probability of B.
P(A and B) = P(A/B) * P(B)

The Law of Total Probability

The law of total probability is the proposition that if {Bn: n = 1, 2, 3, ...} is a finite or countably infinite partition of a sample space and each event Bn is measurable, then for any event A of the same probability space:
P(A) = SUM( P(A and Bx) ) x <- 0 to n

The law of total probability = "Or" Rule + "Multiplication Rule"

Independence Test

Two events are independent if P(A and B) = P(A)*P(B)

EM Algorithm

In the E-step, the missing data is estimated through the technique of conditional expectation. In the M-step, the non-hidden parameters are estimated through MLE.

  1. E-step
    P(W0 | xi) = a, P(W1 | xi) = b
    E(W) = aW0 + bW1
    if |E(W) - W0| < |E(W) - W1|, then W = W0, else, W = W1.

Conditional Expectation

E(Y | X = x) = Sum( y * P ( y | x ) )

Covariance v.s. Correlation

Similar to : Variance v.s. Standard Deviation

How to Estimate the Parameters of a Statistical Model

A statistical model can take the form of a explicit algebraic expressions with parameters.
Or a model can contain no algebraic expressions but only conditional/joint or other probability measurements (called free parameters). These probability measurements can be think as the sampling of the subject population.

The true value of the probability of event A: PA(X=a) can be estimated by repeat the random experiment (repeat the random process), PA(X=a) ~= ratio(A/all). The limit of this ratio is the true value of the probability of event A (happening).

Distribution and Set

A bionomial random experiment contains several bornouli random experiments.
The subset of a sampling of certain distribution satisfy the same distribution.

Random Variable, Probability and Distribution

Random variable and the probability of a random variable given certain value (an event) refers to a specific random experiment.

The distribution, in contrast, describe the subject in the overall trend (the population , the sample set).

P(X) and P(X=x0)

P(X) is the PDF or PMF of a distribution. P(X=x0) is the probability of random variable X reach a value of x0.