Context-Free Grammar Study Note, Part One

Preface

The grammar formalism applies to many other natural languages despite the fact that all discussions are focused on modern English.

  1. Dependency Grammar (Subject, Predicate, Object)
  2. Constituency Grammar (NP, VP)
  3. Others - sub-categorization, etc (Verb followed by infinitive)

Among them, constituency grammar is universally modeled by context-free grammar (CFG), which is also called phrase-structure grammars. The formalism of CFG is equivalent to Backus-Naur Form (BNF).

Relationships that a Parser Models

  1. Constituency Relationships (Constituent Relation)
  2. Grammatically Relationships (Lexical Relation)

Parsing With CFG (Context-Free Grammars)

  1. Top-down (goal-directed search, rationalist tradition)
  2. Bottom-up (data directed search, empiricist tradition)

Data Structure of Parse Tree (CFG tree)

Implemented as a multi-way tree via OOP mechanism. Each internal node represents a non-terminal of CFG;each leaf node represents a terminal of CFG;The value of each node is the symbol of the non-terminal/terminal it represented. The value of each node is a symbol (NP, VP, prefer, etc), the pointers of each node points to its descendants until the reach of leaf nodes (non-terminal terminal pair). The leaf node is a 2-tuple that represents the production rule of a non-terminal to a terminal.

Serialization of a Parse Tree

parse tree = (ROOT (S (NP (PRP I)) (VP (VBP love) (NP (PRP you)))))

A JSON string that contains only list structure with all comma replaced by white space and double quotation replaced by empty string, and optionally squared bracket replaced by parentheses.

In any internal node (within a bracket), only one symbol ("a non-terminal symbol, without any bracket enclosed") can be presented and it is the value of this node. The child node of any internal node is defined to be all the bracketed blocks that lays in the same level with this node. All bracketed blocks is itself a parse tree (may not origin from root)

Parse Tree and Part-of-Speech Tagging

The non-terminal in the leaf node is the part-of-speech tagging of the terminal.

Why Not Context-sensitive Grammar In NLP Application?

Context-sensitive grammar is non-deterministic, which means the production process may not converge.

CFG Data Structure

  1. Search Space Forest
    Search Space -> Search Space Forest (Search Tree in Manning's book)

Constructed by all possible partial parse trees and complete parse trees, among them, partial parse trees act as the internal node and the complete parse trees act as the leaf node (of the tree data structure -> search space forest).

Note: Search Space Forest is actually a and-or tree

  1. Parse Tree
    Parse Tree -> Syntax Parse Tree (Complete Syntax Tree)
    Partial Parse Tree (Incomplete Sytax Tree)

The function of Part-of-Speech Tagging

PoS has both syntactic and semantic meanings. Syntactically, a noun is usually followed by a verb; semantically, a noun typically refer to people, animals, concepts and things.

Probability and Statistics Reviews, Part One

Classic Probability Problem - Standard Procedures

  1. Define simple events (basic events) based on the given information
  2. Abstract the problem to be solved into a compound event
  3. Solve the problem based on classic probability equations

Random Experiment

The concept of random experiment arise from the assumption that the event to be observed is described on a occurring or not basis. Many real world probability problem, however, does not self-describe in such nature. These types of problems can be conceptually transformed into the occurring&observation pattern.

Random Variable

A random variable is the mapping of the result of a random experiment to a real value.

Statistics and Probability

Statistics is the necessary measure that takes to reach a probability (when the population is known) or probability estimate (when only sampling is available).

Relative Frequency

Phrase "Relative" here refers to the fact that in any experiment that is infinitely repeatable, the absolute frequency (proportion ratio) cannot be obtained.

Simple Event (Elementary Event)

A random experiment with an outcome (coin toss with a head on top).

In contrast, compound events have multiple outcomes. An elementary event can be a process with outcomes that are "complex", for example, toss 3 dices and get 1, 3, 2 on top. In this case, the elementary event A here is constructed by three elementary events in a related probability space (toss only one dice to see the outcome).

Coin toss with a head on top and coin toss with a tail on top are two simple events.
Simple and compound events are relative concepts, when no lower level events are defined, a complex events may be defined as simple events when analyzing certain problems.

Sample Space (One Element of Probability Space)

A sample space is a set, denoted as S, which enumerates each and every possible outcome or simple events.
All possible simple events (event condition together with its outcomes).

Any events (to be observed) covers part of the sample space (the event space is a subset of sample space)

The equation of conditional probability should only be applied when P(A), P(B) and P(A*B) are reflects the probability of events under the same sample space.

Distribution Mixture

N1 = The population size of distribution 1
N2 = The population size of distribution 2
P3(x) = (P1(x)N1 + P2(x)N2)/(N1+N2)

Tensorflow Study Note, Part Two

The creation of tensor object under tensorflow framework.

  • The generation of tensors as the input of a computation graph is through tf.constant() and tf.Variable() constructor.
  • The basic arithmatic operation between tensors in a given computation node (ops) must be done through specialized operations such as tf.add, tf.mul, etc
  • The operations (and the entire graph) on tensors will be executed on a specified device (CPU or GPU).
  • The return of the execution of the computation graph is a ndarray of numpy

The programming style of the tensorflow framework.

A complete computation process is defined by a series of computation steps, the output of one step become the input of the next step.
In short, TF adopts functional programming.

TF does not require a "graph" object to be explicitly constructed, rather, "graph" are built through a series of function calls. All functions(ops) defined in the "graph" is carried implicitly with the tensor returned. The run method of session method in the execution phrase will call all function calls carried with the tensor input.

Matrix Representation

In tensorflow, matrix is represented by a two dimensional list or np array. Row vector is represented by a list, column vector is represented by the transpose of row vector. The vector must be represented in matrix form ([[1, 2]]) in stead of a plain list.

Under the notation of tf, a matrix (tensor) is a column vector which is composed of several row vectors.

N-row is determined by the length of the outer list, N-column is determined by the length of the inner list.

Input/Output of TF Operations

The input of an TF operation is typically tensors (matrix), and the output is typically tensors (matrix) also. Even scalars should be wrapped in tensors (matrices). This implies the flow of datasets.

The output of a tf.reduce_mean() could be a scalar or tensor (ndarray), the output of a tf.matmul() is a tensor (even as a tensor with rank 0).

Indexing and Item assignment on Tensor Object

Indexing on a tensor object is allowed while item assignment is not supported.

Tensorflow Study Note, Part One

Tensor v.s. Matrix v.s. Vector

Hierarchy: tensor > matrix > vector

a matrix is a tensor with rank equals 2, a vector is tensor with rank equals 1.
Vector is the basic building block of a multi-dimensional tensor

Overloaded Terminologies

Current days, the exact definition of terms such as shape, rank, dimension, degree differs greatly across the general CS community. In other words, "dimension" as a term is excessively overloaded.

  1. When referring to a point in an Euclidean space, people may use the term 3-D, 4-D to specify the number of dimensionality, which is number of coordinates that is needed to specify the position of a point in such a space. 3-D is the geometry property of a point with three elements (coordinates).

Example: 1-D => (1) 2-D => (1,2) 3-D => (1,2,1) 4-D => (1,2,4,1)

The term "dimension" here has to be interpreted together with the underlying assumption that the subject described is point in a Euclidean space. Therefore, the mathematic description of this point studied is a tuple (or list, array as in the context of different programming languages) with a length of x (which is the value of dimension).

Point is a mathematical object.

  1. When referring to an array, dimension is used to describe the least number of indices needed to specific an element in the array. 3-D is the structural property of the matrix (multi-dimension array).

Example: 1-D => (1,1,2) 2-D => ((1,2),(2,3))

Array is a data structure.

The dimension of a mathematical object and the dimension of a data structure share common characteristics, but reside in different domain. They are thus not equivalent.

  1. A tensor in tensorflow may not directly represent a geometry object (or other abstract object), like points, plane, etc. In most cases, a tensor is the representation of a set of such objects. The dimension of the object is thus different from the dimension of the tensor as a whole.

When referring to a tensor (in the context of tensorflow), dimension (1-D, 2-D, 3-D) is the depth of nested array (also called rank). Each dimension has its own length (indicates the number of elements allowed along this dimension/direction). Dimension represents the level of object in a tensor, implies that the member/elements of an object can be other objects. In this context, the length refers to the number of elements in a certain dimension).

Shape of A Tensor

In tensorflow, shape is the compact description of the structure of a tensor. It describes how many layers are there in this multi-dimen array (nested array), how many elements are there in each layer. The rank of a tensor is the depth of layers.

Example: shape = [dim(4), dim(3)] ==> [[2,2,2],[1,1,1],[1,1,1],[2,2,2]]
rank = len(shape) = 2
dim(4): index of this dim = 0, len of this dim = 4
dim(3): index of this dim = 1, len of this dim = 3

In the above case, the length of the first dim is 4, which represents the size of our data set (that we have 4 data points). The length of the second dim is 3, which indicates that each data point in this data set has 3 attributes.