The grammar formalism applies to many other natural languages despite the fact that all discussions are focused on modern English.
- Dependency Grammar (Subject, Predicate, Object)
- Constituency Grammar (NP, VP)
- Others - sub-categorization, etc (Verb followed by infinitive)
Among them, constituency grammar is universally modeled by context-free grammar (CFG), which is also called phrase-structure grammars. The formalism of CFG is equivalent to Backus-Naur Form (BNF).
Relationships that a Parser Models
- Constituency Relationships (Constituent Relation)
- Grammatically Relationships (Lexical Relation)
Parsing With CFG (Context-Free Grammars)
- Top-down (goal-directed search, rationalist tradition)
- Bottom-up (data directed search, empiricist tradition)
Data Structure of Parse Tree (CFG tree)
Implemented as a multi-way tree via OOP mechanism. Each internal node represents a non-terminal of CFG;each leaf node represents a terminal of CFG;The value of each node is the symbol of the non-terminal/terminal it represented. The value of each node is a symbol (NP, VP, prefer, etc), the pointers of each node points to its descendants until the reach of leaf nodes (non-terminal terminal pair). The leaf node is a 2-tuple that represents the production rule of a non-terminal to a terminal.
Serialization of a Parse Tree
parse tree = (ROOT (S (NP (PRP I)) (VP (VBP love) (NP (PRP you)))))
A JSON string that contains only list structure with all comma replaced by white space and double quotation replaced by empty string, and optionally squared bracket replaced by parentheses.
In any internal node (within a bracket), only one symbol ("a non-terminal symbol, without any bracket enclosed") can be presented and it is the value of this node. The child node of any internal node is defined to be all the bracketed blocks that lays in the same level with this node. All bracketed blocks is itself a parse tree (may not origin from root)
Parse Tree and Part-of-Speech Tagging
The non-terminal in the leaf node is the part-of-speech tagging of the terminal.
Why Not Context-sensitive Grammar In NLP Application?
Context-sensitive grammar is non-deterministic, which means the production process may not converge.
CFG Data Structure
- Search Space Forest
Search Space -> Search Space Forest (Search Tree in Manning's book)
Constructed by all possible partial parse trees and complete parse trees, among them, partial parse trees act as the internal node and the complete parse trees act as the leaf node (of the tree data structure -> search space forest).
Note: Search Space Forest is actually a and-or tree
- Parse Tree
Parse Tree -> Syntax Parse Tree (Complete Syntax Tree)
Partial Parse Tree (Incomplete Sytax Tree)
The function of Part-of-Speech Tagging
PoS has both syntactic and semantic meanings. Syntactically, a noun is usually followed by a verb; semantically, a noun typically refer to people, animals, concepts and things.