## The nature of unsupervised learning

The nature of unsupervised learning is the induction of result from data, the process of the induction is described in the form of computer algorithm and implemented by computer program. The induction process differs from discriminative learning process (such as supervised learning) in that no labels is provided to distinguish the correct result from the incorrect ones (hence no positive/negative feedback, no inputs that can be used to adjust the decision process)

## Debugging A Learning Algorithm

1. Get more training examples -> fix high variance
2. Try small sets of features -> fix high variance
3. Try getting additional features -> fix high bias
4. Try adding polynomial features -> fix high bias
5. Try decreasing lambda -> fix high bias
6. Try increasing lambda -> fix high variance

## Parameter and Hyper-parameter Set

Machine learning problems can often be converted into a non-convex optimization problem, which requires gradient descent algorithm to be implemented to find the global/local optimum of a parameter set. Using improper hyper-parameter values or improper initial parameter values will prevent gradient descents from working (for example, large learning rate can generate dramatic parameter changes which increases the value of cost function). In the above, the learning algorithm takes too big steps between each parameter updates.

An effective and efficient learning algorithm requires the smooth coordination between all parameter values(initial parameter values and hyper-parameter values).

1. Large learning rates should be avoided
2. Initial parameter values should be consistent (constant) between different experiments for the convenience of debugging

## Learning Rate and Parameter Scale

Large parameter scale (weight scale) tend to require a smaller learning rate.

## Generalization Error Estimation

Training error, validation error and test error can all be an estimate of the generalization error of the model/hypothesis on the entire population. However, an estimate of the generalization error might not be an fair estimate (an estimate with minimum bias). Also, we don't usually care about the estimate of the generalization error of model that is less fit in the training set.

## Hypothesis Evaluation

1. Plot the hypothesis function and all the data points, evaluate the hypothesis by manual evaluation of the fitting
2. Evaluate the hypothesis function by the training/validation/testing error.

To evaluate the degree of fitness of the current model/hypothesis on our training set, we train (minimize the objective function) the model on the training set and evaluate the performance by the training error.

To evaluate the degree of fitness of the current model/hypothesis on our validation set, we train (minimize the objective function) the model on the training set and evaluate the performance by the evaluated error on the validation set.

To evaluate the degree of fitness of the current model/hypothesis on our test set, we train (minimize the objective function) the model on the training set and evaluate the performance by the evaluated error on the test set.

To select the best model/hypothesis on our training set, we train the model on the training set with different parameter set, select the model with the minimum training error.

To select the best model/hypothesis on our validation set, we train the model on the training set, select the model (parameter set) with the minimum evaluated error on the validation set.

To evaluate the generalization error of the best model/hypothesis on the validation set, we train the model on the training set, optimize (select) the model (parameter set) with the minimum evaluated error on the validation set, and then report the performance by the evaluated error on the test set.

In the above process, we eventually want to have the minimum evaluated error on the test set for the model with the minimum validation set. We should only optimize the model based on the feedback from the evaluated error on the validation set, and report (only) the performance of the model on the test set. The ultimate goal is to minimize the evaluate error on both the validation set and the test set.

In absolute sense, the generalization error is the evaluated error of the model/hypothesis on the entire population. In other words, it is the fitness of the model on the entire population (as compared to test set error, the fitness of the model on the test set).
Generalization error can be the objective of hypothesis/model optimization, but since the validation set is only a subset of the entire population, the evaluated error on the validation set can not be a fair estimate of the generalization error, rather the evaluated error on the validation set can be goal of optimization (instead of the entire population) and the test set error of the optimized validation set can be a faire estimate of the generalization error.

## Model Selection

Model selection refers to both the form of the hypothesis function and the parameter set of the hypothesis function. In fact, the choice of the form of the hypothesis function can be considered as a separate parameter of our hypothesis.

## Generalization Error

Generalization error = error caused by bias + error caused by variance + irreducible error

## Feature Scaling

The purpose of feature scaling is to accelerate the speed of gradient descent convergence. Mean normalization is commonly used together with feature scaling.

Rule of Thumb, scale all features into the range of [-1, 1].

## Learning Rate

Start from 0.0001, then increase by 10 times, 0.0001, 0.001, 0.01, 0.1, 1.
If not work, then increase by 3 times, 0.0001, 0.0003, 0.0006, ...

## Sampling Error

To avoid sampling error, the dataset should be drawn from the same distribution of the entire population.

1. Categorial
2. Continuous
3. Ordinal

## Feature Engineering for DNN

1. Rescale bounded continuous features: All continuous input that are bounded, rescale them to [-1, 1] through x = (2x - max - min)/(max - min).
2. Standardize all continuous features: All continuous input should be standardized and by this I mean, for every continuous feature, compute its mean (u) and standard deviation (s) and do x = (x - u)/s.
3. Binarize categorical/discrete features: For all categorical features, represent them as multiple boolean features. For example, instead of having one feature called marriage_status, have 3 boolean features - married_status_single, married_status_married, married_status_divorced and appropriately set these features to 1 or -1. As you can see, for every categorical feature, you are adding k binary feature where k is the number of values that the categorical feature takes.

## Package

A golang program (executable) has a package name "main". A golang library can has an arbitrary package name. Only one file declaring/implementing the main method can directly resides in the "main" package.

## Function and Variable Declaration Style

Golang follows the scala style in that all types are decleared after an identifier and return types are declared after parameter list.

## Method Declaration

A method has receiver parmenter placed ahead of method identifier.

## Exported Package Members

An exported package member must have an identifier with a capital letter.
This package member will be referenced throught package name.

## Variable

There are two types of variables: const and var.

## Array Type

The length of an array is part of its type

## Multiple Return

Multiple return values of a function can be discarded on the receiving side by assigning it to _ variable.
For statement "range", this "_" variable can be omitted

## mime/multipart.File

The Read method of object with type: mime/multipart.File can only be used effectively once. When called on the same object the second time, no data will be read from the File.

## Relative File Path

For io operations, relative file path are relative to the current working directory.

## nil value

nil is the zero value for pointers, interfaces, maps, slices, channels and function types, representing an uninitialized value. nil error means no error

device=gpu
device=gpu01
device=gpu02
..

## GPU Multi-task Running Model

Everytime when new gpu computation program instructions is submitted into the GPU device, it will suspend the on-going task and execute the new instructions (program) immediately

## Purpose

The purpose of this benchmark is to prove that parallel computing on gpu does significantly improves program performance in terms of speed. Also, this benchmark gives an estimate of the performance increase.

## Experiment 1

### Environment

Alienware 14R2 (i5-6GB-GT650M)
Ubuntu14.04 CUDA7.5 GCC4.8.2

### Time Elapsed:

C++11: 5418.464355 millsec
CUDA: 229.8250 millsec

### Performance Evaluation

C++11: CUDA = 23.6
CUDA Time Percentage = 4%

## Experiment 2

### Environment

Alienware 14R2 (i5-6GB-GT650M)
Ubuntu14.04 CUDA7.5 GCC4.8.2
Tensorflow 0.9
Tensorflow 0.8

### Time Elapsed:

CPU: Time Elapsed: 298.434136 s
GPU: 144.865046 s

### Performance Evaluation

CPU: CUDA = 2.06
GPU Time Percntage = 48.5%

## Basic Concepts

1. Pipe
Represented by symbol: "|", pipe directs the stdout of the last command to the stdin of the next command.
Example (directs the result of find command to grep command):
find ./src/* | grep .java

2. Environmental Variable
export ABC=\$(find ./src/* | grep .java)

## Arguments in Bash Commands

Arguments in bash commands are all considered as string value. However, double quotations is optional and often omitted. Essentially, bash variables are character strings but arithmetic operations is allowed through "let" tool.

stat -x abc.txt

## Productivity v.s. Complexity

Over engineering happens when the productivity gained does not worth the effort of work and the complexity added into the system.

## Memory Safety

A concern in software development that aims to avoid software bugs that cause security vulnerability dealing with random-memory-access, such as buffer overflows and dangling pointers.

## Type Safety

Type safety is the extent to which a programming language discourages or prevents type errors.

## Static Variable is Evil

A static variable is essentially a global variable (in Java sense). All methods in the program can potentially change the state of a static variable, which makes it hard to reason and control. The predictability of the overall program is then impaired.

## Optional Type

Optional type is created to avoid null return value of method call. Since in Java null is not a sub-type of any other type, return a null would raise a nullpointer exception which crashes the problem.

## | Arraylist | LinkedList

get(index) | O(1) | O(n)
add(E) | O(n) | O(1)
add(E, index) | O(n) | O(n)
remove(index) | O(n) | O(n)
Iterator.remove() | O(n) | O(1)
Iterator.add(E) | O(n) | O(1)

## Threading Overhead

Time cost for creating a new thread: 1 ms - 8 ms

## Parallel Effectiveness Index:

Tc = Thread Creation Cost
Tr = Single Threading Runtime
N = Number of Threads

equilibrium:
Tr = (nn/(n-1))Tc

Threading is beneficial for a program starting from 18 ms (running time) for Intel i7 Process (8 hardware thread)

## Static Member and Static Context

1. Non-static field can not be referenced from a static context
2. Non-static methods can not be referenced from a static context
3. Static field can be referenced from a non-static context
4. Static methods can be referenced from a non-static context
5. Static member is not recommended to be accessed via instance reference

## Java Classpath

Java classpath are search path of java classes/jar/properties. Only .jar file path and directory path can be taken as the classpath (as the same in C/C++ header search path). Multiple classpaths are separated by colon (:). For classpath of jar files, a wildcard (*) is provided to represent all the jar files in a directory.

## Java Null Pointer

A java null pointer belongs to the null type, which has no name. Null in java can be considered as a keyword/literal to represent null reference. Null is not a subtype of anything, nor does it is the supertype of anything.

## Agent Command

1. Method host.sendCommand() signals the beginning of a continuous action. In most cases, the stop of a continuous action requires the terminate signal (such as move 0).

## Absolute v.s Continuous Commands

MissionSpec.allowAllAbsoluteMovementCommands()

ContinuousMovementCommand
DiscreteMovementCommand

## NLP Terminologies

1. Lexical Information
Information relating to the word itself (big, small, large, etc) in addition to the word classes (part-of-speech tags).

2. Endocentric and Exocentric
A grammatical construction (e.g. a phrase or compound word) is said to be endocentric if it fulfills the same linguistic function as one of its parts, and exocentric if it does not.

3. Isomorphic
corresponding or similar in form and relations.

4. Context-Free Grammar
The grammar described in a phrase structure.