Week 10 - Feature transformation

OMSCS

Feature transformation is the problem of pre-processing a set of features to create a new (smaller/compact) feature set, while retaining as much information as possible. It is a map $p: \mathbb{F}^N \rightarrow \mathbb{F}^M$ where you usually want $M < N$.

In this course we will focus on linearĀ  feature reduction where $p$ is a linear map.

Note

Feature selection is a special case of feature transformation

Problems to overcome

If you think of features in analogy to language there is two problems when using a feature to label data.

Polysemy

Synonymy

Principle component analysis

Principle component analysis

Independent component analysis

Independent component analysis

Cocktail party problem

Comparison of ICA and PCA

These both do different things. Notice if we have a set of i.i.d. random variables from the Central limit theorem if they set is large enough their sum will look normally distributed which will provide an axis that maximises variance. Therefore PCA might cut a through a line of their addition whereas ICA will want to separate them.

Whilst ICA solves the Cocktail party problem very well, PCA is very poor at it. PCA’s goal is to find the most shared features whereas ICA finds the features that splits the data apart. For example on faces, ICA finds noses, eyes, chins whereas PCA find brightness or the average face first.

We can use ICA to understand our data on what separates points the best however ICA is not the most efficient algorithm. Though the understanding of your data it provides you can then use to implement more efficient algorithms. For example on documents ICA picks out topics or on nature pictures ICA picks out edges. Both of which there are better algorithms to find.

Alternatives

Random component analysis

Linear discriminant analysis