Week 10 - Feature transformation
Feature transformation is the problem of pre-processing a set of features to create a new (smaller/compact) feature set, while retaining as much information as possible. It is a map $p: \mathbb{F}^N \rightarrow \mathbb{F}^M$ where you usually want $M < N$.
In this course we will focus on linearĀ feature reduction where $p$ is a linear map.
Feature selection is a special case of feature transformation
Problems to overcome
If you think of features in analogy to language there is two problems when using a feature to label data.
Principle component analysis
Independent component analysis
Independent component analysis
Comparison of ICA and PCA
These both do different things. Notice if we have a set of i.i.d. random variables from the Central limit theorem if they set is large enough their sum will look normally distributed which will provide an axis that maximises variance. Therefore PCA might cut a through a line of their addition whereas ICA will want to separate them.
Whilst ICA solves the Cocktail party problem very well, PCA is very poor at it. PCA’s goal is to find the most shared features whereas ICA finds the features that splits the data apart. For example on faces, ICA finds noses, eyes, chins whereas PCA find brightness or the average face first.
We can use ICA to understand our data on what separates points the best however ICA is not the most efficient algorithm. Though the understanding of your data it provides you can then use to implement more efficient algorithms. For example on documents ICA picks out topics or on nature pictures ICA picks out edges. Both of which there are better algorithms to find.