Kernel trick
machine-learning
maths
Kernel trick
Suppose we are in the modelling framework with training data $T \subset A \times B$. When using SVMs we want to find a hyperplane that linearly separates the data - though this might not be possible for the current embedding of $T$ in $A$. Though it might be possible for a map
$$\Phi: A \rightarrow A'.$$The kernel trick is to define a kernel of similarity
$$K: A \times A \rightarrow \mathbb{R}, \mbox{ by } K(x_1,x_2) = \Phi(x_1) \cdot \Phi(x_2).$$Whilst the form of $K$ may look complicated it usually is simpler than the embedding $\Phi$. Normally you do not find the function $\Phi$ instead you define $K$ and check the Mercer’s condition which guarantees the existence of $\Phi$.