Support Vector Machines operate using the modelling framework to try to linearly separate data points. Suppose we have some training data $T \subset A \times B$. This utilises the kernel trick to change the topology of the feature space of our data whilst still keeping the computation relatively simple. Let $K: A \times A \rightarrow \mathbb{R}$ represent such a kernel. Then we solve the following optimisation problem

$$ \max_{\alpha} \sum_{t \in T} \alpha_t - \frac{1}{2} \sum_{t,s \in T} \alpha_t \alpha_s y^t y^s K(x^t, x^s)$$

such that

$$ \alpha_t \geq 0 \mbox{ for all } t \in T, \mbox{ and } \sum_{t \in T} \alpha_ty^t = 0.$$

Which we turn this into a classifier by setting:

$$ \hat{f}(x) = \mbox{sgn}\left ( \sum_{t \in T} \alpha_t y^t K(x^t \cdot x) + b^t \right )$$

where

$$ b^s = y^s - \sum_{t \in T} \alpha_t y^t K(x^t, x^s), \mbox{ for any } s \in T \mbox{ such that } \alpha^s \not = 0. $$

Note that $K$ needs to obey Mercer’s condition for the underlying mapping of the feature space to exist.

# Run time

The complexity of the kernel function can add large overhead to the run time for training this model.

# Correctness

The accuracy of this model highly depends on the choice of the kernel function. This definition of similarity between two vectors.

# Support vector machines (SVM)

# Run time

# Correctness