# Bagging
Last edited: 2024-01-24
This is one of the simplest Ensemble learning methods but out performs classic modelling methods in certain problems.
Suppose we are in the modelling framework . The bagging is the following process
- We choose a set of modelling algorithms $A_i$ for $1 \leq i \leq k$ that could fit the problem - these could all be the same.
- We take random subsets of the training data $T$, $T_i$ for $1 \leq i \leq k$, with replacement - so two samples could contain the same data point.
- We then train $\hat{f}_i$ using algorithm $A_i$ with training data $T_i$ for $1 \leq i \leq k$.
- Then we have some method of averaging these models over our problem space to produce $\hat{f}$ our final model.
# Example
Suppose we want to use polynomial regression on a simple function $f: \mathbb{R} \rightarrow \mathbb{R}$ with training data $T$.
We could instead of running it once randomly select some $T_i \subset T$ then train $\hat{f}_i$ using polynomial regression on $T_i$.
Then we set our final $\hat{f} = \frac{1}{k} \sum_{i=1}^k \hat{f}_i$.
# Correctness
Bagging tends to lead to less overfitting so can help algorithms that are particularly prone to this like polynomial regression .