1. Introduction

lecture

Introduction to Statistical Learning in Python

Book: statlearning Github: https://github.com/bomtall/islp

Set up environment

Sort out python

Difference between Prediction and Inference

Given a setting where $Y = f(X) + \epsilon$ then there.

Prediction - Is building a blackbox $\hat{f}$ such that $\hat{f}(x)$ is close to $f(x)$ on a subdomain $\hat{X} \subset X$. For notation we have $\hat{Y} = \hat{f}(X)$.

Inference - This is the process of understanding the true form of $f$.

Accuracy

Reducible error - The difference between $\hat{f}(x) - f(x)$.

Irreducible error - The $\epsilon$ coming from measurement, innate randomness, ….

Why is irreducible error larger than zero?

$$\mathbb{E}[(Y - \hat{Y})^2] = E(f(X) + \epsilon - \hat{f}(X)^2) = [f(X) - \hat{f}(X)] + Var(\epsilon)$$

With the following assumptions.

  • $Y = f(X) + \epsilon$
  • $\hat{Y} = \hat{f}(X)$
  • $\mathbb{E}(\hat{Y} - Y)$
  • $\mathbb{E}[\epsilon] = 0$

Definition of mean squared error here:

$$MSE(y, \hat{f}(x)) = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{f}(x_i))^2.$$

What is bias?

How good can your model be given it is of a certain form.