Bias-Variance Tradeoff¶

The fundamental tension in machine learning: models that are too simple miss patterns (bias), models that are too complex memorize noise (variance). Every modeling decision is a point on this spectrum.

Definitions¶

Bias = systematic error. How far off predictions are on average from true values. - High bias = underfitting (model too simple, misses patterns) - Example: fitting linear regression to quadratic data

Variance = how much predictions fluctuate across different training sets. - High variance = overfitting (model captures noise, not signal) - Example: degree-10 polynomial on 20 data points

Total error = Bias^2 + Variance + Irreducible noise

Diagnosis¶

Symptom	Diagnosis	Solution
High train error, high val error	Underfitting (high bias)	More complex model, more features
Low train error, high val error	Overfitting (high variance)	Regularization, more data, simpler model
Low train error, low val error	Good fit	Ship it
High train error, low val error	Impossible (data leakage?)	Check for bugs

Learning curves: plot train/val error vs training set size. - Converge at high error -> bias problem (more data won't help, need better model) - Large gap -> variance problem (more data will help)

Model Complexity Spectrum¶

Low complexity <---------> High complexity Linear regression --- Polynomial --- Decision tree --- Deep forest --- Neural net

Low bias, high variance <--> High bias, low variance

Managing the Tradeoff¶

Reduce Variance (fight overfitting)¶

Regularization: L1/L2 penalties on model weights
Dropout: randomly zero out neurons during training
Early stopping: stop training when validation loss increases
Cross-validation: robust performance estimate
More training data: best remedy for overfitting
Ensemble methods: averaging reduces variance (Random Forest, Bagging)
Feature selection: remove noisy/irrelevant features
Simpler model: fewer parameters, shallower trees

Reduce Bias (fight underfitting)¶

More complex model: deeper trees, more layers
More/better features: feature engineering
Less regularization: reduce penalty
Ensemble methods: boosting reduces bias (Gradient Boosting, AdaBoost)
Train longer: more epochs/iterations

Ensemble Methods and the Tradeoff¶

Bagging (Random Forest): trains multiple models independently, averages them. Reduces VARIANCE
Boosting (Gradient Boosting): trains models sequentially, each correcting prior errors. Reduces BIAS

This is why gradient boosting (reduces bias) + regularization (controls variance) is so powerful.

Gotchas¶

"More data" helps variance but NOT bias - if model is too simple, more data won't fix it
Regularization is NOT free - too much regularization increases bias
Validation set overfitting is real - heavy hyperparameter tuning on the same val set
Neural networks challenge the classical tradeoff - very large models can generalize well (double descent phenomenon)
Cross-validation gives a better estimate than single split but is slower