Bagging

Type
Ensemble Technique

Bagging, short for Bootstrap Aggregating, is an ensemble meta-algorithm designed to improve the stability and accuracy of machine learning algorithms. It is primarily used to reduce variance and help avoid overfitting, making it particularly effective for “unstable” models like deep Decision Trees.

The Bagging process consists of two primary stages: Bootstrapping and Aggregating.

1. Bootstrapping (Sampling)

The algorithm creates multiple subsets of the original training data.

Each subset is generated by random sampling with replacement.
This means some observations may appear multiple times in a single subset, while others (known as “Out-of-Bag” samples) may not appear at all.
Each subset is typically the same size as the original dataset.

2. Aggregating (Combining)

A weak learner (usually a Decision Tree) is trained independently and in parallel on each of these bootstrap samples.

For Classification: The final prediction is determined by a majority vote across all models.
For Regression: The final prediction is the average of all the individual model outputs.

Key Advantages

Reduces Variance: By averaging various models, Bagging smooths out the “noise” and high sensitivity to training data found in individual models.
Parallelization: Unlike Boosting (which is sequential), Bagging models can be trained simultaneously, making it computationally efficient on multi-core systems.
Stability: It prevents a single outlier or a specific feature from dominating the final prediction, as that feature might not appear in all bootstrap samples.
Out-of-Bag (OOB) Error: Because some data is left out of each bootstrap sample, this “out-of-bag” data can be used as a built-in validation set to estimate model performance.

Famous Example: Random Forest

The most well-known application of Bagging is the Random Forest. It enhances the standard Bagging process by adding feature randomness. In a Random Forest, not only is the data sampled with replacement, but at each split in the Decision Tree, the algorithm only considers a random subset of features. This ensures that the individual trees are even more decorrelated, leading to a more robust final ensemble.

Bagging vs. Boosting

Feature	Bagging	Boosting
Goal	Reduce Variance	Reduce Bias
Model Training	Parallel (Independent)	Sequential (Dependent)
Sample Weighting	Equal weight for all samples	Misclassified samples get higher weight
Best Used For	High-variance, overfit models	High-bias, underfit models