Bagging
| Type |
|---|
| Ensemble Technique |
Bagging, short for Bootstrap Aggregating, is an ensemble meta-algorithm designed to improve the stability and accuracy of machine learning algorithms. It is primarily used to reduce variance and help avoid overfitting, making it particularly effective for “unstable” models like deep Decision Trees.
The Bagging process consists of two primary stages: Bootstrapping and Aggregating.
1. Bootstrapping (Sampling)
The algorithm creates multiple subsets of the original training data.
- Each subset is generated by random sampling with replacement.
- This means some observations may appear multiple times in a single subset, while others (known as “Out-of-Bag” samples) may not appear at all.
- Each subset is typically the same size as the original dataset.
2. Aggregating (Combining)
A weak learner (usually a Decision Tree) is trained independently and in parallel on each of these bootstrap samples.
- For Classification: The final prediction is determined by a majority vote across all models.
- For Regression: The final prediction is the average of all the individual model outputs.
Key Advantages
- Reduces Variance: By averaging various models, Bagging smooths out the “noise” and high sensitivity to training data found in individual models.
- Parallelization: Unlike Boosting (which is sequential), Bagging models can be trained simultaneously, making it computationally efficient on multi-core systems.
- Stability: It prevents a single outlier or a specific feature from dominating the final prediction, as that feature might not appear in all bootstrap samples.
- Out-of-Bag (OOB) Error: Because some data is left out of each bootstrap sample, this “out-of-bag” data can be used as a built-in validation set to estimate model performance.
Famous Example: Random Forest
The most well-known application of Bagging is the Random Forest. It enhances the standard Bagging process by adding feature randomness. In a Random Forest, not only is the data sampled with replacement, but at each split in the Decision Tree, the algorithm only considers a random subset of features. This ensures that the individual trees are even more decorrelated, leading to a more robust final ensemble.
Bagging vs. Boosting
| Feature | Bagging | Boosting |
|---|---|---|
| Goal | Reduce Variance | Reduce Bias |
| Model Training | Parallel (Independent) | Sequential (Dependent) |
| Sample Weighting | Equal weight for all samples | Misclassified samples get higher weight |
| Best Used For | High-variance, overfit models | High-bias, underfit models |