Decision Trees
| Type |
|---|
| Classification, Regression |
Decision Trees are non-parametric supervised learning methods used for classification and regression. They use a tree-like graph of decisions to predict the value of a target variable.
Decision Tree Algorithms
- CART
- C4.5
- C5.0
Key Components
- Nodes: Represent a test on an attribute (feature).
- Branches: Represent the outcome of the test.
- Leaves: Represent the final class label or value.
Building the Tree (Recursive Partitioning)
The goal is to partition data into subsets that are as “pure” as possible.
- Information Gain: Measures the reduction in Entropy (disorder) after a split. Higher gain is better.
- Gini Impurity: A measure of how often a randomly chosen element would be incorrectly labeled. Used frequently in the CART algorithm.
- Splitting Criteria: At each node, the algorithm selects the feature and split point that maximizes purity in the resulting child nodes.
Overfitting and Pruning
Decision trees can easily overfit by becoming too deep and capturing noise.
- Pre-pruning: Stop growing the tree early (e.g., limit depth or minimum samples per leaf).
- Post-pruning: Grow the full tree and then remove nodes that provide little predictive power.
| Feature | Decision Trees | Boosting |
|---|---|---|
| Model Type | Single base learner | Ensemble of learners |
| Training | Fast, recursive | Sequential, slower |
| Complexity | Easy to interpret | High predictive power, “black box” |
| Variance | High (prone to overfitting) | Reduced via iterative refinement |