Normalization/Scaling Techniques
- z-score normalization
- Min-max normalization
Cases where normalization/scaling is important
- Gradient-based methods (where optimization depends on scale):
- Linear/Logistic Regression (with gradient descent)
- Neural Networks / Deep Learning
- Support Vector Machines (SVMs)
- k-Nearest Neighbors (kNN)
- Principal Component Analysis (PCA), LDA
Reason: If features are on very different scales (e.g., “age in years” vs. “income in dollars”), the algorithm may:
- converge slower (or get stuck in poor minima),
- assign disproportionate importance to larger-scaled features.
Example: In kNN, distance metrics (Euclidean, Manhattan) are directly scale-sensitive.
Cases where normalization is not strictly necessary
- Tree-based methods (scale-invariant algorithms):
- Decision Trees
- Random Forests
- Gradient Boosted Trees (XGBoost, LightGBM, CatBoost)
Reason: Trees split data by thresholds (“is feature > X?”), which are not affected by whether the feature is in cm or m. Scaling won’t change the splits.
- Naïve Bayes (with categorical data): Works with probability distributions, not distances.
Borderline cases
- Naïve Bayes with continuous features (Gaussian NB) → Standardization can help, since the algorithm assumes normally distributed features.
- Linear models with regularization (Lasso, Ridge, ElasticNet) → Strongly recommended, because the penalty is scale-dependent.
Rule of thumb
- If the model uses distances, gradients, or variance-based decomposition, normalize or standardize.
- If it uses rules, counts, or thresholds, scaling usually doesn’t matter.