Normalization/Scaling Techniques

z-score normalization
Min-max normalization

Cases where normalization/scaling is important

Gradient-based methods (where optimization depends on scale):
- Linear/Logistic Regression (with gradient descent)
- Neural Networks / Deep Learning
- Support Vector Machines (SVMs)
- k-Nearest Neighbors (kNN)
- Principal Component Analysis (PCA), LDA

Reason: If features are on very different scales (e.g., “age in years” vs. “income in dollars”), the algorithm may:

converge slower (or get stuck in poor minima),
assign disproportionate importance to larger-scaled features.

Example: In kNN, distance metrics (Euclidean, Manhattan) are directly scale-sensitive.

Cases where normalization is not strictly necessary

Tree-based methods (scale-invariant algorithms):
- Decision Trees
- Random Forests
- Gradient Boosted Trees (XGBoost, LightGBM, CatBoost)

Reason: Trees split data by thresholds (“is feature > X?”), which are not affected by whether the feature is in cm or m. Scaling won’t change the splits.

Naïve Bayes (with categorical data): Works with probability distributions, not distances.

Borderline cases

Naïve Bayes with continuous features (Gaussian NB) → Standardization can help, since the algorithm assumes normally distributed features.
Linear models with regularization (Lasso, Ridge, ElasticNet) → Strongly recommended, because the penalty is scale-dependent.

Rule of thumb

If the model uses distances, gradients, or variance-based decomposition, normalize or standardize.
If it uses rules, counts, or thresholds, scaling usually doesn’t matter.