0. Glossary

Information: Any datum (and/or data) that changes the probability distribution (chances) of a relevant outcome.
Bias: Error from incorrect modeling assumption.
Variance: Error from sensitivity to fluctuations in training data.
Confusion Matrix: A table that summarizes the performance of a classifier. It shows the number of True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN).
Classification Report: A convenient summary that shows the precision, recall, and F1-score for each class, along with the overall accuracy.
Non-stationarity: When the thing you’re modeling changes over time
- Covariate shift: input distribution changed between training and deployment
- Concept drift: correct output for given input changes over time
Non-monotonicity in ML refers to situations where the relationship between input features and the model’s output is not consistently increasing or decreasing. In other words, as the value of an input feature changes, the model’s prediction might not strictly follow a single direction (e.g., always increasing or always decreasing).
Saturation in ML refers to the phenomenon where the relationship between input and output becomes non-linear (eg. are 1000 people really 10 times more relevant than 100 people?-
Multicollinearity (or collinearity) is a situation where the predictors in a regression model are linearly dependent.
The Curse of Dimensionality refers to the negative impact of having too many features (dimensions) in a dataset, which leads to problems in machine learning, such as data becoming too sparse, distances between data points becoming less meaningful, increased computational cost, and a higher risk of overfitting.