Bayesian Decision Theory

Bayesian Decision Theory is a fundamental statistical approach to pattern classification that quantifies the trade-offs between various decisions using probability and the costs associated with those decisions. It provides a mathematical framework for updating existing beliefs based on new evidence. By combining prior knowledge with observed data (likelihood), the theory enables the calculation of posterior probabilities to make decisions that minimize the probability of error or overall risk.

Core Concepts and Components

  • Prior Probability ($P(y)$): Reflects knowledge of how likely an outcome is before any data is observed.
  • Class-Conditional Density ($p(x\vert y)$): Also known as Likelihood, this is the probability density of observing feature $x$ given a specific state of nature $y$.
  • Evidence ($p(x)$): The total probability of observing feature $x$ across all possible categories; it acts as a normalization constant and is often unimportant for the final decision.
  • Posterior Probability ($P(y\vert x)$): The probability of a state of nature $y$ occurring given the observed feature $x$. It is calculated using Bayes’ Rule:

posterior=likelihood×priorevidence\text{posterior} = \frac{\text{likelihood} \times \text{prior}}{\text{evidence}}

Decision Rules

  • Prior-Only Rule: If no data is available, the best strategy is to always choose the class with the highest prior probability.
  • Posterior Decision Rule: Given an observation $x$, decide $y_1$ if $P(y_1\vert x) > P(y_2\vert x)$; otherwise, decide $y_2$.
  • Minimizing Error: This rule is designed to achieve the minimum probability of error.
  • Special Cases:
    • If priors are uniform (equal), the decision relies entirely on the likelihood.
    • If likelihoods are equal, the decision relies entirely on the prior.

      Generalized Theory: Risk and Loss

  • Loss Function ($\lambda(\alpha_i\vert y_j)$): Quantifies the cost incurred by taking action $\alpha_i$ when the true state is $y_j$. This allows for scenarios where some mistakes are more “expensive” than others.
  • Conditional Risk ($R(\alpha_i\vert x)$): The expected loss of taking a specific action given an observation $x$.
  • Bayes Risk ($R^*$): The minimum possible overall risk, representing the best performance achievable.
  • Minimizing Risk: To minimize overall loss, the agent should always select the action that minimizes the conditional risk.

    Three Approaches to Classification

    1. Generative Models: Model the class-conditional densities ($p(x\vert y_k)$) and priors ($p(y_k)$) for each class, then use Bayes’ theorem to find the posterior.
    2. Discriminative Models: Model the posterior probabilities ($P(y_k\vert x)$) directly without modeling the underlying distribution of the data.
    3. Discriminant Functions: Find a function $g(x)$ that maps inputs directly to a class label without calculating probabilities.

Note on Generative vs. Discriminative Models While generative models (like Naive Bayes) provide a full picture of the data distribution, they often require more data and computational power. Discriminative models (like Logistic Regression) are often more efficient when the goal is strictly to find the boundary between classes.

The Role of the Loss Function In real-world applications like medical diagnosis, a “False Negative” (missing a disease) is usually much more costly than a “False Positive” (extra testing). By adjusting the loss function $\lambda$, Bayesian Decision Theory allows the model to become “cautious” and favor one class over another to minimize total cost, even if it slightly increases the raw error rate.