Definition

Basic Concept

Bayes Theorem: relates conditional probabilities to revise beliefs based on evidence. Allows calculation of probability of hypothesis given observed data.

Historical Context

Proposed by Thomas Bayes (1763). Formalized by Pierre-Simon Laplace. Foundation for Bayesian inference and probabilistic reasoning.

Core Idea

Update prior information with new data to obtain posterior probability. Key for decision making under uncertainty.

Mathematical Formulation

Basic Formula

P(A|B) = (P(B|A) * P(A)) / P(B)

Terms Explanation

  • P(A|B): Posterior probability, probability of A given B
  • P(B|A): Likelihood, probability of B given A
  • P(A): Prior probability of A
  • P(B): Marginal probability of B

Alternative Form

P(H|E) = (P(E|H) * P(H)) / Σ P(E|H_i) * P(H_i)

Interpretations

Frequentist Interpretation

Probability as long-run frequency of event occurrence. Bayes theorem as tool for conditional frequencies.

Bayesian Interpretation

Probability as degree of belief. Bayes theorem as formal method for belief revision.

Decision-Theoretic Interpretation

Supports rational decision making by updating probability estimates with evidence.

Applications

Medical Diagnosis

Calculates disease probability given test results. Improves diagnostic accuracy by integrating prior prevalence.

Machine Learning

Used in Naive Bayes classifiers, spam filters, and probabilistic models.

Forensic Science

Evaluates evidence strength, updates probability of guilt or innocence.

Risk Assessment

Incorporates new data to refine risk estimates in finance, engineering, and safety.

Examples

Example 1: Disease Testing

Test sensitivity: 99%, specificity: 95%, disease prevalence: 1%. Calculate probability patient has disease given positive test.

ParameterValue
P(Disease)0.01
P(Positive|Disease)0.99
P(Positive|No Disease)0.05

Calculation:

P(Disease|Positive) = (0.99 * 0.01) / ((0.99 * 0.01) + (0.05 * 0.99)) ≈ 0.17

Example 2: Spam Filtering

Classify email as spam based on word occurrence probabilities and prior spam rate.

Derivation

Starting Point

Definition of conditional probability: P(A|B) = P(A ∩ B) / P(B)

Symmetry of Joint Probability

P(A ∩ B) = P(B ∩ A) = P(B|A) * P(A)

Combining Equations

P(A|B) = (P(B|A) * P(A)) / P(B)

Prior and Posterior Probabilities

Prior Probability

Initial belief about event before new data. Expresses baseline uncertainty.

Posterior Probability

Updated belief after accounting for evidence. Main output of Bayes theorem.

Influence of Evidence

Strength of evidence modulates difference between prior and posterior.

Conditional Probability

Definition

Probability of event A given event B has occurred: P(A|B) = P(A ∩ B) / P(B).

Role in Bayes Theorem

Bayes theorem uses conditional probabilities to invert conditioning events.

Properties

Non-negativity, normalization, chain rule applicability.

Law of Total Probability

Definition

Calculates marginal probability of event by summing over partitions.

P(B) = Σ P(B|A_i) * P(A_i)

Application in Bayes Theorem

Denominator P(B) computed via law of total probability to normalize posterior.

Partitioning Events

Events A_i are mutually exclusive and exhaustive.

Limitations and Assumptions

Requirement of Prior

Quality of prior affects posterior accuracy. Subjective priors can bias results.

Computational Complexity

High-dimensional problems may be intractable without approximations.

Assumption of Known Likelihoods

Requires likelihood functions to be specified and accurate.

Extensions and Generalizations

Bayesian Networks

Graphical models using Bayes theorem to represent complex dependencies.

Hierarchical Bayes

Models with multiple levels of priors and parameters.

Bayesian Updating in Dynamic Systems

Sequential updating algorithms like Kalman filters, particle filters.

Computational Aspects

Exact Computation

Feasible with discrete, low-dimensional problems.

Approximate Methods

Monte Carlo methods, Markov Chain Monte Carlo (MCMC), Variational Inference.

Software Tools

Packages: BUGS, Stan, PyMC3, TensorFlow Probability for Bayesian computation.

MethodDescriptionUse Case
MCMCSampling from posterior distributionComplex models, high dimensions
Variational InferenceOptimization-based approximationLarge-scale data, speed-critical

References

  • Bayes, T., "An Essay towards solving a Problem in the Doctrine of Chances," Philosophical Transactions of the Royal Society of London, vol. 53, 1763, pp. 370–418.
  • Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B., "Bayesian Data Analysis," Chapman and Hall/CRC, 3rd ed., 2013, pp. 1–668.
  • Jaynes, E. T., "Probability Theory: The Logic of Science," Cambridge University Press, 2003, pp. 1–726.
  • Koller, D., Friedman, N., "Probabilistic Graphical Models: Principles and Techniques," MIT Press, 2009, pp. 1–1100.
  • Robert, C. P., "The Bayesian Choice: From Decision-Theoretic Foundations to Computational Implementation," Springer, 2nd ed., 2007, pp. 1–560.