Definition
Basic Concept
Bayes Theorem: relates conditional probabilities to revise beliefs based on evidence. Allows calculation of probability of hypothesis given observed data.
Historical Context
Proposed by Thomas Bayes (1763). Formalized by Pierre-Simon Laplace. Foundation for Bayesian inference and probabilistic reasoning.
Core Idea
Update prior information with new data to obtain posterior probability. Key for decision making under uncertainty.
Mathematical Formulation
Basic Formula
P(A|B) = (P(B|A) * P(A)) / P(B)Terms Explanation
- P(A|B): Posterior probability, probability of A given B
- P(B|A): Likelihood, probability of B given A
- P(A): Prior probability of A
- P(B): Marginal probability of B
Alternative Form
P(H|E) = (P(E|H) * P(H)) / Σ P(E|H_i) * P(H_i)Interpretations
Frequentist Interpretation
Probability as long-run frequency of event occurrence. Bayes theorem as tool for conditional frequencies.
Bayesian Interpretation
Probability as degree of belief. Bayes theorem as formal method for belief revision.
Decision-Theoretic Interpretation
Supports rational decision making by updating probability estimates with evidence.
Applications
Medical Diagnosis
Calculates disease probability given test results. Improves diagnostic accuracy by integrating prior prevalence.
Machine Learning
Used in Naive Bayes classifiers, spam filters, and probabilistic models.
Forensic Science
Evaluates evidence strength, updates probability of guilt or innocence.
Risk Assessment
Incorporates new data to refine risk estimates in finance, engineering, and safety.
Examples
Example 1: Disease Testing
Test sensitivity: 99%, specificity: 95%, disease prevalence: 1%. Calculate probability patient has disease given positive test.
| Parameter | Value |
|---|---|
| P(Disease) | 0.01 |
| P(Positive|Disease) | 0.99 |
| P(Positive|No Disease) | 0.05 |
Calculation:
P(Disease|Positive) = (0.99 * 0.01) / ((0.99 * 0.01) + (0.05 * 0.99)) ≈ 0.17Example 2: Spam Filtering
Classify email as spam based on word occurrence probabilities and prior spam rate.
Derivation
Starting Point
Definition of conditional probability: P(A|B) = P(A ∩ B) / P(B)
Symmetry of Joint Probability
P(A ∩ B) = P(B ∩ A) = P(B|A) * P(A)
Combining Equations
P(A|B) = (P(B|A) * P(A)) / P(B)Prior and Posterior Probabilities
Prior Probability
Initial belief about event before new data. Expresses baseline uncertainty.
Posterior Probability
Updated belief after accounting for evidence. Main output of Bayes theorem.
Influence of Evidence
Strength of evidence modulates difference between prior and posterior.
Conditional Probability
Definition
Probability of event A given event B has occurred: P(A|B) = P(A ∩ B) / P(B).
Role in Bayes Theorem
Bayes theorem uses conditional probabilities to invert conditioning events.
Properties
Non-negativity, normalization, chain rule applicability.
Law of Total Probability
Definition
Calculates marginal probability of event by summing over partitions.
P(B) = Σ P(B|A_i) * P(A_i)Application in Bayes Theorem
Denominator P(B) computed via law of total probability to normalize posterior.
Partitioning Events
Events A_i are mutually exclusive and exhaustive.
Limitations and Assumptions
Requirement of Prior
Quality of prior affects posterior accuracy. Subjective priors can bias results.
Computational Complexity
High-dimensional problems may be intractable without approximations.
Assumption of Known Likelihoods
Requires likelihood functions to be specified and accurate.
Extensions and Generalizations
Bayesian Networks
Graphical models using Bayes theorem to represent complex dependencies.
Hierarchical Bayes
Models with multiple levels of priors and parameters.
Bayesian Updating in Dynamic Systems
Sequential updating algorithms like Kalman filters, particle filters.
Computational Aspects
Exact Computation
Feasible with discrete, low-dimensional problems.
Approximate Methods
Monte Carlo methods, Markov Chain Monte Carlo (MCMC), Variational Inference.
Software Tools
Packages: BUGS, Stan, PyMC3, TensorFlow Probability for Bayesian computation.
| Method | Description | Use Case |
|---|---|---|
| MCMC | Sampling from posterior distribution | Complex models, high dimensions |
| Variational Inference | Optimization-based approximation | Large-scale data, speed-critical |
References
- Bayes, T., "An Essay towards solving a Problem in the Doctrine of Chances," Philosophical Transactions of the Royal Society of London, vol. 53, 1763, pp. 370–418.
- Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B., "Bayesian Data Analysis," Chapman and Hall/CRC, 3rd ed., 2013, pp. 1–668.
- Jaynes, E. T., "Probability Theory: The Logic of Science," Cambridge University Press, 2003, pp. 1–726.
- Koller, D., Friedman, N., "Probabilistic Graphical Models: Principles and Techniques," MIT Press, 2009, pp. 1–1100.
- Robert, C. P., "The Bayesian Choice: From Decision-Theoretic Foundations to Computational Implementation," Springer, 2nd ed., 2007, pp. 1–560.