Introduction
Real-world reasoning rarely operates with certainty. Sensors are noisy, information incomplete, expert knowledge imprecise. Uncertainty reasoning addresses how intelligent systems make decisions and draw conclusions despite incomplete, imperfect information.
Expert systems must handle uncertainty to be practical. A medical diagnosis system might conclude "diabetes" with confidence 0.8 and "thyroid disease" with confidence 0.3, recommending further testing rather than definitive diagnosis. Pure logical systems (true/false) cannot capture this.
Multiple approaches model uncertainty: probability theory, confidence factors, fuzzy logic, belief functions. Each has strengths and limitations. Modern systems often combine approaches: Bayesian networks for probabilistic reasoning, fuzzy logic for vague concepts, confidence factors for rule-based reasoning.
"Dealing with uncertainty is not a weakness of intelligent systems,it's a necessity. The world is uncertain, and systems must reason about it effectively." -- Judea Pearl, UCLA
Sources of Uncertainty
Incomplete Information
Missing data. Example: patient hasn't taken specific lab test yet. System must reason with available information, adjust conclusions as more data arrives.
Imperfect Observations
Sensor noise, measurement error. Temperature sensor reads 39.2°C but actual temperature uncertain (±0.5°C). Must account for measurement uncertainty.
Probabilistic Events
Event outcome genuinely random or unpredictable. Patient recovers with unknown probability. System models via probability distributions.
Conflicting Evidence
Multiple sources provide contradictory information. One expert says "infection likely," another says "unlikely." System must weigh evidence, combine conclusions.
Imprecise Terminology
Vague concepts without crisp boundaries. "Fever" is fuzzy (continuous spectrum, not binary). Fuzzy logic handles this; Bayesian approaches discretize (fever present/absent).
Approximations and Assumptions
Models simplify reality. Rule "if A then B" assumes unchanging relationship, ignores exceptions. Confidence factors model residual uncertainty.
Confidence Factors and Certainty
Concept
Associate each fact with confidence factor (certainty measure): number between -1 and 1, or 0 and 1. High value = high confidence; low value = low confidence.
Fact: patient_has_diseaseConfidence: 0.75 (fairly sure, not certain)Fact: patient_recoveredConfidence: 0.2 (doubtful, weak evidence)Combining Confidence
AND operation: Product of confidences. Both must be confident.
CF(A AND B) = CF(A) * CF(B)CF(fever=0.8, infection=0.9) = 0.8 * 0.9 = 0.72OR operation: Maximum of confidences (at least one true).
CF(A OR B) = max(CF(A), CF(B))CF(fever=0.8 OR headache=0.6) = 0.8Rule Chaining
Rule: IF symptoms=fever THEN disease=infection (CF=0.9)Fact: symptoms=fever (CF=0.8)Conclusion: disease=infection with CF = 0.8 * 0.9 = 0.72Confidence propagates through rule chain; compound uncertainty accumulates.
Advantages and Limitations
Advantages: Simple, intuitive, requires no probability distributions, efficient computation.
Limitations: No formal justification; CF combinations ad-hoc; cannot model dependencies; lacks flexibility for complex scenarios.
Probability Theory Foundations
Basic Concepts
Probability P(A): Number between 0 and 1; P(A)=0 means impossible, P(A)=1 means certain.
P(patient_has_infection) = 0.7(70% chance patient has infection)Joint Probability
P(A, B) = probability of both A and B trueP(fever, infection) = P(fever) * P(infection | fever)Conditional Probability
P(A|B) = probability of A given B known true.
P(infection | fever) = 0.8(80% of patients with fever have infection)Bayes' Theorem
P(A|B) = P(B|A) * P(A) / P(B)Enables reasoning backward: from effects to causesP(infection | fever) = P(fever | infection) * P(infection) / P(fever)Law of Total Probability
P(B) = sum over all A: P(B|A) * P(A)Marginalizes over all possible states of ABayesian Networks
Structure
Directed acyclic graph representing probabilistic dependencies. Nodes = variables; edges = conditional dependencies. Encodes joint probability distribution compactly.
Example: Medical diagnosis Disease -> Symptom1 Disease -> Symptom2 Symptom1 -> Test_ResultDisease is parent; symptoms depend on diseaseSymptoms cause test resultsConditional Probability Tables (CPT)
Each node has CPT specifying P(node | parents). For root nodes (no parents), table is prior probability.
P(infection): P(infection=true) = 0.01 (1% prior probability) P(infection=false) = 0.99P(fever | infection): P(fever | infection=true) = 0.9 (90% with infection) P(fever | infection=false) = 0.1 (10% without infection)Inference
Given evidence (observed variables), compute probability of unobserved variables. Example: given fever observed, probability of infection?
P(infection | fever_observed) = P(fever | infection) * P(infection) / P(fever) = 0.9 * 0.01 / 0.09 = 0.1 (approximately)Exact Inference Algorithms
- Variable Elimination: Sum out variables in order, reducing computation
- Belief Propagation: Message passing in trees for efficient inference
- Junction Tree Algorithm: Converts general graphs to trees, applies belief propagation
Approximate Inference
Exact inference intractable for large networks. Approximate methods:
- Markov Chain Monte Carlo (MCMC): Sample from posterior distribution, estimate probabilities from samples
- Gibbs Sampling: MCMC variant; iteratively sample each variable
- Variational Inference: Approximate posterior with simpler distribution
Dempster-Shafer Theory
Distinction from Probability
Represents uncertainty via belief and plausibility functions instead of probability. Distinguishes between lack of belief and disbelief.
Probability: P(A) = 0.5 (could mean no knowledge or perfectly balanced evidence)Belief: Bel(A) = 0.3, Pl(A) = 0.8 (some evidence for A, but much uncertainty)Interval [0.3, 0.8] captures evidential uncertaintyBasic Concepts
Mass Function m(A): Probability mass assigned to set A. Not probability of A, but degree of support for A.
m(infection) = 0.3 (30% of evidence supports infection)m(not_infection) = 0.2 (20% supports non-infection)m({infection, not_infection}) = 0.5 (50% is ambiguous, supports either)Belief: Bel(A) = sum of m(B) for all B supporting A
Plausibility: Pl(A) = 1 - Bel(NOT A)
Combining Evidence
Dempster's combination rule merges evidence from multiple sources. Resolves conflicts by renormalizing.
Expert 1 says m(infection) = 0.8, m(not_infection) = 0.2Expert 2 says m(infection) = 0.6, m(not_infection) = 0.4Combined: m(infection) proportional to 0.8*0.6 + (crossterms)Result: higher certainty for infection (both experts agree)Advantages/Disadvantages
Advantages: Models ignorance/uncertainty distinction better than probability, handles conflicting evidence, generalizes probability theory.
Disadvantages: Complex math, computationally expensive for many hypotheses, less intuitive than probability, less research/tools available.
Belief Propagation and Inference
Message Passing
In tree-structured networks, send messages (conditional probabilities) between nodes. Each node combines received messages to compute posterior probability.
Message from parent to child: conditional probabilities given parent valuesMessage from child to parent: likelihood of observing child evidenceNode combines all messages: P(node | all_evidence) ∝ P(evidence_below) * P(evidence_above)Sum-Product Algorithm
General algorithm for exact inference on trees. Messages computed as sum and product operations; combined at nodes to compute marginal probabilities.
Loopy Belief Propagation
Extends message passing to networks with cycles. Messages iteratively updated; algorithm converges (approximately) to true posteriors. Useful for many realistic networks.
Approximate Reasoning Methods
Stochastic Simulation
Sample from posterior distribution. Each sample represents hypothesis; distribution of samples approximates true posterior.
Forward Sampling
Sample variables in topological order. Forward sample reflects prior distribution; weight samples by evidence likelihood to approximate posterior.
Likelihood Weighting
Forward sample with evidence variables fixed to observed values. Weight each sample by likelihood of evidence given sample values.
Markov Chain Monte Carlo
Construct Markov chain whose stationary distribution is posterior. Sample from chain; samples approximate posterior. Converges to true distribution asymptotically.
Handling Incomplete and Missing Data
Strategies
- Imputation: Fill missing values with estimates (mean, median, or learned values)
- Expected Value Maximization (EM): Iteratively estimate missing values, learn parameters
- Marginalization: Sum over possible values of missing variables
- Default Values: Assume default state unless evidence suggests otherwise
EM Algorithm
Iterative algorithm for learning with missing data:
E-step: Estimate missing data using current modelM-step: Update model parameters using complete data (including estimates)Repeat until convergenceGuaranteed to improve likelihood each iteration; converges to local maximumApplications in Expert Systems
Medical Diagnosis
Symptom → diagnosis reasoning under uncertainty. Patient symptoms are imperfect indicators; multiple diseases explain same symptoms. Bayesian networks model symptom-disease relationships; inference computes posterior disease probabilities.
Troubleshooting and Fault Diagnosis
Observable faults may have multiple causes; multiple symptoms observed. Inference determines most likely cause(s) for observed symptoms.
Financial Risk Assessment
Loan approval: multiple factors (credit score, income, debt ratio) predict default probability. Models incorporate uncertainty in feature values and prediction relationships.
Sensor Data Fusion
Combine noisy sensor readings from multiple sources. Bayesian fusion accounts for sensor reliability, correlations, computes best estimate of true state.
Comparison of Approaches
| Approach | Strength | Weakness |
|---|---|---|
| Confidence Factors | Simple, fast, intuitive | Ad-hoc, no formal justification, inflexible |
| Probability Theory | Well-founded, flexible, powerful | Requires probability estimates, computationally expensive |
| Fuzzy Logic | Models vagueness, interpretable | Not formal probability, limited learning |
| Dempster-Shafer | Models ignorance, combines conflicting evidence | Complex, less intuitive, computationally expensive |
No single best approach; choice depends on problem domain, available knowledge, required transparency.
References
- Pearl, J. "Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference." Morgan Kaufmann, 1988.
- Shafer, G. "A Mathematical Theory of Evidence." Princeton University Press, 1976.
- Russell, S. J., and Norvig, P. "Artificial Intelligence: A Modern Approach." Prentice Hall, 3rd edition, 2009.
- Dempster, A. P. "Upper and Lower Probabilities Induced by a Multivalued Mapping." Annals of Mathematical Statistics, vol. 38, 1967, pp. 325-339.
- Giarratano, J., and Riley, G. "Expert Systems: Principles and Programming." Course Technology, 4th edition, 2004.