Uncertainty Reasoning | What's Your IQ

Introduction

Real-world reasoning rarely operates with certainty. Sensors are noisy, information incomplete, expert knowledge imprecise. Uncertainty reasoning addresses how intelligent systems make decisions and draw conclusions despite incomplete, imperfect information.

Expert systems must handle uncertainty to be practical. A medical diagnosis system might conclude "diabetes" with confidence 0.8 and "thyroid disease" with confidence 0.3, recommending further testing rather than definitive diagnosis. Pure logical systems (true/false) cannot capture this.

Multiple approaches model uncertainty: probability theory, confidence factors, fuzzy logic, belief functions. Each has strengths and limitations. Modern systems often combine approaches: Bayesian networks for probabilistic reasoning, fuzzy logic for vague concepts, confidence factors for rule-based reasoning.

"Dealing with uncertainty is not a weakness of intelligent systems,it's a necessity. The world is uncertain, and systems must reason about it effectively." -- Judea Pearl, UCLA

Sources of Uncertainty

Incomplete Information

Missing data. Example: patient hasn't taken specific lab test yet. System must reason with available information, adjust conclusions as more data arrives.

Imperfect Observations

Sensor noise, measurement error. Temperature sensor reads 39.2°C but actual temperature uncertain (±0.5°C). Must account for measurement uncertainty.

Probabilistic Events

Event outcome genuinely random or unpredictable. Patient recovers with unknown probability. System models via probability distributions.

Conflicting Evidence

Multiple sources provide contradictory information. One expert says "infection likely," another says "unlikely." System must weigh evidence, combine conclusions.

Imprecise Terminology

Vague concepts without crisp boundaries. "Fever" is fuzzy (continuous spectrum, not binary). Fuzzy logic handles this; Bayesian approaches discretize (fever present/absent).

Approximations and Assumptions

Models simplify reality. Rule "if A then B" assumes unchanging relationship, ignores exceptions. Confidence factors model residual uncertainty.

Confidence Factors and Certainty

Concept

Associate each fact with confidence factor (certainty measure): number between -1 and 1, or 0 and 1. High value = high confidence; low value = low confidence.

Fact: patient_has_diseaseConfidence: 0.75 (fairly sure, not certain)Fact: patient_recoveredConfidence: 0.2 (doubtful, weak evidence)

Combining Confidence

AND operation: Product of confidences. Both must be confident.

CF(A AND B) = CF(A) * CF(B)CF(fever=0.8, infection=0.9) = 0.8 * 0.9 = 0.72

OR operation: Maximum of confidences (at least one true).

CF(A OR B) = max(CF(A), CF(B))CF(fever=0.8 OR headache=0.6) = 0.8

Rule Chaining

Rule: IF symptoms=fever THEN disease=infection (CF=0.9)Fact: symptoms=fever (CF=0.8)Conclusion: disease=infection with CF = 0.8 * 0.9 = 0.72

Confidence propagates through rule chain; compound uncertainty accumulates.

Advantages and Limitations

Advantages: Simple, intuitive, requires no probability distributions, efficient computation.

Limitations: No formal justification; CF combinations ad-hoc; cannot model dependencies; lacks flexibility for complex scenarios.

Probability Theory Foundations

Basic Concepts

Probability P(A): Number between 0 and 1; P(A)=0 means impossible, P(A)=1 means certain.

P(patient_has_infection) = 0.7(70% chance patient has infection)

Joint Probability

P(A, B) = probability of both A and B trueP(fever, infection) = P(fever) * P(infection | fever)

Conditional Probability

P(A|B) = probability of A given B known true.

P(infection | fever) = 0.8(80% of patients with fever have infection)

Bayes' Theorem

P(A|B) = P(B|A) * P(A) / P(B)Enables reasoning backward: from effects to causesP(infection | fever) = P(fever | infection) * P(infection) / P(fever)

Law of Total Probability

P(B) = sum over all A: P(B|A) * P(A)Marginalizes over all possible states of A

Bayesian Networks

Structure

Directed acyclic graph representing probabilistic dependencies. Nodes = variables; edges = conditional dependencies. Encodes joint probability distribution compactly.

Example: Medical diagnosis Disease -> Symptom1 Disease -> Symptom2 Symptom1 -> Test_ResultDisease is parent; symptoms depend on diseaseSymptoms cause test results

Conditional Probability Tables (CPT)

Each node has CPT specifying P(node | parents). For root nodes (no parents), table is prior probability.

P(infection): P(infection=true) = 0.01 (1% prior probability) P(infection=false) = 0.99P(fever | infection): P(fever | infection=true) = 0.9 (90% with infection) P(fever | infection=false) = 0.1 (10% without infection)

Inference

Given evidence (observed variables), compute probability of unobserved variables. Example: given fever observed, probability of infection?

P(infection | fever_observed) = P(fever | infection) * P(infection) / P(fever) = 0.9 * 0.01 / 0.09 = 0.1 (approximately)

Exact Inference Algorithms

Variable Elimination: Sum out variables in order, reducing computation
Belief Propagation: Message passing in trees for efficient inference
Junction Tree Algorithm: Converts general graphs to trees, applies belief propagation

Approximate Inference

Exact inference intractable for large networks. Approximate methods:

Markov Chain Monte Carlo (MCMC): Sample from posterior distribution, estimate probabilities from samples
Gibbs Sampling: MCMC variant; iteratively sample each variable
Variational Inference: Approximate posterior with simpler distribution

Dempster-Shafer Theory

Distinction from Probability

Represents uncertainty via belief and plausibility functions instead of probability. Distinguishes between lack of belief and disbelief.

Probability: P(A) = 0.5 (could mean no knowledge or perfectly balanced evidence)Belief: Bel(A) = 0.3, Pl(A) = 0.8 (some evidence for A, but much uncertainty)Interval [0.3, 0.8] captures evidential uncertainty

Basic Concepts

Mass Function m(A): Probability mass assigned to set A. Not probability of A, but degree of support for A.

m(infection) = 0.3 (30% of evidence supports infection)m(not_infection) = 0.2 (20% supports non-infection)m({infection, not_infection}) = 0.5 (50% is ambiguous, supports either)

Belief: Bel(A) = sum of m(B) for all B supporting A

Plausibility: Pl(A) = 1 - Bel(NOT A)

Combining Evidence

Dempster's combination rule merges evidence from multiple sources. Resolves conflicts by renormalizing.

Expert 1 says m(infection) = 0.8, m(not_infection) = 0.2Expert 2 says m(infection) = 0.6, m(not_infection) = 0.4Combined: m(infection) proportional to 0.8*0.6 + (crossterms)Result: higher certainty for infection (both experts agree)

Advantages/Disadvantages

Advantages: Models ignorance/uncertainty distinction better than probability, handles conflicting evidence, generalizes probability theory.

Disadvantages: Complex math, computationally expensive for many hypotheses, less intuitive than probability, less research/tools available.

Belief Propagation and Inference

Message Passing

In tree-structured networks, send messages (conditional probabilities) between nodes. Each node combines received messages to compute posterior probability.

Message from parent to child: conditional probabilities given parent valuesMessage from child to parent: likelihood of observing child evidenceNode combines all messages: P(node | all_evidence) ∝ P(evidence_below) * P(evidence_above)

Sum-Product Algorithm

General algorithm for exact inference on trees. Messages computed as sum and product operations; combined at nodes to compute marginal probabilities.

Loopy Belief Propagation

Extends message passing to networks with cycles. Messages iteratively updated; algorithm converges (approximately) to true posteriors. Useful for many realistic networks.

Approximate Reasoning Methods

Stochastic Simulation

Sample from posterior distribution. Each sample represents hypothesis; distribution of samples approximates true posterior.

Forward Sampling

Sample variables in topological order. Forward sample reflects prior distribution; weight samples by evidence likelihood to approximate posterior.

Likelihood Weighting

Forward sample with evidence variables fixed to observed values. Weight each sample by likelihood of evidence given sample values.

Markov Chain Monte Carlo

Construct Markov chain whose stationary distribution is posterior. Sample from chain; samples approximate posterior. Converges to true distribution asymptotically.

Handling Incomplete and Missing Data

Strategies

Imputation: Fill missing values with estimates (mean, median, or learned values)
Expected Value Maximization (EM): Iteratively estimate missing values, learn parameters
Marginalization: Sum over possible values of missing variables
Default Values: Assume default state unless evidence suggests otherwise

EM Algorithm

Iterative algorithm for learning with missing data:

E-step: Estimate missing data using current modelM-step: Update model parameters using complete data (including estimates)Repeat until convergenceGuaranteed to improve likelihood each iteration; converges to local maximum

Applications in Expert Systems

Medical Diagnosis

Symptom → diagnosis reasoning under uncertainty. Patient symptoms are imperfect indicators; multiple diseases explain same symptoms. Bayesian networks model symptom-disease relationships; inference computes posterior disease probabilities.

Troubleshooting and Fault Diagnosis

Observable faults may have multiple causes; multiple symptoms observed. Inference determines most likely cause(s) for observed symptoms.

Financial Risk Assessment

Loan approval: multiple factors (credit score, income, debt ratio) predict default probability. Models incorporate uncertainty in feature values and prediction relationships.

Sensor Data Fusion

Combine noisy sensor readings from multiple sources. Bayesian fusion accounts for sensor reliability, correlations, computes best estimate of true state.

Comparison of Approaches

Approach	Strength	Weakness
Confidence Factors	Simple, fast, intuitive	Ad-hoc, no formal justification, inflexible
Probability Theory	Well-founded, flexible, powerful	Requires probability estimates, computationally expensive
Fuzzy Logic	Models vagueness, interpretable	Not formal probability, limited learning
Dempster-Shafer	Models ignorance, combines conflicting evidence	Complex, less intuitive, computationally expensive

No single best approach; choice depends on problem domain, available knowledge, required transparency.

References

Pearl, J. "Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference." Morgan Kaufmann, 1988.
Shafer, G. "A Mathematical Theory of Evidence." Princeton University Press, 1976.
Russell, S. J., and Norvig, P. "Artificial Intelligence: A Modern Approach." Prentice Hall, 3rd edition, 2009.
Dempster, A. P. "Upper and Lower Probabilities Induced by a Multivalued Mapping." Annals of Mathematical Statistics, vol. 38, 1967, pp. 325-339.
Giarratano, J., and Riley, G. "Expert Systems: Principles and Programming." Course Technology, 4th edition, 2004.