Definition
Precision
Precision: ratio of true positive predictions to all positive predictions. Measures correctness of positive identifications. Also called positive predictive value.
Recall
Recall: ratio of true positive predictions to all actual positives. Measures completeness of positive identifications. Also called sensitivity or true positive rate.
Context
Used primarily in classification tasks, especially when class imbalance exists or costs of false positives and false negatives differ.
Confusion Matrix
Components
Four outcomes: True Positive (TP), False Positive (FP), True Negative (TN), False Negative (FN). Basis for precision and recall calculation.
Structure
Table showing actual vs predicted classifications in binary or multi-class settings.
Example Table
| Actual \ Predicted | Positive | Negative |
|---|---|---|
| Positive | TP | FN |
| Negative | FP | TN |
Precision
Definition
Precision = TP / (TP + FP). Indicates proportion of positive identifications that were actually correct.
Significance
High precision: fewer false positives. Critical in spam detection, disease diagnosis where false alarms have high cost.
Calculation
Requires true positive and false positive counts from predictions.
Recall
Definition
Recall = TP / (TP + FN). Indicates proportion of actual positives correctly identified.
Significance
High recall: fewer false negatives. Important in fraud detection, medical screening where missing positives is costly.
Calculation
Requires true positive and false negative counts from dataset ground truth.
Calculation Formulas
Precision Formula
Precision = TP / (TP + FP)Recall Formula
Recall = TP / (TP + FN)Additional Metrics
Specificity = TN / (TN + FP). Accuracy = (TP + TN) / (TP + FP + TN + FN).
Interpretation
High Precision
Model rarely labels negatives as positives. Conservative positive prediction strategy.
High Recall
Model captures most positives. Aggressive positive prediction strategy.
Balanced View
Precision and recall together provide holistic view of positive prediction performance.
Precision-Recall Trade-off
Inverse Relationship
Increasing precision often reduces recall and vice versa. Tradeoff due to threshold settings on prediction scores.
Threshold Adjustment
Changing classification threshold alters balance: higher thresholds increase precision, lower thresholds increase recall.
Use Case Dependency
Optimal balance depends on problem context and cost of false positives vs false negatives.
Precision-Recall Curve
Definition
Graph plotting precision vs recall at different classification thresholds.
Interpretation
Curve shape indicates model performance across thresholds. Area under curve (AUC-PR) quantifies overall effectiveness.
Comparison
Useful for imbalanced datasets where ROC curves may be misleading.
F1 Score
Definition
Harmonic mean of precision and recall. Balances both metrics into single value.
Formula
F1 = 2 * (Precision * Recall) / (Precision + Recall)Use Cases
Preferred when balance between precision and recall is needed or classes are imbalanced.
Applications
Information Retrieval
Evaluating relevance of retrieved documents. Precision: relevant docs retrieved. Recall: retrieved relevant docs.
Medical Diagnosis
Detecting disease presence. High recall reduces missed diagnoses; high precision avoids false alarms.
Spam Detection
Filtering unwanted emails. High precision avoids misclassifying legitimate emails; high recall captures most spam.
Fraud Detection
Flagging fraudulent transactions. Recall critical to catch fraud; precision limits false accusations.
Limitations
Class Imbalance Sensitivity
Precision and recall can be misleading if dataset heavily imbalanced without context.
Ignores True Negatives
Metrics focus on positive class performance; true negatives excluded from calculations.
Threshold Dependency
Values depend on classification threshold; single values may not reflect full model behavior.
Multi-class Complexity
Extending precision/recall to multi-class requires averaging methods (macro, micro, weighted), complicating interpretation.
Best Practices
Use Alongside Other Metrics
Complement with accuracy, specificity, ROC-AUC for comprehensive evaluation.
Threshold Tuning
Optimize classification threshold based on domain-specific costs of errors.
Report Precision-Recall Curve
Visualize trade-offs across thresholds instead of single-point estimates.
Consider Dataset Balance
Apply resampling or weighting to address class imbalance before metric computation.
References
- J. Davis, M. Goadrich, "The Relationship Between Precision-Recall and ROC Curves," ICML, vol. 27, 2006, pp. 233-240.
- T. Fawcett, "An introduction to ROC analysis," Pattern Recognition Letters, vol. 27, 2006, pp. 861-874.
- P. Flach, "The Geometry of ROC Space: Understanding Machine Learning Metrics through ROC Isometrics," Proceedings of ICML, 2003, pp. 194-201.
- S. Saito, M. Rehmsmeier, "The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets," PLOS ONE, vol. 10, 2015, e0118432.
- C. Manning, P. Raghavan, H. Schütze, "Introduction to Information Retrieval," Cambridge University Press, 2008, pp. 101-115.