Definition

Precision

Precision: ratio of true positive predictions to all positive predictions. Measures correctness of positive identifications. Also called positive predictive value.

Recall

Recall: ratio of true positive predictions to all actual positives. Measures completeness of positive identifications. Also called sensitivity or true positive rate.

Context

Used primarily in classification tasks, especially when class imbalance exists or costs of false positives and false negatives differ.

Confusion Matrix

Components

Four outcomes: True Positive (TP), False Positive (FP), True Negative (TN), False Negative (FN). Basis for precision and recall calculation.

Structure

Table showing actual vs predicted classifications in binary or multi-class settings.

Example Table

Actual \ PredictedPositiveNegative
PositiveTPFN
NegativeFPTN

Precision

Definition

Precision = TP / (TP + FP). Indicates proportion of positive identifications that were actually correct.

Significance

High precision: fewer false positives. Critical in spam detection, disease diagnosis where false alarms have high cost.

Calculation

Requires true positive and false positive counts from predictions.

Recall

Definition

Recall = TP / (TP + FN). Indicates proportion of actual positives correctly identified.

Significance

High recall: fewer false negatives. Important in fraud detection, medical screening where missing positives is costly.

Calculation

Requires true positive and false negative counts from dataset ground truth.

Calculation Formulas

Precision Formula

Precision = TP / (TP + FP)

Recall Formula

Recall = TP / (TP + FN)

Additional Metrics

Specificity = TN / (TN + FP). Accuracy = (TP + TN) / (TP + FP + TN + FN).

Interpretation

High Precision

Model rarely labels negatives as positives. Conservative positive prediction strategy.

High Recall

Model captures most positives. Aggressive positive prediction strategy.

Balanced View

Precision and recall together provide holistic view of positive prediction performance.

Precision-Recall Trade-off

Inverse Relationship

Increasing precision often reduces recall and vice versa. Tradeoff due to threshold settings on prediction scores.

Threshold Adjustment

Changing classification threshold alters balance: higher thresholds increase precision, lower thresholds increase recall.

Use Case Dependency

Optimal balance depends on problem context and cost of false positives vs false negatives.

Precision-Recall Curve

Definition

Graph plotting precision vs recall at different classification thresholds.

Interpretation

Curve shape indicates model performance across thresholds. Area under curve (AUC-PR) quantifies overall effectiveness.

Comparison

Useful for imbalanced datasets where ROC curves may be misleading.

F1 Score

Definition

Harmonic mean of precision and recall. Balances both metrics into single value.

Formula

F1 = 2 * (Precision * Recall) / (Precision + Recall)

Use Cases

Preferred when balance between precision and recall is needed or classes are imbalanced.

Applications

Information Retrieval

Evaluating relevance of retrieved documents. Precision: relevant docs retrieved. Recall: retrieved relevant docs.

Medical Diagnosis

Detecting disease presence. High recall reduces missed diagnoses; high precision avoids false alarms.

Spam Detection

Filtering unwanted emails. High precision avoids misclassifying legitimate emails; high recall captures most spam.

Fraud Detection

Flagging fraudulent transactions. Recall critical to catch fraud; precision limits false accusations.

Limitations

Class Imbalance Sensitivity

Precision and recall can be misleading if dataset heavily imbalanced without context.

Ignores True Negatives

Metrics focus on positive class performance; true negatives excluded from calculations.

Threshold Dependency

Values depend on classification threshold; single values may not reflect full model behavior.

Multi-class Complexity

Extending precision/recall to multi-class requires averaging methods (macro, micro, weighted), complicating interpretation.

Best Practices

Use Alongside Other Metrics

Complement with accuracy, specificity, ROC-AUC for comprehensive evaluation.

Threshold Tuning

Optimize classification threshold based on domain-specific costs of errors.

Report Precision-Recall Curve

Visualize trade-offs across thresholds instead of single-point estimates.

Consider Dataset Balance

Apply resampling or weighting to address class imbalance before metric computation.

References

  • J. Davis, M. Goadrich, "The Relationship Between Precision-Recall and ROC Curves," ICML, vol. 27, 2006, pp. 233-240.
  • T. Fawcett, "An introduction to ROC analysis," Pattern Recognition Letters, vol. 27, 2006, pp. 861-874.
  • P. Flach, "The Geometry of ROC Space: Understanding Machine Learning Metrics through ROC Isometrics," Proceedings of ICML, 2003, pp. 194-201.
  • S. Saito, M. Rehmsmeier, "The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets," PLOS ONE, vol. 10, 2015, e0118432.
  • C. Manning, P. Raghavan, H. Schütze, "Introduction to Information Retrieval," Cambridge University Press, 2008, pp. 101-115.