Metrics

Positive Predictive Value (PPV) & Negative Predictive Value (NPV)

Positive predictive value (precision) and negative predictive value summarize how often predicted positives and negatives are correct.

Overview

PPV and NPV are complementary thresholded metrics derived from the confusion matrix. PPV answers: “when the model predicts positive, how often is it right?” NPV answers: “when the model predicts negative, how often is it right?” PPV is identical to precision for positive predictions, while NPV measures the correctness of negative predictions.

These metrics depend on the decision threshold and the base rate of positives in the dataset. PPV prioritizes minimizing false positives among flagged cases, while NPV prioritizes minimizing false negatives among unflagged cases. They are commonly used in clinical screening, triage, and risk flagging where the cost of false alarms or missed cases can differ.

Input Format

predictions: array of thresholded predicted labels
labels: array of ground-truth labels

Example:

{
  "predictions": [1, 1, 1, 0, 0, 0, 1, 0],
  "labels": [1, 0, 1, 0, 0, 0, 0, 1]
}

Output Format

Numeric PPV and NPV aggregated over the dataset. Optional outputs may include confusion-matrix counts and the threshold used.

{
  "ppv": 0.75,
  "npv": 0.83
}

Metrics

PPV (precision): proportion of predicted positives that are true positives.
$\text{PPV} = \frac{\text{TP}}{\text{TP} + \text{FP}}$
NPV: proportion of predicted negatives that are true negatives.
$\text{NPV} = \frac{\text{TN}}{\text{TN} + \text{FN}}$
Optional: report prevalence ( $(\text{TP} + \text{FN}) / N$ ) and the decision threshold to contextualize PPV/NPV.

Known Limitations

Highly sensitive to class prevalence; values can shift when the base rate changes.
Threshold-dependent. Optimizing PPV typically reduces NPV and vice versa.
Do not summarize ranking quality across thresholds (consider AUROC or AUPRC).
Can obscure subgroup disparities without stratified reporting.
Published PPV/NPV are only valid for populations with similar prevalence and case mix; they may not transport to other hospitals, regions, or time periods without recalibration.

Versioning and Provenance

PPV/NPV depend on label encoding, positive-class definition, and how ambiguous predictions are thresholded. Monitor PPV/NPV prospectively in deployment to detect drift in prevalence, case mix, or model performance, and recalibrate thresholds or models as needed. For reproducibility, document the label mapping, threshold choice, and implementation (e.g., scikit-learn's precision_score for PPV and confusion_matrix to compute NPV from TN/FN).