Metrics
Positive Predictive Value (PPV) & Negative Predictive Value (NPV)
Positive predictive value (precision) and negative predictive value summarize how often predicted positives and negatives are correct.
Overview
PPV and NPV are complementary thresholded metrics derived from the confusion matrix. PPV answers: “when the model predicts positive, how often is it right?” NPV answers: “when the model predicts negative, how often is it right?” PPV is identical to precision for positive predictions, while NPV measures the correctness of negative predictions.
These metrics depend on the decision threshold and the base rate of positives in the dataset. PPV prioritizes minimizing false positives among flagged cases, while NPV prioritizes minimizing false negatives among unflagged cases. They are commonly used in clinical screening, triage, and risk flagging where the cost of false alarms or missed cases can differ.
Input Format
predictions: array of thresholded predicted labelslabels: array of ground-truth labels
Example:
{
"predictions": [1, 1, 1, 0, 0, 0, 1, 0],
"labels": [1, 0, 1, 0, 0, 0, 0, 1]
}Output Format
Numeric PPV and NPV aggregated over the dataset. Optional outputs may include confusion-matrix counts and the threshold used.
{
"ppv": 0.75,
"npv": 0.83
}Metrics
- PPV (precision): proportion of predicted positives that are true positives.
- NPV: proportion of predicted negatives that are true negatives.
- Optional: report prevalence () and the decision threshold to contextualize PPV/NPV.
Known Limitations
- Highly sensitive to class prevalence; values can shift when the base rate changes.
- Threshold-dependent. Optimizing PPV typically reduces NPV and vice versa.
- Do not summarize ranking quality across thresholds (consider AUROC or AUPRC).
- Can obscure subgroup disparities without stratified reporting.
- Published PPV/NPV are only valid for populations with similar prevalence and case mix; they may not transport to other hospitals, regions, or time periods without recalibration.
Versioning and Provenance
PPV/NPV depend on label encoding, positive-class definition, and how ambiguous predictions are thresholded. Monitor PPV/NPV prospectively in deployment to detect drift in prevalence, case mix, or model performance, and recalibrate thresholds or models as needed. For reproducibility, document the label mapping, threshold choice, and implementation (e.g., scikit-learn's precision_score for PPV and confusion_matrix to compute NPV from TN/FN).
References
Altman & Bland, 1994. Diagnostic tests 2: predictive values.
Paper: https://pmc.ncbi.nlm.nih.gov/articles/PMC2540558/
Implementation: https://scikit-learn.org/stable/modules/model_evaluation.html#classification-metrics
Related Metrics
Sensitivity & Specificity
Companion metrics measuring true positive rate (sensitivity) and true negative rate (specificity).
F1 Score
Balanced metric that summarizes precision and recall into one harmonic-mean score for classification performance.
MCC
Correlation-based metric that accounts for true/false positives and negatives, robust to class imbalance.