Metrics

Area Under the Precision-Recall Curve (AUPRC)

Summarizes the precision-recall (PR) tradeoff across thresholds, especially informative for imbalanced datasets.

Overview

AUPRC measures how well a model balances precision and recall across thresholds by summarizing the PR curve. It is often more informative than AUROC when the positive class is rare and performance on the positive class is critical.

AUPRC is a metric rather than a benchmark and requires probabilistic predictions (or scores) paired with ground-truth labels. Higher is better, and the baseline is the positive class prevalence.

Input Format

predictions: array of numbers (model scores or probabilities, higher values indicate stronger confidence in the positive class)
labels: array of binary ground-truth labels (0 or 1)

Example:

{
  "predictions": [0.64, 0.23, 0.89, ...],
  "labels": [0, 1, 1, ...]
}

Output Format

A single numeric AUPRC aggregated over the dataset.

{
  "auprc": 0.73
}

Metrics

AUPRC: area under the PR curve summarizing the tradeoff between precision and recall across thresholds. The baseline equals the prevalence of the positive class, so scores should be interpreted relative to that rate.
Optional: PR curve points, precision/recall at selected thresholds.

Known Limitations

Sensitive to class prevalence, making scores not directly comparable across datasets with different base rates.
Does not capture calibration or probability quality.
Reported values depend on PR curve construction and integration strategy.
Aggregate AUPRC can obscure performance at clinically or operationally relevant thresholds.

Versioning and Provenance

AUPRC implementations vary in interpolation and averaging strategies. For reproducibility, document the implementation (e.g., scikit-learn's average_precision_score), label encoding, and score type used.

References

Davis and Goadrich, 2006. The Relationship Between Precision-Recall and ROC Curves.

Paper: https://dl.acm.org/doi/10.1145/1143844.1143874

Implementation: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.average_precision_score.html

Related Metrics

AUROC

Ranking & Discrimination

Summarizes ROC curve performance across thresholds by measuring ranking quality between positives and negatives.

Predicted scores + ground-truth labelsBinary classification (ranking)AUROC