Metric
Area Under the Receiver Operating Characteristic Curve (AUROC)
Threshold-free ranking metric that summarizes the tradeoff between true positive rate and false positive rate.
Overview
AUROC measures how well a model ranks positive examples above negative ones across all possible thresholds. It is widely used for binary classification tasks and summarizes the ROC curve, which plots true positive rate (sensitivity) against false positive rate (1 - specificity).
AUROC is a metric rather than a benchmark and requires probabilistic predictions (or scores) paired with ground-truth labels.
Input Format
predictions: array of numbers (model scores or probabilities; higher values indicate stronger confidence in the positive class)labels: array of binary ground-truth labels (0 or 1)
Example:
{
"predictions": [0.72, 0.31, 0.89, ...],
"labels": [1, 0, 1, ...]
}Output Format
A single numeric AUROC aggregated over the dataset. Optional outputs may include the ROC curve points for plotting.
{
"auroc": 0.91
}Metrics
- AUROC: area under the ROC curve that measures how well a model ranks positive examples above negative ones across thresholds, with values ranging from 0.5 for random ranking to 1.0 for perfect separation.
- Optional: ROC curve points, TPR/FPR at selected thresholds.
Known Limitations
- Can appear overly optimistic in highly imbalanced datasets where false positives are rare.
- Does not reflect calibration or absolute probability quality.
- Performance at clinically relevant thresholds can differ from the aggregate AUROC.
- Not directly comparable across datasets with different base rates without context.
Versioning and Provenance
AUROC implementations vary in interpolation strategy and handling of tied scores. For reproducibility, document the implementation (e.g., scikit-learn's roc_auc_score), label encoding, and the score type used.
References
Hanley and McNeil, 1982. The meaning and use of the area under a ROC curve.
Paper: https://doi.org/10.1148/radiology.143.1.7063747
Implementation: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html