Metrics
Sensitivity & Specificity
Complementary classification metrics that measure true positive rate (sensitivity) and true negative rate (specificity).
Overview
Sensitivity (also called recall or true positive rate) measures the fraction of actual positives that are correctly identified. Specificity (true negative rate) measures the fraction of actual negatives that are correctly identified. Together, they summarize different failure modes and are often used in clinical evaluation to balance missed cases vs. false alarms.
Sensitivity and Specificity are metrics and require thresholded (discrete) predictions paired with ground-truth labels. Threshold choices for converting probabilities into labels have a direct impact on both.
Input Format
predictions: array of thresholded predicted labelslabels: array of ground-truth labels
Example:
{
"predictions": [1, 1, 1, 0, 0, 0, 1, 0],
"labels": [1, 1, 1, 0, 0, 0, 0, 1]
}Output Format
Numeric sensitivity and specificity aggregated over the dataset. Optional outputs may include confusion-matrix counts.
{
"sensitivity": 0.88,
"specificity": 0.8
}Metrics
- Sensitivity: true positive rate.
- Specificity: true negative rate.
- Optional: report the full confusion matrix and threshold used.
Known Limitations
- Sensitive to the decision threshold, with changes in the cutoff shifting both metrics.
- Does not summarize performance across thresholds (use AUROC or AUPRC when ranking matters).
- Does not reflect probability calibration or confidence quality.
- Can hide subgroup performance differences without stratified reporting.
Versioning and Provenance
Implementations vary by label encoding, threshold selection, and how ties or uncertain outputs are handled. For reproducibility, document the threshold, label mapping, and implementation (e.g., scikit-learn's recall_score for sensitivity and confusion_matrix for specificity).
References
Fawcett, 2006. An introduction to ROC analysis.
Paper: https://doi.org/10.1016/j.patrec.2005.10.010
Implementation (scikit-learn): https://scikit-learn.org/stable/modules/model_evaluation.html#classification-metrics
Related Metrics
PPV & NPV
Positive predictive value (precision) and negative predictive value, measuring correctness for predicted positives and negatives.
F1 Score
Balanced metric that summarizes precision and recall into one harmonic-mean score for classification performance.
MCC
Correlation-based metric that accounts for true/false positives and negatives, robust to class imbalance.