Benchmark Hub

The Quantiles Benchmark Hub is a library of evaluations and metrics designed to reveal how AI models behave, especially in healthcare contexts. Rather than optimizing for a single score, these benchmarks are task-focused and probe distinct dimensions of model behavior, including reasoning, factual accuracy, hallucinations, calibration, robustness, and clinical safety.

What the Hub is for

The Benchmark Hub serves as a centralized reference for:

  • Curated descriptions of widely used and emerging AI evaluation benchmarks
  • Clear explanations of what each benchmark measures and how it is typically used
  • The strengths and limitations of commonly cited benchmarks
  • A common reference point for technical and clinical stakeholders