Observability infrastructure for healthcare-grade AI

Version, diff, and trace every dataset, model, and evaluation across your development lifecycle, down to each individual sample.

Purpose-built for trustworthy AI

Quantiles provides immutable lineage, versioned datasets, and transparent metrics so you can trust every result, from baseline to benchmark.

data

Data Lineage

Every benchmark, dataset, and config is version-tracked and controlled. Trace evaluation results back to the data, model, and parameters.

evaluation

Model Run Tracking

Capture hyperparameters, dependencies, and environment details per run. Compare variants with structured diffs.

evaluation

Benchmark Reproducibility

Every run records complete provenance—data sources, config, code, and outputs—so results are easy to reproduce and inspect.

evaluation

Drift & Bias Detection

Monitors flag distribution shifts, cohort skew, and fairness issues across time, code, configuration, and datasets.

security

Governance & Audit Logs

Immutable run records for compliance. Export benchmark, evaluation, or per-sample lineage to JSON or PDF.

integration

API & SDK Access

Instrument evaluations from code, notebooks, or CI/CD. Query artifacts and lineage with a modern Python API.

Model: CodeBlue

Evaluation completed Dec 3, 2025

COMPLETE

BenchmarkPrompt APrompt B
Hash7f82d90db9e05a4c
Accuracy0.860.93
F10.820.91
Inference45ms32ms

Review all evaluations

Understand model behavior

Quantify performance deltas across models, versions, datasets and more to guide model selection, optimization, and tuning.

  • Measure the effect of hyperparameters and prompts on model performance
  • Correlate changes in metrics with model, data, or pipeline updates
  • Benchmark models across time and environments
TraceabilityTraceabilityTraceabilityTraceability

Keep a transparent, complete record of data and model changes for auditing and documentation workflows.

Versioned inputs

Datamixes, configs, hyperparameters, code, and prompt templates are all hashed and timestamped.

  1. Datasetdatamix v1.3
  2. Config & Hyperparamsconfig.yaml
  3. Hashed & Timestampedsha: 9a2bdf7, 2025-01-14 12:42

Immutable outputs

Metrics, charts, and summaries stored with fingerprints and unique run IDs.

Saved run

Reproducible pipelines

One command re-creates and re-runs any evaluation.

$ quantiles eval --reproduce run_42
eval v1.7.2datamix_2024Q3prompt=eval_v2