data
Data Lineage
Every benchmark, dataset, and config is version-tracked and controlled. Trace evaluation results back to the data, model, and parameters.
Version, diff, and trace every dataset, model, and evaluation across your development lifecycle, down to each individual sample.
Quantiles provides immutable lineage, versioned datasets, and transparent metrics so you can trust every result, from baseline to benchmark.
data
Every benchmark, dataset, and config is version-tracked and controlled. Trace evaluation results back to the data, model, and parameters.
evaluation
Capture hyperparameters, dependencies, and environment details per run. Compare variants with structured diffs.
evaluation
Every run records complete provenance—data sources, config, code, and outputs—so results are easy to reproduce and inspect.
evaluation
Monitors flag distribution shifts, cohort skew, and fairness issues across time, code, configuration, and datasets.
security
Immutable run records for compliance. Export benchmark, evaluation, or per-sample lineage to JSON or PDF.
integration
Instrument evaluations from code, notebooks, or CI/CD. Query artifacts and lineage with a modern Python API.
Evaluation completed Dec 3, 2025
COMPLETE
| Benchmark | Prompt A | Prompt B |
|---|---|---|
| Hash | 7f82d90d | b9e05a4c |
| Accuracy | 0.86 | 0.93 |
| F1 | 0.82 | 0.91 |
| Inference | 45ms | 32ms |
Review all evaluations
Quantify performance deltas across models, versions, datasets and more to guide model selection, optimization, and tuning.
Keep a transparent, complete record of data and model changes for auditing and documentation workflows.
Datamixes, configs, hyperparameters, code, and prompt templates are all hashed and timestamped.
Metrics, charts, and summaries stored with fingerprints and unique run IDs.
One command re-creates and re-runs any evaluation.