The Quantiles configuration file
Quantiles looks for a quantiles.toml or .quantiles.toml file in the current working directory to configure how benchmarks and custom evaluations are executed.
Only one of the two filenames can exist in the same directory. If both exist, the CLI will exit with an error.
When to use a config file
You need a config file when you want to do any of the following:
- Override defaults for a built-in benchmark, such as model or sample limit
- Define a custom evaluation that runs your own Python code
- Resume custom evaluations later with
qt resume
You do not need a config file to run built-in benchmarks out of the box, such as qt run pubmedqa and qt run simpleqa-verified.
Config file structure
Every benchmark lives under its own [benchmarks.<eval_name>] section. To run that benchmark, pass <eval_name> to the qt run command:
qt run <eval_name>Override built-in benchmark defaults
By default, built-in benchmarks use the default demo model, which simply generates random text, so it should only be used for basic prototyping. However, you can customize built-in benchmark execution parameters, including dataset sampling behavior and model selection. For example, you can limit the number of samples evaluated and specify the provider and model used for inference (such as OpenAI or Anthropic).
# This config block customizes the built-in PubMedQA benchmark.
#
# You could customize other built-in benchmarks, such as
# simpleqa-verified, with a similar block.
[benchmarks.pubmedqa]
# `type` defaults to "builtin", so you can omit this line
# if you want.
type = "builtin"
# Restrict the benchmark to run only the first 50 of the
# 1000 samples in the PubMedQA benchmark
samples = 50
# Use OpenAI's GPT 5.4-nano model instead of the built-in
# demo model. This requires an OPENAI_API_KEY environment
# variable, and OpenAI will charge you for usage.
model = "openai:gpt-5.4-nano"When you run qt run pubmedqa with this in your config file, the CLI will detect it and apply the above customizations to the PubMedQA built-in benchmark.
Customizing the model
If you set the model key in your config file, you’ll need to set an environment variable with the API key for your provider prefix, followed by the provider’s model name you’d like to use. Supported prefixes and their API key environment variables are listed below:
| Prefix | API Key Environment Variables | Example |
|---|---|---|
openai: | OPENAI_API_KEY | OpenAI quantiles.toml |
anthropic: | ANTHROPIC_API_KEY | Anthropic quantiles.toml |
gemini: | GEMINI_API_KEY | Gemini quantiles.toml |
cloudflare_ai_gateway: | CLOUDFLARE_API_KEY, CLOUDFLARE_ACCOUNT_ID and CLOUDFLARE_GATEWAY_ID | Cloudflare AI Gateway quantiles.toml |
Cloudflare AI Gateway models
The Cloudflare AI Gateway integration allows Quantiles to access a broad catalog of open-source models. Some of the popular LLMs offered by this provider include the following:
See the Cloudflare Workers AI model catalog for the complete list and latest changes.
Define a Custom Evaluation
To create a custom code evaluation, you must specify, in your quantiles.toml/.quantiles.toml configuration file, how qt run my-eval executes your code:
# This config block specifies the custom "my-eval" custom code
# evaluation.
[benchmarks.my-eval]
# `type` must be set to "custom_code" here
type = "custom_code"
# You must specify the command, in a list, that `qt run ...` will
# execute to run your custom code evaluation. For this command,
# my_eval.py should be built with the Quantiles Python SDK.
command = ["uv", "run", "my_eval.py"]Using the above configuration, the qt run my-eval command executes uv run my_eval.py and records the results of the run in its local database.
Pass Input to a Custom Evaluation
You can also pass structured input to your custom eval code. You can specify the input in your quantiles.toml configuration file, a qt run --input flag, or both. The contents are merged, parsed as JSON, and passed to your eval code. Below is how you’d extend your quantiles.toml file to pass input data to your eval:
# We specified
[benchmarks.my-eval]
type = "custom_code"
command = ["uv", "run", "my_eval.py"]
input = { dataset = "my_data.jsonl", max_samples = 100 }And below is how you’d extend configured inputs by passing JSON data in the qt run --input flag:
qt run my-eval --input '{"other_data": "other_data_val"}'If you pass the same key in the
--inputflag as in the config file’sinputkey, the value from the--inputflag wins and a warning is printed.
After configuration file and command-line inputs are merged and parsed, they are passed to your custom eval code as JSON in the QUANTILES_INPUT environment variable. The Python SDK automatically parses the contents of this environment variable and passes it to your workflow as a dictionary:
{
"dataset": "my_data.jsonl",
"samples": 100
}qt resume and the Config File
When you run qt resume "$RUN_ID", the CLI reuses the same configuration from that run, including input, from the database. It also re-reads the command from the config file. This setup means the following:
- You do not need to re-submit input when resuming a previous run.
- If you update the
commandbetweenqt runandqt resume, the resumed run uses the updated command, but retains theinputvalues used in the original run. - If a
custom_code’s config section is removed after aqt run, resuming that run will fail with a clear error.
See Restart and Resume Runs for the full recovery workflow.
Full Reference
For detailed information on every config field, model naming, validation rules, and complete examples, see the Quantiles Configuration reference on GitHub.