The Quantiles configuration file

Quantiles looks for a quantiles.toml or .quantiles.toml file in the current working directory to configure how benchmarks and custom evaluations are executed.

Only one of the two filenames can exist in the same directory. If both exist, the CLI will exit with an error.

When to use a config file

You need a config file when you want to do any of the following:

Override defaults for a built-in benchmark, such as model or sample limit
Define a custom evaluation that runs your own Python code
Resume custom evaluations later with qt resume

You do not need a config file to run built-in benchmarks out of the box, such as qt run pubmedqa and qt run simpleqa-verified.

Config file structure

Every benchmark lives under its own [benchmarks.<eval_name>] section. To run that benchmark, pass <eval_name> to the qt run command:


qt run <eval_name>

Override built-in benchmark defaults

By default, built-in benchmarks use the default demo model, which simply generates random text, so it should only be used for basic prototyping. However, you can customize built-in benchmark execution parameters, including dataset sampling behavior and model selection. For example, you can limit the number of samples evaluated and specify the provider and model used for inference (such as OpenAI or Anthropic).


# This config block customizes the built-in PubMedQA benchmark.
#
# You could customize other built-in benchmarks, such as
# simpleqa-verified, with a similar block.
[benchmarks.pubmedqa]
# `type` defaults to "builtin", so you can omit this line
# if you want.
type = "builtin"
# Restrict the benchmark to run only the first 50 of the
# 1000 samples in the PubMedQA benchmark
samples = 50
# Use OpenAI's GPT 5.4-nano model instead of the built-in
# demo model. This requires an OPENAI_API_KEY environment
# variable, and OpenAI will charge you for usage.
model = "openai:gpt-5.4-nano"

When you run qt run pubmedqa with this in your config file, the CLI will detect it and apply the above customizations to the PubMedQA built-in benchmark.

Customizing the model

If you set the model key in your config file, you’ll need to set an environment variable with the API key for your provider prefix, followed by the provider’s model name you’d like to use. Supported prefixes and their API key environment variables are listed below:

Prefix	API Key Environment Variables	Example
`openai:`	`OPENAI_API_KEY`	OpenAI `quantiles.toml`
`anthropic:`	`ANTHROPIC_API_KEY`	Anthropic `quantiles.toml`
`gemini:`	`GEMINI_API_KEY`	Gemini `quantiles.toml`
`cloudflare_ai_gateway:`	`CLOUDFLARE_API_KEY`, `CLOUDFLARE_ACCOUNT_ID` and `CLOUDFLARE_GATEWAY_ID`	Cloudflare AI Gateway `quantiles.toml`

Cloudflare AI Gateway models

The Cloudflare AI Gateway integration allows Quantiles to access a broad catalog of open-source models. Some of the popular LLMs offered by this provider include the following:

GPT-OSS (OpenAI)
Llama (Meta)
Mistral
Gemma (Google)
DeepSeek
Qwen
GLM 5.2 (Z.ai)

See the Cloudflare Workers AI model catalog for the complete list and latest changes.

Define a Custom Evaluation

To create a custom code evaluation, you must specify, in your quantiles.toml/.quantiles.toml configuration file, how qt run my-eval executes your code:


# This config block specifies the custom "my-eval" custom code
# evaluation.
[benchmarks.my-eval]
# `type` must be set to "custom_code" here
type = "custom_code"
# You must specify the command, in a list, that `qt run ...` will
# execute to run your custom code evaluation. For this command,
# my_eval.py should be built with the Quantiles Python SDK.
command = ["uv", "run", "my_eval.py"]

Using the above configuration, the qt run my-eval command executes uv run my_eval.py and records the results of the run in its local database.

Pass Input to a Custom Evaluation

You can also pass structured input to your custom eval code. You can specify the input in your quantiles.toml configuration file, a qt run --input flag, or both. The contents are merged, parsed as JSON, and passed to your eval code. Below is how you’d extend your quantiles.toml file to pass input data to your eval:


# We specified
[benchmarks.my-eval]
type = "custom_code"
command = ["uv", "run", "my_eval.py"]
input = { dataset = "my_data.jsonl", max_samples = 100 }

And below is how you’d extend configured inputs by passing JSON data in the qt run --input flag:


qt run my-eval --input '{"other_data": "other_data_val"}'

If you pass the same key in the --input flag as in the config file’s input key, the value from the --input flag wins and a warning is printed.

After configuration file and command-line inputs are merged and parsed, they are passed to your custom eval code as JSON in the QUANTILES_INPUT environment variable. The Python SDK automatically parses the contents of this environment variable and passes it to your workflow as a dictionary:


{
  "dataset": "my_data.jsonl",
  "samples": 100
}

`qt resume` and the Config File

When you run qt resume "$RUN_ID", the CLI reuses the same configuration from that run, including input, from the database. It also re-reads the command from the config file. This setup means the following:

You do not need to re-submit input when resuming a previous run.
If you update the command between qt run and qt resume, the resumed run uses the updated command, but retains the input values used in the original run.
If a custom_code’s config section is removed after a qt run, resuming that run will fail with a clear error.

See Restart and Resume Runs for the full recovery workflow.

Full Reference

For detailed information on every config field, model naming, validation rules, and complete examples, see the Quantiles Configuration reference on GitHub.