Skip to Content

Agent Quickstart

Use the Quantiles agent skill to help coding agents run repository-based evaluations with the qt CLI. It supports Codex, Claude Code, Cursor, GitHub Copilot, Gemini CLI, OpenCode, and other agents that use reusable skills or instruction files.

The skill standardizes local setup, benchmark and custom eval runs, sample-level inspection, run comparison, resume behavior, and regression summaries.

Install the skill

Use the prompt below to set up your coding agent with the Quantiles CLI and agent skill:

Please install the Quantiles skill at github.com/quantiles-evals/skill

Alternatively, copy SKILL.md  into your agent’s skill directory.

Run your first benchmark

After your agent completes the install, have it run its first benchmark using the following prompt:

Run the SimpleQA Verified benchmark and summarize the results.

This prompt uses a demo model which generates random text, does not use any hosted LLM provider and does not incur any inference cost. This demo model run is useful for validating the workflow. Do not treat it as a real model-quality benchmark.

Customize the benchmark

To customize the simpleqa-verified benchmark from above, ask your coding agent to use a hosted LLM provider of your choice and a subset of the samples in the benchmark, all with the following prompt:

Configure the `simpleqa-verified` benchmark in a Quantiles config file to use 10 samples and the <your model here> model, then run the benchmark and summarize the results.

See CONFIG.md  for more details.

Custom evaluations

For custom evaluations that use your own datasets, models, and measurement techniques, you can build evaluations with the Quantiles Python SDK, while benefiting from the resilience, efficiency, and observability features built into the Quantiles platform.

To have your coding agent build and run a custom evaluation, customize the below prompt template to your needs:

Write a Quantiles custom code evaluation using the Python SDK that uses the <your dataset> dataset, run samples through the <your model> model, and measures the output using the following metrics: <list your metrics here>. Call the evaluation <name>, and make sure to include it in the `quantiles.toml` config file. When you're done, run the new eval and summarize the results.

See custom evaluations documentation for details on how to write custom evaluations with your own code, using the Quantiles SDKs and tooling.

Agent documentation

The following resources provide more detailed guidance on running Quantiles evaluations using coding agents:

Last updated on