References

CLI Reference

The qt CLI runs local evaluation workflows. It stores run metadata in the local workspace, starts a local HTTP server when needed, records workflow steps and metrics, and compares runs from the command line.

Commands

Command	Description	Example
`qt run <eval_name> [--json]`	Run a built-in or custom code eval.	`qt run support-triage`
`qt run <eval_name> --input <json> [--json]`	Run an eval with structured JSON input.	`qt run support-triage --input '{"promptVersion":"A"}'`
`qt show <run_id> [--json]`	Show details of a given eval run.	`qt show 1 --json`
`qt resume <run_id> [--json]`	Resume a previously-failed or incomplete run.	`qt resume 1`
`qt list [--json]`	List all eval runs.	`qt list --json`
`qt compare <run_a> <run_b> [--json]`	Compare two eval runs.	`qt compare 7 8 --json`

`qt init`

Initialize a local Quantiles workspace explicitly:


qt init

This creates a .quantiles/ directory under the current project directory, including:

.quantiles/quantiles.sqlite for workflow metadata
.quantiles/metrics/ for metrics and benchmark analytics data

qt run creates the workspace automatically when needed, so you’ll rarely need to run qt init.

Example output:


Initialized Quantiles workspace at .quantiles/quantiles.sqlite

`qt run`

Run a built-in benchmark or a configured custom-code eval:


qt run "$EVAL_NAME"

qt run does the following:

Starts up the local Quantiles REST API server
Runs the configured command with Quantiles environment variables.
Records step inputs, outputs, and metrics
Saves the workflow output
Stops the temporary server
Records whether the workflow succeeded or failed

The qt CLI creates a workflow run, starts the local server if one is not reachable, runs the subprocess, records the result, and stops the temporary server when the subprocess finishes.

For custom-code evals, define the command in quantiles.toml or .quantiles.toml:


[benchmarks.eval-name]
type = "custom_code"
command = ["bun", "run", "eval-name.ts"]

Structured input

Pass JSON input with --input:


qt run eval-name --input '{"promptVersion":"A","model":"gpt-5-nano"}'

SDK workflows receive this JSON through the environment. In the current implementation, qt run injects:

Variable	Meaning
`QUANTILES_RUN_ID`	Numeric run ID for the subprocess
`QUANTILES_WORKFLOW_NAME`	Workflow name passed to `qt run`
`QUANTILES_BASE_URL`	Local server URL
`QUANTILES_INPUT`	JSON input string, or `{}` when no input is provided

The default local server URL is http://127.0.0.1:8765. You can override it by setting QUANTILES_BASE_URL.

JSON output

Use --json when calling qt run from scripts:


qt run eval-name --json

In JSON mode, qt run prints machine-readable output.

For built-in evals, the output includes:

run_id
aggregate_metrics
warning, when applicable

For custom-code evals, child process stdout and stderr are captured and returned inside one machine-readable JSON object. The output includes:

Run metadata

run_id
workflow_name
input
command
base_url

Execution state

server_started_by_us
status
success
exit_code
duration_seconds

Process output

stdout
stderr
error

Quantiles marks a run as completed when the child process exits successfully and failed when it does not. qt run records the child process result, but intentionally does not propagate it as the CLI exit code.

For automation, inspect success, exit_code, or the saved run status.

See Built-in Benchmarks for details on running built-in benchmarks and Custom Evaluations for guidance on building custom evaluations.

Resume an interrupted or failed run

Use qt resume to continue an interrupted workflow, restore completed steps from cache, and rerun only failed or incomplete steps.


qt resume "$RUN_ID"

You cannot resume a run if it’s marked as status: completed in qt list or qt show output. To re-run it, do another qt run.

When resuming a custom-code eval, the CLI reuses the stored run input and re-reads the command from the current quantiles.toml or .quantiles.toml config file.

See Resume Runs for details on recovering interrupted or failed runs.

`qt list`

List workflow runs in reverse chronological order:


qt list

The output includes run ID, eval name, status, sample count, creation time, and duration.

Example output:


ID  EVAL            STATUS     SAMPLES    CREATED                      DURATION
2   support-triage  completed  1000       2026-06-19T18:30:00.000000Z  3.000s
1   support-triage  failed     642        2026-06-19T18:15:00.000000Z  1.000s

Use --json for machine-readable output:


qt list --json

`qt show`

Inspect one workflow run:


qt show "$RUN_ID"

By default, qt show prints run metadata and metrics:

Eval name
Status
Creation time and duration
Input
Output
Error
Metrics

Machine-readable output

Use --json to inspect structured run details from scripts or agents:


qt show "$RUN_ID" --json

JSON output includes the run metadata, input, output, metrics, and sample-level results in a machine-readable format.

See Evaluation Results for details on inspecting evaluation results.

`qt compare`

Compare two workflow runs with IDs $RUN_ID_A and $RUN_ID_B:


qt compare "$RUN_ID_A" "$RUN_ID_B"

qt compare checks:

Workflow input
Final workflow output
Step presence
Step input hash changes
Step status changes
Step output changes
Emitted metrics

If you try to compare a run against itself, the qt CLI will output an error and exit with code 1.

The qt CLI displays a warning when comparing runs where the evaluation / benchmark names differ.

Exit codes

qt compare exits with:

0 when the compared runs are identical
1 when the compared runs differ

See Compare Evals for details on comparing evaluation runs.

`qt serve`

Start the local HTTP server manually. The server binds to 127.0.0.1:8765 by default:


qt serve

Most workflows do not need qt serve directly because qt run starts a temporary server when one is not already reachable. Use qt serve when you want a persistent local server for multiple shells, SDK experiments, or direct REST API calls.

Use another address

Use --addr to choose a different address:


qt serve --addr 127.0.0.1:9000

Common evaluation workflows

Run an evaluation and show the results


# Run the evaluation, show a summary of results, and get the associated
# `run_id`
qt run "$EVAL_NAME"
 
# Use the `run_id` from above to show the same summary of results
qt show "$RUN_ID"
 
# Show results summary and sample-level results, all in machine-readable JSON
qt show "$RUN_ID" --json

Compare two evaluations


# Run the first evaluation to create a new run record and `run_id`
qt run "$EVAL_NAME"
 
# Run the second evaluation to create a new run record and `run_id`
qt run "$EVAL_NAME"
 
# Use the corresponding `run_id`s to compare the evaluations
qt compare "$RUN_ID_A" "$RUN_ID_B" --json

Resume an interrupted for failed evaluation


qt resume "$RUN_ID"