CLI Reference
The qt CLI runs local evaluation workflows. It stores run metadata in the local workspace, starts a local HTTP server when needed, records workflow steps and metrics, and compares runs from the command line.
Commands
| Command | Description | Example |
|---|---|---|
qt run <eval_name> [--json] | Run a built-in or custom code eval. | qt run support-triage |
qt run <eval_name> --input <json> [--json] | Run an eval with structured JSON input. | qt run support-triage --input '{"promptVersion":"A"}' |
qt show <run_id> [--json] | Show details of a given eval run. | qt show 1 --json |
qt resume <run_id> [--json] | Resume a previously-failed or incomplete run. | qt resume 1 |
qt list [--json] | List all eval runs. | qt list --json |
qt compare <run_a> <run_b> [--json] | Compare two eval runs. | qt compare 7 8 --json |
qt init
Initialize a local Quantiles workspace explicitly:
qt initThis creates a .quantiles/ directory under the current project directory, including:
.quantiles/quantiles.sqlitefor workflow metadata.quantiles/metrics/for metrics and benchmark analytics data
qt runcreates the workspace automatically when needed, so you’ll rarely need to runqt init.
Example output:
Initialized Quantiles workspace at .quantiles/quantiles.sqliteqt run
Run a built-in benchmark or a configured custom-code eval:
qt run "$EVAL_NAME"qt run does the following:
- Starts up the local Quantiles REST API server
- Runs the configured command with Quantiles environment variables.
- Records step inputs, outputs, and metrics
- Saves the workflow output
- Stops the temporary server
- Records whether the workflow succeeded or failed
The qt CLI creates a workflow run, starts the local server if one is not reachable, runs the subprocess, records the result, and stops the temporary server when the subprocess finishes.
For custom-code evals, define the command in quantiles.toml or .quantiles.toml:
[benchmarks.eval-name]
type = "custom_code"
command = ["bun", "run", "eval-name.ts"]Structured input
Pass JSON input with --input:
qt run eval-name --input '{"promptVersion":"A","model":"gpt-5-nano"}'SDK workflows receive this JSON through the environment. In the current implementation, qt run injects:
| Variable | Meaning |
|---|---|
QUANTILES_RUN_ID | Numeric run ID for the subprocess |
QUANTILES_WORKFLOW_NAME | Workflow name passed to qt run |
QUANTILES_BASE_URL | Local server URL |
QUANTILES_INPUT | JSON input string, or {} when no input is provided |
The default local server URL is http://127.0.0.1:8765. You can override it by setting QUANTILES_BASE_URL.
JSON output
Use --json when calling qt run from scripts:
qt run eval-name --jsonIn JSON mode, qt run prints machine-readable output.
For built-in evals, the output includes:
run_idaggregate_metricswarning, when applicable
For custom-code evals, child process stdout and stderr are captured and returned inside one machine-readable JSON object. The output includes:
Run metadata
run_idworkflow_nameinputcommandbase_url
Execution state
server_started_by_usstatussuccessexit_codeduration_seconds
Process output
stdoutstderrerror
Quantiles marks a run as completed when the child process exits successfully and failed when it does not. qt run records the child process result, but intentionally does not propagate it as the CLI exit code.
For automation, inspect success, exit_code, or the saved run status.
See Built-in Benchmarks for details on running built-in benchmarks and Custom Evaluations for guidance on building custom evaluations.
Resume an interrupted or failed run
Use qt resume to continue an interrupted workflow, restore completed steps from cache, and rerun only failed or incomplete steps.
qt resume "$RUN_ID"You cannot resume a run if it’s marked as
status: completedinqt listorqt showoutput. To re-run it, do anotherqt run.
When resuming a custom-code eval, the CLI reuses the stored run input and re-reads the command from the current quantiles.toml or .quantiles.toml config file.
See Resume Runs for details on recovering interrupted or failed runs.
qt list
List workflow runs in reverse chronological order:
qt listThe output includes run ID, eval name, status, sample count, creation time, and duration.
Example output:
ID EVAL STATUS SAMPLES CREATED DURATION
2 support-triage completed 1000 2026-06-19T18:30:00.000000Z 3.000s
1 support-triage failed 642 2026-06-19T18:15:00.000000Z 1.000sUse --json for machine-readable output:
qt list --jsonqt show
Inspect one workflow run:
qt show "$RUN_ID"By default, qt show prints run metadata and metrics:
- Eval name
- Status
- Creation time and duration
- Input
- Output
- Error
- Metrics
Machine-readable output
Use --json to inspect structured run details from scripts or agents:
qt show "$RUN_ID" --jsonJSON output includes the run metadata, input, output, metrics, and sample-level results in a machine-readable format.
See Evaluation Results for details on inspecting evaluation results.
qt compare
Compare two workflow runs with IDs $RUN_ID_A and $RUN_ID_B:
qt compare "$RUN_ID_A" "$RUN_ID_B"qt compare checks:
- Workflow input
- Final workflow output
- Step presence
- Step input hash changes
- Step status changes
- Step output changes
- Emitted metrics
If you try to compare a run against itself, the qt CLI will output an error and exit with code 1.
The
qtCLI displays a warning when comparing runs where the evaluation / benchmark names differ.
Exit codes
qt compare exits with:
0when the compared runs are identical1when the compared runs differ
See Compare Evals for details on comparing evaluation runs.
qt serve
Start the local HTTP server manually. The server binds to 127.0.0.1:8765 by default:
qt serveMost workflows do not need qt serve directly because qt run starts a temporary server when one is not already reachable. Use qt serve when you want a persistent local server for multiple shells, SDK experiments, or direct REST API calls.
Use another address
Use --addr to choose a different address:
qt serve --addr 127.0.0.1:9000Common evaluation workflows
Run an evaluation and show the results
# Run the evaluation, show a summary of results, and get the associated
# `run_id`
qt run "$EVAL_NAME"
# Use the `run_id` from above to show the same summary of results
qt show "$RUN_ID"
# Show results summary and sample-level results, all in machine-readable JSON
qt show "$RUN_ID" --jsonCompare two evaluations
# Run the first evaluation to create a new run record and `run_id`
qt run "$EVAL_NAME"
# Run the second evaluation to create a new run record and `run_id`
qt run "$EVAL_NAME"
# Use the corresponding `run_id`s to compare the evaluations
qt compare "$RUN_ID_A" "$RUN_ID_B" --jsonResume an interrupted for failed evaluation
qt resume "$RUN_ID"