Skip to Content

CLI Reference

The qt CLI runs local evaluation workflows. It stores run metadata in the local workspace, starts a local HTTP server when needed, records workflow steps and metrics, and compares runs from the command line.

Commands

CommandDescriptionExample
qt run <eval_name> [--json]Run a built-in or custom code eval.qt run support-triage
qt run <eval_name> --input <json> [--json]Run an eval with structured JSON input.qt run support-triage --input '{"promptVersion":"A"}'
qt show <run_id> [--json]Show details of a given eval run.qt show 1 --json
qt resume <run_id> [--json]Resume a previously-failed or incomplete run.qt resume 1
qt list [--json]List all eval runs.qt list --json
qt compare <run_a> <run_b> [--json]Compare two eval runs.qt compare 7 8 --json

qt init

Initialize a local Quantiles workspace explicitly:

qt init

This creates a .quantiles/ directory under the current project directory, including:

  • .quantiles/quantiles.sqlite for workflow metadata
  • .quantiles/metrics/ for metrics and benchmark analytics data

qt run creates the workspace automatically when needed, so you’ll rarely need to run qt init.

Example output:

Initialized Quantiles workspace at .quantiles/quantiles.sqlite

qt run

Run a built-in benchmark or a configured custom-code eval:

qt run "$EVAL_NAME"

qt run does the following:

  1. Starts up the local Quantiles REST API server
  2. Runs the configured command with Quantiles environment variables.
  3. Records step inputs, outputs, and metrics
  4. Saves the workflow output
  5. Stops the temporary server
  6. Records whether the workflow succeeded or failed

The qt CLI creates a workflow run, starts the local server if one is not reachable, runs the subprocess, records the result, and stops the temporary server when the subprocess finishes.

For custom-code evals, define the command in quantiles.toml or .quantiles.toml:

[benchmarks.eval-name] type = "custom_code" command = ["bun", "run", "eval-name.ts"]

Structured input

Pass JSON input with --input:

qt run eval-name --input '{"promptVersion":"A","model":"gpt-5-nano"}'

SDK workflows receive this JSON through the environment. In the current implementation, qt run injects:

VariableMeaning
QUANTILES_RUN_IDNumeric run ID for the subprocess
QUANTILES_WORKFLOW_NAMEWorkflow name passed to qt run
QUANTILES_BASE_URLLocal server URL
QUANTILES_INPUTJSON input string, or {} when no input is provided

The default local server URL is http://127.0.0.1:8765. You can override it by setting QUANTILES_BASE_URL.

JSON output

Use --json when calling qt run from scripts:

qt run eval-name --json

In JSON mode, qt run prints machine-readable output.

For built-in evals, the output includes:

  • run_id
  • aggregate_metrics
  • warning, when applicable

For custom-code evals, child process stdout and stderr are captured and returned inside one machine-readable JSON object. The output includes:

Run metadata

  • run_id
  • workflow_name
  • input
  • command
  • base_url

Execution state

  • server_started_by_us
  • status
  • success
  • exit_code
  • duration_seconds

Process output

  • stdout
  • stderr
  • error

Quantiles marks a run as completed when the child process exits successfully and failed when it does not. qt run records the child process result, but intentionally does not propagate it as the CLI exit code.

For automation, inspect success, exit_code, or the saved run status.

See Built-in Benchmarks for details on running built-in benchmarks and Custom Evaluations for guidance on building custom evaluations.

Resume an interrupted or failed run

Use qt resume to continue an interrupted workflow, restore completed steps from cache, and rerun only failed or incomplete steps.

qt resume "$RUN_ID"

You cannot resume a run if it’s marked as status: completed in qt list or qt show output. To re-run it, do another qt run.

When resuming a custom-code eval, the CLI reuses the stored run input and re-reads the command from the current quantiles.toml or .quantiles.toml config file.

See Resume Runs for details on recovering interrupted or failed runs.

qt list

List workflow runs in reverse chronological order:

qt list

The output includes run ID, eval name, status, sample count, creation time, and duration.

Example output:

ID EVAL STATUS SAMPLES CREATED DURATION 2 support-triage completed 1000 2026-06-19T18:30:00.000000Z 3.000s 1 support-triage failed 642 2026-06-19T18:15:00.000000Z 1.000s

Use --json for machine-readable output:

qt list --json

qt show

Inspect one workflow run:

qt show "$RUN_ID"

By default, qt show prints run metadata and metrics:

  • Eval name
  • Status
  • Creation time and duration
  • Input
  • Output
  • Error
  • Metrics

Machine-readable output

Use --json to inspect structured run details from scripts or agents:

qt show "$RUN_ID" --json

JSON output includes the run metadata, input, output, metrics, and sample-level results in a machine-readable format.

See Evaluation Results for details on inspecting evaluation results.

qt compare

Compare two workflow runs with IDs $RUN_ID_A and $RUN_ID_B:

qt compare "$RUN_ID_A" "$RUN_ID_B"

qt compare checks:

  • Workflow input
  • Final workflow output
  • Step presence
  • Step input hash changes
  • Step status changes
  • Step output changes
  • Emitted metrics

If you try to compare a run against itself, the qt CLI will output an error and exit with code 1.

The qt CLI displays a warning when comparing runs where the evaluation / benchmark names differ.

Exit codes

qt compare exits with:

  • 0 when the compared runs are identical
  • 1 when the compared runs differ

See Compare Evals for details on comparing evaluation runs.

qt serve

Start the local HTTP server manually. The server binds to 127.0.0.1:8765 by default:

qt serve

Most workflows do not need qt serve directly because qt run starts a temporary server when one is not already reachable. Use qt serve when you want a persistent local server for multiple shells, SDK experiments, or direct REST API calls.

Use another address

Use --addr to choose a different address:

qt serve --addr 127.0.0.1:9000

Common evaluation workflows

Run an evaluation and show the results

# Run the evaluation, show a summary of results, and get the associated # `run_id` qt run "$EVAL_NAME" # Use the `run_id` from above to show the same summary of results qt show "$RUN_ID" # Show results summary and sample-level results, all in machine-readable JSON qt show "$RUN_ID" --json

Compare two evaluations

# Run the first evaluation to create a new run record and `run_id` qt run "$EVAL_NAME" # Run the second evaluation to create a new run record and `run_id` qt run "$EVAL_NAME" # Use the corresponding `run_id`s to compare the evaluations qt compare "$RUN_ID_A" "$RUN_ID_B" --json

Resume an interrupted for failed evaluation

qt resume "$RUN_ID"
Last updated on