CLI Reference¶
The metaharness CLI covers four workflows:
- scaffold a project
- run or probe a backend
- inspect and export results
- execute repeated experiment matrices
Show help:
Long commands below are wrapped with \ so they stay readable and copy cleanly.
Many reporting commands support:
- plain text output by default
--jsonfor machine-readable output--tsvfor spreadsheet-friendly export where supported
scaffold¶
Create a new coding-tool project:
Profiles:
standardlocal-oss-smokelocal-oss-medium
Fast Local Smoke¶
Smaller harness aimed at local OSS smoke runs.
Medium Local OSS¶
Restores bootstrap and test scripts while staying lighter than the full scaffold.
run¶
Run one optimization project:
Use this when you want a single benchmark or project run and care about the winning candidate, not aggregate trial statistics.
Important options:
--backend--budget--run-name--hosted--oss--local-provider--model--proposal-timeout
Fake Backend¶
Best for smoke checks and development.
Hosted Codex¶
Best current path for real benchmark quality.
Local Codex Over Ollama¶
Local-only path for OSS model runs.
Gemini CLI¶
Use Gemini as an experimental proposer backend.
Pi¶
Use Pi in JSON print mode as an experimental proposer backend.
experiment¶
Run a benchmark x backend x budget x trial matrix:
Use this when you want repeatable benchmark results instead of one-off runs.
Saved Config¶
The most reusable path for teams.
Multiple Budgets¶
Compare how much improvement you get from a larger search budget.
TSV Export¶
Send aggregate results straight to a spreadsheet or notebook.
This command writes:
experiment.jsontrials.jsonaggregates.jsontrials.tsvaggregates.tsv
Config files can contain:
project_dirsbackendsbudgetstrial_countmodelsresults_dirbackend_overrides
If a config file is provided, relative paths are resolved from the config file location. CLI flags override the corresponding config values.
smoke codex¶
Probe the Codex path before spending model calls:
Probe the local Ollama path:
uv run metaharness smoke codex \
./my-coding-tool-optimizer \
--probe-only \
--oss \
--local-provider ollama \
--model gpt-oss:20b
Use this when you want to verify the environment, provider, and model path before running a benchmark.
smoke gemini¶
Probe the Gemini CLI path before spending model calls:
Run one Gemini-backed smoke iteration:
smoke pi¶
Probe the Pi path before spending model calls:
Run one Pi-backed smoke iteration:
uv run metaharness smoke pi \
./my-coding-tool-optimizer \
--budget 1 \
--model anthropic/claude-sonnet-4-5
inspect¶
Inspect one completed run:
This is the quickest human-readable view of:
- candidate outcomes
- validity
- proposal application
- scope violations
- objective scores
ledger¶
Export the per-candidate ledger for one run:
TSV export:
Use this when you want one row per candidate with outcomes, changed-file counts, summaries, and scope violations.
summarize¶
Summarize all runs in a project:
TSV export:
Use this when you want a project-wide view of scores, durations, and outcome counts.
compare¶
Compare specific run directories:
uv run metaharness compare \
./examples/python_fixture_benchmark/runs/hosted-codex-20260401 \
./examples/python_fixture_benchmark/runs/ollama-20b-20260401 \
./examples/python_fixture_benchmark/runs/ollama-120b-20260401
TSV export:
uv run metaharness compare \
./examples/python_fixture_benchmark/runs/hosted-codex-20260401 \
./examples/python_fixture_benchmark/runs/ollama-120b-20260401 \
--tsv
Use this when you want an explicit side-by-side comparison between selected runs rather than every run in a project.
Output Files To Know¶
The most useful stored artifacts are usually:
run_config.jsonindexes/leaderboard.jsonmanifest.jsonproposal/result.jsonproposal/workspace.diffvalidation/result.jsonevaluation/result.json