CLI Reference¶

The metaharness CLI covers four workflows:

scaffold a project
run or probe a backend
inspect and export results
execute repeated experiment matrices

Show help:

uv run metaharness --help

Long commands below are wrapped with \ so they stay readable and copy cleanly.

Many reporting commands support:

plain text output by default
--json for machine-readable output
--tsv for spreadsheet-friendly export where supported

`scaffold`¶

Create a new coding-tool project:

uv run metaharness scaffold \
  coding-tool \
  ./my-coding-tool-optimizer

Profiles:

standard
local-oss-smoke
local-oss-medium

Standard¶

Best default for a new project.

uv run metaharness scaffold coding-tool ./my-project

Fast Local Smoke¶

Smaller harness aimed at local OSS smoke runs.

uv run metaharness scaffold \
  coding-tool \
  ./my-local-oss-smoke \
  --profile local-oss-smoke

Medium Local OSS¶

Restores bootstrap and test scripts while staying lighter than the full scaffold.

uv run metaharness scaffold \
  coding-tool \
  ./my-local-oss-medium \
  --profile local-oss-medium

`run`¶

Run one optimization project:

uv run metaharness run \
  ./my-coding-tool-optimizer \
  --backend fake \
  --budget 1

Use this when you want a single benchmark or project run and care about the winning candidate, not aggregate trial statistics.

Important options:

--backend
--budget
--run-name
--hosted
--oss
--local-provider
--model
--proposal-timeout

Fake Backend¶

Best for smoke checks and development.

uv run metaharness run \
  ./my-coding-tool-optimizer \
  --backend fake \
  --budget 1

Hosted Codex¶

Best current path for real benchmark quality.

uv run metaharness run \
  ./my-coding-tool-optimizer \
  --backend codex \
  --hosted \
  --budget 1

Local Codex Over Ollama¶

Local-only path for OSS model runs.

uv run metaharness run \
  ./my-coding-tool-optimizer \
  --backend codex \
  --oss \
  --local-provider ollama \
  --model gpt-oss:20b \
  --proposal-timeout 240 \
  --budget 1

Gemini CLI¶

Use Gemini as an experimental proposer backend.

uv run metaharness run \
  ./my-coding-tool-optimizer \
  --backend gemini \
  --model gemini-2.5-pro \
  --proposal-timeout 180 \
  --budget 1

Pi¶

Use Pi in JSON print mode as an experimental proposer backend.

uv run metaharness run \
  ./my-coding-tool-optimizer \
  --backend pi \
  --model anthropic/claude-sonnet-4-5 \
  --proposal-timeout 180 \
  --budget 1

`experiment`¶

Run a benchmark x backend x budget x trial matrix:

uv run metaharness experiment \
  ./examples/python_fixture_benchmark \
  --backend fake \
  --trials 3

Use this when you want repeatable benchmark results instead of one-off runs.

Saved Config¶

The most reusable path for teams.

uv run metaharness experiment \
  --config ./examples/experiment_configs/fake-benchmarks.json

Multiple Budgets¶

Compare how much improvement you get from a larger search budget.

uv run metaharness experiment \
  ./examples/python_fixture_benchmark \
  --backend fake \
  --budget 1 \
  --budget 2 \
  --trials 2

TSV Export¶

Send aggregate results straight to a spreadsheet or notebook.

uv run metaharness experiment \
  ./examples/python_fixture_benchmark \
  --backend fake \
  --trials 3 \
  --tsv

This command writes:

experiment.json
trials.json
aggregates.json
trials.tsv
aggregates.tsv

Config files can contain:

project_dirs
backends
budgets
trial_count
models
results_dir
backend_overrides

If a config file is provided, relative paths are resolved from the config file location. CLI flags override the corresponding config values.

`smoke codex`¶

Probe the Codex path before spending model calls:

uv run metaharness smoke codex ./my-coding-tool-optimizer --probe-only

Probe the local Ollama path:

uv run metaharness smoke codex \
  ./my-coding-tool-optimizer \
  --probe-only \
  --oss \
  --local-provider ollama \
  --model gpt-oss:20b

Use this when you want to verify the environment, provider, and model path before running a benchmark.

`smoke gemini`¶

Probe the Gemini CLI path before spending model calls:

uv run metaharness smoke gemini ./my-coding-tool-optimizer --probe-only

Run one Gemini-backed smoke iteration:

uv run metaharness smoke gemini \
  ./my-coding-tool-optimizer \
  --budget 1 \
  --model gemini-2.5-pro

`smoke pi`¶

Probe the Pi path before spending model calls:

uv run metaharness smoke pi ./my-coding-tool-optimizer --probe-only

Run one Pi-backed smoke iteration:

uv run metaharness smoke pi \
  ./my-coding-tool-optimizer \
  --budget 1 \
  --model anthropic/claude-sonnet-4-5

`inspect`¶

Inspect one completed run:

uv run metaharness inspect \
  ./examples/python_fixture_benchmark/runs/hosted-codex-20260401

This is the quickest human-readable view of:

candidate outcomes
validity
proposal application
scope violations
objective scores

`ledger`¶

Export the per-candidate ledger for one run:

uv run metaharness ledger \
  ./examples/python_fixture_benchmark/runs/hosted-codex-20260401

TSV export:

uv run metaharness ledger \
  ./examples/python_fixture_benchmark/runs/hosted-codex-20260401 \
  --tsv

Use this when you want one row per candidate with outcomes, changed-file counts, summaries, and scope violations.

`summarize`¶

Summarize all runs in a project:

uv run metaharness summarize \
  ./examples/python_fixture_benchmark

TSV export:

uv run metaharness summarize \
  ./examples/python_fixture_benchmark \
  --tsv

Use this when you want a project-wide view of scores, durations, and outcome counts.

`compare`¶

Compare specific run directories:

uv run metaharness compare \
  ./examples/python_fixture_benchmark/runs/hosted-codex-20260401 \
  ./examples/python_fixture_benchmark/runs/ollama-20b-20260401 \
  ./examples/python_fixture_benchmark/runs/ollama-120b-20260401

TSV export:

uv run metaharness compare \
  ./examples/python_fixture_benchmark/runs/hosted-codex-20260401 \
  ./examples/python_fixture_benchmark/runs/ollama-120b-20260401 \
  --tsv

Use this when you want an explicit side-by-side comparison between selected runs rather than every run in a project.

Output Files To Know¶

The most useful stored artifacts are usually:

run_config.json
indexes/leaderboard.json
manifest.json
proposal/result.json
proposal/workspace.diff
validation/result.json
evaluation/result.json

CLI Reference¶

scaffold¶

Standard¶

Fast Local Smoke¶

Medium Local OSS¶

run¶

Fake Backend¶

Hosted Codex¶

Local Codex Over Ollama¶

Gemini CLI¶

Pi¶

experiment¶

Saved Config¶

Multiple Budgets¶

TSV Export¶

smoke codex¶

smoke gemini¶

smoke pi¶

inspect¶

ledger¶

summarize¶

compare¶

Output Files To Know¶

`scaffold`¶

`run`¶

`experiment`¶

`smoke codex`¶

`smoke gemini`¶

`smoke pi`¶

`inspect`¶

`ledger`¶

`summarize`¶

`compare`¶