Benchmarks¶

Overview¶

metaharness currently includes three example targets.

Two are real coding-tool benchmarks:

python_fixture_benchmark
python_cli_benchmark

One is a smaller deterministic example:

ticket_router

`python_fixture_benchmark`¶

Path:

examples/python_fixture_benchmark

What it exercises:

a real python -m venv bootstrap flow
a real unittest suite over a fixture package
deterministic instruction-file checks
helper script correctness

What can change:

AGENTS.md
GEMINI.md
scripts/bootstrap.sh
scripts/validate.sh
scripts/test.sh

Configured write scope:

AGENTS.md
GEMINI.md
scripts/

Typical runs:

uv run metaharness run examples/python_fixture_benchmark --backend fake --budget 1
uv run metaharness run \
  examples/python_fixture_benchmark \
  --backend codex \
  --hosted \
  --budget 1
uv run metaharness run \
  examples/python_fixture_benchmark \
  --backend codex \
  --oss \
  --local-provider ollama \
  --model gpt-oss:120b \
  --proposal-timeout 420 \
  --budget 1

`python_cli_benchmark`¶

Path:

examples/python_cli_benchmark

What it exercises:

a real python -m venv bootstrap flow
a real unittest suite
a real CLI smoke command against fixture data
deterministic instruction-file checks

What can change:

AGENTS.md
GEMINI.md
scripts/bootstrap.sh
scripts/validate.sh
scripts/test.sh

Configured write scope:

AGENTS.md
GEMINI.md
scripts/

Typical runs:

uv run metaharness run examples/python_cli_benchmark --backend fake --budget 1
uv run metaharness run \
  examples/python_cli_benchmark \
  --backend codex \
  --hosted \
  --budget 1
uv run metaharness run \
  examples/python_cli_benchmark \
  --backend codex \
  --oss \
  --local-provider ollama \
  --model gpt-oss:20b \
  --proposal-timeout 240 \
  --budget 1

`ticket_router`¶

Path:

examples/ticket_router

This is a smaller deterministic example that optimizes a Python router against a fixed dataset. It is useful for fast development checks and basic API examples.

Run it:

uv run python examples/ticket_router/run.py --backend fake --budget 1

Scaffold Profiles¶

The CLI scaffold also includes profiles for users who want to bring their own coding-tool project:

standard
local-oss-smoke
local-oss-medium

These are useful for starting a real project, but the benchmark directories are the clearest examples of how to structure a reusable target.

Benchmarks¶

Overview¶

python_fixture_benchmark¶

python_cli_benchmark¶

ticket_router¶

Scaffold Profiles¶

`python_fixture_benchmark`¶

`python_cli_benchmark`¶

`ticket_router`¶