Getting Started¶

This page walks through the fastest path from a clean checkout to a real metaharness run that you can inspect. It is written for newcomers first.

Prerequisites¶

Python 3.11 or newer
uv
optional: codex, gemini, pi, or opencode CLI for live provider runs
optional: Ollama with gpt-oss:20b or gpt-oss:120b for local runs

Install¶

Recommended newcomer path

If you only want to use the released CLI, install from PyPI with uv tool install. If you want to run the built-in examples from this repository, use a source checkout with uv sync.

Published package:

PyPI distribution: superagentic-metaharness
CLI command: metaharness
import package: metaharness

Install the CLI from PyPI:

uv tool install superagentic-metaharness

Check the installed command:

metaharness --help

If you want to add the library to another Python project:

uv add superagentic-metaharness

Command formatting note

Long commands on this page are wrapped with \ so they stay readable on narrower screens. You can copy them exactly as written.

If you are working from a source checkout of this repository, create the project environment with:

uv sync

If you want the docs toolchain too:

uv sync --group dev

Check the CLI:

uv run metaharness --help

The Fastest First Run¶

Recommended first run

Use the fake backend on a real benchmark. This exercises the full loop without needing provider auth, network access, or a local model server.

uv run metaharness run \
  examples/python_fixture_benchmark \
  --backend fake \
  --budget 1 \
  --run-name first-run

Expected result:

a run directory under examples/python_fixture_benchmark/runs/first-run
best_candidate_id=c0001
best_objective=1.000

What To Inspect Next¶

Inspect A Single Run¶

Use this when you want a quick human-readable summary of the candidates and outcomes.

uv run metaharness inspect \
  examples/python_fixture_benchmark/runs/first-run

Export The Candidate Ledger¶

Use this when you want one row per candidate with outcomes, changed-file counts, and validation or evaluation summaries.

uv run metaharness ledger \
  examples/python_fixture_benchmark/runs/first-run \
  --tsv

Summarize A Whole Benchmark¶

Use this when you want one row per run and a compact view of score, duration, and failure patterns.

uv run metaharness summarize examples/python_fixture_benchmark

Run A Saved Experiment Matrix¶

Once the single-run flow makes sense, move to repeated trials:

uv run metaharness experiment \
  --config examples/experiment_configs/fake-benchmarks.json

This writes:

experiment.json
trials.json
aggregates.json
trials.tsv
aggregates.tsv

Use this path when you want reproducible benchmarking rather than ad hoc manual runs.

Use Hosted Codex¶

Requirements:

codex CLI installed
authenticated Codex session or API key setup
outbound network access

Probe The CLI¶

uv run metaharness smoke codex examples/python_fixture_benchmark --probe-only

Run Hosted Codex¶

uv run metaharness run \
  examples/python_fixture_benchmark \
  --backend codex \
  --hosted \
  --budget 1 \
  --run-name hosted-codex

Use --hosted if a project config defaults to local Ollama. Hosted Codex is the strongest current path for real benchmark runs in this repository.

Use Gemini CLI¶

Gemini is an experimental backend in the current release. Use it if Gemini CLI is already part of your local workflow and you are comfortable with a try-it-yourself path.

Requirements:

gemini CLI installed
Gemini authentication configured in your local environment

Probe The CLI¶

uv run metaharness smoke gemini examples/python_fixture_benchmark --probe-only

Run Gemini¶

uv run metaharness run \
  examples/python_fixture_benchmark \
  --backend gemini \
  --model gemini-2.5-pro \
  --proposal-timeout 180 \
  --budget 1 \
  --run-name gemini-run

The integration is real, but it is not part of the main validated Codex-first release path.

Use Pi¶

Pi is an experimental backend in the current release. Use it if Pi is already part of your local workflow and you are comfortable with a try-it-yourself path.

Requirements:

pi CLI installed
Pi authentication configured for the model you want to use

Probe The CLI¶

uv run metaharness smoke pi examples/python_fixture_benchmark --probe-only

Run Pi¶

uv run metaharness run \
  examples/python_fixture_benchmark \
  --backend pi \
  --model anthropic/claude-sonnet-4-5 \
  --proposal-timeout 180 \
  --budget 1 \
  --run-name pi-run

Pi runs through its JSON print mode and defaults to ephemeral --no-session behavior inside metaharness. This keeps optimization runs isolated from Pi's normal interactive session workflow. It is not part of the main validated Codex-first release path.

Use Local Codex Over Ollama¶

Requirements:

Ollama server reachable on 127.0.0.1:11434
a local model such as gpt-oss:20b or gpt-oss:120b

Probe The Local Path¶

uv run metaharness smoke codex \
  examples/python_fixture_benchmark \
  --probe-only \
  --oss \
  --local-provider ollama \
  --model gpt-oss:20b

Run `gpt-oss:20b`¶

uv run metaharness run \
  examples/python_fixture_benchmark \
  --backend codex \
  --oss \
  --local-provider ollama \
  --model gpt-oss:20b \
  --proposal-timeout 240 \
  --budget 1 \
  --run-name ollama-20b

Run `gpt-oss:120b`¶

uv run metaharness run \
  examples/python_fixture_benchmark \
  --backend codex \
  --oss \
  --local-provider ollama \
  --model gpt-oss:120b \
  --proposal-timeout 420 \
  --budget 1 \
  --run-name ollama-120b

Create Your Own Project¶

If you want to optimize your own coding-agent harness, scaffold a project:

uv run metaharness scaffold coding-tool ./my-coding-tool-optimizer

Available scaffold profiles:

standard
local-oss-smoke
local-oss-medium

Examples:

uv run metaharness scaffold \
  coding-tool \
  ./my-local-oss-smoke \
  --profile local-oss-smoke

uv run metaharness scaffold \
  coding-tool \
  ./my-local-oss-medium \
  --profile local-oss-medium

If you want a checked-in experiment workflow for your own project, add a small JSON spec and run:

uv run metaharness experiment \
  --config ./my-experiment.json

What A Successful First Session Looks Like¶

By the end of a first session, you should be able to:

run a benchmark with the fake backend
inspect the winning candidate
export a candidate ledger
run a saved experiment matrix
decide whether to use hosted Codex or a local Ollama model for the next step

Build The Docs¶

Serve locally:

uv run mkdocs serve

Build the site:

uv run mkdocs build --strict

Getting Started¶

Prerequisites¶

Install¶

The Fastest First Run¶

What To Inspect Next¶

Inspect A Single Run¶

Export The Candidate Ledger¶

Summarize A Whole Benchmark¶

Run A Saved Experiment Matrix¶

Use Hosted Codex¶

Probe The CLI¶

Run Hosted Codex¶

Use Gemini CLI¶

Probe The CLI¶

Run Gemini¶

Use Pi¶

Probe The CLI¶

Run Pi¶

Use Local Codex Over Ollama¶

Probe The Local Path¶

Run gpt-oss:20b¶

Run gpt-oss:120b¶

Create Your Own Project¶

What A Successful First Session Looks Like¶

Build The Docs¶

Run `gpt-oss:20b`¶

Run `gpt-oss:120b`¶