Getting Started¶
This page walks through the fastest path from a clean checkout to a real metaharness run that you can inspect.
It is written for newcomers first.
Prerequisites¶
- Python 3.11 or newer
uv- optional:
codex,gemini,pi, oropencodeCLI for live provider runs - optional: Ollama with
gpt-oss:20borgpt-oss:120bfor local runs
Install¶
Recommended newcomer path
If you only want to use the released CLI, install from PyPI with uv tool install.
If you want to run the built-in examples from this repository, use a source checkout with uv sync.
Published package:
- PyPI distribution:
superagentic-metaharness - CLI command:
metaharness - import package:
metaharness
Install the CLI from PyPI:
Check the installed command:
If you want to add the library to another Python project:
Command formatting note
Long commands on this page are wrapped with \ so they stay readable on narrower screens.
You can copy them exactly as written.
If you are working from a source checkout of this repository, create the project environment with:
If you want the docs toolchain too:
Check the CLI:
The Fastest First Run¶
Recommended first run
Use the fake backend on a real benchmark. This exercises the full loop without needing provider auth, network access, or a local model server.
uv run metaharness run \
examples/python_fixture_benchmark \
--backend fake \
--budget 1 \
--run-name first-run
Expected result:
- a run directory under
examples/python_fixture_benchmark/runs/first-run best_candidate_id=c0001best_objective=1.000
What To Inspect Next¶
Inspect A Single Run¶
Use this when you want a quick human-readable summary of the candidates and outcomes.
Export The Candidate Ledger¶
Use this when you want one row per candidate with outcomes, changed-file counts, and validation or evaluation summaries.
Summarize A Whole Benchmark¶
Use this when you want one row per run and a compact view of score, duration, and failure patterns.
Run A Saved Experiment Matrix¶
Once the single-run flow makes sense, move to repeated trials:
This writes:
experiment.jsontrials.jsonaggregates.jsontrials.tsvaggregates.tsv
Use this path when you want reproducible benchmarking rather than ad hoc manual runs.
Use Hosted Codex¶
Requirements:
codexCLI installed- authenticated Codex session or API key setup
- outbound network access
Use --hosted if a project config defaults to local Ollama.
Hosted Codex is the strongest current path for real benchmark runs in this repository.
Use Gemini CLI¶
Gemini is an experimental backend in the current release. Use it if Gemini CLI is already part of your local workflow and you are comfortable with a try-it-yourself path.
Requirements:
geminiCLI installed- Gemini authentication configured in your local environment
The integration is real, but it is not part of the main validated Codex-first release path.
Use Pi¶
Pi is an experimental backend in the current release. Use it if Pi is already part of your local workflow and you are comfortable with a try-it-yourself path.
Requirements:
piCLI installed- Pi authentication configured for the model you want to use
Pi runs through its JSON print mode and defaults to ephemeral --no-session behavior inside metaharness.
This keeps optimization runs isolated from Pi's normal interactive session workflow.
It is not part of the main validated Codex-first release path.
Use Local Codex Over Ollama¶
Requirements:
- Ollama server reachable on
127.0.0.1:11434 - a local model such as
gpt-oss:20borgpt-oss:120b
Create Your Own Project¶
If you want to optimize your own coding-agent harness, scaffold a project:
Available scaffold profiles:
standardlocal-oss-smokelocal-oss-medium
Examples:
uv run metaharness scaffold \
coding-tool \
./my-local-oss-smoke \
--profile local-oss-smoke
uv run metaharness scaffold \
coding-tool \
./my-local-oss-medium \
--profile local-oss-medium
If you want a checked-in experiment workflow for your own project, add a small JSON spec and run:
What A Successful First Session Looks Like¶
By the end of a first session, you should be able to:
- run a benchmark with the fake backend
- inspect the winning candidate
- export a candidate ledger
- run a saved experiment matrix
- decide whether to use hosted Codex or a local Ollama model for the next step
Build The Docs¶
Serve locally:
Build the site: