Architecture¶
Core Idea¶
metaharness optimizes executable harnesses by keeping the outer loop simple and making the filesystem the source of truth.
This package is inspired by the Meta Harness paper and is an unofficial implementation of the core outer-loop ideas from that work.
Each run:
- materializes a baseline candidate
- asks a proposer backend to mutate the workspace
- validates the result
- evaluates the result
- stores everything on disk
Design Influences¶
Three influences are especially important for understanding why the library looks the way it does:
- the Meta Harness paper, which motivated treating executable harness code as the optimization target
- GEPA, which was useful as a reference point for packaging reusable optimization tooling
- Autoresearch by Andrej Karpathy, which influenced the focus on explicit experiment loops, keep or discard outcomes, and constrained mutable scope
Main Components¶
Optimization Engine¶
The engine coordinates the loop and picks the best candidate based on the objective score.
Key file:
src/metaharness/core/engine.py
Run Store¶
The filesystem run store creates candidate workspaces, captures a compact environment bootstrap, enforces any configured write-scope allowlist, writes manifests, stores proposal output, and records diffs.
Key file:
src/metaharness/store/filesystem.py
Proposer Backends¶
A proposer backend is the system that edits the candidate workspace.
Current backends:
CodexExecBackendFakeBackendGeminiCliBackendPiCliBackend
Key files:
src/metaharness/proposer/codex_exec.pysrc/metaharness/proposer/fake.pysrc/metaharness/proposer/gemini_cli.pysrc/metaharness/proposer/pi_cli.py
Coding Tool Integration¶
This integration turns coding-agent instruction files and helper scripts into an optimization target with deterministic task scoring.
Key files:
src/metaharness/integrations/coding_tool/config.pysrc/metaharness/integrations/coding_tool/runtime.py
Run Layout¶
Every run is stored on disk.
Typical shape:
runs/<run_id>/
run_config.json
indexes/
leaderboard.json
candidates/
c0000/
manifest.json
workspace/
validation/result.json
evaluation/result.json
c0001/
manifest.json
workspace/
.metaharness/
AGENTS.md
bootstrap/
summary.md
snapshot.json
experience/
proposal/
prompt.txt
result.json
events.json
workspace.diff
workspace_changes.json
validation/result.json
evaluation/result.json
Each candidate manifest records whether the proposal was applied, whether validation passed, the objective score, and the explicit candidate outcome.
Why Filesystem First Matters¶
This design makes the system useful for real engineering work:
- proposals are concrete file edits
- failures are inspectable after the run
- diffs can be reviewed by humans
- environment facts are captured before the agent starts editing
- candidate outcomes such as
keep,discard,crash,timeout,no-change, andscope-violationare recorded explicitly - evaluation artifacts are easy to archive
- the optimization history can be re-used by future iterations
- reporting can export both run comparisons and per-candidate ledgers as TSV or JSON
- experiment matrices can aggregate repeated trials across benchmarks, backends, budgets, and models
How A Coding Tool Project Is Evaluated¶
The coding-tool integration uses two types of tasks:
file_phrasecheckscommandchecks
This lets you score both instruction quality and executable workflow behavior.
Examples:
- require specific repository safety guidance in
AGENTS.md - require context handoff guidance in
GEMINI.md - require
scripts/bootstrap.shto build a working environment - require
scripts/test.shto pass a deterministic test suite
Projects can also set allowed_write_paths in metaharness.json so only specific files or directories are mutable during optimization.
What The Current OSS Version Focuses On¶
The current package is strongest when the target under optimization is:
- instruction files
- helper scripts
- benchmark harness code
- routing or workflow glue code
It is not trying to be a full paper reproduction of every benchmark domain yet. It is trying to be a practical and reusable outer-loop harness optimization library.