Providers¶
Provider Model¶
metaharness separates the optimization loop from the system that actually edits files.
That editing system is called a proposer backend.
Current status:
CodexExecBackendis real and exercised in benchmark runsFakeBackendis deterministic and used for tests and smoke runsGeminiCliBackendis experimentalPiCliBackendis experimental and uses Pi print-mode JSON output for integrationOpenCodeRunBackendis experimental and usesopencode run --format json
The current package is Codex-first with an extensible backend interface. All real provider benchmark runs currently documented in this repository were executed through the Codex CLI path. That includes hosted Codex and local Ollama models used through Codex.
Hosted Codex¶
Hosted Codex is supported today.
Requirements:
codexCLI installed- authenticated session or API key
- network access to the provider
Important command:
uv run metaharness run \
examples/python_fixture_benchmark \
--backend codex \
--hosted \
--budget 1 \
--run-name hosted-codex
Why --hosted matters:
- some benchmark configs default to local Ollama
--hostedclears the local-provider settings for that run
Current conclusion:
- hosted Codex is supported in the library today
- the remaining requirement is environment access and authentication
Local Codex Over Ollama¶
This path is supported and has been exercised with:
gpt-oss:20bgpt-oss:120b
Probe first:
uv run metaharness smoke codex \
examples/python_fixture_benchmark \
--probe-only \
--oss \
--local-provider ollama \
--model gpt-oss:20b
Run:
uv run metaharness run \
examples/python_fixture_benchmark \
--backend codex \
--oss \
--local-provider ollama \
--model gpt-oss:20b \
--proposal-timeout 240 \
--budget 1
Current Provider Takeaways¶
Based on the recorded benchmark runs in this repository:
| Provider | Benchmark Result | Observed Pattern |
|---|---|---|
| Hosted Codex | solved both real benchmarks in one proposal iteration | fastest high quality path so far |
Ollama gpt-oss:20b |
timed out on both real benchmarks at 240s |
useful for very small smoke runs, not reliable enough for the current real benchmarks |
Ollama gpt-oss:120b |
solved python_fixture_benchmark |
slower than hosted Codex, but capable |
This means the project's current public benchmark evidence is centered on Codex. Other coding-agent benchmark writeups may emphasize Claude Code or Opus, but those are not the provider paths currently documented in this repository.
Experimental Backends¶
The following backends are implemented, but they are not part of the main validated release path today. They are best treated as try-it-yourself integrations unless and until they accumulate stronger benchmark evidence in this repository.
Gemini CLI¶
Gemini is implemented as an experimental backend.
What is implemented:
- non-interactive Gemini CLI invocation
stream-jsonparsing- model override support
- approval mode and sandbox config wiring
- proposal timeout handling
metaharness smoke gemini
Useful command:
Current caveat:
- Gemini is experimental and the benchmark evidence in this repository is not yet strong enough to present it as a primary backend
Pi¶
Pi is implemented as an experimental backend.
What is implemented:
- Pi CLI invocation in
--mode json - ephemeral
--no-sessiondefault for optimization runs - JSON event parsing for assistant text, tool usage, command output, and likely file changes
- model override support
- proposal timeout handling
metaharness smoke pi
Useful command:
Current caveat:
- Pi is experimental and does not yet have successful real benchmark records checked into this repository
OpenCode¶
OpenCode is implemented as an experimental backend.
What is implemented:
- non-interactive
opencode runinvocation - JSON event parsing for text, tool usage, command execution, and likely changed files
- model override support
- agent and variant config wiring
- proposal timeout handling
metaharness smoke opencode
Useful command:
Current caveat:
- OpenCode is experimental and does not yet have benchmark records checked into this repository
What To Use In Practice¶
If you want the most reliable current path:
- use hosted Codex for serious benchmark or project runs
If you want a local-only workflow:
- use
gpt-oss:20bfor quick scaffold smoke checks - use
gpt-oss:120bfor more capable local proposal runs - increase proposal timeout for the larger model
Next Provider Work¶
The next provider milestone after the current Codex path is:
- either keep the current Codex-first scope
- or selectively strengthen one experimental backend at a time if there is a concrete user need