Skip to content

Providers

Provider Model

metaharness separates the optimization loop from the system that actually edits files. That editing system is called a proposer backend.

Current status:

  • CodexExecBackend is real and exercised in benchmark runs
  • FakeBackend is deterministic and used for tests and smoke runs
  • GeminiCliBackend is experimental
  • PiCliBackend is experimental and uses Pi print-mode JSON output for integration
  • OpenCodeRunBackend is experimental and uses opencode run --format json

The current package is Codex-first with an extensible backend interface. All real provider benchmark runs currently documented in this repository were executed through the Codex CLI path. That includes hosted Codex and local Ollama models used through Codex.

Hosted Codex

Hosted Codex is supported today.

Requirements:

  • codex CLI installed
  • authenticated session or API key
  • network access to the provider

Important command:

uv run metaharness run \
  examples/python_fixture_benchmark \
  --backend codex \
  --hosted \
  --budget 1 \
  --run-name hosted-codex

Why --hosted matters:

  • some benchmark configs default to local Ollama
  • --hosted clears the local-provider settings for that run

Current conclusion:

  • hosted Codex is supported in the library today
  • the remaining requirement is environment access and authentication

Local Codex Over Ollama

This path is supported and has been exercised with:

  • gpt-oss:20b
  • gpt-oss:120b

Probe first:

uv run metaharness smoke codex \
  examples/python_fixture_benchmark \
  --probe-only \
  --oss \
  --local-provider ollama \
  --model gpt-oss:20b

Run:

uv run metaharness run \
  examples/python_fixture_benchmark \
  --backend codex \
  --oss \
  --local-provider ollama \
  --model gpt-oss:20b \
  --proposal-timeout 240 \
  --budget 1

Current Provider Takeaways

Based on the recorded benchmark runs in this repository:

Provider Benchmark Result Observed Pattern
Hosted Codex solved both real benchmarks in one proposal iteration fastest high quality path so far
Ollama gpt-oss:20b timed out on both real benchmarks at 240s useful for very small smoke runs, not reliable enough for the current real benchmarks
Ollama gpt-oss:120b solved python_fixture_benchmark slower than hosted Codex, but capable

This means the project's current public benchmark evidence is centered on Codex. Other coding-agent benchmark writeups may emphasize Claude Code or Opus, but those are not the provider paths currently documented in this repository.

Experimental Backends

The following backends are implemented, but they are not part of the main validated release path today. They are best treated as try-it-yourself integrations unless and until they accumulate stronger benchmark evidence in this repository.

Gemini CLI

Gemini is implemented as an experimental backend.

What is implemented:

  • non-interactive Gemini CLI invocation
  • stream-json parsing
  • model override support
  • approval mode and sandbox config wiring
  • proposal timeout handling
  • metaharness smoke gemini

Useful command:

uv run metaharness smoke gemini \
  ./my-coding-tool-optimizer \
  --probe-only

Current caveat:

  • Gemini is experimental and the benchmark evidence in this repository is not yet strong enough to present it as a primary backend

Pi

Pi is implemented as an experimental backend.

What is implemented:

  • Pi CLI invocation in --mode json
  • ephemeral --no-session default for optimization runs
  • JSON event parsing for assistant text, tool usage, command output, and likely file changes
  • model override support
  • proposal timeout handling
  • metaharness smoke pi

Useful command:

uv run metaharness smoke pi \
  ./my-coding-tool-optimizer \
  --probe-only

Current caveat:

  • Pi is experimental and does not yet have successful real benchmark records checked into this repository

OpenCode

OpenCode is implemented as an experimental backend.

What is implemented:

  • non-interactive opencode run invocation
  • JSON event parsing for text, tool usage, command execution, and likely changed files
  • model override support
  • agent and variant config wiring
  • proposal timeout handling
  • metaharness smoke opencode

Useful command:

uv run metaharness smoke opencode \
  ./my-coding-tool-optimizer \
  --probe-only

Current caveat:

  • OpenCode is experimental and does not yet have benchmark records checked into this repository

What To Use In Practice

If you want the most reliable current path:

  • use hosted Codex for serious benchmark or project runs

If you want a local-only workflow:

  • use gpt-oss:20b for quick scaffold smoke checks
  • use gpt-oss:120b for more capable local proposal runs
  • increase proposal timeout for the larger model

Next Provider Work

The next provider milestone after the current Codex path is:

  • either keep the current Codex-first scope
  • or selectively strengthen one experimental backend at a time if there is a concrete user need