Skip to content

Benchmark Commands

Run coding harness benchmarks across multiple agent targets.


benchmark run

Run benchmark tasks across one or more targets.

superqode benchmark run <tasks.json> [OPTIONS]

Arguments

Argument Description
tasks.json Path to a JSON file defining benchmark tasks

Options

Option Description
--target Target to benchmark (repeatable, e.g., --target superqode --target opencode)

Examples

superqode benchmark run tasks.json --target superqode --target opencode --target pi --target deepagents

Tasks File Format

The tasks file is a JSON array of task definitions:

[
  {
    "id": "task-001",
    "prompt": "Implement a fibonacci function in Python"
  },
  {
    "id": "task-002",
    "prompt": "Write a markdown parser in TypeScript"
  }
]

Each task contains a unique id and a prompt sent to each target. Results are compared across targets by task ID, producing a side-by-side report of completion rates, time, and output quality.

Supported Targets

Target Description
superqode SuperQode harness
opencode OpenCode ACP agent
pi pi.ai coding agent
deepagents DeepAgents harness