Using CodexOpt with Codex¶

Use this guide when your repo already has Codex instruction files and you want CodexOpt to improve them safely.

CodexOpt works with the same files Codex loads:

AGENTS.md
.codex/skills/**/SKILL.md
.agents/skills/**/SKILL.md

Start With A Preview¶

Run this from the repo where you use Codex:

uv run codexopt improve

This command:

finds AGENTS.md and SKILL.md files
mines starter tasks from git history and skill descriptions
runs the reflective optimizer in preview mode
shows what would change
writes review artifacts under .codexopt/

The default preview stays offline. It does not spend Codex or API budget unless you ask it to.

Run The Live Codex Loop¶

Use live mode when you want CodexOpt to evaluate actual Codex behavior:

uv run codexopt improve --live

Live mode uses codex exec as the optimizer and judge. CodexOpt evaluates the candidate instruction file, captures feedback from the run, proposes a focused rewrite, and keeps the rewrite only when it improves held-out tasks.

Apply The Result¶

After reviewing the preview, apply validated changes:

uv run codexopt improve --live --apply

CodexOpt writes backups before changing files.

Review The Report¶

Write a markdown report after any run:

uv run codexopt report --output codexopt-report.md

The report shows:

files found
files improved
validation score movement
accepted reflective edits
sampled feedback that led to the edit
fallback notes when CodexOpt had to use a weaker signal

Step By Step Workflow¶

Use this flow when you want more control than improve:

uv run codexopt init
uv run codexopt scan
uv run codexopt benchmark
uv run codexopt optimize skills --engine reflective
uv run codexopt apply --kind skills --dry-run
uv run codexopt report --output codexopt-report.md

Review the dry-run diff, then apply:

uv run codexopt apply --kind skills

For AGENTS.md:

uv run codexopt optimize agents --engine reflective --file AGENTS.md
uv run codexopt apply --kind agents --dry-run

Add Simple Task Evidence¶

Task evidence tells CodexOpt what “better” means for your repo.

Create tasks.md:

- Update changelog entries for patch releases.
- Add regression tests before changing parser behavior.
- Summarize risky changes in the final response.

Reference it in codexopt.yaml:

evidence:
  task_files:
    - tasks.md

Then run:

uv run codexopt improve

CodexOpt uses these tasks for train and validation splits. A candidate must improve held-out validation score before it can win.

Mine Starter Tasks¶

If you do not have task evidence yet, generate a starter file:

uv run codexopt tasks init

Review the generated codexopt-tasks.json, trim anything noisy, then add it to evidence.task_files.

Add Command Rollouts¶

Use command rollouts when a deterministic verifier can decide whether a skill supports a workflow.

Create skill-rollouts.json:

[
  {
    "name": "release-skill-smoke",
    "description": "Verify the release skill mentions changelog and tests.",
    "command": "python scripts/verify_release_skill.py",
    "timeout_seconds": 30,
    "expected_stdout_contains": "ok"
  }
]

Reference it:

evidence:
  task_files:
    - skill-rollouts.json

Run:

uv run codexopt improve

CodexOpt copies the repo to a temporary directory, writes the candidate SKILL.md, runs the verifier, and uses pass rate as a strong reward signal.

Add Codex Rollouts¶

Use Codex rollouts when you want to test how Codex behaves with a candidate skill.

Create codex-rollouts.json:

[
  {
    "name": "codex-release-notes",
    "backend": "codex",
    "description": "Ask Codex to use the candidate release skill on a release-note task.",
    "codex_prompt": "Use the local release skill to update CHANGELOG.md for a patch release.",
    "timeout_seconds": 120,
    "expected_final_response_contains": "CHANGELOG.md",
    "expected_command_contains": "git status",
    "expected_file_change": "CHANGELOG.md",
    "expected_file_contains": {
      "path": "CHANGELOG.md",
      "contains": "Patch"
    }
  }
]

Run live mode:

uv run codexopt improve --live

CodexOpt runs codex exec --json in a temporary repo copy and records the trajectory:

final response
command executions
file changes
token usage
errors

What SkillOpt Means In CodexOpt¶

CodexOpt now includes SkillOpt-style discipline in the Codex workflow:

train and validation task splits
bounded edits
validation-gated acceptance
rollout-based reward when available
textual feedback that drives reflective mutation

For most users, the entry point is still simple:

uv run codexopt improve --live