Using CodexOpt with Codex¶
Use this guide when your repo already has Codex instruction files and you want CodexOpt to improve them safely.
CodexOpt works with the same files Codex loads:
AGENTS.md.codex/skills/**/SKILL.md.agents/skills/**/SKILL.md
Start With A Preview¶
Run this from the repo where you use Codex:
This command:
- finds
AGENTS.mdandSKILL.mdfiles - mines starter tasks from git history and skill descriptions
- runs the reflective optimizer in preview mode
- shows what would change
- writes review artifacts under
.codexopt/
The default preview stays offline. It does not spend Codex or API budget unless you ask it to.
Run The Live Codex Loop¶
Use live mode when you want CodexOpt to evaluate actual Codex behavior:
Live mode uses codex exec as the optimizer and judge. CodexOpt evaluates the
candidate instruction file, captures feedback from the run, proposes a focused
rewrite, and keeps the rewrite only when it improves held-out tasks.
Apply The Result¶
After reviewing the preview, apply validated changes:
CodexOpt writes backups before changing files.
Review The Report¶
Write a markdown report after any run:
The report shows:
- files found
- files improved
- validation score movement
- accepted reflective edits
- sampled feedback that led to the edit
- fallback notes when CodexOpt had to use a weaker signal
Step By Step Workflow¶
Use this flow when you want more control than improve:
uv run codexopt init
uv run codexopt scan
uv run codexopt benchmark
uv run codexopt optimize skills --engine reflective
uv run codexopt apply --kind skills --dry-run
uv run codexopt report --output codexopt-report.md
Review the dry-run diff, then apply:
For AGENTS.md:
uv run codexopt optimize agents --engine reflective --file AGENTS.md
uv run codexopt apply --kind agents --dry-run
Add Simple Task Evidence¶
Task evidence tells CodexOpt what “better” means for your repo.
Create tasks.md:
- Update changelog entries for patch releases.
- Add regression tests before changing parser behavior.
- Summarize risky changes in the final response.
Reference it in codexopt.yaml:
Then run:
CodexOpt uses these tasks for train and validation splits. A candidate must improve held-out validation score before it can win.
Mine Starter Tasks¶
If you do not have task evidence yet, generate a starter file:
Review the generated codexopt-tasks.json, trim anything noisy, then add it to
evidence.task_files.
Add Command Rollouts¶
Use command rollouts when a deterministic verifier can decide whether a skill supports a workflow.
Create skill-rollouts.json:
[
{
"name": "release-skill-smoke",
"description": "Verify the release skill mentions changelog and tests.",
"command": "python scripts/verify_release_skill.py",
"timeout_seconds": 30,
"expected_stdout_contains": "ok"
}
]
Reference it:
Run:
CodexOpt copies the repo to a temporary directory, writes the candidate
SKILL.md, runs the verifier, and uses pass rate as a strong reward signal.
Add Codex Rollouts¶
Use Codex rollouts when you want to test how Codex behaves with a candidate skill.
Create codex-rollouts.json:
[
{
"name": "codex-release-notes",
"backend": "codex",
"description": "Ask Codex to use the candidate release skill on a release-note task.",
"codex_prompt": "Use the local release skill to update CHANGELOG.md for a patch release.",
"timeout_seconds": 120,
"expected_final_response_contains": "CHANGELOG.md",
"expected_command_contains": "git status",
"expected_file_change": "CHANGELOG.md",
"expected_file_contains": {
"path": "CHANGELOG.md",
"contains": "Patch"
}
}
]
Run live mode:
CodexOpt runs codex exec --json in a temporary repo copy and records the
trajectory:
- final response
- command executions
- file changes
- token usage
- errors
What SkillOpt Means In CodexOpt¶
CodexOpt now includes SkillOpt-style discipline in the Codex workflow:
- train and validation task splits
- bounded edits
- validation-gated acceptance
- rollout-based reward when available
- textual feedback that drives reflective mutation
For most users, the entry point is still simple: