CodexOpt
Benchmark and improve AGENTS.md and SKILL.md for Codex with a repeatable developer workflow.
AGENTS.md and SKILL.md.
CodexOpt helps teams benchmark and optimize Codex instruction assets with a repeatable workflow.
It focuses on Codex repo-local instruction assets:
AGENTS.md.codex/skills/**/SKILL.md.agents/skills/**/SKILL.md
CodexOpt gives developers a practical loop:
- scan instruction assets
- benchmark their quality
- generate improved candidates
- review diffs and reports
- apply only validated improvements
Why CodexOpt?¶
Most teams maintain AGENTS.md and SKILL.md manually. Over time these files drift:
- duplicated rules
- contradictory instructions
- missing verification guidance
- weak skill triggers
- prompt bloat
CodexOpt makes those problems measurable and easier to improve safely.
What It Does¶
- scans a repo for agent and skill instruction files
- benchmarks them with static checks plus optional task / issue evidence
- improves them with heuristic cleanup or the reflective SkillOpt and GEPA inspired engine
- records artifacts under
.codexopt/ - generates markdown reports for review and PR discussion
Why Developers Use It¶
Instruction files tend to drift long before teams notice:
- duplicated rules
- contradictory constraints
- weak testing guidance
- vague skill triggers
- prompt bloat
CodexOpt gives developers a way to improve those files with something closer to a normal engineering loop than ad hoc prompt editing.
Demo Repository¶
If you want a small example repo with intentionally messy instructions, use the companion demo:
- Demo repo: https://github.com/SuperagenticAI/codexopt-demo
- Demo guide: Open the demo walkthrough
Try It¶
uv sync --extra dev
uv run codexopt init
uv run codexopt improve
uv run codexopt improve --live
uv run codexopt report --output codexopt-report.md
If you want a guided example with sample inputs, evidence files, and ready-made commands, start with the demo walkthrough.