CodexOpt

Benchmark and improve AGENTS.md and SKILL.md for Codex with a repeatable developer workflow.

Targeted Focused on repo-local Codex assets: AGENTS.md and SKILL.md.

Measurable Score instruction quality, attach evidence, and review artifact-backed changes.

Practical Preview, run live Codex checks, review, apply, and report from a single CLI.

CodexOpt helps teams benchmark and optimize Codex instruction assets with a repeatable workflow.

It focuses on Codex repo-local instruction assets:

CodexOpt gives developers a practical loop:

Why CodexOpt?¶

Most teams maintain AGENTS.md and SKILL.md manually. Over time these files drift:

CodexOpt makes those problems measurable and easier to improve safely.

scans a repo for agent and skill instruction files
benchmarks them with static checks plus optional task / issue evidence
improves them with heuristic cleanup or the reflective SkillOpt and GEPA inspired engine
records artifacts under .codexopt/
generates markdown reports for review and PR discussion

Instruction files tend to drift long before teams notice:

CodexOpt gives developers a way to improve those files with something closer to a normal engineering loop than ad hoc prompt editing.

If you want a small example repo with intentionally messy instructions, use the companion demo:

uv sync --extra dev
uv run codexopt init
uv run codexopt improve
uv run codexopt improve --live
uv run codexopt report --output codexopt-report.md

If you want a guided example with sample inputs, evidence files, and ready-made commands, start with the demo walkthrough.