SkillOpt Rollouts¶
This page documents the SkillOpt-inspired optimization work added to CodexOpt.
What Changed¶
CodexOpt now has a skillopt optimization engine for SKILL.md files.
The engine keeps the existing CodexOpt workflow, but adds SkillOpt-style controls:
- train/validation splitting for task evidence
- bounded skill edits through an edit budget
- validation-gated candidate acceptance
- rejected-candidate tracking
- deterministic candidate proposal
- executable rollout validation when JSON rollout tasks are configured
This makes skill optimization more rigorous than static instruction cleanup alone. A candidate skill has to improve held-out validation evidence before it can become the selected candidate.
Command¶
For the maintained reflective optimizer, use:
Use skillopt when you want a single-shot validation gate. Use reflective or
improve --live when you want feedback-driven mutation.
Configuration¶
optimization:
engine: "skillopt"
max_metric_calls: 60
skillopt_train_ratio: 0.67
skillopt_edit_budget: 24
skillopt_validation_delta: 0.01
Settings:
skillopt_train_ratio: fraction of task or rollout evidence used for candidate proposal.skillopt_edit_budget: maximum line edit operations allowed for a candidate.skillopt_validation_delta: minimum held-out validation gain required for acceptance.
Evidence Modes¶
skillopt supports two validation modes.
Static Evidence Gate¶
Markdown task files and issue files still work as before. CodexOpt uses them to build task keywords, score repository alignment, and split task evidence into train and validation slices.
Example:
In this mode, skillopt gates candidates using CodexOpt's existing quality score over held-out
task evidence.
Executable Rollout Gate¶
JSON task files can define executable verifier commands. When at least two rollout tasks are
available, skillopt splits those tasks into train and validation sets and uses validation pass
rate as the acceptance score.
Example skill-rollouts.json:
[
{
"name": "skill-smoke",
"description": "Verify the skill supports release validation.",
"command": ["python", "scripts/verify_skill.py"],
"timeout_seconds": 30,
"expected_stdout_contains": "ok"
},
{
"name": "skill-regression",
"description": "Verify the skill handles regression checks.",
"command": "python scripts/verify_regression_skill.py",
"timeout_seconds": 30
}
]
Codex task rollouts can be defined with backend: "codex" and codex_prompt:
[
{
"name": "codex-release-task",
"backend": "codex",
"description": "Ask Codex to use the candidate skill on a release task.",
"codex_prompt": "Read the release skill and update CHANGELOG.md for a patch release.",
"timeout_seconds": 120,
"expected_final_response_contains": "CHANGELOG.md",
"expected_command_contains": "git status",
"expected_file_change": "CHANGELOG.md",
"expected_file_contains": {
"path": "CHANGELOG.md",
"contains": "Patch"
}
}
]
Configure it like this:
For each candidate, CodexOpt:
- Copies the repository to a temporary directory.
- Writes the candidate
SKILL.mdinto the copied repo. - Runs each rollout command from the temporary repo root.
- Records pass/fail details.
- Uses held-out rollout pass rate as the validation gate.
Rollout Task Fields¶
Required:
command: command to run. It can be a string or a list of argv parts.
Optional:
name: readable task name used in artifacts.description: task statement used for evidence text.timeout_seconds: per-task timeout. Defaults to 30 seconds.expected_stdout_contains: required substring in stdout.expected_stderr_contains: required substring in stderr.expected_file_contains: object or list of objects withpathandcontains, checked after the command or Codex run.backend: use"codex"for a Codex non-interactive rollout.codex_prompt: prompt passed tocodex execwhenbackendis"codex".codex_binary: optional Codex executable path. Defaults tocodex.codex_args: optional full argument list inserted between the executable and prompt. Defaults toexec --skip-git-repo-check --sandbox workspace-write --ask-for-approval never --json --.expected_final_response_contains: required substring in Codex's final agent message.expected_command_contains: required substring in a command recorded by Codex JSONL events.expected_file_change: required substring in a file path recorded by Codex JSONL file-change events.
String commands run through the shell. List commands run without a shell.
For Codex rollouts, CodexOpt parses codex exec --json output into codex_trajectory metadata
with final response text, command executions, file changes, token usage, and errors.
Artifact Output¶
SkillOpt metadata is written into optimize.json under each file result:
{
"metadata": {
"skillopt": {
"train_task_count": 1,
"validation_task_count": 1,
"baseline_validation_score": 0.0,
"rollout_gate_used": true,
"accepted_count": 1,
"rejected_count": 2,
"rejected_candidates": []
}
}
}
Each candidate also includes details such as:
- train score
- validation score
- static score
- rollout score
- edit operations
- acceptance flag
- rejection reasons
- rollout command results
The CLI summary and markdown report also show whether rollout gating was used.
Acceptance Rules¶
A candidate is rejected if:
- it exceeds
skillopt_edit_budget - its validation score does not improve by at least
skillopt_validation_delta
When rollout tasks are available, validation score means held-out rollout pass rate. Otherwise, validation score means CodexOpt's static quality score over held-out task evidence.
Current Boundary¶
Codex-backed rollouts use codex exec --json and parse the resulting JSONL event stream into
trajectory metadata. They are suitable for validating concrete Codex workflows, but CodexOpt does
not yet run a full multi-epoch live Codex training loop with trajectory reflection and patch
generation from those traces. Today, Codex trajectories are used for validation and artifact
inspection.
Files Changed¶
Implementation:
src/codexopt/optimizer.pysrc/codexopt/patches.pysrc/codexopt/rollouts.pysrc/codexopt/quality.pysrc/codexopt/benchmark.pysrc/codexopt/cli.pysrc/codexopt/config.pysrc/codexopt/reporter.pysrc/codexopt/types.py
Tests:
tests/test_cli.py
Docs and examples:
README.mdcodexopt.example.yamldocs/configuration.mddocs/optimization.mddocs/benchmarking.mddocs/skillopt-rollouts.md