Evolution Engine¶
The Evolution Engine automatically improves agent prompts using GEPA-style optimization based on evaluation results.
What is Evolution?¶
Evolution is the process of automatically improving agent prompts based on evaluation results. It uses GEPA (Genetic-Pareto) to generate better prompts through reflective mutation.
How Evolution Works¶
1. Analyze Failures¶
The engine analyzes evaluation failures:
2. Generate Proposals¶
It generates improved prompt proposals using GEPA:
proposed_prompt = gepa_proposer.propose_improved_prompt(
current_prompt=current_prompt,
reflective_dataset=failures
)
3. Test Candidates¶
It tests candidate prompts:
4. Select Best¶
It selects the best-performing prompt:
Using the Evolution Engine¶
Basic Usage¶
from codeoptix.evolution import EvolutionEngine
from codeoptix.evaluation import EvaluationEngine
from codeoptix.reflection import ReflectionEngine
from codeoptix.artifacts import ArtifactManager
# Create engines
eval_engine = EvaluationEngine(adapter, llm_client)
reflection_engine = ReflectionEngine(artifact_manager)
artifact_manager = ArtifactManager()
# Create evolution engine
evolution_engine = EvolutionEngine(
adapter=adapter,
evaluation_engine=eval_engine,
llm_client=llm_client,
artifact_manager=artifact_manager
)
# Evolve prompts
evolved = evolution_engine.evolve(
evaluation_results=results,
reflection=reflection_content,
behavior_names=["insecure-code"]
)
With Configuration¶
config = {
"max_iterations": 3, # Number of evolution iterations
"population_size": 3, # Number of candidates per iteration
"minibatch_size": 2, # Scenarios per evaluation
"improvement_threshold": 0.05, # Minimum improvement to accept
"proposer": {
"use_gepa": True, # Use GEPA for proposal
"model": "gpt-5.2"
}
}
evolution_engine = EvolutionEngine(
adapter, eval_engine, llm_client, artifact_manager, config=config
)
Evolution Process¶
Iteration 1¶
- Generate 3 candidate prompts
- Evaluate each on minibatch
- Select best candidate
- If improved, accept; otherwise stop
Iteration 2¶
- Generate new candidates from best
- Evaluate again
- Continue until no improvement
Final Result¶
Best prompt is saved and applied to adapter.
Evolved Prompts Structure¶
{
"agent": "codex",
"prompts": {
"system_prompt": "Improved prompt text..."
},
"metadata": {
"iterations": 2,
"initial_score": 0.65,
"final_score": 0.85,
"improvement": 0.20,
"evolution_history": [...]
}
}
GEPA Integration¶
The Evolution Engine uses GEPA for prompt proposal:
GEPA uses: - Reflective Dataset: Failure examples - Instruction Proposal: LLM-based prompt generation - Iterative Improvement: Multiple refinement rounds
Configuration Options¶
Evolution Parameters¶
{
"max_iterations": 3, # Maximum iterations
"population_size": 3, # Candidates per iteration
"minibatch_size": 2, # Scenarios per evaluation
"improvement_threshold": 0.05 # Minimum improvement
}
Proposer Configuration¶
{
"proposer": {
"use_gepa": True, # Use GEPA
"model": "gpt-5.2", # LLM model
"temperature": 0.7 # Generation temperature
}
}
Best Practices¶
1. Start with Evaluation¶
Always evaluate before evolving:
2. Generate Reflection¶
Generate reflection for better evolution:
3. Focus on Specific Behaviors¶
Evolve for specific behaviors:
4. Review Evolved Prompts¶
Always review evolved prompts:
Evolution History¶
The evolution history tracks progress:
history = evolved["metadata"]["evolution_history"]
for iteration in history:
print(f"Iteration {iteration['iteration']}:")
print(f" Score: {iteration['best_score']:.2f}")
print(f" Improvement: {iteration['improvement']:.2f}")
Limitations¶
1. Iteration Limit¶
Evolution stops after max_iterations or when no improvement is found.
2. Minibatch Evaluation¶
Uses minibatch for speed, may not reflect full performance.
3. LLM Dependency¶
Requires LLM access for prompt generation.
Next Steps¶
- GEPA Integration - Learn about GEPA
- Python API Guide - Advanced usage
- CLI Usage - Command-line usage