Evolution Engine¶

The Evolution Engine automatically improves agent prompts using GEPA-style optimization based on evaluation results.

What is Evolution?¶

Evolution is the process of automatically improving agent prompts based on evaluation results. It uses GEPA (Genetic-Pareto) to generate better prompts through reflective mutation.

How Evolution Works¶

1. Analyze Failures¶

The engine analyzes evaluation failures:

failures = extract_failures(evaluation_results)

2. Generate Proposals¶

It generates improved prompt proposals using GEPA:

proposed_prompt = gepa_proposer.propose_improved_prompt(
    current_prompt=current_prompt,
    reflective_dataset=failures
)

3. Test Candidates¶

It tests candidate prompts:

for candidate in candidates:
    score = evaluate_candidate(candidate)

4. Select Best¶

It selects the best-performing prompt:

best_prompt = select_best(candidates, scores)

Using the Evolution Engine¶

Basic Usage¶

from codeoptix.evolution import EvolutionEngine
from codeoptix.evaluation import EvaluationEngine
from codeoptix.reflection import ReflectionEngine
from codeoptix.artifacts import ArtifactManager

# Create engines
eval_engine = EvaluationEngine(adapter, llm_client)
reflection_engine = ReflectionEngine(artifact_manager)
artifact_manager = ArtifactManager()

# Create evolution engine
evolution_engine = EvolutionEngine(
    adapter=adapter,
    evaluation_engine=eval_engine,
    llm_client=llm_client,
    artifact_manager=artifact_manager
)

# Evolve prompts
evolved = evolution_engine.evolve(
    evaluation_results=results,
    reflection=reflection_content,
    behavior_names=["insecure-code"]
)

With Configuration¶

config = {
    "max_iterations": 3,      # Number of evolution iterations
    "population_size": 3,     # Number of candidates per iteration
    "minibatch_size": 2,      # Scenarios per evaluation
    "improvement_threshold": 0.05,  # Minimum improvement to accept
    "proposer": {
        "use_gepa": True,     # Use GEPA for proposal
        "model": "gpt-5.2"
    }
}

evolution_engine = EvolutionEngine(
    adapter, eval_engine, llm_client, artifact_manager, config=config
)

Evolution Process¶

Iteration 1¶

Generate 3 candidate prompts
Evaluate each on minibatch
Select best candidate
If improved, accept; otherwise stop

Iteration 2¶

Generate new candidates from best
Evaluate again
Continue until no improvement

Final Result¶

Best prompt is saved and applied to adapter.

Evolved Prompts Structure¶

{
    "agent": "codex",
    "prompts": {
        "system_prompt": "Improved prompt text..."
    },
    "metadata": {
        "iterations": 2,
        "initial_score": 0.65,
        "final_score": 0.85,
        "improvement": 0.20,
        "evolution_history": [...]
    }
}

GEPA Integration¶

The Evolution Engine uses GEPA for prompt proposal:

config = {
    "proposer": {
        "use_gepa": True,  # Enable GEPA
        "model": "gpt-5.2"
    }
}

GEPA uses: - Reflective Dataset: Failure examples - Instruction Proposal: LLM-based prompt generation - Iterative Improvement: Multiple refinement rounds

Configuration Options¶

Evolution Parameters¶

{
    "max_iterations": 3,           # Maximum iterations
    "population_size": 3,           # Candidates per iteration
    "minibatch_size": 2,            # Scenarios per evaluation
    "improvement_threshold": 0.05   # Minimum improvement
}

Proposer Configuration¶

{
    "proposer": {
        "use_gepa": True,           # Use GEPA
        "model": "gpt-5.2",          # LLM model
        "temperature": 0.7         # Generation temperature
    }
}

Best Practices¶

1. Start with Evaluation¶

Always evaluate before evolving:

results = eval_engine.evaluate_behaviors(...)

2. Generate Reflection¶

Generate reflection for better evolution:

reflection = reflection_engine.reflect(results)

3. Focus on Specific Behaviors¶

Evolve for specific behaviors:

evolved = evolution_engine.evolve(
    results,
    reflection,
    behavior_names=["insecure-code"]  # Focus
)

4. Review Evolved Prompts¶

Always review evolved prompts:

print(evolved["prompts"]["system_prompt"])

Evolution History¶

The evolution history tracks progress:

history = evolved["metadata"]["evolution_history"]

for iteration in history:
    print(f"Iteration {iteration['iteration']}:")
    print(f"  Score: {iteration['best_score']:.2f}")
    print(f"  Improvement: {iteration['improvement']:.2f}")

Limitations¶

1. Iteration Limit¶

Evolution stops after max_iterations or when no improvement is found.

2. Minibatch Evaluation¶

Uses minibatch for speed, may not reflect full performance.

3. LLM Dependency¶

Requires LLM access for prompt generation.

Next Steps¶

GEPA Integration - Learn about GEPA
Python API Guide - Advanced usage
CLI Usage - Command-line usage