Prompt Optimization
What is Prompt Optimization?
Prompt optimization is the process of improving an agent's core instructions, persona definition, reasoning patterns, and response formatting to produce better outputs. This is the foundation layer of agent optimization.
Key Insight: GEPA doesn't just tweak prompt text. It learns strategies for how to structure instructions, when to use reasoning, and how to format responses for maximum clarity and effectiveness.
What GEPA Optimizes in Prompts
1. Persona and Role Definition
What It Is: The agent's identity, role, and behavioral guidelines
What GEPA Learns: - Role clarity and specificity - Goal articulation - Communication style - Domain expertise framing
Example Configuration:
spec:
persona:
role: Senior Software Engineer & Security Reviewer
goal: Provide thorough, actionable code reviews
traits:
- detail-oriented
- security-conscious
- constructive
Before Optimization:
role: Code Reviewer
goal: Review code
After GEPA Optimization:
role: Senior Software Engineer & Security Reviewer
goal: Provide thorough, actionable code reviews that improve security,
performance, and maintainability with specific solutions
traits: [detail-oriented, security-conscious, constructive, pragmatic]
Impact: More focused, professional reviews with clear authority
2. Task Instructions
What It Is: Core instructions for each agent task
What GEPA Learns: - Instruction clarity - Step specification - Output requirements - Success criteria
Example:
tasks:
- name: review_code
instruction: |
Analyze the code for:
1. Security vulnerabilities (SQL injection, XSS, hardcoded secrets)
2. Performance issues (O(n²) complexity, inefficient loops)
3. Code quality (cyclomatic complexity, duplication, naming)
For each issue:
- Identify the specific line/pattern
- Explain why it's problematic
- Provide a concrete solution with code example
- Cite relevant documentation or standards
Before Optimization:
instruction: Review the code and find problems
After GEPA Optimization:
instruction: |
Analyze code systematically:
1. Security: Check for OWASP Top 10 vulnerabilities
2. Performance: Identify O(n²) or worse complexity
3. Quality: Calculate cyclomatic complexity (threshold: 4)
For each finding:
- Specify exact line and pattern
- Explain impact (security risk, performance cost, etc.)
- Provide executable solution with code
- Reference standards (OWASP, PEP 8, etc.)
Impact: Structured, comprehensive reviews vs. vague suggestions
3. Reasoning Patterns
What It Is: Chain-of-thought, step-by-step thinking process
What GEPA Learns: - When to use chain-of-thought - How many reasoning steps - Depth of analysis - Thinking structure
Example:
reasoning:
style: chain_of_thought
steps:
- Scan code for security patterns
- Calculate complexity metrics
- Check against best practices
- Prioritize findings by severity
- Formulate actionable recommendations
Before Optimization:
Agent thinks implicitly, inconsistent analysis
After GEPA Optimization:
Agent follows explicit reasoning steps:
1. Scan for security patterns (SQL concatenation, hardcoded secrets)
2. Run complexity analysis (cyclomatic complexity, nesting depth)
3. Check best practices (naming, error handling, DRY)
4. Prioritize by severity (critical > high > medium > low)
5. Generate solutions with code examples
Impact: Consistent, thorough analysis every time
4. Response Formatting
What It Is: How the agent structures its output
What GEPA Learns: - Response organization - Level of detail - Code example formatting - Citation style
Example Output Structure:
output_fields:
- name: review
type: str
description: Comprehensive code review with structured findings
- name: severity
type: str
description: Overall severity (critical, high, medium, low)
- name: recommendations
type: list
description: Prioritized action items
Before Optimization:
"Your code has some issues. Check for SQL injection and complexity."
After GEPA Optimization:
SEVERITY: CRITICAL
FINDINGS:
1. SQL Injection Vulnerability (CRITICAL)
Line: query = "SELECT * FROM users WHERE name = '" + username + "'"
Issue: User input concatenated directly into SQL query
Solution:
```python
query = "SELECT * FROM users WHERE name = ?"
result = db.execute(query, (username,))
```
Reference: OWASP Top 10 #1
2. High Cyclomatic Complexity (MEDIUM)
Function: login()
Complexity: 8 (threshold: 4)
Issue: Too many nested conditions
Solution: Use early returns
```python
if not result: return False
if not result['active']: return False
return True
```
RECOMMENDATIONS:
1. Fix SQL injection immediately (security risk)
2. Refactor login() to reduce complexity (maintainability)
Impact: Actionable, professional reviews vs. vague suggestions
Before/After Comparison
Scenario: Code Review Request
Input Code:
def get_user(username):
query = "SELECT * FROM users WHERE name = '" + username + "'"
return db.execute(query)
Before Prompt Optimization:
"Your code might have security issues. Consider using better practices."
After GEPA Prompt Optimization:
SEVERITY: CRITICAL
FINDING: SQL Injection Vulnerability
Line: query = "SELECT * FROM users WHERE name = '" + username + "'"
Issue: User input (username) is concatenated directly into SQL query without
sanitization, allowing attackers to inject malicious SQL code.
Attack Example:
username = "admin' OR '1'='1"
ā Returns all users, bypassing authentication
Solution:
```python
def get_user(username):
query = "SELECT * FROM users WHERE name = ?"
return db.execute(query, (username,))
Why This Works: - Parameterized queries prevent SQL injection - Database driver handles escaping automatically - Industry standard (OWASP recommendation)
Reference: OWASP Top 10 2024 - Injection Attacks (#1)
- Specific issue identified
- Attack scenario explained
- Executable solution provided
- Standards cited
**Improvement**: From 0% helpful ā 100% actionable
---
## How GEPA Learns Prompt Strategies
### The Optimization Process
1. **Analysis Phase**
- GEPA evaluates agent responses
- Identifies vague or incomplete reviews
- Notes missing elements (severity, solutions, citations)
2. **Reflection Phase**
```
GEPA Reflection:
"The agent identified 'security issue' but didn't specify SQL injection.
It didn't provide a code solution or cite OWASP standards.
Need more specific instructions for security findings."
```
3. **Mutation Phase**
- Generates improved prompt variations
- Tests: "Always specify exact vulnerability type (SQL injection, XSS, etc.)"
- Tests: "Provide executable code solutions"
- Tests: "Cite security standards (OWASP, CWE)"
4. **Selection Phase**
- Evaluates each variation
- Selects best-performing prompts
- Keeps improvements, discards regressions
5. **Iteration**
- Repeats process
- Compounds improvements
- Converges on optimal prompts
**Result**: Learned prompt strategies, not just text tweaks
---
## Best Practices
### 1. Start with Clear Base Prompts
```yaml
# Good starting point
persona:
role: [Specific role with expertise level]
goal: [Clear, measurable objective]
traits: [Relevant characteristics]
tasks:
- instruction: [Step-by-step process, not vague request]
2. Define Expected Output Format
output_fields:
- name: finding
description: Specific issue identified
- name: severity
description: Impact level
- name: solution
description: Executable fix with code
GEPA will learn to match this structure consistently.
3. Provide RSpec-Style BDD Scenarios
feature_specifications:
scenarios:
- name: security_detection
description: Agent should detect and explain security issues
input:
code: [Code with SQL injection]
expected_output:
review: Must mention "SQL injection" and "parameterized queries"
severity: critical
GEPA optimizes prompts to match these specifications.
4. Use Datasets for Real-World Patterns
datasets:
- source: ./data/real_reviews.csv
limit: 100
GEPA learns effective phrasing from real examples.
Common Patterns GEPA Learns
Pattern 1: Specificity Over Generality
Before: "Check for issues"
After: "Check for: SQL injection, XSS, hardcoded secrets, complexity > 4"
Pattern 2: Actionability
Before: "This could be better"
After: "Replace X with Y. Here's the code: [executable solution]"
Pattern 3: Citations
Before: "This is bad practice"
After: "Violates SOLID principles (reference: Clean Code, Chapter 3)"
Pattern 4: Structured Output
Before: Freeform text response
After: Severity ā Findings ā Solutions ā References
Metrics and Results
What Gets Measured
- Specificity: % of responses with specific issue identification
- Actionability: % of responses with executable solutions
- Citation Rate: % of responses with references
- Format Compliance: % matching expected output structure
Typical Improvements
| Metric | Before | After | Improvement |
| Response Specificity | 30% | 90% | +60% |
| Actionable Solutions | 20% | 85% | +65% |
| Citation Rate | 10% | 75% | +65% |
| Format Compliance | 40% | 95% | +55% |
Quick Start
Enable Prompt Optimization
Prompts are automatically optimized when you run GEPA:
# 1. Create agent with good base prompts
super spec generate code_reviewer --template genie
# 2. Compile
super agent compile code_reviewer
# 3. Evaluate baseline
super agent evaluate code_reviewer
# 4. Optimize prompts (and other enabled layers)
super agent optimize code_reviewer --auto medium
# 5. See improvement
super agent evaluate code_reviewer # automatically loads optimized weights
GEPA automatically optimizes all prompts in your playbook!
Advanced: Prompt-Specific Tuning
Control What GEPA Optimizes
optimization:
optimizer:
name: GEPA
params:
# Focus on prompts
variables_to_optimize:
- persona.role
- persona.goal
- tasks[*].instruction
Optimization Intensity
optimization:
optimizer:
params:
auto: light # Quick prompt tweaks (3-5 iterations)
auto: medium # Balanced optimization (10-15 iterations)
auto: intensive # Thorough exploration (20-30 iterations)
Common Prompt Improvements
Improvement 1: Adding Specificity
Before: "Analyze the code"
After: "Analyze code for: (1) Security vulnerabilities per OWASP Top 10, (2) Performance issues with O(n) or worse complexity, (3) Code quality issues with cyclomatic complexity > 4"
Improvement 2: Adding Structure
Before: "Provide feedback"
After: "Structure review as: Severity ā Findings (line number, issue, impact) ā Solutions (code example) ā References (standards)"
Improvement 3: Adding Examples
Before: "Suggest improvements"
After: "Suggest improvements with before/after code examples showing exact changes needed"
Improvement 4: Adding Context
Before: "Review code quality"
After: "Review code quality considering: project type, team size, production criticality, industry standards"
Integration with Other Layers
Prompt optimization works best when combined with other layers:
Prompt + RAG:
Prompt: "Search security documentation before analyzing SQL queries"
ā Combines optimized instructions with optimized retrieval
Prompt + Tools:
Prompt: "Use complexity_calculator for nested conditions exceeding 3 levels"
ā Combines optimized instructions with optimized tool selection
Prompt + Memory:
Prompt: "Reference similar past findings when identifying patterns"
ā Combines optimized instructions with optimized memory retrieval
Troubleshooting
Issue: Prompts Not Improving
Symptoms: Optimization runs but prompts stay similar
Solutions:
1. Check RSpec-style BDD scenarios are specific enough
2. Ensure datasets have diverse examples
3. Increase iterations: --auto intensive
4. Add reflection_lm: --reflection-lm llama3.1:8b
Issue: Prompts Too Long
Symptoms: Optimized prompts exceed token limits
Solutions: 1. Set max_tokens constraint in optimization config 2. Use summarization in optimization params 3. Focus on key instructions, not exhaustive lists
Issue: Inconsistent Format
Symptoms: Agent responses vary in structure
Solutions: 1. Define strict output_fields schema 2. Add format examples in RSpec-style BDD scenarios 3. Use structured output in task configuration
Related Guides
- š RAG Optimization - Optimize knowledge retrieval
- š ļø Tool Optimization - Optimize tool selection
- š§ Memory Optimization - Optimize context selection
- š Dataset-Driven Optimization - Train on large-scale data
- šÆ Full-Stack Example - See all layers together
- GEPA Optimizer Guide - Technical details
- SuperSpec DSL - Playbook configuration
Next: Learn how GEPA optimizes RAG retrieval strategies ā