Full-Stack Optimization: Complete Example
Overview
This guide shows how all 6 optimization layers work together in a real production agent. We'll use a Code Review Agent as our example because it demonstrates every optimization layer naturally.
What You'll See: - How each layer contributes to the final result - The compound effect of multi-layer optimization - Before/after comparisons at each layer - Complete end-to-end workflow
The Use Case: Code Review Agent
Agent Purpose
Analyze code for security vulnerabilities, performance issues, and code quality problems, providing actionable recommendations with code examples.
Why This Example?
Code review requires ALL optimization layers: - Prompts: How to structure comprehensive reviews - RAG: When to search security/performance documentation - Tools: Which analysis tools to use (complexity, security scanning) - Memory: Recall similar past findings - Protocols: Use MCP for advanced file operations - Datasets: Learn from 100 real GitHub code reviews
The Agent Configuration
Full Playbook Structure
apiVersion: agent/v1
kind: AgentSpec
metadata:
name: Code Review Assistant
level: genies
spec:
# Language Model
language_model:
provider: ollama
model: llama3.1:8b
# Layer 1: Prompts (always optimized)
persona:
role: Senior Software Engineer & Security Reviewer
goal: Provide thorough, actionable code reviews
traits: [detail-oriented, security-conscious, constructive]
# Layer 2: RAG (knowledge retrieval)
rag:
enabled: true
knowledge_base:
- ./knowledge/security/*.md
- ./knowledge/performance/*.md
- ./knowledge/best_practices/*.md
top_k: 5
# Layer 3: Tools (analysis capabilities)
tools:
enabled: true
specific_tools:
- complexity_calculator
- security_scanner
- performance_analyzer
# Layer 4: Memory (context from past reviews)
memory:
enabled: true
enable_context_optimization: true
max_context_tokens: 2000
# Layer 5: Protocols (MCP for advanced ops)
# (Can be added if needed)
# Layer 6: Datasets (learn from real reviews)
datasets:
- name: github_reviews
source: ./data/real_code_reviews.csv
limit: 100
# GEPA optimizes ALL layers
optimization:
optimizer:
name: GEPA
params:
auto: medium
The Test Input: Complex Real-World Code
# Production authentication code with multiple issues
password = "admin123" # Issue 1: Hardcoded credential
def authenticate_user(username):
# Issue 2: SQL injection via string concatenation
query = "SELECT * FROM users WHERE username='" + username + "'"
result = db.execute(query)
# Issue 3: High cyclomatic complexity (nested conditionals)
if result:
if result['password'] == password: # Issue 4: Plaintext password
if result['active']:
if result['verified']:
if result['subscription'] == 'premium':
return True
return False
Issues Present: 1. Hardcoded password (CRITICAL) 2. SQL injection (CRITICAL) 3. Plaintext password comparison (CRITICAL) 4. High cyclomatic complexity (MEDIUM)
Baseline Performance (Before Optimization)
Initial Agent Behavior
super agent compile code_review_assistant
super agent evaluate code_review_assistant
Result:
Testing 8 RSpec-style BDD scenarios:
❌ SQL Injection Detection: FAIL
Expected: Specific vulnerability with solution
Got: "Check your SQL queries"
❌ Hardcoded Credentials: FAIL
Expected: Identify hardcoded secret with env var solution
Got: Generic "use better practices"
✅ Complexity Analysis: PASS
(Agent happened to mention complexity)
❌ Security Comprehensive: FAIL
Expected: All 3 security issues identified
Got: Only 1 issue found
Overall: 1/8 PASS (12.5%)
Agent Output (Baseline):
"Your code has some security issues and could be improved."
Problems: - Extremely vague - No specific issues identified - No solutions provided - Not actionable - No tool usage - No knowledge retrieval - No past context referenced
Layer-by-Layer Optimization
Step 1: Run GEPA Optimization
super agent optimize code_review_assistant --auto medium --fresh
What Happens: GEPA optimizes all 6 layers simultaneously
Layer 1: Prompt Optimization in Action
GEPA Reflection (Iteration 3):
"Agent gave vague 'security issues' response. Need specific vulnerability
identification. Optimize persona to include security expertise and goal
to include 'specific findings with severity classification'."
Prompt Evolution:
Before:
persona:
role: Code Reviewer
goal: Find code issues
After GEPA:
persona:
role: Senior Software Engineer & Security Reviewer with expertise in
OWASP Top 10, secure coding, and performance optimization
goal: Identify specific security vulnerabilities, performance bottlenecks,
and code quality issues with severity classification and executable
solutions
Result: Agent now knows to be specific and provide solutions
Layer 2: RAG Optimization in Action
GEPA Reflection (Iteration 5):
"Agent should search security documentation BEFORE analyzing SQL queries.
Pattern: String concatenation in SQL context → Retrieve sql_injection.md"
RAG Strategy Evolution:
Before:
Query: Generic "code review"
Retrieved: random_doc.md, naming_conventions.md, testing.md
Relevance: 25% (wrong topics)
After GEPA:
Learned Strategy:
- Detect: String concatenation + SQL keywords
- Query: "SQL injection prevention parameterized queries OWASP"
- Retrieved: sql_injection.md, database_security.md, owasp_a03.md
- Relevance: 95% (perfect match)
Result: Agent retrieves exact security docs needed
Layer 3: Tool Optimization in Action
GEPA Reflection (Iteration 7):
"Agent should use complexity_calculator for nested conditions.
Pattern: >3 nested if statements → Use complexity_calculator
Then cite threshold violation in review."
Tool Usage Evolution:
Before:
Tools Available: complexity_calculator, security_scanner
Tools Used: None
Result: No metrics, vague assessment
After GEPA:
Learned Strategy:
1. Detect nested conditionals → Use complexity_calculator
2. Detect string concatenation in SQL → Use security_scanner
3. Combine results for comprehensive review
Tools Used:
- complexity_calculator → Returns: complexity = 5
- security_scanner → Returns: [SQL injection, hardcoded secret]
Result: Metric-driven, tool-backed findings
Layer 4: Memory Optimization in Action
GEPA Reflection (Iteration 9):
"Agent should reference similar past SQL injection findings.
Memory optimization should prioritize high-importance security memories."
Memory Selection Evolution:
Before:
Memories: Last 10 reviews chronologically
Relevance: 30% (mostly irrelevant)
After GEPA:
Learned Strategy:
Query: "SQL injection"
Selected Memories:
- Memory #47: Previous SQL injection in auth code (0.95 similarity)
- Memory #23: Parameterized query pattern used before (0.88 similarity)
- Memory #61: Team security standard (0.82 similarity, high importance)
Result: Highly relevant past context
Result: Agent says "Similar to previous finding #47. Use parameterized queries as recommended before."
Layer 5: Protocol Optimization in Action
GEPA Reflection (Iteration 11):
"For simple code snippets, use built-in tools (faster).
For file system operations, use MCP when recursive or complex."
Protocol Strategy Evolution:
Before:
Always initialize MCP → Slow startup (500ms overhead)
After GEPA:
Learned Strategy:
- Single code snippet → Use built-in tools (fast)
- Directory analysis → Use MCP filesystem (powerful)
- Git context needed → Use MCP github (required)
Current query: Single code snippet
Tool: built-in security_scanner (50ms vs. 500ms)
Result: Optimal tool source selection
Layer 6: Dataset Learning in Action
GEPA Learning from 100 Real Reviews:
Pattern Extraction:
- 85% of expert reviews cite OWASP standards
- 90% provide executable code solutions
- 75% include attack examples for critical issues
- 95% classify severity based on exploitability
- 80% prioritize security over code quality
Agent Learns:
→ Always cite OWASP for web security
→ Provide code solutions with imports
→ Show attack vector for CRITICAL issues
→ Classify: CRITICAL > HIGH > MEDIUM > LOW
→ Prioritize security in recommendations
Result: Agent review style matches expert reviews
Final Optimized Performance
Re-Evaluation After Optimization
super agent evaluate code_review_assistant # automatically loads optimized weights
Result:
Testing 8 RSpec-style BDD scenarios:
✅ SQL Injection Detection: PASS
✓ Identified SQL injection with specific attack example
✓ Provided parameterized query solution
✓ Cited OWASP Top 10 #1
✅ Hardcoded Credentials: PASS
✓ Found hardcoded password
✓ Recommended environment variables
✓ Cited CWE-798
✅ Complexity Analysis: PASS
✓ Used complexity_calculator tool
✓ Reported complexity 5/4 threshold
✓ Suggested early returns pattern
✅ Security Comprehensive: PASS
✓ Found all 3 security issues
✓ Prioritized by severity
✓ Cited OWASP standards
✅ Performance Issues: PASS
✅ Memory Reference: PASS
✅ Multi-Issue Handling: PASS
✅ Professional Format: PASS
Overall: 8/8 PASS (100%)
Improvement: +87.5% (12.5% → 100%)
Complete Optimized Agent Output
Live Query Result
super agent run code_review_assistant --code "[authentication code above]"
Complete Agent Response (All Layers Working Together):
═══════════════════════════════════════════════════════════════════════════
SECURITY CODE REVIEW RESULTS
═══════════════════════════════════════════════════════════════════════════
OVERALL SEVERITY: 🔴 CRITICAL
📋 CRITICAL ISSUES (Fix Before Deployment):
1. Hardcoded Database Credential (Line 2)
─────────────────────────────────────────────────────────────────────
Code: password = "admin123"
🎯 Issue [Tool: security_scanner]:
Sensitive credential stored directly in source code
⚠️ Risk:
- Credential exposed if code repository accessed
- Security breach if code deployed to production
- Violates security compliance standards
✅ Solution:
```python
import os
password = os.environ.get('DB_PASSWORD')
if not password:
raise ValueError("DB_PASSWORD environment variable not set")
```
📚 Reference [Retrieved from: knowledge/security/hardcoded_secrets.md]:
- OWASP ASVS 2.7.1: Credentials must not be hardcoded
- CWE-798: Use of Hard-coded Credentials
- Similar to: Previous finding #61 (resolved with env vars)
2. SQL Injection Vulnerability (Line 5)
─────────────────────────────────────────────────────────────────────
Code: query = "SELECT * FROM users WHERE username='" + username + "'"
🎯 Issue [Tool: security_scanner]:
User input concatenated directly into SQL query
⚠️ Attack Example:
Input: username = "admin' OR '1'='1"
Result: SELECT * FROM users WHERE username='admin' OR '1'='1'
Impact: Returns ALL users, authentication bypass
✅ Solution:
```python
def authenticate_user(username):
query = "SELECT * FROM users WHERE username = ?"
result = db.execute(query, (username,))
```
📚 Reference [Retrieved from: knowledge/security/sql_injection.md]:
- OWASP Top 10 2024: #1 Injection
- CWE-89: SQL Injection
- Parameterized queries prevent all injection attacks
- Similar to: Previous finding #47 (parameterized queries recommended)
3. Plaintext Password Comparison (Line 7)
─────────────────────────────────────────────────────────────────────
Code: if result['password'] == password
🎯 Issue [Tool: security_scanner]:
Direct password comparison without hashing
⚠️ Risk:
- Passwords stored in plaintext in database
- Timing attack vulnerability (password length leak)
- Complete compromise if database breached
✅ Solution:
```python
import bcrypt
# When storing password:
hashed = bcrypt.hashpw(password.encode(), bcrypt.gensalt())
# When verifying:
if bcrypt.checkpw(password.encode(), result['password_hash']):
return True
```
📚 Reference [Retrieved from: knowledge/security/password_hashing.md]:
- OWASP ASVS 2.4.1: Passwords must be hashed with bcrypt/Argon2
- Use constant-time comparison to prevent timing attacks
═══════════════════════════════════════════════════════════════════════════
📊 MEDIUM PRIORITY ISSUES:
4. High Cyclomatic Complexity (Lines 7-12)
─────────────────────────────────────────────────────────────────────
Function: authenticate_user()
🎯 Metrics [Tool: complexity_calculator]:
- Cyclomatic Complexity: 5
- Recommended Threshold: 4
- Nesting Depth: 5 levels
⚠️ Impact:
- Hard to test (2^5 = 32 test cases needed)
- Difficult to maintain
- Error-prone modifications
✅ Solution (Early Returns Pattern):
```python
def authenticate_user(username):
query = "SELECT * FROM users WHERE username = ?"
result = db.execute(query, (username,))
if not result:
return False
if not bcrypt.checkpw(password.encode(), result['password_hash']):
return False
if not result['active']:
return False
if not result['verified']:
return False
if result['subscription'] != 'premium':
return False
return True
```
📚 Reference [Retrieved from: knowledge/best_practices/code_smells.md]:
- Clean Code: Cyclomatic complexity should be <4
- Early returns reduce nesting and improve readability
- Similar to: Issue #23 (refactored using early returns)
═══════════════════════════════════════════════════════════════════════════
📊 ANALYSIS METRICS:
Lines Analyzed: 13
Issues Found: 4
- CRITICAL: 3
- HIGH: 0
- MEDIUM: 1
- LOW: 0
Tool Usage:
- security_scanner: 3 findings
- complexity_calculator: 1 finding
Knowledge Retrieved:
- security/sql_injection.md (0.94 relevance)
- security/hardcoded_secrets.md (0.91 relevance)
- security/password_hashing.md (0.89 relevance)
- best_practices/code_smells.md (0.87 relevance)
Memory References:
- Issue #47: SQL injection (similar pattern)
- Issue #61: Hardcoded credentials (team standard)
- Issue #23: Complexity refactoring (same solution)
Analysis Time: 2.8 seconds
═══════════════════════════════════════════════════════════════════════════
🎯 RECOMMENDATIONS (Priority Order):
IMMEDIATE (Deploy Blockers):
1. Fix SQL injection (Line 5) - CRITICAL security risk
2. Remove hardcoded password (Line 2) - CRITICAL exposure
3. Implement password hashing (Line 7) - CRITICAL crypto failure
BEFORE NEXT RELEASE:
4. Refactor complexity (Lines 7-12) - Maintainability
STANDARDS COMPLIANCE CHECK:
❌ OWASP Top 10 #1: Injection - VIOLATED
❌ OWASP ASVS 2.4.1: Password Hashing - VIOLATED
❌ OWASP ASVS 2.7.1: Hardcoded Credentials - VIOLATED
⚠️ Clean Code: Complexity Threshold - EXCEEDED
RISK LEVEL: 🔴 DEPLOY BLOCKED (3 critical security issues)
═══════════════════════════════════════════════════════════════════════════
📚 REFERENCES & RESOURCES:
Security:
- OWASP Top 10 2024: https://owasp.org/Top10
- SQL Injection Prevention: [Local: knowledge/security/sql_injection.md]
- Password Security: [Local: knowledge/security/password_hashing.md]
Code Quality:
- Clean Code (Martin): Chapter 3 (Functions)
- Cyclomatic Complexity: https://en.wikipedia.org/wiki/Cyclomatic_complexity
Team Standards:
- Previous similar findings: #23, #47, #61
- Security review checklist: ./docs/security_checklist.md
═══════════════════════════════════════════════════════════════════════════
Layer-by-Layer Breakdown
How Each Layer Contributed
1. Prompts 💬: - Learned to structure reviews: Severity → Findings → Solutions → References - Learned to quantify impact: "2^5 = 32 test cases needed" - Learned to prioritize: CRITICAL before MEDIUM
2. RAG 🔍: - Searched security docs when detecting SQL concatenation - Retrieved 4 highly relevant docs (>0.87 similarity) - Integrated citations naturally: "[Retrieved from: sql_injection.md]"
3. Tools 🛠️: - Used security_scanner for vulnerability detection - Used complexity_calculator for metrics - Combined tool outputs: "3 security + 1 complexity findings"
4. Memory 🧠: - Recalled similar past findings (#23, #47, #61) - Referenced team standards and previous solutions - Selected only relevant memories (3/50 selected)
5. Protocols 🔌: - Used built-in tools (code snippet analysis, no MCP needed) - Would use MCP for directory scanning or Git context
6. Datasets 📊: - Learned professional phrasing from 100 real reviews - Learned to provide attack examples (from dataset) - Learned severity classification (CRITICAL for exploitable) - Learned to cite standards (90% of dataset cites OWASP)
Optimization Metrics
Performance Improvement
| Metric | Before | After | Improvement |
| Overall Accuracy | 12.5% | 100% | +87.5% |
| Security Detection | 33% (1/3) | 100% (3/3) | +67% |
| RAG Relevance | 25% | 95% | +70% |
| Tool Usage Accuracy | 0% | 100% | +100% |
| Memory Relevance | 30% | 90% | +60% |
| Response Actionability | 20% | 95% | +75% |
| Token Efficiency | 5,000 tokens | 2,000 tokens | -60% |
The Compound Effect
Single-Layer vs. Full-Stack
Optimizing ONLY Prompts:
Result: Better instructions, but...
- ❌ Still retrieves wrong knowledge docs
- ❌ Still doesn't use tools correctly
- ❌ Still includes irrelevant memories
- ❌ Still uses generic phrasing
Final Accuracy: ~45%
Optimizing ALL Layers:
Result: Better instructions AND
- ✅ Retrieves perfect security docs
- ✅ Uses tools strategically
- ✅ Recalls relevant past findings
- ✅ Uses expert-level phrasing from dataset
Final Accuracy: 100%
Key Insight: Each layer multiplies the effect of others. This is why full-stack optimization produces 2-3x better results than prompt-only optimization.
Complete Workflow
End-to-End Optimization Process
# 1. Initialize project
super init code_review_project
cd code_review_project
# 2. Create knowledge base
mkdir -p knowledge/security knowledge/performance knowledge/best_practices
# ... add your documentation files
# 3. Prepare dataset (optional but recommended)
# Download or create real code review examples
cp your_real_reviews.csv data/code_reviews.csv
# 4. Create agent playbook
cat > agents/code_reviewer/playbook.yaml << 'EOF'
spec:
persona:
role: Code Reviewer
goal: Find issues in code
rag:
enabled: true
knowledge_base: [./knowledge/**/*.md]
tools:
enabled: true
specific_tools: [complexity_calculator, security_scanner]
memory:
enabled: true
enable_context_optimization: true
datasets:
- source: ./data/code_reviews.csv
limit: 100
optimization:
optimizer: {name: GEPA, params: {auto: medium}}
EOF
# 5. Compile
super agent compile code_reviewer
# → Generates pipeline with all layers enabled
# 6. Baseline evaluation
super agent evaluate code_reviewer
# → Shows baseline: 12.5% accuracy
# 7. Optimize ALL layers
super agent optimize code_reviewer --auto medium --fresh
# → GEPA optimizes: prompts + RAG + tools + memory + dataset learning
# → Takes 10-15 minutes
# → Shows progress for each layer
# 8. Re-evaluate
super agent evaluate code_reviewer # automatically loads optimized weights
# → Shows improvement: 100% accuracy (+87.5%)
# 9. Test on real code
super agent run code_reviewer --code "$(cat suspicious_code.py)"
# → Professional, comprehensive security audit
# 10. Deploy
super agent run code_reviewer --interactive
# → Production-ready code review agent!
What Makes This Production-Ready?
Quality Indicators
1. Comprehensive Coverage ✅ - Detects all vulnerability types (SQL, XSS, secrets, etc.) - Analyzes performance and code quality - Handles edge cases and complex patterns
2. Actionable Output ✅ - Specific issue identification - Executable code solutions - Clear priority ordering - Standards compliance
3. Professional Quality ✅ - Expert-level phrasing (learned from dataset) - Proper citations (OWASP, CWE, Clean Code) - Severity classification - Risk assessment
4. Efficient Operation ✅ - Strategic knowledge retrieval (95% relevance) - Correct tool usage (100% accuracy) - Optimized memory selection (60% fewer tokens) - Fast response time (<3 seconds)
5. Robust Performance ✅ - Handles unseen code variations (dataset generalization) - References past findings (memory consistency) - Graceful error handling (tool/protocol failures)
Applying to Your Agents
Step 1: Identify Your Layers
Which layers does your agent need?
Customer Support:
- Prompts: ✅ (response quality)
- RAG: ✅ (product knowledge)
- Tools: ⚠️ (maybe ticketing system)
- Memory: ✅✅ (conversation history)
- Protocols: ❌ (not needed)
- Datasets: ✅ (real support tickets)
Research Agent:
- Prompts: ✅ (research methodology)
- RAG: ✅✅ (academic papers)
- Tools: ✅ (search, citations)
- Memory: ⚠️ (maybe past searches)
- Protocols: ✅ (MCP for data sources)
- Datasets: ✅ (research examples)
Code Generator:
- Prompts: ✅ (coding standards)
- RAG: ✅ (API docs, patterns)
- Tools: ✅ (syntax validators)
- Memory: ⚠️ (maybe past code)
- Protocols: ⚠️ (maybe MCP filesystem)
- Datasets: ✅✅ (code examples)
Step 2: Enable Your Layers
spec:
# Enable only what you need
rag:
enabled: true # If you have knowledge base
tools:
enabled: true # If you need tools
memory:
enabled: true # If you need context
enable_context_optimization: true
datasets:
- source: your_data.csv # If you have examples
Step 3: Optimize
super agent optimize your_agent --auto medium
GEPA automatically optimizes all enabled layers!
Key Takeaways
What You Learned
- Full-Stack > Prompts Only: All 6 layers working together produces 2-3x better results
- Each Layer Compounds: RAG finds better docs, tools provide better metrics, memory recalls better context
- GEPA Learns Strategies: Not just what to do, but WHEN, WHICH, and HOW
- Datasets Amplify: Real examples teach real-world patterns and expert-level quality
- Production-Ready: 12% → 100% accuracy demonstrates deployment readiness
The Full-Stack Advantage
| Optimization Approach | Accuracy | Quality | Production-Ready |
|---|---|---|---|
| Prompts Only | 45% | Basic | ❌ No |
| Prompts + RAG | 65% | Better | ⚠️ Maybe |
| Prompts + RAG + Tools | 75% | Good | ⚠️ Close |
| Full-Stack (All 6 Layers) | 100% | Expert | ✅ Yes |
Next Steps
1. Explore Individual Layers
Deep-dive into specific layers: - 💬 Prompt Optimization - 🔍 RAG Optimization - 🛠️ Tool Optimization - 🧠 Memory Optimization - 🔌 Protocol Optimization - 📊 Dataset Optimization
2. Try the Workflow
# Create an agent with multiple layers
# Optimize with GEPA
# See the compound effect!
3. Learn GEPA Internals
4. Scale with Datasets
- Dataset Import Guide
- Browse HuggingFace: https://huggingface.co/datasets
Related Guides: