RAG Optimization
What is RAG Optimization?
RAG (Retrieval-Augmented Generation) optimization is the process of improving when, which, and how an agent retrieves knowledge from its knowledge base. While traditional RAG uses fixed retrieval strategies, GEPA learns dynamic, context-aware retrieval patterns.
Key Insight: It's not enough to have a knowledge base. The agent must learn WHEN to search it, WHICH documents to retrieve, and HOW to integrate that knowledge into its response.
The RAG Optimization Problem
Without Optimization
Scenario: Agent reviewing code for SQL injection
1. Agent receives code with SQL injection
2. Agent retrieves random documents from knowledge base
3. Agent might get: "Python naming conventions.md" (irrelevant)
4. Agent gives vague response: "Check your code"
Problem: Wrong documents retrieved, no actionable solution
With GEPA Optimization
Scenario: Same code review
1. Agent receives code with SQL injection
2. GEPA-learned strategy: "Search security docs for SQL patterns"
3. Agent retrieves: "sql_injection.md" (highly relevant)
4. Agent gives specific response: "SQL injection detected. Use parameterized queries: query = 'SELECT * WHERE id = ?'"
Solution: Right documents at right time, actionable solution
What GEPA Optimizes in RAG
1. Retrieval Strategy (When to Search)
What It Is: Learning when to query the knowledge base
What GEPA Learns: - Which scenarios require knowledge retrieval - When to search before analysis vs. after - When to skip retrieval (already have knowledge)
Example Configuration:
spec:
rag:
enabled: true
vector_database: chromadb
collection: code_review_knowledge
knowledge_base:
- ./knowledge/security/*.md
- ./knowledge/python/*.md
- ./knowledge/performance/*.md
Before Optimization:
Strategy: Always search all knowledge sources for every query
Result: Slow, often irrelevant docs retrieved
After GEPA Optimization:
Learned Strategies:
- "Search security/*.md when code contains string concatenation in SQL"
- "Search performance/*.md when detecting loops or recursion"
- "Search python/*.md for naming and style issues"
- "Skip search for simple syntax errors"
Impact: 3x faster, 85% relevance vs. 30% relevance
2. Document Selection (Which to Retrieve)
What It Is: Choosing the most relevant documents from the knowledge base
What GEPA Learns: - Query formulation for semantic search - Relevance threshold tuning - Number of documents to retrieve (top_k)
Configuration:
rag:
top_k: 5
similarity_threshold: 0.7
Before Optimization:
Query: Generic "code review"
Retrieved: Random 5 docs, low relevance
After GEPA Optimization:
Learned Query Strategies:
- For SQL code: "SQL injection prevention parameterized queries OWASP"
- For loops: "performance optimization time complexity O(n)"
- For naming: "Python naming conventions PEP 8"
Retrieved: Top 5 highly relevant docs (>0.85 similarity)
Impact: Precision increased from 40% → 90%
3. Context Integration (How to Use)
What It Is: Incorporating retrieved knowledge into agent responses
What GEPA Learns: - How to cite retrieved documents - How to combine multiple sources - When to quote vs. paraphrase - How to attribute knowledge
Before Optimization:
Response: "This is a security issue."
[Retrieved doc not cited or used effectively]
After GEPA Optimization:
Response: "SQL Injection Vulnerability (CRITICAL)
According to OWASP Top 10 2024 [Retrieved from: security/sql_injection.md]:
'Injection attacks occur when untrusted data is sent to an interpreter as
part of a command or query.'
Solution (from best practices):
```python
query = "SELECT * FROM users WHERE name = ?"
db.execute(query, (username,))
Reference: OWASP Top 10 #1, CWE-89"
**Impact**: Clear attribution, professional citations, actionable solutions
---
### 4. Relevance Scoring
**What It Is**: Filtering and ranking retrieved documents
**What GEPA Learns**:
- Optimal similarity thresholds
- Re-ranking strategies
- Filtering irrelevant results
**Configuration**:
```yaml
rag:
similarity_threshold: 0.7
rerank: true
Before Optimization:
Retrieved 5 docs with similarities: [0.71, 0.68, 0.65, 0.64, 0.62]
Used all 5 (including low-relevance ones)
After GEPA Optimization:
Learned Threshold: 0.75 for security queries, 0.65 for general
Retrieved 5 docs, filtered to top 2 with >0.85 similarity
Re-ranked by: recency + domain match + keyword overlap
Impact: Higher quality context, less noise
Before/After Comparison
Scenario: Security Code Review
Input Code:
def get_user(username):
query = "SELECT * FROM users WHERE name = '" + username + "'"
return db.execute(query)
Before RAG Optimization:
RAG Behavior:
- Query: "code review"
- Retrieved: naming_conventions.md, file_organization.md, testing.md
- Relevance: Low (wrong topic)
Agent Response:
"This code could be improved. Consider better practices."
After GEPA RAG Optimization:
RAG Behavior:
- Learned Strategy: "Detect SQL pattern → Search security docs"
- Query: "SQL injection prevention parameterized queries"
- Retrieved: sql_injection.md, database_security.md
- Relevance: High (>0.90 similarity)
Agent Response:
"SQL Injection Vulnerability (CRITICAL)
Issue: String concatenation in SQL query allows injection attacks.
According to OWASP Top 10 [Retrieved from: sql_injection.md]:
This is the #1 web security risk in 2024.
Solution:
```python
def get_user(username):
query = "SELECT * FROM users WHERE name = ?"
return db.execute(query, (username,))
Why: Parameterized queries prevent injection by treating user input as data, not code.
Reference: OWASP A03:2021 - Injection"
**Improvement**: From irrelevant docs → Precise security knowledge
---
## How GEPA Learns RAG Strategies
### The Optimization Process
1. **Analysis Phase**
```
GEPA Observes:
- Agent retrieved "naming_conventions.md" for SQL injection code
- Document wasn't relevant to security issue
- Agent gave vague response without specific solution
```
2. **Reflection Phase**
```
GEPA Reflection:
"The agent should search security documentation BEFORE analyzing
SQL queries. String concatenation in SQL context is a security
pattern that requires security knowledge retrieval."
```
3. **Mutation Phase**
```
GEPA Tests:
- Strategy 1: "Always search security docs for any SQL code"
- Strategy 2: "Search security docs when detecting string concatenation in SQL"
- Strategy 3: "Search security docs after finding potential injection"
```
4. **Evaluation Phase**
```
Results:
- Strategy 1: 70% (too broad, slow)
- Strategy 2: 95% (precise, fast) ← Winner!
- Strategy 3: 60% (too late, misses context)
```
5. **Selection Phase**
```
GEPA Keeps: Strategy 2
Next Iteration: Build on this strategy for other patterns
```
**Result**: Learned when and what to retrieve for maximum relevance
---
## Best Practices
### 1. Organize Knowledge Base by Topic
```yaml
rag:
knowledge_base:
- ./knowledge/security/*.md # Security topics
- ./knowledge/performance/*.md # Performance topics
- ./knowledge/best_practices/*.md # General practices
GEPA learns which directory to search for which scenario.
2. Use Descriptive Document Names
✅ Good:
- sql_injection_prevention.md
- xss_mitigation.md
- password_hashing_best_practices.md
❌ Bad:
- doc1.md
- security.md
- notes.md
3. Structure Documents Consistently
# SQL Injection Prevention
## What is SQL Injection?
[Clear explanation]
## How to Prevent
[Specific solutions with code]
## References
[OWASP, CWE links]
Consistent structure helps GEPA learn effective retrieval patterns.
4. Combine with RSpec-Style BDD Scenarios
feature_specifications:
scenarios:
- name: sql_injection_detection
description: Agent should use security docs for SQL analysis
input:
code: [SQL injection code]
expected_output:
review: Must mention "SQL injection" and cite "OWASP"
GEPA optimizes RAG to pass these scenarios.
Metrics and Results
What Gets Measured
- Retrieval Precision: % of retrieved docs that are relevant
- Retrieval Recall: % of relevant docs that are retrieved
- Response Relevance: % of responses using retrieved knowledge
- Citation Accuracy: % of citations that are correct
Typical Improvements
| Metric | Before | After | Improvement |
| Retrieval Precision | 30% | 85% | +55% |
| Response Relevance | 40% | 90% | +50% |
| Citation Accuracy | 25% | 95% | +70% |
| Retrieval Speed | 2.5s | 0.8s | 3x faster |
Quick Start
Enable RAG Optimization
spec:
# RAG configuration
rag:
enabled: true
vector_database: chromadb
knowledge_base:
- ./knowledge/**/*.md
top_k: 5
# GEPA automatically optimizes RAG strategies
optimization:
optimizer:
name: GEPA
params:
auto: medium
super agent compile your_agent
super agent optimize your_agent --auto medium
GEPA will learn optimal retrieval strategies!
Advanced: RAG-Specific Configuration
Fine-Tune Retrieval Parameters
rag:
# Semantic search config
top_k: 5 # Number of docs to retrieve
similarity_threshold: 0.7 # Minimum similarity score
rerank: true # Re-rank results
# Document processing
chunk_size: 512 # Tokens per chunk
chunk_overlap: 50 # Overlap between chunks
# Embedding model
embedding_model: sentence-transformers/all-MiniLM-L6-v2
GEPA learns optimal values for these parameters through optimization.
Common RAG Strategies GEPA Learns
Strategy 1: Topic-Aware Retrieval
Before: Search all knowledge indiscriminately
After: "Search security/.md for SQL/XSS patterns, performance/.md for loops/recursion"
Strategy 2: Pre-emptive Retrieval
Before: Retrieve after analysis (too late)
After: "Search BEFORE analyzing SQL queries to have security context"
Strategy 3: Contextual Queries
Before: Generic query "code review"
After: Specific query "SQL injection prevention parameterized queries OWASP"
Strategy 4: Multi-Source Combination
Before: Use only top result
After: "Combine security doc + code example doc + OWASP reference for comprehensive answer"
Integration with Other Layers
RAG optimization amplifies other layer optimizations:
RAG + Prompts:
Optimized Prompt: "Search security docs for SQL patterns"
Optimized RAG: Retrieves sql_injection.md with 0.95 similarity
→ Agent has perfect context for security analysis
RAG + Tools:
Optimized RAG: Retrieves complexity best practices
Optimized Tools: Uses complexity_calculator
→ Agent cites doc: "Per Clean Code guidelines, complexity should be <4"
→ Then shows: "Your code: 8 (calculated)"
RAG + Memory:
Optimized Memory: "Similar SQL injection found in previous review"
Optimized RAG: Retrieves same security doc used before
→ Consistent, high-quality security recommendations
Real-World Example
Use Case: Security Code Review
Knowledge Base Structure:
knowledge/
├── security/
│ ├── sql_injection.md (OWASP Top 10 #1)
│ ├── xss_prevention.md (OWASP Top 10 #3)
│ ├── hardcoded_secrets.md (Security best practice)
│ └── password_hashing.md (Cryptography)
├── performance/
│ ├── time_complexity.md (Big O notation)
│ └── optimization_patterns.md (Performance tips)
└── best_practices/
└── solid_principles.md (Clean code)
Input: Code with SQL injection
Before RAG Optimization:
Retrieved:
1. solid_principles.md (similarity: 0.68) - Wrong topic
2. optimization_patterns.md (similarity: 0.65) - Wrong topic
3. time_complexity.md (similarity: 0.63) - Wrong topic
Response: "Your code needs improvement."
After GEPA RAG Optimization:
Learned Strategy: "SQL string concatenation → security domain"
Retrieved:
1. sql_injection.md (similarity: 0.94) - Perfect!
2. hardcoded_secrets.md (similarity: 0.87) - Relevant!
3. password_hashing.md (similarity: 0.82) - Related!
Response: "SQL Injection Vulnerability (CRITICAL)
Issue: User input concatenated into SQL query (OWASP Top 10 #1)
[Retrieved from: security/sql_injection.md]
Attack Example: username = \"admin' OR '1'='1\"
→ Returns all users, bypasses authentication
Solution (from OWASP guidelines):
```python
query = "SELECT * FROM users WHERE name = ?"
result = db.execute(query, (username,))
Reference: OWASP A03:2021 - Injection, CWE-89" ```
Impact: 0% helpful → 100% actionable with professional citations
Troubleshooting
Issue: Low Retrieval Relevance
Symptoms: Agent retrieves wrong documents
Solutions:
1. Add more RSpec-style BDD scenarios showing expected retrieval
2. Increase top_k to give GEPA more options
3. Improve document organization by topic
4. Use more specific document titles
Issue: Slow Retrieval
Symptoms: RAG queries take too long
Solutions:
1. GEPA learns to skip retrieval when not needed
2. Reduce top_k (GEPA finds optimal value)
3. Use smaller embedding model
4. Enable caching for repeated queries
Issue: Documents Not Used in Response
Symptoms: Docs retrieved but not cited
Solutions: 1. Add citation requirements to RSpec-style BDD scenarios 2. Optimize prompts to include "cite sources" 3. Add examples showing proper citation format
Related Guides
- 💬 Prompt Optimization - Optimize instructions
- 🛠️ Tool Optimization - Optimize tool usage
- 🧠 Memory Optimization - Optimize context
- 📊 Dataset-Driven Optimization - Train on data
- 🎯 Full-Stack Example - See all layers
- RAG Configuration Guide - RAG setup details
- MCP + RAG Complete Guide - Advanced RAG
Next: Learn how GEPA optimizes tool selection and usage →