RAG Optimization Tutorial
Learn how to optimize Retrieval-Augmented Generation (RAG) systems using SuperOptiX's advanced techniques and GEPA optimization.
Overview
This tutorial covers: - Setting up RAG systems with multiple vector databases - Optimizing retrieval parameters - Improving context relevance with GEPA - Performance monitoring and evaluation
Prerequisites
Install SuperOptiX
pip install superoptix
For vector database support:
pip install superoptix[vectordb]
Includes: - SuperOptiX core with GEPA 0.0.17 - GenericRAGAdapter for RAG optimization - All vector databases (ChromaDB, LanceDB, Weaviate, Qdrant, Milvus)
Requirements: - Python 3.11+ - Git (for DSPy dependency) - Basic understanding of RAG concepts
Step 1: Initialize RAG Project
# Create new project
super init rag_optimization_project
cd rag_optimization_project
# Pull RAG demo agent
super agent pull rag_chroma_demo
Step 2: Configure Vector Database
Choose your vector database:
# agents/rag_chroma_demo.yaml
spec:
rag:
enabled: true
backend: chromadb
config:
collection_name: "superoptix_docs"
persist_directory: "./chroma_db"
embedding_model: "all-MiniLM-L6-v2"
chunk_size: 512
chunk_overlap: 50
top_k: 5
spec:
rag:
enabled: true
backend: lancedb
config:
table_name: "documents"
uri: "./lancedb"
embedding_model: "all-MiniLM-L6-v2"
chunk_size: 512
chunk_overlap: 50
top_k: 5
spec:
rag:
enabled: true
backend: weaviate
config:
url: "http://localhost:8080"
class_name: "Document"
embedding_model: "all-MiniLM-L6-v2"
chunk_size: 512
chunk_overlap: 50
top_k: 5
spec:
rag:
enabled: true
backend: qdrant
config:
url: "http://localhost:6333"
collection_name: "documents"
embedding_model: "all-MiniLM-L6-v2"
chunk_size: 512
chunk_overlap: 50
top_k: 5
spec:
rag:
enabled: true
backend: milvus
config:
host: "localhost"
port: 19530
collection_name: "documents"
embedding_model: "all-MiniLM-L6-v2"
chunk_size: 512
chunk_overlap: 50
top_k: 5
Step 3: Prepare Your Documents
# Create documents directory
mkdir -p documents
# Add your documents
cp /path/to/your/docs/*.pdf documents/
cp /path/to/your/docs/*.txt documents/
cp /path/to/your/docs/*.md documents/
# Or use sample documents
echo "SuperOptiX is a full-stack agentic AI optimization framework." > documents/intro.txt
echo "GEPA is the universal optimizer that works across all frameworks." > documents/gepa.txt
echo "RAG systems improve AI responses with relevant context." > documents/rag.txt
Step 4: Compile and Test RAG Agent
# Compile the RAG agent
super agent compile rag_chroma_demo
# Test with sample query
super agent run rag_chroma_demo --goal "What is SuperOptiX?"
Expected output:
Response: SuperOptiX is a full-stack agentic AI optimization framework that provides comprehensive tools for building, optimizing, and deploying AI agents across multiple frameworks.
Step 5: Evaluate RAG Performance
# Run evaluation
super agent evaluate rag_chroma_demo
This will test: - Retrieval accuracy - Response relevance - Context utilization - Response quality
Step 6: Optimize RAG Parameters
6.1 Chunk Size Optimization
# Test different chunk sizes
spec:
rag:
config:
chunk_size: 256 # Smaller chunks for precise retrieval
chunk_overlap: 25
top_k: 5
super agent compile rag_chroma_demo
super agent evaluate rag_chroma_demo
6.2 Top-K Optimization
# Test different top_k values
spec:
rag:
config:
chunk_size: 512
chunk_overlap: 50
top_k: 3 # Fewer results for focused context
super agent compile rag_chroma_demo
super agent evaluate rag_chroma_demo
6.3 Embedding Model Optimization
# Test different embedding models
spec:
rag:
config:
embedding_model: "sentence-transformers/all-mpnet-base-v2" # Better quality
chunk_size: 512
chunk_overlap: 50
top_k: 5
Step 7: GEPA Optimization for RAG
Optimize the RAG system using GEPA:
# Optimize with GEPA
super agent optimize rag_chroma_demo --auto medium
# Evaluate optimized version
super agent evaluate rag_chroma_demo # automatically loads optimized weights
GEPA will optimize: - Retrieval parameters - Context selection - Response generation - Relevance scoring
Step 8: Advanced RAG Techniques
8.1 Hybrid Search
spec:
rag:
config:
search_type: "hybrid" # Combines semantic + keyword search
semantic_weight: 0.7
keyword_weight: 0.3
chunk_size: 512
top_k: 5
8.2 Query Expansion
spec:
rag:
config:
query_expansion: true
expansion_model: "gpt-3.5-turbo"
max_expansions: 3
chunk_size: 512
top_k: 5
8.3 Context Re-ranking
spec:
rag:
config:
rerank: true
rerank_model: "cross-encoder/ms-marco-MiniLM-L-6-v2"
rerank_top_k: 10
final_top_k: 5
Step 9: Performance Monitoring
9.1 Set Up Observability
# Enable MLFlow tracking
super agent compile rag_chroma_demo --observability mlflow
# Enable LangFuse tracing
super agent compile rag_chroma_demo --observability langfuse
9.2 Monitor Metrics
# Run with monitoring
super agent run rag_chroma_demo --goal "What is GEPA?" --monitor
# View metrics
super observe metrics rag_chroma_demo
Key metrics to monitor: - Retrieval Accuracy: How relevant are retrieved chunks? - Response Quality: How good are the generated responses? - Latency: How fast is the RAG system? - Token Usage: How many tokens are consumed?
Step 10: Production Deployment
10.1 Optimize for Production
# Final optimization
super agent optimize rag_chroma_demo --auto intensive
# Build production version
super agent compile rag_chroma_demo --production
10.2 Deploy with Orchestra
# Create orchestra for RAG system
super orchestra create rag_orchestra
# Add RAG agent to orchestra
super orchestra add-agent rag_chroma_demo
# Run orchestra
super orchestra run rag_orchestra
Best Practices
Document Preparation
- Clean your documents: Remove headers, footers, and irrelevant content
- Consistent formatting: Use consistent structure across documents
- Metadata inclusion: Add relevant metadata to documents
Chunking Strategy
- Optimal chunk size: 256-512 tokens for most use cases
- Overlap: 10-20% overlap between chunks
- Semantic boundaries: Split at sentence or paragraph boundaries
Retrieval Optimization
- Top-K tuning: Start with 5-10, adjust based on performance
- Embedding models: Use domain-specific models when available
- Hybrid search: Combine semantic and keyword search for better results
Evaluation Metrics
- Relevance: How relevant are retrieved chunks?
- Accuracy: How accurate are the responses?
- Completeness: Do responses cover all aspects of the query?
- Consistency: Are responses consistent across similar queries?
Troubleshooting
Common Issues
Low Retrieval Accuracy
# Try different chunk sizes
super agent compile rag_chroma_demo --chunk-size 256
# Try different embedding models
super agent compile rag_chroma_demo --embedding-model "all-mpnet-base-v2"
Slow Performance
# Reduce top_k
super agent compile rag_chroma_demo --top-k 3
# Use faster embedding model
super agent compile rag_chroma_demo --embedding-model "all-MiniLM-L6-v2"
Irrelevant Context
# Enable query expansion
super agent compile rag_chroma_demo --goal-expansion
# Use hybrid search
super agent compile rag_chroma_demo --search-type hybrid