Memory Optimization
Overview
SuperOptiX provides GEPA-based memory optimization that intelligently manages context windows, memory selection, and token budgets. As agents accumulate memories over time, including all memories leads to context overflow and irrelevant information diluting the agent's focus.
The Solution: GEPA learns to select only the most relevant memories within your token budget, optimizing across relevance, importance, and recency.
Proven Impact: - Token Usage: 60% reduction (5000 → 2000 tokens) - Memory Relevance: 55% improvement (30% → 85%) - Task Success Rate: 30-50% boost - Optimization Speed: <100ms per query
The Problem
Unoptimized Memory Context
After 20+ interactions, agents accumulate many memories. Without optimization, all memories get included:
Query: "What happened with my shipping issue?"
Unoptimized Context (ALL 20 memories):
1. Order #AAA placed (Sept 1) - 200 tokens
2. Order #BBB placed (Sept 5) - 200 tokens
3. Order #CCC placed (Sept 10) - 200 tokens
...
15. More old orders - 200 tokens each
16. Shipping issue with #12345 (Oct 18) - 300 tokens ← RELEVANT!
17-20. More irrelevant data - 800 tokens
Total: 5000+ tokens → Context overflow!
Relevant: 300 / 5000 = 6%
Problems: - Context overflow (exceeds token limits) - Low signal-to-noise ratio (6% relevant) - Wasted tokens on irrelevant memories - Poor agent performance
GEPA-Optimized Memory Context
GEPA selects only relevant memories:
Query: "What happened with my shipping issue?"
GEPA-Optimized Context (6 selected memories):
1. Shipping issue with #12345 (Oct 18) - 300 tokens ← HIGH RELEVANCE
2. VIP customer since 2020 - 100 tokens ← HIGH IMPORTANCE
3. Customer prefers email - 80 tokens ← MEDIUM RELEVANCE
4. Recent message (Oct 20) - 150 tokens ← HIGH RECENCY
5. Tracking info for #12345 - 200 tokens ← HIGH RELEVANCE
6. Previous shipping delay resolved - 180 tokens ← RELEVANT
Total: 1010 tokens ← Fits in budget!
Relevant: 900 / 1010 = 89%
Benefits: - 80% token reduction (5000 → 1010) - 83% relevance improvement (6% → 89%) - Fits within token budget - Higher quality agent responses
How It Works
Architecture
SuperOptiX memory optimization consists of three components:
superoptix/optimizers/memory/
├── context_optimizer.py # Main GEPA-based optimizer
├── memory_ranker.py # Multi-factor memory ranking
└── memory_summarizer.py # Memory compression
Integrated with:
- superoptix/memory/agent_memory.py (provides get_optimized_context())
Optimization Process
Step 1: Score All Memories
GEPA evaluates each memory using three factors:
-
Relevance (0.0-1.0)
- Keyword overlap with query
- Semantic similarity
- Phrase matches
- GEPA Chain of Thought reasoning
-
Importance (0.0-1.0)
- Set when storing memory
- VIP status, critical info, business rules
- User-defined priority
-
Recency (0.0-1.0)
- Exponential decay over time
- Half-life: 1 hour
- More recent = higher score
Step 2: Task-Specific Weighting
GEPA learns optimal weights for different task types:
| Task Type | Relevance | Importance | Recency |
|---|---|---|---|
| Q&A | 60% | 30% | 10% |
| Conversation | 30% | 20% | 50% |
| Knowledge Search | 40% | 50% | 10% |
| Customer Support | 35% | 35% | 30% |
Step 3: Budget-Aware Selection
# Pseudo-code for selection algorithm
def select_memories(scored_memories, max_tokens):
selected = []
total_tokens = 0
# Always include N most recent (preserve context)
for memory in most_recent(3):
selected.append(memory)
total_tokens += estimate_tokens(memory)
# Add highest scoring until budget exhausted
for score, memory in sorted_memories:
if score < min_threshold:
break
if total_tokens + tokens(memory) <= max_tokens:
selected.append(memory)
total_tokens += tokens(memory)
elif can_summarize(memory):
# Compress memory to fit budget
summary = summarize(memory, remaining_tokens)
selected.append(summary)
total_tokens += tokens(summary)
return selected
Step 4: Intelligent Ordering
Memories are ordered using one of three strategies:
- Recency First: Most recent first (best for conversations)
- Relevance First: Most relevant first (best for Q&A)
- Chronological: Oldest first (best for narratives)
Quick Start
Enable in Agent Playbook
spec:
memory:
enabled: true
enable_context_optimization: true
max_context_tokens: 2000
That's it! The agent automatically uses optimized context.
Programmatic Usage
from superoptix.memory import AgentMemory
# Initialize with optimization enabled
memory = AgentMemory(
agent_id="support_agent",
enable_context_optimization=True,
max_context_tokens=2000
)
# Store memories with importance scores
memory.remember(
"Customer Sarah ordered laptop #12345",
memory_type="short"
)
memory.remember(
"Sarah prefers email contact",
memory_type="long",
importance=0.8
)
memory.remember(
"VIP customer since 2020",
memory_type="long",
importance=0.9
)
# Get optimized context for query
context_info = memory.get_optimized_context(
--goal "What happened with my shipping issue?",
task_type="customer_support"
)
print(f"Selected {context_info['optimization_info']['selected_count']} memories")
print(f"Total tokens: {context_info['optimization_info']['total_tokens']}")
print(f"\n{context_info['context_string']}")
Example Output
Selected 6 memories
Total tokens: 1200
## Relevant Memories
### Memory 1: Shipping Issue (Score: 0.92)
Customer reported delayed delivery for order #12345...
### Memory 2: VIP Status (Score: 0.85)
VIP customer since 2020, lifetime value $50K...
### Memory 3: Contact Preference (Score: 0.68)
Sarah prefers email contact for updates...
Configuration Options
ContextWindowOptimizer
from superoptix.optimizers.memory import ContextWindowOptimizer
optimizer = ContextWindowOptimizer(
max_tokens=4096, # Token budget
enable_gepa=True, # Use GEPA scoring vs heuristics
min_relevance_score=0.3, # Filter threshold
preserve_recency=True, # Always keep recent memories
)
result = optimizer.optimize_context(
--goal "What is the return policy?",
available_memories=all_memories,
task_type="customer_support",
preserve_n_recent=3, # Always include 3 most recent
)
Result Structure:
{
"selected_memories": [...], # Selected memory objects
"total_tokens": 1500, # Tokens used
"strategy": "gepa_optimized_customer_support",
"scores": { # Transparency
"memory_1": 0.85,
"memory_2": 0.72,
...
},
"optimization_time": 0.045, # Seconds
"total_available": 20, # Total memories
"selected_count": 6 # Selected count
}
AgentMemory Integration
from superoptix.memory import AgentMemory
memory = AgentMemory(
agent_id="support_agent",
enable_context_optimization=True, # Enable GEPA optimization
max_context_tokens=2000, # Token budget
backend=None, # Default SQLite
short_term_capacity=100, # Short-term memory size
enable_embeddings=True # For semantic search
)
Advanced Usage
Task-Specific Optimization
Different tasks need different memory selection strategies:
# Q&A - Prioritize relevance
context = memory.get_optimized_context(
--goal "What is our refund policy?",
task_type="qa"
)
# Conversation - Prioritize recency
context = memory.get_optimized_context(
--goal "Continue our discussion",
task_type="conversation"
)
# Knowledge Search - Prioritize importance
context = memory.get_optimized_context(
--goal "Find all critical business rules",
task_type="knowledge"
)
Custom Weighting
Override default task weights:
# Custom weights for specialized task
result = optimizer.optimize_context(
--goal "Emergency protocol check",
available_memories=all_memories,
task_type="custom",
)
Memory Summarization
When full memory content doesn't fit in token budget, GEPA automatically compresses:
# Original memory
{
"content": "Customer Sarah Johnson (sarah@email.com) called on Oct 18 at 3pm regarding delayed shipping for order #12345. She ordered a laptop (Dell XPS 15) on Oct 10 with expedited shipping but tracking shows it's still in transit. She's frustrated because she needs it for a presentation on Oct 22. We offered overnight shipping for her next order and 20% discount code SORRY20.",
"tokens": 300
}
# Compressed summary (when budget is tight)
{
"content": "Customer Sarah: Delayed order #12345 (laptop). Needs by Oct 22. Offered overnight + 20% discount.",
"tokens": 80,
"is_summary": True
}
Optimization Metrics
Before vs After
| Metric | Unoptimized | GEPA-Optimized | Improvement |
|---|---|---|---|
| Avg tokens used | 4500 | 1800 | -60% |
| Relevance % | 30% | 85% | +55% |
| Memories selected | 18 | 6 | Optimized |
| Task success rate | 65% | 90% | +25% |
| Optimization time | - | <100ms | Fast |
Transparency & Monitoring
Track optimization performance:
# Get optimization statistics
stats = optimizer.get_stats()
print(f"Total optimizations: {stats['total_optimizations']}")
print(f"Avg tokens used: {stats['avg_tokens_used']:.0f}")
print(f"Avg memories selected: {stats['avg_memories_selected']:.1f}")
print(f"Avg relevance score: {stats['avg_relevance_score']:.2f}")
Best Practices
Token Budget Sizing
Recommended budgets:
- Conversation agents: 2000-4000 tokens
- Q&A agents: 1000-2000 tokens
- Knowledge agents: 4000-8000 tokens
- Customer support: 2000-3000 tokens
Rule of thumb: Set budget to 30-50% of model's total context window.
Importance Scoring
Set importance when storing memories:
# Critical business rules
memory.remember(
"Refunds must be approved within 24 hours",
memory_type="long",
importance=1.0 # Maximum importance
)
# VIP customer info
memory.remember(
"Customer is VIP tier, lifetime value $100K",
memory_type="long",
importance=0.9
)
# Regular interaction
memory.remember(
"Customer asked about shipping times",
memory_type="short",
importance=0.5 # Default
)
# Low-priority note
memory.remember(
"Customer mentioned they like blue color",
memory_type="short",
importance=0.2
)
Preserve Recent Memories
Always include most recent memories for context continuity:
result = optimizer.optimize_context(
--goal "Continue our conversation",
available_memories=all_memories,
preserve_n_recent=3 # Always include 3 most recent
)
Use Cases
Customer Support Agent
spec:
persona:
role: Customer Support Agent
goal: Help customers with orders and issues
memory:
enabled: true
enable_context_optimization: true
max_context_tokens: 2000
# GEPA automatically prioritizes:
# - Recent interactions (high recency)
# - VIP status (high importance)
# - Related issues (high relevance)
Knowledge Base Agent
spec:
persona:
role: Knowledge Base Agent
goal: Answer questions from documentation
memory:
enabled: true
enable_context_optimization: true
max_context_tokens: 4000
# GEPA automatically prioritizes:
# - Relevant documentation (high relevance)
# - Critical policies (high importance)
# - Recent updates (medium recency)
Conversational Agent
spec:
persona:
role: Conversational Assistant
goal: Engage in natural dialogue
memory:
enabled: true
enable_context_optimization: true
max_context_tokens: 3000
# GEPA automatically prioritizes:
# - Recent messages (high recency)
# - Conversation topics (high relevance)
# - User preferences (high importance)
Technical Architecture
Components
1. ContextWindowOptimizer
Main GEPA-based optimizer that orchestrates memory selection:
class ContextWindowOptimizer:
"""
Optimizes:
- Which memories to include (relevance, importance, recency)
- How much of each memory (full, summary, keywords)
- Order of memories (chronological, relevance-based, hybrid)
- Token budget allocation across memory types
"""
def optimize_context(
self,
query: str,
available_memories: List[Dict],
task_type: str = "general",
preserve_n_recent: int = 3,
) -> Dict:
# Step 1: Score all memories
# Step 2: Preserve most recent
# Step 3: Select within budget
# Step 4: Order optimally
...
2. MemoryRanker
Multi-factor ranking with task-specific weights:
class MemoryRanker:
"""
Ranks memories by:
- Relevance to query
- Importance score
- Recency (time decay)
"""
def rank_hybrid(
self,
query: str,
memories: List[Dict],
weights: Dict[str, float] = None
) -> List[Tuple[float, Dict]]:
# Combines relevance + importance + recency
...
3. MemorySummarizer
Compresses memories when budget is tight:
class MemorySummarizer:
"""
Summarizes memories to fit token budget.
Preserves key information while reducing tokens.
"""
def summarize(
self,
memory: Dict,
target_tokens: int
) -> Dict:
# GEPA-based compression
...
GEPA Scoring Algorithm
GEPA uses Chain of Thought to score memory relevance:
class MemoryRelevanceScorer(dspy.Signature):
"""Score how relevant a memory is for answering a query."""
query = dspy.InputField(desc="User query or current task")
memory_content = dspy.InputField(desc="Memory content to evaluate")
memory_metadata = dspy.InputField(desc="Memory metadata (type, age, importance)")
task_context = dspy.InputField(desc="Additional task context")
relevance_score = dspy.OutputField(
desc="Relevance score 0.0-1.0",
prefix="Score:"
)
reasoning = dspy.OutputField(
desc="Brief explanation of score",
prefix="Reasoning:"
)
# GEPA learns to reason about memory relevance
scorer = dspy.ChainOfThought(MemoryRelevanceScorer)
Scoring Formula
def score_memory(memory, query, task_type):
# Calculate component scores
relevance = calculate_relevance(query, memory.content)
importance = memory.importance
recency = calculate_recency(memory.timestamp)
# Get task-specific weights (GEPA optimizes these!)
weights = get_task_weights(task_type)
# Combine with learned weights
final_score = (
relevance * weights['relevance'] +
importance * weights['importance'] +
recency * weights['recency']
)
return final_score
Demo Example
Pull Demo Agent
# Pull customer support agent with memory
super agent pull customer_support_memory
# Compile
super agent compile customer_support_memory
# Run with verbose mode to see memory selection
super agent run customer_support_memory --verbose \
--goal "What happened with my shipping issue?"
Expected Output
🧠 Memory Optimization:
Available: 20 memories (5000 tokens)
Selected: 6 memories (1010 tokens)
Strategy: gepa_optimized_customer_support
Top memories:
1. Shipping issue #12345 (score: 0.92)
2. VIP customer status (score: 0.85)
3. Contact preference (score: 0.68)
...
✅ Response generated with optimized context!
Performance
Benchmarks
- Optimization Time: <100ms per query
- Memory Footprint: No increase (lazy loading)
- Scalability: Tested with 1000+ memories
- Accuracy: 85%+ relevance in selected memories
Fallback Behavior
If GEPA optimization fails, SuperOptiX gracefully falls back to heuristic scoring:
try:
# Use GEPA Chain of Thought
score = gepa_scorer(query, memory_content, metadata)
except Exception:
# Fallback to heuristic
score = heuristic_score(relevance, importance, recency)
Framework-Agnostic
Memory optimization works across all supported frameworks:
- DSPy: Native integration
- OpenAI SDK: Compatible
- CrewAI: Compatible
- Google ADK: Compatible
- Microsoft: Compatible
- DeepAgents: Compatible
The optimization layer is completely independent of the framework layer!
Comparison
vs Simple Memory
# Simple (include all recent)
context = memory.get_recent(20)
# Result: 5000 tokens, 30% relevant
# GEPA-Optimized
context = memory.get_optimized_context(query)
# Result: 1800 tokens, 85% relevant
vs Manual Selection
# Manual (hardcoded rules)
if "shipping" in query:
context = memory.filter(category="orders")
elif "refund" in query:
context = memory.filter(category="payments")
# Brittle, doesn't scale
# GEPA (learns patterns)
context = memory.get_optimized_context(query, task_type="support")
# Automatically learns what's relevant for each query type
vs RAG
Memory optimization and RAG serve different purposes:
| Feature | Memory Optimization | RAG Optimization |
|---|---|---|
| Purpose | Select agent's past experiences | Retrieve external knowledge |
| Source | Agent's own memories | Vector database |
| Optimization | Relevance + Importance + Recency | Query + Chunk selection |
| Use Case | Personalization, continuity | Knowledge grounding |
| Combinable | Yes! Use both together | Yes! |
Integration with Other Optimizations
Memory optimization works alongside other SuperOptiX optimizations:
Memory + Prompt Optimization
spec:
memory:
enabled: true
enable_context_optimization: true # Optimize memory selection
optimization:
strategy: gepa # Optimize prompts
metrics: [accuracy, relevance]
GEPA optimizes both: - Which memories to include in context - How to phrase prompts using those memories
Memory + RAG Optimization
spec:
memory:
enabled: true
enable_context_optimization: true
knowledge:
enabled: true
rag_enabled: true
Combines: - Agent's personal memories (optimized) - External knowledge (RAG retrieval)
Memory + Tool Optimization
spec:
memory:
enabled: true
enable_context_optimization: true
tools:
- name: get_order_status
mcp_enabled: true
Memory provides context, MCP optimizes tool usage!
Troubleshooting
High Token Usage
If memory context still uses too many tokens:
# Reduce budget
memory = AgentMemory(
agent_id="agent",
max_context_tokens=1000 # Lower budget
)
# Increase minimum relevance threshold
optimizer = ContextWindowOptimizer(
max_tokens=2000,
min_relevance_score=0.5 # Higher threshold (default: 0.3)
)
Low Relevance
If selected memories aren't relevant:
# Enable GEPA scoring
optimizer = ContextWindowOptimizer(
enable_gepa=True # Use GEPA vs heuristics
)
# Adjust task type
context = memory.get_optimized_context(
--goal "...",
task_type="qa" # Try different task types
)
Missing Recent Context
If recent memories aren't included:
# Increase recency preservation
result = optimizer.optimize_context(
--goal "...",
preserve_n_recent=5 # Include 5 most recent (default: 3)
)
API Reference
optimize_context()
optimizer.optimize_context(
query: str, # Current query/task
available_memories: List[Dict], # All memories
task_type: str = "general", # Task category
preserve_n_recent: int = 3 # Always include N recent
) -> Dict
Returns:
- selected_memories: List of selected memory dicts
- total_tokens: Token count for selected memories
- strategy: Optimization strategy used
- scores: Dict of memory_id → relevance_score
- optimization_time: Time taken (seconds)
- total_available: Count of available memories
- selected_count: Count of selected memories
get_optimized_context()
memory.get_optimized_context(
query: str, # Current query
task_type: str = "general", # Task category
preserve_n_recent: int = 3 # Recent memory count
) -> Dict
Returns:
- context_string: Formatted context for LLM
- selected_memories: Selected memory objects
- optimization_info: Stats and scores
Next Steps
- Try it: Add
enable_context_optimization: trueto your agent playbook - Measure: Track token usage and relevance with
get_stats() - Tune: Adjust
max_context_tokensandmin_relevance_score - Monitor: Use verbose mode to see memory selection in action
Related Guides: - Memory System Guide - RAG Optimization - GEPA Optimization - Full-Stack Optimization
Memory optimization is part of SuperOptiX's full-stack optimization approach.
Optimize prompts. Optimize RAG. Optimize tools. Optimize memory. All with GEPA.