Memory Optimization

What is Memory Optimization?

Memory optimization is the process of improving which memories an agent includes in its context, how it scores relevance, and how it manages token budgets. While traditional approaches include all memories (leading to context overflow), GEPA learns intelligent, context-aware memory selection.

Key Insight: After 20+ interactions, an agent can't include all memories. GEPA learns which memories matter most for each query, balancing relevance, importance, and recency within token constraints.

The Memory Optimization Problem

Without Optimization

Scenario: Customer support agent after 30 interactions

1. Agent has 30 memories (15,000 tokens total)
2. Context limit: 2,000 tokens
3. Traditional approach: Include all → Context overflow
4. Agent: Error or truncated response

Problem: Too many memories, context overflow, poor performance

With GEPA Optimization

Scenario: Same agent, same 30 memories

1. Agent has 30 memories (15,000 tokens total)
2. Context limit: 2,000 tokens
3. GEPA-learned strategy: Select 6 most relevant (1,800 tokens)
4. Agent: High-quality response with perfect context

Solution: Optimized selection, perfect fit, better performance

What GEPA Optimizes in Memory

1. Context Selection (Which Memories to Include)

What It Is: Learning which memories are relevant for each query

What GEPA Learns: - Relevance scoring (semantic similarity to query) - Importance weighting (critical vs. minor info) - Recency balance (recent vs. older memories) - Task-specific patterns

Example Configuration:

spec:
  memory:
    enabled: true
    enable_context_optimization: true  # GEPA optimizes selection!
    max_context_tokens: 2000
    short_term_capacity: 100

Before Optimization:

Strategy: Include all recent memories chronologically
Result: 
- Context overflow (15,000 tokens > 2,000 limit)
- Irrelevant memories included
- Important older memories excluded

After GEPA Optimization:

Learned Strategy:
For security query:
  1. Search for memories matching "security", "SQL", "vulnerability"
  2. Prioritize: High importance memories (0.8-1.0)
  3. Include: Recent similar findings (last 7 days)
  4. Summarize: Older related memories
  5. Stop at: 1,800 tokens (buffer for response)

Result:
- 6 highly relevant memories selected
- Fits perfectly in 2,000 token budget
- Includes critical past findings

Impact: 60% fewer tokens, 55% higher relevance, better performance

2. Relevance Scoring (How to Score Memories)

What It Is: Calculating how relevant each memory is to current query

What GEPA Learns: - Semantic similarity weights - Keyword matching importance - Category matching - Task-specific relevance

Scoring Algorithm (GEPA-Optimized):

# Pseudocode showing what GEPA learns

def score_memory(memory, query, task_type):
    # GEPA learns optimal weights for each task type
    weights = get_task_weights(task_type)  # GEPA optimizes these!

    relevance = calculate_similarity(query, memory.content)
    importance = memory.importance
    recency = calculate_recency(memory.timestamp)

    # GEPA learns optimal combination
    final_score = (
        relevance * weights['relevance'] +      # GEPA learned: 0.5
        importance * weights['importance'] +    # GEPA learned: 0.3
        recency * weights['recency']            # GEPA learned: 0.2
    )

    return final_score

Before Optimization:

Weights: Equal (0.33, 0.33, 0.33)
Result: Recent but irrelevant memories rank high

After GEPA Optimization:

Learned Weights (Security Task):
- relevance: 0.6 (prioritize semantic match)
- importance: 0.3 (critical findings matter)
- recency: 0.1 (older security patterns still valid)

Learned Weights (Conversation Task):
- relevance: 0.4
- importance: 0.2
- recency: 0.4 (recent context matters more)

Impact: Task-specific scoring produces better context

3. Token Budget Management

What It Is: Fitting memories within context window constraints

What GEPA Learns: - How many memories to include - When to summarize vs. include full memory - Buffer allocation for response - Compression strategies

Before Optimization:

Approach: Include memories until limit reached
Result:
- Memory 1: 500 tokens (full)
- Memory 2: 500 tokens (full)
- Memory 3: 500 tokens (full)
- Memory 4: 500 tokens (full)
- Total: 2,000 tokens (no buffer for response!)

After GEPA Optimization:

Learned Strategy:
- Reserve 200 tokens for response buffer
- Budget: 1,800 tokens for memories
- Strategy: Include top 3 full (450 tokens each)
- Summarize next 3 (150 tokens each)
- Total: 1,800 tokens (perfect fit!)

Impact: Better token allocation, room for quality responses

4. Summarization Strategies

What It Is: Learning when and how to compress memories

What GEPA Learns: - When to summarize (old, low-importance, large) - How much to compress - What information to preserve - Summary format

Before Optimization:

Memory: "Customer John reported shipping issue with order #12345 on 
        Oct 15. He lives in California, ordered 3 items (laptop, mouse, 
        keyboard), paid $1,500, wants refund, mentioned he's traveling 
        next week..."
Length: 800 tokens

After GEPA Optimization:

Learned Summarization:
Summary: "Customer John: Order #12345 shipping issue, refund requested (Oct 15)"
Length: 50 tokens
Preserved: Customer name, order ID, issue type, request, date
Removed: Address, item details, payment (not relevant to current query)

Impact: 16x compression while preserving key info

Before/After Comparison

Scenario: Customer Support Query

Agent Memory (30 interactions, 15,000 tokens): - Memories 1-5: Recent small talk (low importance) - Memories 6-10: Previous product questions (medium importance) - Memory 11: Shipping issue with order #12345 (high importance) - Memory 12: Customer prefers email contact (high importance) - Memories 13-30: Various unrelated interactions

Current Query: "What happened with my shipping issue?"

Before Memory Optimization:

Selection: Last 10 memories chronologically (memories 21-30)
Included:
- Recent small talk about weather
- Question about product features
- Unrelated order status check
- More irrelevant recent context

Missing:
- Memory 11: Actual shipping issue details!
- Memory 12: Contact preference

Result: Agent says "I don't have information about shipping issues"

After GEPA Memory Optimization:

GEPA-Learned Selection Strategy:

1. Semantic Search: "shipping issue" matches Memory 11 (0.95 similarity)
2. Importance Filter: Memory 11 (0.9) and Memory 12 (0.8) ranked high
3. Recency Boost: Memory 11 is recent enough (5 days ago)
4. Token Allocation: Memory 11 (500 tokens) + Memory 12 (200 tokens) = 700 tokens

Selected Memories:
- Memory 11: Shipping issue with order #12345 (full text)
- Memory 12: Email contact preference (full text)
- Memory 28: Recent interaction (summarized, 100 tokens)
Total: 800 tokens (well under 2,000 limit)

Agent Response:
"I found information about your shipping issue.

Order #12345 Status:
- Reported: Oct 15
- Issue: Package delayed at distribution center
- Expected: Oct 25
- Tracking: Updated yesterday

I'll send detailed status to your email (your preferred contact method).

Would you like me to check current tracking status or escalate for faster delivery?"

Improvement: From "no information" → Complete, accurate response

How GEPA Learns Memory Strategies

The Optimization Process

Analysis Phase

GEPA Observes:
- Query about shipping issue
- Agent selected recent irrelevant memories
- Agent missed Memory 11 (actual shipping issue details)
- Response was "I don't have that information" (FAIL)

Reflection Phase

GEPA Reflection:
"Agent failed because it selected recent memories chronologically 
 instead of semantically relevant memories. Query contained 'shipping 
 issue' which matches Memory 11 with 0.95 similarity. Need to prioritize 
 semantic relevance over recency for factual queries."

Mutation Phase

GEPA Tests:
- Strategy 1: Pure recency (chronological)
- Strategy 2: Pure relevance (semantic)
- Strategy 3: Balanced (0.6 relevance + 0.3 importance + 0.1 recency)

Evaluation Phase

Results:
- Strategy 1: 30% (misses key info)
- Strategy 2: 70% (good but ignores recent context)
- Strategy 3: 95% (balanced!) ← Winner!

Selection Phase

GEPA Keeps: Strategy 3 (balanced approach)
Fine-Tunes: Adjust weights per task type

Result: Learned task-specific memory selection strategies

Best Practices

1. Enable Context Optimization

memory:
  enabled: true
  enable_context_optimization: true  # Critical!
  max_context_tokens: 2000

2. Set Appropriate Token Budgets

memory:
  max_context_tokens: 2000   # For general agents
  max_context_tokens: 4000   # For complex reasoning agents
  max_context_tokens: 1000   # For simple Q&A agents

3. Use Memory Importance Levels

# In code
memory.remember(
    content="Critical security finding: SQL injection",
    memory_type="long_term",
    importance=0.9  # High importance!
)

memory.remember(
    content="Small talk about weather",
    memory_type="short_term",
    importance=0.1  # Low importance
)

GEPA learns to prioritize high-importance memories.

4. Define Memory-Aware RSpec-Style BDD Scenarios

feature_specifications:
  scenarios:
    - name: memory_recall
      description: Agent should recall similar past issues
      given_memory:
        - content: "Previous SQL injection in authentication"
          importance: 0.9
      input:
        code: [SQL injection code]
      expected_output:
        review: Must mention "similar to previous finding"

Common Memory Strategies GEPA Learns

Strategy 1: Semantic Matching

Before: Chronological selection
After: "For factual queries, prioritize memories with >0.80 semantic similarity"

Strategy 2: Importance Weighting

Before: All memories equal weight
After: "Always include high-importance memories (>0.8) regardless of age"

Strategy 3: Task-Specific Balancing

Before: Same strategy for all tasks
After: "Security queries: 60% relevance, 30% importance, 10% recency. Conversation: 40% relevance, 20% importance, 40% recency"

Strategy 4: Dynamic Summarization

Before: Include full text or skip
After: "Summarize memories >7 days old with <0.7 relevance to save tokens"

Metrics and Results

What Gets Measured

Selection Accuracy: % of scenarios where right memories selected
Token Efficiency: Average tokens used vs. budget
Relevance Score: Average similarity of selected memories
Response Quality: Agent performance with optimized context

Typical Improvements

Metric	Before	After	Improvement
Token Usage	5,000 (overflow)	1,800	-64%
Memory Relevance	30%	85%	+55%
Response Accuracy	45%	90%	+45%
Task Success Rate	50%	85%	+35%

Quick Start

Enable Memory Optimization

spec:
  # Memory configuration
  memory:
    enabled: true
    enable_context_optimization: true  # GEPA optimizes!
    max_context_tokens: 2000
    short_term_capacity: 100

  # GEPA automatically optimizes memory selection
  optimization:
    optimizer:
      name: GEPA
      params:
        auto: medium

super agent compile your_agent
super agent optimize your_agent --auto medium

GEPA will learn optimal memory selection strategies!

Advanced: Memory-Specific Configuration

Fine-Tune Memory Behavior

memory:
  enabled: true
  enable_context_optimization: true

  # Token budget
  max_context_tokens: 2000

  # Capacity limits
  short_term_capacity: 100
  long_term_capacity: 1000

  # Embeddings for semantic search
  enable_embeddings: true
  embedding_model: sentence-transformers/all-MiniLM-L6-v2

  # Retention policy
  retention_policy: lru  # Least Recently Used

GEPA learns optimal configurations through optimization.

Common Memory Strategies GEPA Learns

Strategy 1: Task-Specific Weighting

Before: Same weights for all queries
After: - "Security queries: Prioritize high-importance memories (importance weight: 0.4)" - "Conversation queries: Prioritize recent context (recency weight: 0.5)" - "Knowledge queries: Prioritize semantic relevance (relevance weight: 0.6)"

Strategy 2: Dynamic Summarization

Before: Include full memories or exclude them
After: "Memories >7 days old: Summarize to 20% of original length if relevance <0.7"

Strategy 3: Category-Aware Selection

Before: Ignore memory categories
After: "For security query, prioritize memories with category='security_patterns'"

Strategy 4: Similarity Clustering

Before: Select memories independently
After: "If selecting memory about SQL injection, also include related memories about database security (cluster similar topics)"

Integration with Other Layers

Memory optimization enhances other layers:

Memory + Prompts:

Optimized Memory: Includes past security finding
Optimized Prompt: "Reference similar past issues when available"
→ Agent says: "Similar to previous finding #47, this is SQL injection"

Memory + RAG:

Optimized Memory: Recalls "We used sql_injection.md doc before"
Optimized RAG: Retrieves same doc again for consistency
→ Consistent security recommendations across sessions

Memory + Tools:

Optimized Memory: "Last time complexity was 7, we refactored"
Optimized Tool: Calculates current complexity = 8
→ Agent says: "Similar to previous issue #23 (complexity 7). Recommend same refactoring approach"

Real-World Example

Use Case: Customer Support with Memory

Agent Memory (50 interactions over 2 weeks):

ID	Content	Type	Importance	Age	Tokens
M1	Small talk about weather	short_term	0.1	1h	50
M2	Product feature question	short_term	0.3	2h	150
M3	Order #12345 shipping issue reported	long_term	0.9	5 days	400
M4	Customer prefers email contact	long_term	0.8	5 days	100
M5	Unrelated billing question	short_term	0.4	1 day	200
...	...	...	...	...	...
M50	Small talk yesterday	short_term	0.1	1 day	50

Total: 15,000 tokens, Budget: 2,000 tokens

Query: "What's the status of my shipping issue?"

Before Memory Optimization:

Selection: Last 20 memories chronologically (M31-M50)
Total Tokens: 2,100 (overflow!)
Relevance: 25% (mostly irrelevant recent chat)

Included:
- M50: Small talk about weather (irrelevant)
- M49: Product question (irrelevant)
- M48: Another product question (irrelevant)
- ... more irrelevant memories
- M31: Unrelated topic

Missing:
- M3: Actual shipping issue! (excluded because too old)
- M4: Contact preference (excluded)

Result: Agent can't answer the question!

After GEPA Memory Optimization:

GEPA-Learned Selection Strategy:

Step 1: Semantic Search
- Query: "shipping issue"
- M3 matches with 0.95 similarity ← Highly relevant!
- M4 matches with 0.60 similarity ← Contact info

Step 2: Importance Weighting
- M3: importance = 0.9 (high priority!)
- M4: importance = 0.8 (high priority!)

Step 3: Token Allocation
- M3: 400 tokens (full, most important)
- M4: 100 tokens (full, contact pref)
- M47-M50: 300 tokens (recent context, summarized)
- Total: 800 tokens (40% of budget)

Step 4: Final Selection
Selected: M3, M4, M47-M50 (6 memories)
Tokens: 800 (under budget)
Relevance: 85% (highly relevant)

Agent Response:
"Your order #12345 shipping issue status:

Original Report (Oct 15):
- Package delayed at distribution center
- Expected delivery: Oct 25

Current Status:
- Tracking updated yesterday
- Package in transit
- Estimated arrival: Oct 24 (1 day early!)

I'll send detailed tracking to your email (your preferred contact method).

Would you like me to set up delivery notifications?"

Improvement: From "no information" → Complete, personalized response

Troubleshooting

Issue: Context Overflow

Symptoms: Agent errors or truncated responses

Solutions: 1. Reduce max_context_tokens 2. Enable enable_context_optimization: true 3. Increase summarization aggressiveness 4. Use shorter memory formats

Issue: Irrelevant Memories Selected

Symptoms: Agent includes off-topic memories

Solutions: 1. Add RSpec-style BDD scenarios with given_memory showing expected selection 2. Increase importance scores for critical memories 3. Tune semantic similarity threshold 4. Optimize with more diverse scenarios

Issue: Important Memories Excluded

Symptoms: Agent misses key information

Solutions: 1. Increase importance score when storing critical info 2. Use memory_type: long_term for persistent info 3. Add category tags for better organization 4. Increase max_context_tokens budget

💬 Prompt Optimization - Optimize instructions
🔍 RAG Optimization - Optimize knowledge retrieval
🛠️ Tool Optimization - Optimize tool usage
🔌 Protocol Optimization - Optimize protocols
🎯 Full-Stack Example - See all layers
Memory Optimization Guide - Implementation details
Memory Systems Guide - Memory architecture

Next: Learn how GEPA optimizes protocol usage patterns (MCP) →

Memory Optimization

What is Memory Optimization?

The Memory Optimization Problem

Without Optimization

With GEPA Optimization

What GEPA Optimizes in Memory

1. Context Selection (Which Memories to Include)

2. Relevance Scoring (How to Score Memories)

3. Token Budget Management

4. Summarization Strategies

Before/After Comparison

Scenario: Customer Support Query

How GEPA Learns Memory Strategies

The Optimization Process

Best Practices

1. Enable Context Optimization

2. Set Appropriate Token Budgets

3. Use Memory Importance Levels

4. Define Memory-Aware RSpec-Style BDD Scenarios

Common Memory Strategies GEPA Learns

Strategy 1: Semantic Matching

Strategy 2: Importance Weighting

Strategy 3: Task-Specific Balancing

Strategy 4: Dynamic Summarization

Metrics and Results

What Gets Measured

Typical Improvements

Quick Start

Enable Memory Optimization

Advanced: Memory-Specific Configuration

Fine-Tune Memory Behavior

Common Memory Strategies GEPA Learns

Strategy 1: Task-Specific Weighting

Strategy 2: Dynamic Summarization

Strategy 3: Category-Aware Selection

Strategy 4: Similarity Clustering

Integration with Other Layers

Real-World Example

Use Case: Customer Support with Memory

Troubleshooting

Issue: Context Overflow

Issue: Irrelevant Memories Selected

Issue: Important Memories Excluded

Related Guides