Tool Optimization
What is Tool Optimization?
Tool optimization is the process of improving an agent's tool selection, invocation order, and output combination strategies. While traditional approaches hardcode which tools to use, GEPA learns dynamic, context-aware tool usage patterns.
Key Insight: Having tools isn't enough. The agent must learn WHICH tools to use for WHICH scenarios, in WHAT order, and HOW to combine their outputs for comprehensive results.
The Tool Optimization Problem
Without Optimization
Scenario: Agent reviewing code with high complexity
1. Agent receives nested conditional code
2. Agent has complexity_calculator tool available
3. Agent doesn't use it (doesn't know when to use tools)
4. Agent says: "Your code might be complex"
Problem: Tool available but not used, vague response
With GEPA Optimization
Scenario: Same code review
1. Agent receives nested conditional code
2. GEPA-learned strategy: "Use complexity_calculator for nested conditions"
3. Agent uses tool: complexity_calculator(code)
4. Tool returns: complexity = 8
5. Agent says: "High complexity detected (8/4 threshold). Refactor to reduce nesting."
Solution: Correct tool used, specific metric provided, actionable feedback
What GEPA Optimizes in Tool Usage
1. Tool Selection (Which Tools to Use)
What It Is: Learning which tool(s) to use for each scenario
What GEPA Learns: - Pattern ā Tool mapping - When to use specific tools - When to use multiple tools - When to skip tools
Example Configuration:
spec:
tools:
enabled: true
specific_tools:
- complexity_calculator # Calculates cyclomatic complexity
- security_scanner # Detects vulnerabilities
- performance_analyzer # Identifies performance issues
- code_smell_detector # Finds code smells
Before Optimization:
Strategy: Random tool selection or no tool usage
Result: Tools used incorrectly or not at all
After GEPA Optimization:
Learned Strategies:
- "Use complexity_calculator when seeing nested if/for/while (>3 levels)"
- "Use security_scanner when detecting string concatenation in SQL/HTML"
- "Use performance_analyzer for loops with nested loops"
- "Use code_smell_detector for duplicated code patterns"
Impact: 0% ā 100% correct tool selection
2. Tool Invocation Order (Orchestration)
What It Is: Learning the optimal sequence of tool calls
What GEPA Learns: - Which tools to call first - Dependencies between tools - Parallel vs. sequential execution - When to stop tool chain
Before Optimization:
Random order:
1. code_smell_detector (finds duplication)
2. complexity_calculator (finds complexity: 8)
3. security_scanner (finds SQL injection)
After GEPA Optimization:
Learned Priority Order:
1. security_scanner FIRST (critical issues)
ā Finds: SQL injection (CRITICAL)
2. complexity_calculator SECOND (maintainability)
ā Finds: Complexity 8 (MEDIUM)
3. code_smell_detector THIRD (code quality)
ā Finds: DRY violation (LOW)
Result: Prioritized by severity automatically
Impact: Logical ordering, priority-based findings
3. Output Combination (Multi-Tool Integration)
What It Is: Learning how to combine outputs from multiple tools
What GEPA Learns: - How to merge tool results - How to resolve conflicts - How to synthesize comprehensive findings - Citation format for tool outputs
Example Scenario: Code with multiple issues
Before Optimization:
Tool 1: complexity_calculator ā "Complexity: 8"
Tool 2: security_scanner ā "Potential injection"
Agent Response: "Complexity is 8. Potential injection found."
[Disconnected findings, no synthesis]
After GEPA Optimization:
Tool 1: complexity_calculator ā "Complexity: 8"
Tool 2: security_scanner ā "SQL injection detected"
Agent Response:
"Multiple issues detected:
1. SQL Injection (CRITICAL) [Tool: security_scanner]
Line: query = 'SELECT...'
Impact: Security breach possible
2. High Complexity (MEDIUM) [Tool: complexity_calculator]
Complexity: 8/4 threshold
Impact: Hard to maintain and test
RECOMMENDATION: Fix SQL injection first (security), then refactor (maintainability)"
Impact: Synthesized, prioritized findings vs. disconnected results
4. Error Handling and Fallbacks
What It Is: Learning how to handle tool failures gracefully
What GEPA Learns: - What to do when tool fails - Fallback strategies - Error message interpretation - Alternative tool selection
Before Optimization:
Tool fails ā Agent gives up
"Unable to analyze code."
After GEPA Optimization:
Learned Strategies:
- Tool fails ā Try alternative tool
- All tools fail ā Use pattern matching from training data
- Partial failure ā Continue with available results
Response: "Complexity tool unavailable. Based on pattern analysis:
Function has 6 nested conditions, likely high complexity.
Recommend refactoring."
Impact: Robust performance even when tools fail
Before/After Comparison
Scenario: Performance Code Review
Input Code:
def find_duplicates(items):
duplicates = []
for i in range(len(items)):
for j in range(len(items)):
if i != j and items[i] == items[j]:
if items[i] not in duplicates:
duplicates.append(items[i])
return duplicates
Before Tool Optimization:
Tool Usage: None (agent doesn't know to use performance_analyzer)
Agent Response:
"This code could be more efficient."
After GEPA Tool Optimization:
Tool Usage:
1. performance_analyzer(code)
ā Detects: Nested loops, O(n²) complexity
2. complexity_calculator(code)
ā Returns: Cyclomatic complexity = 4
Learned Strategy: "Use both tools for loop analysis"
Agent Response:
"Performance Issue Detected
Complexity Analysis [Tool: complexity_calculator]:
- Cyclomatic Complexity: 4
- Nested loops detected
Performance Analysis [Tool: performance_analyzer]:
- Time Complexity: O(n²)
- Space Complexity: O(n)
- Issue: Nested iteration over same list
Impact: For 1000 items, 1,000,000 iterations
Optimized Solution (O(n)):
```python
def find_duplicates(items):
seen = set()
duplicates = set()
for item in items:
if item in seen:
duplicates.add(item)
seen.add(item)
return list(duplicates)
Performance Improvement: O(n²) ā O(n) (1000x faster for large lists)"
**Improvement**: From vague ā Specific metrics + Executable solution
---
## How GEPA Learns Tool Strategies
### The Optimization Process
1. **Analysis Phase**
```
GEPA Observes:
- Agent saw nested loops but didn't use performance_analyzer
- Agent gave vague "could be better" response
- RSpec-style BDD scenario expected O(n²) mention and solution
- Scenario FAILED
```
2. **Reflection Phase**
```
GEPA Reflection:
"The agent should use performance_analyzer when detecting nested loops.
The tool would have identified O(n²) complexity and suggested O(n) solution.
Need to learn: nested loops ā performance_analyzer"
```
3. **Mutation Phase**
```
GEPA Tests:
- Strategy 1: "Always use performance_analyzer"
- Strategy 2: "Use performance_analyzer when detecting loops"
- Strategy 3: "Use performance_analyzer for nested loops only"
```
4. **Evaluation Phase**
```
Results:
- Strategy 1: 50% (too many false positives)
- Strategy 2: 75% (good but misses single loops)
- Strategy 3: 95% (precise!) ā Winner!
```
5. **Selection Phase**
```
GEPA Keeps: Strategy 3
Next: Learn to combine with complexity_calculator
```
**Result**: Learned precise tool selection patterns
---
## Best Practices
### 1. Provide Diverse Tool Categories
```yaml
tools:
categories:
- code_analysis # Static analysis tools
- security # Security scanning tools
- utilities # General utilities
specific_tools:
- complexity_calculator
- security_scanner
- performance_analyzer
GEPA learns which category for which scenario.
2. Define Tool Purposes in RSpec-Style BDD
feature_specifications:
scenarios:
- name: complexity_detection
description: Agent should use complexity_calculator for nested code
input:
code: [Nested conditionals]
expected_output:
review: Must include "complexity: 8" (from tool)
GEPA learns: This scenario requires complexity_calculator.
3. Show Tool Usage in Datasets
code,review
"[nested loops]","Performance: O(n²) [Tool: performance_analyzer]"
"[SQL concat]","SQL Injection [Tool: security_scanner]"
GEPA learns tool patterns from real examples.
4. Enable Tool Reflection
tools:
max_iterations: 5 # Allow multi-step tool usage
timeout: 30 # Per-tool timeout
Common Tool Patterns GEPA Learns
Pattern 1: Conditional Tool Usage
Before: Use tools randomly
After: "Use complexity_calculator only for functions >5 lines with conditionals"
Pattern 2: Tool Chaining
Before: Use one tool in isolation
After: "Run security_scanner ā If findings, use vulnerability_db for severity ā Cite from knowledge base"
Pattern 3: Tool Result Validation
Before: Trust tool output blindly
After: "If complexity_calculator returns unexpected value, verify with manual pattern check"
Pattern 4: Multi-Tool Synthesis
Before: Report each tool result separately
After: "Combine security_scanner + complexity_calculator + code_smell_detector for comprehensive review"
Metrics and Results
What Gets Measured
- Tool Selection Accuracy: % of scenarios where correct tools used
- Tool Usage Rate: % of scenarios where tools should be used and are
- Output Synthesis Quality: % of multi-tool outputs properly combined
- Error Handling: % of tool failures handled gracefully
Typical Improvements
| Metric | Before | After | Improvement |
| Tool Selection Accuracy | 25% | 100% | +75% |
| Tool Usage Rate | 30% | 95% | +65% |
| Multi-Tool Synthesis | 20% | 85% | +65% |
| Error Handling | 0% | 90% | +90% |
Quick Start
Enable Tool Optimization
spec:
# Tool configuration
tools:
enabled: true
categories:
- code_analysis
- security
specific_tools:
- complexity_calculator
- security_scanner
# GEPA automatically optimizes tool usage
optimization:
optimizer:
name: GEPA
params:
auto: medium
super agent compile your_agent
super agent optimize your_agent --auto medium
GEPA will learn optimal tool selection strategies!
Advanced: Tool-Specific Configuration
Fine-Tune Tool Behavior
tools:
enabled: true
max_iterations: 5 # Allow multi-step tool usage
timeout: 30 # Per-tool timeout
retry_on_failure: true # Retry failed tools
# Tool categories
categories:
- code_analysis
- security
- utilities
# Specific tools with configs
specific_tools:
- name: complexity_calculator
params:
threshold: 4
- name: security_scanner
params:
rules: ["sql-injection", "xss", "secrets"]
GEPA learns optimal parameters through optimization.
Common Tool Strategies GEPA Learns
Strategy 1: Pattern-Based Selection
Before: Use all tools for everything
After: "Use security_scanner only when code contains: string concatenation, user input, DB queries, file operations"
Strategy 2: Progressive Tool Usage
Before: Use all tools upfront
After: "Start with security_scanner. If issues found, then use vulnerability_db for details. If no security issues, check complexity_calculator"
Strategy 3: Tool Result Interpretation
Before: Report tool output verbatim
After: "Complexity: 8 ā Explain: 'This exceeds threshold of 4, making code hard to test. Recommend refactoring.'"
Strategy 4: Multi-Tool Combination
Before: Report each tool separately
After: "Combine security_scanner + complexity_calculator ā Comprehensive review: 'Code has CRITICAL security issue (priority 1) and MEDIUM complexity issue (priority 2)'"
Integration with Other Layers
Tool optimization amplifies other optimizations:
Tools + Prompts:
Optimized Prompt: "Use complexity_calculator for nested conditions"
Optimized Tool: Correctly identifies nested conditions ā Calls tool
ā Agent provides metric-driven feedback
Tools + RAG:
Optimized RAG: Retrieves complexity best practices doc
Optimized Tool: Calculates actual complexity = 8
ā Agent says: "Your complexity (8) exceeds recommended (4) per Clean Code guidelines"
Tools + Memory:
Optimized Memory: Recalls "Previous similar code had complexity 7, we refactored"
Optimized Tool: Calculates current complexity = 8
ā Agent says: "Similar to issue #47 (complexity 7). Recommend same refactoring approach"
Real-World Example
Use Case: Comprehensive Code Analysis
Available Tools:
- complexity_calculator - Cyclomatic complexity
- security_scanner - Vulnerability detection
- performance_analyzer - Performance patterns
- code_smell_detector - Code quality issues
Input Code:
password = "admin123" # Hardcoded secret
def login(user):
# SQL injection vulnerability
query = "SELECT * FROM users WHERE username='" + user + "'"
result = db.execute(query)
# High complexity (nested conditionals)
if result:
if result['password'] == password:
if result['active']:
if result['verified']:
return True
return False
Before Tool Optimization:
Tool Usage: None (doesn't know which tools to use)
Response: "Your code has some issues."
After GEPA Tool Optimization:
Learned Multi-Tool Strategy:
Step 1: security_scanner
ā Detects: Hardcoded password, SQL injection, plaintext password check
Step 2: complexity_calculator
ā Calculates: Cyclomatic complexity = 5
Step 3: Synthesis
ā Prioritizes: Security (CRITICAL) before complexity (MEDIUM)
Agent Response:
"Multiple Critical Issues Detected
1. Hardcoded Credentials (CRITICAL) [Tool: security_scanner]
Line: password = 'admin123'
Risk: Security breach if code exposed
2. SQL Injection (CRITICAL) [Tool: security_scanner]
Line: query = 'SELECT...' + user + '...'
Risk: Database compromise possible
3. Plaintext Password Comparison (CRITICAL) [Tool: security_scanner]
Line: result['password'] == password
Risk: Passwords should be hashed
4. High Cyclomatic Complexity (MEDIUM) [Tool: complexity_calculator]
Complexity: 5/4 threshold
Issue: Nested conditionals reduce maintainability
PRIORITY:
1. Fix security issues immediately (3 CRITICAL)
2. Refactor complexity after security fixed"
Improvement: From no tool usage ā Strategic multi-tool orchestration
How GEPA Learns Tool Strategies
Example: Learning to Use complexity_calculator
Iteration 1:
Scenario: Code with nested conditionals
Agent: Doesn't use complexity_calculator
Result: FAIL (expected "complexity: 5" in output)
GEPA Reflection:
"Agent failed to use complexity_calculator for nested conditionals.
Tool is available but not invoked. Need to learn pattern."
Iteration 2:
Modified Prompt: "For code with if/else statements, use complexity_calculator"
Agent: Uses tool for ALL if statements (even simple ones)
Result: PARTIAL (too broad, slow)
GEPA Reflection:
"Tool usage too aggressive. Should only use for nested (>3 levels)"
Iteration 3:
Refined Prompt: "Use complexity_calculator when detecting >3 nested conditionals"
Agent: Uses tool selectively for complex code only
Result: PASS (100% accuracy)
GEPA: Strategy learned! Keep this pattern.
Result: Learned precise tool invocation pattern through reflection
Troubleshooting
Issue: Tools Not Being Used
Symptoms: Agent has tools but doesn't call them
Solutions: 1. Add RSpec-style BDD scenarios requiring tool outputs 2. Include tool usage examples in datasets 3. Increase optimization iterations 4. Check tool registration is working
Issue: Wrong Tools Used
Symptoms: Agent uses inappropriate tools for scenarios
Solutions: 1. Add more specific RSpec-style BDD scenarios 2. Increase training examples showing correct tool usage 3. Use tool categories to group related tools 4. Add tool descriptions to help GEPA understand purpose
Issue: Tool Outputs Not Integrated
Symptoms: Tool called but output not used in response
Solutions: 1. Require tool outputs in expected_output of RSpec-style BDD scenarios 2. Optimize prompts to include "cite tool results" 3. Add examples showing proper tool citation
Related Guides
- š¬ Prompt Optimization - Optimize instructions
- š RAG Optimization - Optimize knowledge retrieval
- š§ Memory Optimization - Optimize context
- š Protocol Optimization - Optimize MCP usage
- šÆ Full-Stack Example - See all layers
- Tool Development Guide - Create custom tools
- MCP Optimization Tutorial - Advanced tool protocols
Next: Learn how GEPA optimizes memory and context selection ā