Performance Guide¶

Optimizing CodeOptiX performance for different use cases.

Performance Optimization Strategies¶

For Local Development¶

Use Ollama for fastest evaluation:

# Install Ollama and pull a model
ollama pull llama3.2

# Run evaluation (no API key needed)
codeoptix eval \
  --agent claude-code \
  --behaviors insecure-code \
  --llm-provider ollama

Benefits: - ✅ No API rate limits - ✅ No network latency - ✅ No API costs - ✅ Runs entirely locally

For CI/CD Pipelines¶

Use the ci command optimized for automation:

codeoptix ci \
  --agent codex \
  --behaviors insecure-code \
  --fail-on-failure \
  --output-format summary

Configuration for CI:

evaluation:
  scenario_generator:
    num_scenarios: 2  # Reduce from default 5
    use_bloom: false  # Disable complex scenario generation

evolution:
  max_iterations: 1   # Reduce evolution iterations

For Production Evaluation¶

Use cloud providers with optimized settings:

adapter:
  llm_config:
    provider: anthropic
    model: claude-opus-4-5-20251101  # Latest Opus model
    temperature: 0.1  # Lower temperature for consistency

evaluation:
  scenario_generator:
    num_scenarios: 3
    use_bloom: true

  static_analysis:
    bandit: true
    safety: true
    # Disable slower linters
    pylint: false
    mypy: false

Benchmarking Performance¶

Run Performance Tests¶

# Time a full evaluation
time codeoptix eval \
  --agent codex \
  --behaviors "insecure-code,vacuous-tests" \
  --llm-provider openai

Profile Memory Usage¶

# Monitor memory during evaluation
codeoptix eval --agent codex --behaviors insecure-code &
pid=$!
while kill -0 $pid 2>/dev/null; do
    ps -o pid,ppid,cmd,%mem,%cpu -p $pid
    sleep 1
done

Compare Different Configurations¶

# Test with different providers
for provider in openai anthropic google ollama; do
    echo "Testing $provider..."
    time codeoptix eval --agent codex --behaviors insecure-code --llm-provider $provider
done

Performance by Component¶

LLM Providers (API Response Time)¶

Provider	Model	Avg Response Time	Cost
Ollama	llama3.2	2-5s	Free
Anthropic	claude-opus-4-5	3-8s	$$$
OpenAI	gpt-5.2	2-6s	$$$
Google	gemini-3-pro	4-10s	$$

Behaviors (Evaluation Time)¶

Behavior	Complexity	Avg Time
insecure-code	Medium	10-30s
vacuous-tests	Low	5-15s
plan-drift	High	20-60s

Scenario Generation¶

Without Bloom: 5-15s per scenario
With Bloom: 30-90s per scenario
Recommendation: Use Bloom only when needed for complex behaviors

Memory Optimization¶

Large Codebases¶

For evaluating large codebases:

evaluation:
  static_analysis:
    # Enable memory-efficient linters
    bandit: true
    safety: true
    ruff: true

    # Disable memory-intensive linters
    pylint: false
    mypy: false

linters:
  # Limit concurrent linters
  max_concurrent: 2

Resource Constraints¶

For systems with limited resources:

# Use smaller Ollama models
ollama pull llama3.2:1b  # 1B parameter model

# Limit evaluation scope
codeoptix eval \
  --agent claude-code \
  --behaviors insecure-code \
  --config minimal-config.yaml

Caching and Reuse¶

Reuse Evaluation Results¶

# Save results for reuse
codeoptix eval \
  --agent codex \
  --behaviors insecure-code \
  --output results.json

# Generate reflection from saved results
codeoptix reflect --input results.json

# Evolve from saved results
codeoptix evolve --input results.json

Cache LLM Responses¶

For repeated evaluations of similar code:

# Implement caching in custom adapter
from functools import lru_cache

class CachedLLMClient:
    def __init__(self, client):
        self.client = client

    @lru_cache(maxsize=100)
    def chat_completion(self, messages, model, **kwargs):
        # Cache based on prompt content
        prompt_hash = hash(str(messages))
        return self.client.chat_completion(messages, model, **kwargs)

Scaling Considerations¶

Multiple Agents¶

Evaluate multiple agents in parallel:

# Run evaluations in background
for agent in claude-code codex gemini-cli; do
    codeoptix eval --agent $agent --behaviors insecure-code &
done
wait

Batch Processing¶

Process multiple evaluation requests:

from codeoptix.evaluation import EvaluationEngine
from concurrent.futures import ThreadPoolExecutor

def evaluate_batch(requests):
    with ThreadPoolExecutor(max_workers=4) as executor:
        futures = [
            executor.submit(evaluate_single, req)
            for req in requests
        ]
        return [f.result() for f in futures]

Monitoring and Alerting¶

Set Up Monitoring¶

# Monitor evaluation performance
codeoptix eval --agent codex --behaviors insecure-code 2>&1 | tee eval.log

# Check for errors
if grep -q "ERROR\|FAILED" eval.log; then
    echo "Evaluation failed"
    exit 1
fi

Performance Alerts¶

# Alert if evaluation takes too long
timeout 300 codeoptix eval --agent codex --behaviors insecure-code

if [ $? -eq 124 ]; then
    echo "Evaluation timed out after 5 minutes"
    # Send alert
fi

Best Practices¶

Development Workflow¶

Local Development: Use Ollama for fast iteration
Pre-commit: Use codeoptix lint for quick checks
CI/CD: Use codeoptix ci for automated quality gates
Production: Use cloud providers with optimized configs

Configuration Templates¶

Fast Development:

evaluation:
  scenario_generator:
    num_scenarios: 1
    use_bloom: false

llm:
  provider: ollama
  model: llama3.2

Production Quality:

evaluation:
  scenario_generator:
    num_scenarios: 5
    use_bloom: true

  static_analysis:
    bandit: true
    safety: true
    ruff: true

llm:
  provider: anthropic
  model: claude-opus-4-5-20251101

CI/CD Optimized:

evaluation:
  scenario_generator:
    num_scenarios: 2
    use_bloom: false

output:
  format: summary

behavior:
  fail_on_failure: true