Quick Start¶
Get up and running with CodeOptiX in 5 minutes! This guide will walk you through your first evaluation.
Step 1: Install CodeOptiX¶
If you haven't already, install CodeOptiX:
Step 2: Choose Your Setup¶
Option A: Ollama (Free, No API Key Required) ๐¶
Perfect for getting started! Use local Ollama models - no API keys, no costs, works offline.
# Install Ollama (https://ollama.com)
# macOS: brew install ollama
# Linux: curl -fsSL https://ollama.com/install.sh | sh
# Start Ollama
ollama serve
# Pull a model (in another terminal)
ollama pull llama3.2:3b
# Run evaluation
codeoptix eval \
--agent basic \
--behaviors insecure-code \
--llm-provider ollama
Option B: Cloud Providers (Requires API Key) โ๏ธ¶
Use OpenAI, Anthropic, or Google models for more advanced evaluations.
Set your API key:
export OPENAI_API_KEY="sk-your-api-key-here"
# OR
export ANTHROPIC_API_KEY="sk-ant-your-api-key-here"
# OR
export GOOGLE_API_KEY="your-api-key-here"
Step 3: Run Your First Evaluation¶
Start with Ollama - It's Free & Works Offline!
Recommended for first-time users! Skip API keys and start evaluating immediately.
Quick Test with Ollama (Recommended)¶
Expected Output:
๐ CodeOptiX Evaluation
============================================================
๐ Agent: basic
๐ Behavior(s): insecure-code
โ
Adapter created: basic
๐ง Using local Ollama provider.
๐ Running evaluation...
============================================================
โ
Evaluation Complete!
============================================================
๐ Overall Score: 85.71%
๐ Results: .codeoptix/artifacts/results_*.json
Advanced Evaluation with Cloud Providers¶
For more advanced analysis using latest models:
# OpenAI GPT-5.2
codeoptix eval \
--agent basic \
--behaviors insecure-code \
--llm-provider openai
# Anthropic Claude Opus 4.5
codeoptix eval \
--agent claude-code \
--behaviors insecure-code \
--llm-provider anthropic
What Happens During Evaluation¶
All commands will:
- โ Create the specified agent adapter
- Sets up the evaluation environment
- โ Generate test scenarios
- Creates diverse security test cases automatically
- โ Run behavioral analysis
- Evaluates the agent against each scenario
- โ Save detailed results
- Stores everything in
.codeoptix/artifacts/results_*.json
Step 4: Check Your Results¶
View Evaluation Summary¶
CodeOptiX automatically saves results. Check them:
# List all evaluation runs
codeoptix list-runs
# View detailed results (requires jq for pretty printing)
cat .codeoptix/artifacts/results_*.json | jq .
Understanding Your Results¶
High Score (80-100%): Your agent performs well on security evaluation Medium Score (50-79%): Some security issues detected - review recommendations Low Score (0-49%): Significant security concerns - needs improvement
Sample Results:
{
"run_id": "7d42c92c",
"overall_score": 0.857, // 85.7% - Good performance!
"behaviors": {
"insecure-code": {
"score": 0.857,
"passed": true, // โ
Evaluation passed
"evidence": [] // No critical issues found
}
}
}
Key Metrics: - overall_score: 0.0 to 1.0 (higher is better) - passed: true if behavior requirements met - evidence: Specific issues or examples found
Step 5: Generate Reflection Report¶
Get deep insights into your agent's performance:
This generates a comprehensive reflection report explaining:
- โ What went well
- Analysis of successful behaviors and patterns
- ๐ What needs improvement
- Identification of problematic patterns
- ๐ง Root causes of issues
- Deep analysis of why problems occurred
- ๐ก Actionable recommendations
- Specific suggestions for improvement
Step 6: Evolve the Agent (Advanced)¶
Automatically improve your agent's prompts using AI:
Evolution Process:
- ๐ Analyzes evaluation results
- Identifies patterns and issues
- ๐ง Generates improved prompts
- Uses GEPA optimization algorithm
- ๐งช Tests new prompts
- Validates improvements work
- ๐พ Saves evolved prompts
- Stores in
.codeoptix/artifacts/evolved_prompts_*.yaml
Evolution requires API keys
This advanced feature needs cloud LLM access for the optimization process.
Complete Python Example¶
Here's a complete example using Ollama (no API keys needed):
from codeoptix.adapters.factory import create_adapter
from codeoptix.evaluation import EvaluationEngine
from codeoptix.utils.llm import create_llm_client, LLMProvider
# 1. Create a basic adapter with Ollama
adapter = create_adapter("basic", {
"llm_config": {
"provider": "ollama",
"model": "llama3.2:3b" # Use any installed Ollama model
}
})
# 2. Create evaluation engine
llm_client = create_llm_client(LLMProvider.OLLAMA)
engine = EvaluationEngine(adapter, llm_client)
# 3. Evaluate behaviors
results = engine.evaluate_behaviors(
behavior_names=["insecure-code"]
)
# 4. Print results
print(f"Overall Score: {results['overall_score']:.1%}")
for behavior_name, behavior_data in results['behaviors'].items():
status = "โ
PASS" if behavior_data['passed'] else "โ FAIL"
print(f"{behavior_name}: {behavior_data['score']:.1%} {status}")
Expected Output:
What's Next?¶
Now that you've run your first evaluation:
- Your First Evaluation - Detailed walkthrough
- Core Concepts - Understand how CodeOptiX works
- Python API Guide - Advanced usage
- CLI Usage - All CLI commands
Common Commands¶
Here are the most common commands you'll use:
# Evaluate agent
codeoptix eval --agent codex --behaviors insecure-code
# Generate reflection
codeoptix reflect --input results.json
# Evolve prompts
codeoptix evolve --input results.json
# Run full pipeline
codeoptix run --agent codex --behaviors insecure-code --evolve
# List all runs
codeoptix list-runs
Tips for Beginners¶
Start Simple¶
Begin with a single behavior:
Use Context¶
Provide context for better evaluations:
codeoptix eval \
--agent codex \
--behaviors plan-drift \
--context '{"plan": "Create a secure API"}'
Check Results¶
Always review the results:
Troubleshooting¶
"API key not found"¶
Make sure you've set your API key:
"Agent not found"¶
Check that you're using a supported agent:
codex- OpenAI GPT-5.2claude-code- Anthropic Claude (Sonnet 4.5, Opus 4.5)gemini-cli- Google Gemini (Gemini 3, Gemini 3 Flash)
"Behavior not found"¶
Use one of the built-in behaviors:
insecure-code- Security vulnerabilitiesvacuous-tests- Test qualityplan-drift- Plan alignment
Need Help?¶
- ๐ Read the full documentation
- ๐ฌ Ask questions in Discussions
- ๐ Report issues on GitHub