Skip to content

Quick Start

Get up and running with CodeOptiX in 5 minutes! This guide will walk you through your first evaluation.


Step 1: Install CodeOptiX

If you haven't already, install CodeOptiX:

pip install codeoptix

Step 2: Choose Your Setup

Option A: Ollama (Free, No API Key Required) ๐Ÿ†“

Perfect for getting started! Use local Ollama models - no API keys, no costs, works offline.

# Install Ollama (https://ollama.com)
# macOS: brew install ollama
# Linux: curl -fsSL https://ollama.com/install.sh | sh

# Start Ollama
ollama serve

# Pull a model (in another terminal)
ollama pull llama3.2:3b

# Run evaluation
codeoptix eval \
  --agent basic \
  --behaviors insecure-code \
  --llm-provider ollama

Option B: Cloud Providers (Requires API Key) โ˜๏ธ

Use OpenAI, Anthropic, or Google models for more advanced evaluations.

Set your API key:

export OPENAI_API_KEY="sk-your-api-key-here"
# OR
export ANTHROPIC_API_KEY="sk-ant-your-api-key-here"
# OR
export GOOGLE_API_KEY="your-api-key-here"

Step 3: Run Your First Evaluation

Start with Ollama - It's Free & Works Offline!

Recommended for first-time users! Skip API keys and start evaluating immediately.

codeoptix eval \
  --agent basic \
  --behaviors insecure-code \
  --llm-provider ollama

Expected Output:

๐Ÿ” CodeOptiX Evaluation
============================================================
๐Ÿ“Š Agent: basic
๐Ÿ“‹ Behavior(s): insecure-code
โœ… Adapter created: basic
๐Ÿง  Using local Ollama provider.

๐Ÿš€ Running evaluation...
============================================================
โœ… Evaluation Complete!
============================================================
๐Ÿ“Š Overall Score: 85.71%
๐Ÿ“ Results: .codeoptix/artifacts/results_*.json

Advanced Evaluation with Cloud Providers

For more advanced analysis using latest models:

# OpenAI GPT-5.2
codeoptix eval \
  --agent basic \
  --behaviors insecure-code \
  --llm-provider openai

# Anthropic Claude Opus 4.5
codeoptix eval \
  --agent claude-code \
  --behaviors insecure-code \
  --llm-provider anthropic

What Happens During Evaluation

All commands will:

  • โœ… Create the specified agent adapter
  • Sets up the evaluation environment
  • โœ… Generate test scenarios
  • Creates diverse security test cases automatically
  • โœ… Run behavioral analysis
  • Evaluates the agent against each scenario
  • โœ… Save detailed results
  • Stores everything in .codeoptix/artifacts/results_*.json

Step 4: Check Your Results

View Evaluation Summary

CodeOptiX automatically saves results. Check them:

# List all evaluation runs
codeoptix list-runs

# View detailed results (requires jq for pretty printing)
cat .codeoptix/artifacts/results_*.json | jq .

Understanding Your Results

High Score (80-100%): Your agent performs well on security evaluation Medium Score (50-79%): Some security issues detected - review recommendations Low Score (0-49%): Significant security concerns - needs improvement

Sample Results:

{
  "run_id": "7d42c92c",
  "overall_score": 0.857,  // 85.7% - Good performance!
  "behaviors": {
    "insecure-code": {
      "score": 0.857,
      "passed": true,        // โœ… Evaluation passed
      "evidence": []         // No critical issues found
    }
  }
}

Key Metrics: - overall_score: 0.0 to 1.0 (higher is better) - passed: true if behavior requirements met - evidence: Specific issues or examples found


Step 5: Generate Reflection Report

Get deep insights into your agent's performance:

codeoptix reflect --input .codeoptix/artifacts/results_*.json

This generates a comprehensive reflection report explaining:

  • โœ… What went well
  • Analysis of successful behaviors and patterns
  • ๐Ÿ” What needs improvement
  • Identification of problematic patterns
  • ๐Ÿ”ง Root causes of issues
  • Deep analysis of why problems occurred
  • ๐Ÿ’ก Actionable recommendations
  • Specific suggestions for improvement

Step 6: Evolve the Agent (Advanced)

Automatically improve your agent's prompts using AI:

codeoptix evolve \
  --input .codeoptix/artifacts/results_*.json \
  --iterations 2

Evolution Process:

  • ๐Ÿ” Analyzes evaluation results
  • Identifies patterns and issues
  • ๐Ÿง  Generates improved prompts
  • Uses GEPA optimization algorithm
  • ๐Ÿงช Tests new prompts
  • Validates improvements work
  • ๐Ÿ’พ Saves evolved prompts
  • Stores in .codeoptix/artifacts/evolved_prompts_*.yaml

Evolution requires API keys

This advanced feature needs cloud LLM access for the optimization process.


Complete Python Example

Here's a complete example using Ollama (no API keys needed):

from codeoptix.adapters.factory import create_adapter
from codeoptix.evaluation import EvaluationEngine
from codeoptix.utils.llm import create_llm_client, LLMProvider

# 1. Create a basic adapter with Ollama
adapter = create_adapter("basic", {
    "llm_config": {
        "provider": "ollama",
        "model": "llama3.2:3b"  # Use any installed Ollama model
    }
})

# 2. Create evaluation engine
llm_client = create_llm_client(LLMProvider.OLLAMA)
engine = EvaluationEngine(adapter, llm_client)

# 3. Evaluate behaviors
results = engine.evaluate_behaviors(
    behavior_names=["insecure-code"]
)

# 4. Print results
print(f"Overall Score: {results['overall_score']:.1%}")
for behavior_name, behavior_data in results['behaviors'].items():
    status = "โœ… PASS" if behavior_data['passed'] else "โŒ FAIL"
    print(f"{behavior_name}: {behavior_data['score']:.1%} {status}")

Expected Output:

Overall Score: 85.7%
insecure-code: 85.7% โœ… PASS


What's Next?

Now that you've run your first evaluation:

  1. Your First Evaluation - Detailed walkthrough
  2. Core Concepts - Understand how CodeOptiX works
  3. Python API Guide - Advanced usage
  4. CLI Usage - All CLI commands

Common Commands

Here are the most common commands you'll use:

# Evaluate agent
codeoptix eval --agent codex --behaviors insecure-code

# Generate reflection
codeoptix reflect --input results.json

# Evolve prompts
codeoptix evolve --input results.json

# Run full pipeline
codeoptix run --agent codex --behaviors insecure-code --evolve

# List all runs
codeoptix list-runs

Tips for Beginners

Start Simple

Begin with a single behavior:

codeoptix eval --agent codex --behaviors insecure-code

Use Context

Provide context for better evaluations:

codeoptix eval \
  --agent codex \
  --behaviors plan-drift \
  --context '{"plan": "Create a secure API"}'

Check Results

Always review the results:

codeoptix reflect --input results.json

Troubleshooting

"API key not found"

Make sure you've set your API key:

echo $OPENAI_API_KEY

"Agent not found"

Check that you're using a supported agent:

  • codex - OpenAI GPT-5.2
  • claude-code - Anthropic Claude (Sonnet 4.5, Opus 4.5)
  • gemini-cli - Google Gemini (Gemini 3, Gemini 3 Flash)

"Behavior not found"

Use one of the built-in behaviors:

  • insecure-code - Security vulnerabilities
  • vacuous-tests - Test quality
  • plan-drift - Plan alignment

Need Help?