🐍 Pydantic AI Integration

SuperOptiX now supports Pydantic AI - a modern agent framework with native MCP support!

✅ Works great with Ollama 8b models (No API Keys Needed for Local Models!)

✅ Native MCP (Model Context Protocol) Support - Built-in tool integration

✅ Plain Text Output Mode - Natural responses without JSON formatting issues

✅ Model Settings - Full control over generation parameters

🎯 What is Pydantic AI?

Pydantic AI is a modern, type-safe framework for building AI agents with:

🎯 Type Safety: Structured outputs using Pydantic models
🔧 Tool Integration: Native MCP (Model Context Protocol) support for tools
🌐 Provider Agnostic: Works with OpenAI, Ollama, Anthropic, and 100+ LLMs
⚡ Async/Await: Built-in async support for high-performance applications
📊 Model Settings: Fine-grained control (max_tokens, top_p, etc.)
🔌 MCP Native: Direct integration with MCP servers for tool discovery

Perfect for production applications requiring type safety and reliable tool integration!

📦 Installation

pip install superoptix[frameworks-pydantic-ai]

Includes: - pydantic-ai 1.31.0 (exact version pinned) - SuperOptiX core with GEPA 0.0.17

Requirements: - Python 3.11+ - Git (for DSPy dependency)

🚀 Quick Start

1. Initialize Project

super init my_project
cd my_project

2. Pull Demo Agent

super agent pull developer

This pulls the developer agent playbook into your project. The agent comes pre-configured with: - Ollama model setup (default: llama3.1:8b) - BDD test scenarios - Optimization configuration

3. Configure Model

✅ Uses Ollama by Default! (FREE, no API keys needed!)

The developer agent defaults to Ollama llama3.1:8b:

spec:
  language_model:
    provider: ollama
    model: llama3.1:8b  # Pydantic AI auto-adds 'ollama:' prefix if needed
    api_base: http://localhost:11434

Just install Ollama and run:

brew install ollama  # macOS
ollama pull llama3.1:8b
super agent compile developer --framework pydantic-ai
super agent run developer --goal "Implement a user registration API endpoint with email validation"

Also Works With Cloud Models (requires API key):

# OpenAI GPT-4
spec:
  language_model:
    provider: openai
    model: gpt-4o  # Pydantic AI auto-detects provider
    # Set: export OPENAI_API_KEY="sk-..."

# Anthropic Claude
spec:
  language_model:
    provider: anthropic
    model: claude-3-5-sonnet
    # Set: export ANTHROPIC_API_KEY="sk-ant-..."

4. Run the Workflow

# Compile
super agent compile developer --framework pydantic-ai

# Evaluate
super agent evaluate developer

# Optimize with GEPA (OPTIONAL)
# ⚠️ WARNING: Only run if you have:
#   - High-end GPU or cloud GPU access
#   - Understanding of cost implications (many LLM API calls)
#   - Use local Ollama (ollama/llama3.1:8b) to avoid API charges
# ⚠️ WARNING: Requires high-end GPU and makes many LLM API calls
# Use local Ollama (ollama/llama3.1:8b) to avoid API costs

# Ultra fast: only 3 metric calls (~30 seconds - 1 minute)
super agent optimize developer --framework pydantic-ai --max-metric-calls 3 --reflection-lm ollama/llama3.1:8b

# Super light for quick test (~1-2 minutes, ~10 API calls)
super agent optimize developer --framework pydantic-ai --max-metric-calls 10 --reflection-lm ollama/llama3.1:8b

# Or use light mode for better results (~5-10 minutes, ~50-100 API calls)
super agent optimize developer --framework pydantic-ai --auto light --reflection-lm ollama/llama3.1:8b

# Run
super agent run developer --goal "Your task here"

📋 Creating Your Own Pydantic AI Playbook

Basic Structure

apiVersion: agent/v1
kind: AgentSpec
metadata:
  name: My Assistant
  id: my_assistant
  namespace: custom
  version: 1.0.0
  level: genies

spec:
  language_model:
    provider: ollama
    model: llama3.1:8b
    api_base: http://localhost:11434

  input_fields:
    - name: query
      type: str
      description: User query or question

  output_fields:
    - name: response
      type: str
      description: Generated response

  persona:
    role: Helpful AI Assistant
    goal: Provide clear and helpful responses
    backstory: I am an AI assistant trained to help users with their questions.

  # BDD Scenarios
  feature_specifications:
    scenarios:
      - name: Test scenario
        input:
          query: "Hello!"
        expected_output:
          response: "Greeting"
          expected_keywords:
            - hello

  optimization:
    optimizer:
      name: GEPA
      params:
        reflection_lm: llama3.1:8b
        auto: medium

Output Configuration

Define what your agent should return:

spec:
  output_fields:
    - name: implementation
      type: str
      description: Code implementation
    - name: explanation
      type: str
      description: Brief explanation of the code

Plain Text Mode (Default): The agent returns natural text responses, which works reliably with smaller models like llama3.1:8b. The output is mapped to your defined fields.

Example Output:

┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Aspect         ┃ Value                                   ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Implementation │ def add_numbers(a, b):                  │
│                │     """Add two numbers together."""     │
│                │     return a + b                        │
└────────────────┴─────────────────────────────────────────┘

This approach avoids JSON formatting issues that can occur with smaller models.

Model Settings

Configure generation parameters (excluding temperature as it's deprecated by OpenAI):

spec:
  language_model:
    provider: ollama
    model: llama3.1:8b
    max_tokens: 4000  # Default: 4000 (supports detailed responses)
    top_p: 0.9        # Optional: Nucleus sampling (0.0-1.0)
    frequency_penalty: 0.0  # Optional: Reduce repetition (-2.0 to 2.0)
    presence_penalty: 0.0   # Optional: Encourage new topics (-2.0 to 2.0)

Configuration Options:

max_tokens (default: 4000): Maximum number of tokens in the response.
Increase for longer responses (test plans, detailed code, comprehensive explanations)
Decrease for shorter responses (faster, cheaper)
Recommended values:
- Quick responses: 1000-2000
- Standard responses: 2000-4000 (default)
- Detailed/comprehensive: 4000-8000
- Very detailed: 8000-16000 (may require larger models)
top_p (optional): Nucleus sampling threshold (0.0-1.0). Controls diversity of output.
0.9-1.0: More creative, diverse responses
0.5-0.9: Balanced
0.0-0.5: More focused, deterministic
frequency_penalty (optional): Reduces repetition (-2.0 to 2.0)
Positive values: Reduce repetition
Negative values: Allow more repetition
presence_penalty (optional): Encourages new topics (-2.0 to 2.0)
Positive values: Encourage new topics
Negative values: Stay on topic

These settings are passed to Pydantic AI's ModelSettings class and control the model's generation behavior.

🔌 MCP (Model Context Protocol) Integration

Pydantic AI has native MCP support! You can connect to MCP servers directly in your playbook.

Basic MCP Configuration

spec:
  mcp:
    enabled: true
    servers:
      - name: filesystem
        type: stdio
        config:
          command: "npx"
          args: ["-y", "@modelcontextprotocol/server-filesystem", "/private/tmp"]  # Use /private/tmp on macOS (or /tmp on Linux)
        tool_prefix: "fs_"  # Optional: prefix to avoid naming conflicts

Supported MCP Server Types

1. Local stdio Server

Runs MCP server as a subprocess:

spec:
  mcp:
    enabled: true
    servers:
      - name: filesystem
        type: stdio
        config:
          command: "npx"
          args: ["-y", "@modelcontextprotocol/server-filesystem", "/private/tmp"]  # Use /private/tmp on macOS (or /tmp on Linux)
          env:  # Optional environment variables
            API_KEY: "${MY_API_KEY}"
          timeout: 30  # Optional timeout in seconds

2. Remote Streamable HTTP Server

Connects to a remote MCP server over HTTP:

spec:
  mcp:
    enabled: true
    servers:
      - name: weather_api
        type: streamable_http
        config:
          url: "https://mcp-server.com/mcp"
        tool_prefix: "weather_"

3. Remote SSE Server (Deprecated)

Connects to a remote MCP server using Server-Sent Events:

spec:
  mcp:
    enabled: true
    servers:
      - name: legacy_server
        type: sse
        config:
          url: "http://localhost:3001/sse"

Multiple MCP Servers

You can connect multiple MCP servers:

spec:
  mcp:
    enabled: true
    servers:
      - name: filesystem
        type: stdio
        config:
          command: "npx"
          args: ["-y", "@modelcontextprotocol/server-filesystem", "/private/tmp"]  # Use /private/tmp on macOS (or /tmp on Linux)
        tool_prefix: "fs_"

      - name: weather_api
        type: streamable_http
        config:
          url: "https://mcp-server.com/mcp"
        tool_prefix: "weather_"

Each server's tools will be available with their respective prefixes.

MCP Tool Optimization

⚠️ Resource Warning: MCP tool optimization runs two sequential phases, effectively doubling the resource usage. Only run this if you have adequate GPU/compute resources and understand the cost implications.

SuperOptiX can optimize both MCP tool descriptions AND agent instructions in a two-phase optimization process. This ensures your agent uses tools effectively AND understands its role clearly.

Enable MCP Tool Optimization

Add the optimization section under mcp in your playbook:

spec:
  mcp:
    enabled: true
    servers:
      - name: filesystem
        type: stdio
        config:
          command: "npx"
          args: ["-y", "@modelcontextprotocol/server-filesystem", "/private/tmp"]  # Use /private/tmp on macOS (or /tmp on Linux)
        tool_prefix: "fs_"  # Tools will be prefixed with fs_ at runtime

    # Enable MCP tool description optimization
    optimization:
      optimize_tool_descriptions: true
      # IMPORTANT: Use actual MCP server tool names (WITHOUT prefix)
      # The optimizer queries the server directly to find tools
      tool_names: ["read_file", "write_file", "list_directory"]

Important Notes: - Use actual MCP server tool names WITHOUT prefix - The optimizer queries the MCP server directly, which returns the original tool names (e.g., read_file, not fs_read_file) - The tool_prefix only affects how tools appear at runtime in the agent, not how the optimizer finds them - You can optimize multiple tools from the same server - Tool optimization uses the same training data (BDD scenarios) as instruction optimization

Common MCP Filesystem Server Tools: | Server Tool Name | Description | |-----------------|-------------| | read_file | Read file contents | | write_file | Write/create files | | list_directory | List directory contents | | create_directory | Create directories | | move_file | Move/rename files | | search_files | Search for files |

Two-Phase Optimization Process

When you run super agent optimize, SuperOptiX performs two sequential optimizations:

Phase 1: MCP Tool Description Optimization

What Gets Optimized: - Tool descriptions for each tool in tool_names - GEPA learns better descriptions that help the model understand: - When to use each tool - What each tool does - How to use each tool effectively

Example Transformation:

Before Optimization:

{
  "tool_description_fs_read_file": "Tool: fs_read_file",
  "tool_description_fs_write_file": "Tool: fs_write_file",
  "tool_description_fs_list_files": "Tool: fs_list_files"
}

After GEPA Optimization:

{
  "tool_description_fs_read_file": "Read file contents from the filesystem. Use when user asks to view, show, display, or read file contents. Returns the full text content of the specified file path. Requires a valid file path parameter.",
  "tool_description_fs_write_file": "Write content to files on the filesystem. Use when user asks to create, save, update, or write file contents. Requires file path and content parameters. Overwrites existing files.",
  "tool_description_fs_list_files": "List files and directories in a given path. Use when user asks to see what files are available, browse directories, find files, or explore the filesystem. Returns a list of files and folders in the specified directory."
}

Output File:

{project_name}/agents/{agent_name}/optimized/{agent_name}_mcp_tool_descriptions.json

Phase 2: Agent Instruction Optimization

What Gets Optimized: - Agent's system prompt/instructions (built from persona.role, persona.goal, persona.backstory, etc.) - GEPA learns better instructions that help the model: - Understand its role more clearly - Use tools more effectively - Generate better responses

Output File:

{project_name}/agents/{agent_name}/optimized/{agent_name}_pydantic_ai_optimized.json

Running Optimization

⚠️ Before Running: - Ensure you have high-end GPU or cloud GPU access - Understand that optimization makes many LLM API calls - Use local Ollama models (e.g., ollama/llama3.1:8b) to minimize costs - Cloud models (GPT-4, Claude) will incur significant API charges

# Quick test (super light - ~1-2 minutes, ~20 API calls)
# RECOMMENDED: Use local Ollama to avoid API costs
super agent optimize developer \
  --framework pydantic-ai \
  --max-metric-calls 20 \
  --reflection-lm ollama/llama3.1:8b

# Light mode for better results (~5-10 minutes, ~50-100 API calls)
# Use local Ollama: --reflection-lm ollama/llama3.1:8b
# Cloud models (costly): --reflection-lm openai/gpt-4o
super agent optimize developer \
  --framework pydantic-ai \
  --auto light \
  --reflection-lm ollama/llama3.1:8b

What You'll See:

🔧 Phase 1: Optimizing MCP Tool Descriptions
   Optimizing 3 tool(s): fs_read_file, fs_write_file, fs_list_files
   ✅ MCP tool optimization complete!
   Best score: 0.850
   Saved to: .../developer_mcp_tool_descriptions.json

⚡ Phase 2: Running GEPA optimization for instructions...
   Budget: light
   Training examples: 5
   Validation examples: 0
   ✅ Optimization complete!
   Best score: 0.920
   Saved to: .../developer_pydantic_ai_optimized.json

Using Optimized Results

The generated pipeline automatically loads optimized values:

Tool Descriptions: Applied when MCP servers are initialized
Instructions: Loaded when the agent is created

You don't need to manually apply the optimizations - the pipeline handles it automatically!

Complete Example

spec:
  language_model:
    provider: ollama
    model: llama3.1:8b  # Works great with plain text output mode

  mcp:
    enabled: true
    servers:
      - name: filesystem
        type: stdio
        config:
          command: "npx"
          args: ["-y", "@modelcontextprotocol/server-filesystem", "/private/tmp"]  # Use /private/tmp on macOS (or /tmp on Linux)
        tool_prefix: "fs_"  # Runtime prefix (tools become fs__read_file, etc.)

    # Enable tool optimization
    optimization:
      optimize_tool_descriptions: true
      # Use actual MCP server tool names (WITHOUT prefix)
      tool_names: ["read_file", "write_file", "list_directory"]

  # Agent instruction optimization (always runs)
  optimization:
    optimizer:
      name: GEPA
      params:
        auto: light
        reflection_lm: ollama/llama3.1:8b  # Use forward slash for LiteLLM

Best Practices

⚠️ When to Skip Optimization: - Your agent already performs well (pass rate > 80%) - You don't have high-end GPU or cloud GPU access - You want to avoid API costs - You're in early development/testing phase

Optimization is optional - many agents work great without it!

Use Tool Prefixes: Prevents naming conflicts when using multiple MCP servers
```
tool_prefix: "fs_"  # Tools become fs_read_file, fs_write_file, etc.
```
Optimize Key Tools: Focus on tools that are used frequently or are critical to your agent's functionality
Reflection Model: Use a smaller, faster model for reflection (GEPA runs it many times):
```
--reflection-lm ollama/llama3.1:8b  # Fast, good enough for reflection, FREE!
```
⚠️ Avoid cloud models (GPT-4, Claude) unless you understand the costs - they can cost $5-100+ per optimization run!

Task Model: Use a larger model for the actual agent (better quality):

model: llama3.1:8b  # Works well with plain text mode (local, free)
# or gpt-oss:120b for better quality (requires GPU)

Training Data: Ensure your BDD scenarios include examples that use MCP tools:

feature_specifications:
  scenarios:
    - name: read_config_file
      input:
        feature_requirement: "Read /private/tmp/config.json and tell me what database host is configured"  # Use /private/tmp on macOS
      expected_output:
        implementation: "localhost"  # Should use fs_read_file tool

Benefits of Two-Phase Optimization

✅ Better Tool Usage: Optimized descriptions help the model choose the right tool at the right time
✅ Better Instructions: Optimized prompts improve overall agent behavior
✅ Compound Effect: Both optimizations work together for maximum performance
✅ Automatic: Pipeline automatically applies both optimizations when available

Troubleshooting

Issue: Tool optimization fails with "None of the specified tools found"

Solution: Use actual MCP server tool names without prefix:

# ❌ Wrong - uses prefixed names
tool_names: ["fs_read_file", "fs_write_file"]

# ✅ Correct - uses actual server tool names
tool_names: ["read_file", "write_file", "list_directory"]

The optimizer queries the MCP server directly, which returns unprefixed tool names. The tool_prefix only affects runtime tool naming in the agent.

Issue: Tool optimization fails but instruction optimization succeeds

Solution: Check that: - Tool names match actual MCP server tool names (without prefix) - MCP server is accessible and tools are available - Training scenarios include tool usage examples

Issue: Optimization takes too long

Quick Solutions: - Super Light: Use --max-metric-calls 20 for fastest test (~1-2 minutes) - Light Mode: Use --auto light for balanced speed/quality (~5-10 minutes) - Use smaller reflection model: --reflection-lm ollama/llama3.1:8b - Reduce number of tools to optimize

🎬 MCP Demo Tutorial

For a complete step-by-step demo of MCP with Pydantic AI, including: - ✅ Quick start with pydantic-mcp demo agent - ✅ Setting up filesystem MCP server - ✅ Testing file operations (read, write, list) - ✅ Verified working examples with llama3.1:8b - ✅ Troubleshooting common issues

See: Pydantic AI MCP Demo Guide

Quick Start:

super init swe && cd swe
super agent pull pydantic-mcp
super agent compile pydantic-mcp --framework pydantic-ai
super agent run pydantic-mcp --goal "List all files in /private/tmp"  # Use /private/tmp on macOS

🎯 GEPA Optimization

⚠️ IMPORTANT: Resource Requirements

GEPA optimization is resource-intensive and should only be run when: - ✅ You have a high-end GPU (or cloud GPU access) - ✅ You understand the cost implications (many LLM API calls) - ✅ You have adequate time budget (5-60 minutes depending on settings)

Resource Usage: - Makes many LLM API calls (reflection + evaluation) - Can consume significant GPU memory and compute resources - Cloud API costs can add up quickly (especially with GPT-4, Claude, etc.)

For local testing: Use --max-metric-calls 20 with ollama/llama3.1:8b to minimize resource usage.

What Gets Optimized

Pydantic AI has one main optimizable variable: - instructions: The agent's system prompt (built from persona.role, persona.goal, persona.backstory, etc.)

How GEPA Optimizes Pydantic AI Agents

GEPA optimizes the instructions field by:

Analyzing BDD test scenarios to understand success criteria
Generating variations of the instructions prompt
Testing each variation against your evaluation scenarios
Selecting the best performer based on pass rate

Example transformation:

# Original (from playbook)
persona:
  role: Software Developer
  goal: Write clean, efficient code
  backstory: I am an experienced developer

→ instructions = "Software Developer\nGoal: Write clean, efficient code\nBackstory: I am an experienced developer"

# After GEPA optimization
→ instructions = "You are a Software Developer.

When writing code:
1. Ensure it is clean and maintainable
2. Follow best practices and conventions
3. Include proper error handling
4. Write comprehensive tests

Goal: Write clean, efficient code that meets requirements and is production-ready.

Backstory: I am an experienced developer with expertise in multiple programming languages and frameworks."

GEPA typically expands the instructions to be more explicit and structured, which improves agent behavior consistency.

Optimization Command

⚠️ Resource & Cost Warning:

GEPA optimization is resource-intensive and makes many LLM API calls: - Super Light: ~20 API calls (~$0.10-2.00 with cloud models) - Light: ~50-100 API calls (~$0.50-10.00 with cloud models)
- Medium: ~150-300 API calls (~$5-50 with cloud models) - Heavy: ~300-600 API calls (~$20-100+ with cloud models)

Recommendations: - ✅ Use local Ollama models (ollama/llama3.1:8b) to avoid API costs - ✅ Only optimize when you have high-end GPU or cloud GPU access - ✅ Start with --max-metric-calls 20 to test - ❌ Avoid cloud models (GPT-4, Claude) unless you understand the costs

Quick Test (Super Light) ⚡

⚠️ Resource Warning: Even "super light" optimization makes LLM API calls and can take time.

💡 Tip: If your playbook has auto: light or max_full_evals set, the CLI --max-metric-calls argument will override it. CLI arguments always take precedence.

For fastest optimization to test if it works:

# Very fast: only 3 metric calls (~30 seconds - 1 minute)
super agent optimize my_agent \
  --framework pydantic-ai \
  --max-metric-calls 3 \
  --reflection-lm ollama/llama3.1:8b

# Quick test: 10 metric calls (~1-2 minutes)
super agent optimize my_agent \
  --framework pydantic-ai \
  --max-metric-calls 10 \
  --reflection-lm ollama/llama3.1:8b

# Light optimization: 20 metric calls (~2-3 minutes)
super agent optimize my_agent \
  --framework pydantic-ai \
  --max-metric-calls 20 \
  --reflection-lm ollama/llama3.1:8b

Use this when: - Testing if optimization works - Limited resources/time - Quick iteration during development - Just need to verify the process - Using local Ollama models (recommended to avoid API costs)

Note: --max-metric-calls 20 limits total evaluations more precisely than --max-full-evals 1, ensuring faster completion.

Cost Tip: Use local ollama/llama3.1:8b for reflection_lm to avoid API costs. Cloud models (GPT-4, Claude) will incur charges.

Recommended (Balanced)

super agent optimize my_agent \
  --framework pydantic-ai \
  --auto light \
  --reflection-lm ollama/llama3.1:8b

Budget Options: - --auto light: Fast optimization (~2-5 iterations, ~5-10 minutes) ⭐ Recommended - --auto medium: Balanced optimization (~5-10 iterations, ~15-30 minutes) - --auto heavy: Thorough optimization (~10-20 iterations, ~30-60 minutes) - --max-full-evals N: Specify exact number of iterations (use 1 for super quick test) - --max-metric-calls N: Limit total metric evaluations

Optimization Results

Optimized instructions are saved to:

{project_name}/agents/{agent_name}/optimized/{agent_name}_pydantic_ai_optimized.json

The generated pipeline automatically loads optimized instructions if available.

🔍 Field Description Optimization

⚠️ IMPORTANT: Resource Requirements

Field description optimization is resource-intensive and should only be run when: - ✅ You have a high-end GPU (or cloud GPU access) - ✅ You understand the cost implications (additional LLM API calls) - ✅ You plan to use structured output mode (required for optimized descriptions to take effect)

What Gets Optimized

Field Description Optimization uses GEPA to optimize Pydantic model field descriptions (Field(description=...)) for structured output. This improves the model's understanding of what each output field should contain.

Requires: - output_fields defined in your playbook - optimize_field_descriptions: true in optimization config - Structured output mode enabled (output_mode: structured) to use optimized descriptions

How It Works

GEPA optimizes field descriptions by:

Extracting field descriptions from output_fields in your playbook
Creating evaluation scenarios based on your BDD test cases
Generating variations of each field description
Testing each variation to see which descriptions lead to better structured outputs
Selecting the best descriptions that improve structured data extraction accuracy

Example Transformation

Before Optimization:

spec:
  output_fields:
    - name: implementation
      type: string
      description: The code implementation of the feature

After GEPA Optimization:

{
  "original_descriptions": {
    "implementation": "The code implementation of the feature"
  },
  "optimized_descriptions": {
    "implementation": "Complete, production-ready code implementation with proper imports, error handling, and documentation. Include full function/class definitions, not pseudocode or descriptions."
  },
  "score": 0.95,
  "iterations": 3
}

The optimized description is more explicit about what the model should produce, leading to better structured output quality.

Enable Field Description Optimization

Add to your playbook's optimization section:

spec:
  output_fields:
    - name: implementation
      type: string
      description: The code implementation of the feature
      required: true

  optimization:
    optimize_field_descriptions: true  # Enable field description optimization
    optimizer:
      name: GEPA
      params:
        auto: light
        reflection_lm: ollama/llama3.1:8b

Important Notes: - Field description optimization runs as Phase 1.5 (between MCP tool optimization and instruction optimization) - It only runs if output_fields are defined in your playbook - Optimized descriptions are saved but only used when structured output mode is enabled

Running Field Description Optimization

# Quick test (super light - ~1-2 minutes)
super agent optimize developer \
  --framework pydantic-ai \
  --max-metric-calls 20 \
  --reflection-lm ollama/llama3.1:8b

# Light mode (~5-10 minutes)
super agent optimize developer \
  --framework pydantic-ai \
  --auto light \
  --reflection-lm ollama/llama3.1:8b

What You'll See:

🔧 Phase 1: Optimizing MCP Tool Descriptions (if enabled)
   ...

📋 Phase 1.5: Optimizing Field Descriptions
   Optimizing 1 field descriptions:
     - implementation: The code implementation of the feature

   ✅ Field description optimization complete!
   Best score: 0.95
   Saved to: .../developer_field_descriptions_optimized.json

⚡ Phase 2: Running GEPA optimization for instructions...
   ...

Output File

Optimized field descriptions are saved to:

{project_name}/agents/{agent_name}/optimized/{agent_name}_field_descriptions_optimized.json

File Format:

{
  "original_descriptions": {
    "implementation": "The code implementation of the feature"
  },
  "optimized_descriptions": {
    "implementation": "Complete, production-ready code implementation..."
  },
  "score": 0.95,
  "iterations": 3
}

Using Optimized Field Descriptions

Optimized descriptions are automatically used when: 1. Structured output mode is enabled (output_mode: structured) 2. The optimization file exists in the optimized/ directory 3. You've run super agent optimize with optimize_field_descriptions: true

The generated pipeline automatically loads and applies optimized descriptions when creating the BaseModel for structured output.

Benefits

✅ Better Structured Output: More explicit field descriptions improve the model's understanding
✅ Improved Accuracy: Optimized descriptions lead to better structured data extraction
✅ Type Safety: Works seamlessly with Pydantic's BaseModel validation
✅ Automatic: Pipeline automatically applies optimized descriptions when available

When to Use

Use field description optimization when: - ✅ You're using structured output mode - ✅ Your structured outputs aren't accurate enough - ✅ You have well-defined BDD test scenarios - ✅ You have adequate GPU/compute resources

Skip field description optimization when: - ❌ You're using plain text output mode (descriptions won't be used) - ❌ Your structured outputs already work well - ❌ You don't have resources for additional optimization - ❌ output_fields aren't defined in your playbook

📊 Structured Output Mode

Pydantic AI supports structured output using Pydantic BaseModel for type-safe, validated responses. SuperOptiX provides an opt-in structured output mode that uses optimized field descriptions when available.

What is Structured Output?

Structured Output uses Pydantic BaseModel to enforce type-safe responses: - ✅ Type Validation: Responses are validated against the BaseModel schema - ✅ Field Descriptions: Each field has a description that guides the model - ✅ Type Safety: Python type hints ensure correct data types - ✅ Automatic Parsing: Responses are automatically parsed into BaseModel instances

Default Mode (Plain Text): - Agent returns plain text strings - No JSON structure enforcement - Works great for code generation, explanations, etc. - Better compatibility with smaller models (8b)

Structured Output Mode (Opt-in): - Agent returns validated BaseModel instances - Type-safe, structured data - Uses optimized field descriptions when available - Requires larger models (70b+) for reliable results

Enable Structured Output

Add output_mode: structured to your playbook:

spec:
  output_mode: structured  # Enable structured output (opt-in, defaults to plain)
  output_fields:
    - name: implementation
      type: string
      description: The code implementation of the feature
      required: true

Requirements: - output_fields must be defined - Requires larger models (70b+) for reliable structured output - Works best with optimized field descriptions

Example Playbook

apiVersion: agent/v1
kind: AgentSpec
metadata:
  name: Developer Assistant
  id: developer
spec:
  # Enable structured output
  output_mode: structured

  language_model:
    provider: ollama
    model: llama3.1:70b  # Larger model recommended for structured output
    api_base: http://localhost:11434

  output_fields:
    - name: implementation
      type: string
      description: The code implementation of the feature
      required: true

  optimization:
    optimize_field_descriptions: true  # Optimize field descriptions
    optimizer:
      name: GEPA
      params:
        auto: light
        reflection_lm: ollama/llama3.1:8b

How It Works

When structured output is enabled:

BaseModel Creation: A Pydantic BaseModel is created from output_fields
Optimized Descriptions: If available, optimized field descriptions are used
Agent Configuration: Agent is configured with output_type=BaseModel
Response Validation: Model responses are validated against the BaseModel
Type-Safe Output: Responses are returned as BaseModel instances

Generated Code:

# BaseModel created from output_fields
class DeveloperOutput(BaseModel):
    implementation: str = Field(
        description="Complete, production-ready code implementation..."  # Optimized description if available
    )

# Agent configured with structured output
agent = Agent(
    model=model,
    instructions=instructions,
    output_type=DeveloperOutput  # Structured output enabled
)

Verification

When running an agent with structured output, you'll see:

✅ Using structured output mode (BaseModel)
   Output Model: DeveloperOutput
   ✅ Using optimized field descriptions

Response Output:

✅ Structured Output Received!
   Type: DeveloperOutput
   Model: DeveloperOutput
   📊 Pydantic v2 model validated successfully
   📋 Structured Data (JSON):
   {
     "implementation": "...actual code here..."
   }

Benefits

✅ Type Safety: Responses are validated against Pydantic models
✅ Better Structure: Enforces consistent output format
✅ Optimized Descriptions: Uses GEPA-optimized field descriptions
✅ Validation: Automatic validation ensures correct data types
✅ Integration: Works seamlessly with Pydantic AI's native structured output

When to Use Structured Output

Use structured output when: - ✅ You need type-safe, validated responses - ✅ You're using larger models (70b+) - ✅ You have well-defined output schemas - ✅ You've optimized field descriptions - ✅ You need consistent data structure

Use plain text output when: - ✅ You're using smaller models (8b) - ✅ You want maximum compatibility - ✅ Output format is flexible - ✅ You don't need structured validation - ✅ Default mode - works great for most use cases

Switching Between Modes

Enable structured output:

spec:
  output_mode: structured

Disable (use plain text - default):

spec:
  # output_mode: plain  # Default, can omit
  # or remove output_mode entirely

Important: Always recompile after changing output_mode:

super agent compile developer --framework pydantic-ai

📈 Performance Characteristics

Baseline Performance

Task: Code generation and explanation Model: Ollama llama3.1:8b Framework: Pydantic AI

Pydantic AI achieves good baseline performance with local Ollama models. Results vary based on: - Hardware capabilities (RAM, CPU/GPU) - Model size and quality (8b vs 70b) - BDD scenario complexity - Model settings (max_tokens, top_p, etc.)

Framework Comparison

Pydantic AI strengths: - ✅ Type-safe structured outputs (validated by Pydantic) - ✅ Native MCP support (no extra configuration) - ✅ Modern async/await API - ✅ Clean, simple architecture - ✅ Works seamlessly with Ollama

DSPy strengths: - ✅ More optimization targets (all signatures) - ✅ Better for focused, well-defined tasks - ✅ Greater improvement potential through optimization

OpenAI SDK strengths: - ✅ Built-in multi-agent handoffs - ✅ Session management - ✅ Guardrails support

🏗️ Architecture

SuperSpec YAML Playbook
        ↓
    Compiler (AgentCompiler)
        ↓
Pydantic AI Pipeline Template (pydantic_ai_pipeline.py.jinja2)
        ↓
Generated Python Pipeline
        ├─ MyAssistantComponent (BaseComponent wrapper)
        │   ├─ _initialize_model() → infer_model() (Pydantic AI)
        │   ├─ _initialize_mcp_servers() → [MCPServerStdio, ...] (if enabled)
        │   ├─ _get_model_settings() → ModelSettings
        │   ├─ _initialize_agent() → Agent(
        │   │                         model,
        │   │                         instructions,  ← Optimized by GEPA!
        │   │                         model_settings,
        │   │                         toolsets=[...]  ← MCP servers
        │   │                       )
        │   └─ forward() → agent.run() (async, plain text output)
        └─ MyAssistantPipeline
            ├─ run()
            ├─ evaluate()
            ├─ optimize_with_gepa() ← Universal GEPA
            └─ run_bdd_test_suite()

Note: The template uses plain text output mode (no output_type parameter) for reliable responses with smaller models.

🔄 Model Configuration

Ollama (Local) - RECOMMENDED ⭐

spec:
  language_model:
    provider: ollama
    model: llama3.1:8b  # or llama3.1:70b for better quality
    api_base: http://localhost:11434
    max_tokens: 4000  # Adjust based on response length needs (default: 4000)
    top_p: 0.9  # Optional: Control output diversity

Setup:

# Install Ollama
brew install ollama  # macOS
# or download from https://ollama.com

# Pull model
ollama pull llama3.1:8b

# Set environment (optional, auto-configured by pipeline)
# Note: The pipeline automatically sets these if api_base is provided in playbook
export OLLAMA_BASE_URL=http://localhost:11434/v1
export OLLAMA_API_KEY=ollama  # Placeholder key (Ollama doesn't require real key)

The pipeline automatically: - Adds ollama: prefix if missing - Sets OLLAMA_BASE_URL with /v1 suffix - Uses Pydantic AI's infer_model() for automatic model creation

OpenAI (Cloud)

spec:
  language_model:
    provider: openai
    model: gpt-4o
    max_tokens: 4000  # Adjust as needed (default: 4000)
    top_p: 0.9  # Optional

Setup:

export OPENAI_API_KEY="sk-..."

Pydantic AI automatically detects OpenAI from the model string or provider field.

Anthropic (Cloud)

spec:
  language_model:
    provider: anthropic
    model: claude-3-5-sonnet
    max_tokens: 4000  # Adjust as needed (default: 4000)
    top_p: 0.9  # Optional (Anthropic may ignore this)

Setup:

export ANTHROPIC_API_KEY="sk-ant-..."

Other Providers

Pydantic AI supports 100+ providers via LiteLLM. Just specify the provider and model:

spec:
  language_model:
    provider: google  # or groq, together, bedrock, etc.
    model: gemini-pro

🐛 Troubleshooting

Model Not Found

Symptom: Unknown provider: llama3.1 or ModelHTTPError: 404

Solutions: 1. Ensure model string has provider prefix: ollama:llama3.1:8b 2. Check OLLAMA_BASE_URL includes /v1: http://localhost:11434/v1 3. Verify Ollama is running: curl http://localhost:11434/api/tags 4. Check model is downloaded: ollama list

The pipeline auto-detects Ollama models, but explicit prefix is safer.

Low Pass Rate in Evaluation

Symptom: Evaluation scenarios failing

Solutions: 1. Check BDD scenario keywords are realistic 2. Lower threshold in evaluate() method (default is 0.6) 3. Run GEPA optimization to improve instructions 4. Try different model (llama3.1:70b for more capability) 5. Adjust model settings in playbook:

spec:
  language_model:
    max_tokens: 8000  # Increase for longer responses
    top_p: 0.9

MCP Server Connection Issues

Symptom: Failed to initialize MCP server or tools not available

Solutions: 1. For stdio servers: - Verify command exists: which npx - Check args are correct - Ensure MCP server package is installed

For remote servers:
Verify URL is accessible: curl https://mcp-server.com/mcp
Check network connectivity
Verify server is running
General:
Check server logs for errors
Verify mcp package is installed: pip install mcp
Test server independently first

Import Error

Symptom: ModuleNotFoundError: No module named 'pydantic_ai'

Solution:

pip install superoptix[frameworks-pydantic-ai]
# or
pip install pydantic-ai==1.31.0

Optimization Takes Too Long

Symptom: Optimization never completes or takes too long

⚠️ Note: Optimization is inherently resource-intensive. If it's taking too long, consider if optimization is necessary for your use case. Many agents work well without optimization.

Quick Solutions:

Ultra Fast (~30s-1m, ~3 API calls): Minimal metric calls for quick verification:

# Use local Ollama to avoid API costs
super agent optimize developer --framework pydantic-ai --max-metric-calls 3 --reflection-lm ollama/llama3.1:8b

Super Light (~1-2 minutes, ~10 API calls): Limit total metric calls:

# Use local Ollama to avoid API costs
super agent optimize developer --framework pydantic-ai --max-metric-calls 10 --reflection-lm ollama/llama3.1:8b

Light Mode (Recommended - ~5-10 minutes, ~50-100 API calls):

# Use local Ollama for cost-free optimization
super agent optimize developer --framework pydantic-ai --auto light --reflection-lm ollama/llama3.1:8b

Cloud models are costly:

# ❌ NOT RECOMMENDED - Expensive!
super agent optimize developer --framework pydantic-ai --auto light --reflection-lm openai/gpt-4o
# This can cost $5-20+ per optimization run!

Reduce iterations in playbook:

optimization:
  optimizer:
    params:
      max_metric_calls: 20  # Limit total evaluations
      reflection_lm: ollama/llama3.1:8b

Use smaller reflection model: --reflection-lm ollama/llama3.1:8b
Reduce training dataset size (fewer BDD scenarios)

JSON Metadata Instead of Content

Symptom: Agent returns JSON like {"action": "do_something", "params": {...}} instead of actual content

Solution: This was fixed in SuperOptiX 0.2.1. The template now uses plain text output mode:

pip install --upgrade superoptix
super agent compile your_agent --framework pydantic-ai  # Recompile

MCP Server Not Initializing

Symptom: No "🛠️ Initialized MCP stdio server" message during run

Solutions: 1. Check playbook filename uses underscores: my_agent_playbook.yaml (not hyphens) 2. Verify mcp.enabled: true in playbook 3. Check MCP server command is correct:

mcp:
  enabled: true
  servers:
    - name: filesystem
      type: stdio
      config:
        command: npx
        args: ["-y", "@modelcontextprotocol/server-filesystem", "/private/tmp"]  # Use /private/tmp on macOS (or /tmp on Linux)

🔬 Under the Hood

Model Initialization

The pipeline uses Pydantic AI's infer_model() for automatic model creation:

from pydantic_ai.models import infer_model

# Auto-detects provider from model string
model = infer_model("ollama:llama3.1:8b")
# or
model = infer_model("openai:gpt-4o")

For Ollama, it automatically: - Adds ollama: prefix if missing - Sets OLLAMA_BASE_URL environment variable - Configures OpenAI-compatible API endpoint

Plain Text Output Mode

The template uses plain text output for reliable responses with all model sizes:

# Agent configured for plain text output
agent = Agent(
    model=model,
    instructions=instructions,
    # No output_type - uses plain text mode
    toolsets=[server] if server else None,  # ← MCP servers as toolsets
)

# Result is plain text mapped to output fields
result = await agent.run(input_text)
response_text = str(result.output)

# Mapped to first output field
return {"implementation": response_text}

Why Plain Text Mode? - ✅ Works reliably with 8b models - ✅ No JSON formatting issues - ✅ Natural, readable responses - ✅ Better for code generation and documentation tasks

MCP Server Integration

MCP servers are registered as toolsets on the Agent:

from pydantic_ai.mcp import MCPServerStdio

server = MCPServerStdio(
    command="npx",
    args=["-y", "@modelcontextprotocol/server-filesystem", "/private/tmp"],  # Use /private/tmp on macOS (or /tmp on Linux)
)

agent = Agent(
    model=model,
    instructions=instructions,
    toolsets=[server],  # ← MCP tools automatically available!
)

Agent Execution Flow

User input received
Component's forward() called (async method)
Agent initialized (lazy, cached)
await agent.run(input) executed (async execution)
Agent processes with model + tools
Returns result.output (validated BaseModel or str)
Mapped to output fields dict

🎯 The SuperOptiX Multi-Framework Advantage

One Playbook, Multiple Frameworks

SuperOptiX allows you to write your agent specification once and compile to any supported framework:

# Same playbook, different frameworks
super agent compile my_agent --framework pydantic-ai
super agent compile my_agent --framework dspy
super agent compile my_agent --framework openai
super agent compile my_agent --framework deepagents

# GEPA optimization works across all frameworks
super agent optimize my_agent --framework pydantic-ai --max-metric-calls 20  # Super light test
# or
super agent optimize my_agent --framework pydantic-ai --auto light  # Recommended

When to Use Pydantic AI

Choose Pydantic AI when: - ✅ You need type-safe structured outputs - ✅ You want native MCP tool integration - ✅ You prefer modern async/await APIs - ✅ You're building production applications - ✅ You need validated, reliable responses

Choose DSPy when: - ✅ You need maximum optimization flexibility - ✅ You want to optimize multiple components - ✅ You have well-defined, focused tasks - ✅ You want proven optimization improvements

Choose OpenAI SDK when: - ✅ You need multi-agent handoffs - ✅ You want built-in session management - ✅ You need guardrails support - ✅ You prefer simple, straightforward API

📚 Additional Resources

🎉 Next Steps

Try the demo: super agent pull developer && super agent compile developer --framework pydantic-ai
Add MCP tools: Configure MCP servers in your playbook
Optimize (Optional): Run GEPA optimization to improve performance
⚠️ Only run if: You have high-end GPU AND understand the costs
⚠️ Resource Warning: Makes many LLM API calls (20-600+ depending on settings)
⚠️ Cost Warning: Use local ollama/llama3.1:8b to avoid API charges ($0 vs $5-100+)
Ultra fast: --max-metric-calls 3 (~30s-1m, ~3 API calls)
Quick test: --max-metric-calls 10 (~1-2 minutes, ~10 API calls)
Recommended: --auto light (~5-10 minutes, ~50-100 API calls)
Skip optimization if your agent already works well - it's optional!
Deploy: Use the generated pipeline in your application

📊 Optimization Time Guide

Option	Command	Time	API Calls	Use Case
Ultra Fast	`--max-metric-calls 3`	~30s-1m	~3 calls	Verify optimization works
Super Light	`--max-metric-calls 10`	~1-2 min	~10 calls	Quick test
Light	`--max-metric-calls 20`	~2-3 min	~20 calls	Quick test, verify it works
Light	`--auto light`	~5-10 min	~50-100 calls	⭐ Recommended - Balanced speed/quality
Medium	`--auto medium`	~15-30 min	~150-300 calls	Better results, more iterations
Heavy	`--auto heavy`	~30-60 min	~300-600 calls	Maximum quality, production ready

⚠️ Cost Estimates (with cloud models like GPT-4o): - Super Light: ~$0.10-2.00 - Light: ~$0.50-10.00 - Medium: ~$5-50.00 - Heavy: ~$20-100+

💡 Save money: Use --reflection-lm ollama/llama3.1:8b for free local optimization!

📊 Observability with LogFire

SuperOptiX includes native LogFire integration for Pydantic AI agents, providing comprehensive observability for your agents. See the LogFire Integration Guide for:

✅ Tracing agent executions
✅ Monitoring LLM calls and tool usage
✅ Tracking token usage and costs
✅ Viewing traces in LogFire dashboard or local backends (Jaeger)

Happy building! 🚀