๐ญ RSpec-Style BDD in SuperOptiX
๐ฏ What is BDD?
Behavior-Driven Development (BDD) is a software development methodology that bridges the gap between technical and non-technical stakeholders by describing software behavior in natural language. BDD focuses on behavior rather than implementation details.
RSpec is the most popular BDD testing framework for Ruby and Ruby on Rails, created to make tests more readable and expressive. SuperOptiX follows RSpec's philosophy of clear, behavior-focused specifications for AI agents.
Core BDD Principles
graph LR
A[Business Requirements] --> B[BDD Scenarios]
B --> C[Executable Specifications]
C --> D[Test-Driven Development]
D --> E[Quality Assurance]
E --> F[Continuous Delivery]
style A fill:#1e3a8a,stroke:#3b82f6,stroke-width:2px,color:#ffffff
style B fill:#7c3aed,stroke:#a855f7,stroke-width:2px,color:#ffffff
style C fill:#059669,stroke:#10b981,stroke-width:2px,color:#ffffff
style D fill:#d97706,stroke:#f59e0b,stroke-width:2px,color:#ffffff
style E fill:#dc2626,stroke:#ef4444,stroke-width:2px,color:#ffffff
style F fill:#059669,stroke:#10b981,stroke-width:2px,color:#ffffff
Original BDD Structure (Gherkin)
Feature: User Authentication
As a user
I want to log into the system
So that I can access my account
Scenario: Successful login with valid credentials
Given I am on the login page
When I enter valid username and password
And I click the login button
Then I should be redirected to the dashboard
And I should see my profile information
RSpec-Style BDD (Ruby/Rails)
RSpec brings BDD to Ruby with a cleaner, more expressive syntax:
# spec/models/user_spec.rb
describe User do
describe '#authenticate' do
it 'logs in with valid credentials' do
user = User.create(email: 'test@example.com', password: 'secret')
expect(user.authenticate('secret')).to be true
end
it 'rejects invalid passwords' do
user = User.create(email: 'test@example.com', password: 'secret')
expect(user.authenticate('wrong')).to be false
end
end
end
SuperOptiX adapts this RSpec philosophy for AI agents!
๐๏ธ BDD in Software Development
Why BDD Works
BDD transforms software development by:
- โ Shared Understanding: Business and technical teams speak the same language
- ๐ฏ Focus on Behavior: Describes what the system should do, not how
- ๐ Living Documentation: Scenarios serve as executable specifications
- ๐งช Test-Driven: Every behavior is testable and validated
- ๐ Quality Gates: Clear pass/fail criteria for deployment
BDD Workflow in Original Development
graph TB
A[Business Requirements] --> B[Write BDD Scenarios]
B --> C[Implement Features]
C --> D[Run BDD Tests]
D --> E{All Tests Pass?}
E -->|Yes| F[Deploy to Production]
E -->|No| G[Fix Implementation]
G --> C
style A fill:#1e3a8a,stroke:#3b82f6,stroke-width:2px,color:#ffffff
style B fill:#7c3aed,stroke:#a855f7,stroke-width:2px,color:#ffffff
style C fill:#059669,stroke:#10b981,stroke-width:2px,color:#ffffff
style D fill:#d97706,stroke:#f59e0b,stroke-width:2px,color:#ffffff
style E fill:#dc2626,stroke:#ef4444,stroke-width:2px,color:#ffffff
style F fill:#059669,stroke:#10b981,stroke-width:2px,color:#ffffff
style G fill:#dc2626,stroke:#ef4444,stroke-width:2px,color:#ffffff
๐ค BDD for AI Agent Development
The Perfect Match: BDD + AI Agents
BDD is perfectly suited for AI agent development because:
๐ฏ 1. Behavior-First Approach
- AI agents are defined by their behavioral capabilities
- BDD scenarios describe expected agent responses
- Focus on what the agent should do, not internal implementation
๐ 2. Iterative Improvement
- BDD scenarios become training data for optimization
- Test โ Optimize โ Test cycle drives continuous improvement
- Quality gates ensure reliable agent behavior
๐งช 3. Testable Specifications
- Every agent capability can be specified and tested
- Pass/fail criteria for each behavioral expectation
- Regression testing prevents quality degradation
BDD in SuperOptiX: SuperSpec Feature Specifications
SuperOptiX implements BDD through SuperSpec, our domain-specific language for agent specifications. BDD scenarios are defined as feature_specifications within the SuperSpec playbook structure:
# SuperSpec Feature Specifications (BDD Scenarios)
feature_specifications:
scenarios:
- name: "robust_api_endpoint_creation"
description: "Given a REST API requirement, the agent should generate secure, validated, well-documented endpoints"
input:
feature_requirement: "Create a user authentication endpoint with email validation, password hashing, rate limiting, and comprehensive error handling"
expected_output:
implementation: |
from fastapi import APIRouter, HTTPException, Depends
from pydantic import BaseModel, EmailStr
from passlib.context import CryptContext
from slowapi import Limiter, _rate_limit_exceeded_handler
pwd_context = CryptContext(schemes=["bcrypt"], deprecated="auto")
limiter = Limiter(key_func=lambda: "global")
class AuthRequest(BaseModel):
email: EmailStr
password: str
@router.post("/auth/login")
@limiter.limit("5/minute")
async def authenticate_user(request: AuthRequest):
# Validate email format (handled by EmailStr)
if not request.password or len(request.password) < 8:
raise HTTPException(status_code=400, detail="Invalid password format")
# Hash password for comparison
hashed_password = pwd_context.hash(request.password)
# Database lookup would go here
return {"status": "success", "token": "jwt_token_here"}
๐ BDD + DSPy: The Evaluation-First Revolution
Why BDD is Perfect for DSPy's Evaluation-First Approach
DSPy's evaluation-first methodology aligns perfectly with BDD principles:
๐ฏ 1. Specification-Driven Development
graph LR
A[BDD Scenarios] --> B[DSPy Gold Examples]
B --> C[Optimization Training]
C --> D[Improved Prompts]
D --> E[Better Agent Behavior]
E --> F[Re-evaluation]
F --> G{Quality Gates Pass?}
G -->|Yes| H[Production Ready]
G -->|No| I[Further Optimization]
I --> C
style A fill:#1e3a8a,stroke:#3b82f6,stroke-width:2px,color:#ffffff
style B fill:#7c3aed,stroke:#a855f7,stroke-width:2px,color:#ffffff
style C fill:#059669,stroke:#10b981,stroke-width:2px,color:#ffffff
style D fill:#d97706,stroke:#f59e0b,stroke-width:2px,color:#ffffff
style E fill:#dc2626,stroke:#ef4444,stroke-width:2px,color:#ffffff
style F fill:#059669,stroke:#10b981,stroke-width:2px,color:#ffffff
style G fill:#dc2626,stroke:#ef4444,stroke-width:2px,color:#ffffff
style H fill:#059669,stroke:#10b981,stroke-width:2px,color:#ffffff
style I fill:#d97706,stroke:#f59e0b,stroke-width:2px,color:#ffffff
๐ 2. Dual-Purpose Scenarios
Your BDD scenarios serve two critical functions:
- ๐ Training Data: Converted to DSPy gold examples for optimization
- ๐งช Test Cases: Used for evaluation and quality assurance
โก 3. Continuous Feedback Loop
# The SuperOptiX BDD/DSPy Workflow
super agent compile developer # Compile with BDD scenarios
super agent evaluate developer # Establish baseline (BDD tests)
super agent optimize developer # DSPy optimization using BDD scenarios
super agent evaluate developer # Re-evaluate (measure improvement)
super agent run developer # Production execution
๐ญ Professional BDD Spec Runner
SuperOptiX features a revolutionary BDD specification framework with professional-grade tooling that rivals pytest, cucumber, and other industry-standard testing tools.
๐ Quick Start
# Standard specification execution
super agent evaluate developer
# Detailed analysis with verbose output
super agent evaluate developer --verbose
# Auto-tuning for improved results
super agent evaluate developer --auto-tune
Professional Output Formats
# Table format (default) - beautiful console output
super agent evaluate developer --format table
# JSON format - for CI/CD integration
super agent evaluate developer --format json
# Save detailed report to file
super agent evaluate developer --save-report test_results.json
๐ Multi-Criteria Evaluation System
Evaluation Metrics
Each BDD specification is evaluated using four weighted criteria:
| Criterion | Weight | Description |
|---|---|---|
| Semantic Similarity | 50% | How closely the output matches expected meaning |
| Keyword Presence | 20% | Important terms and concepts inclusion |
| Structure Match | 20% | Format, length, and organization similarity |
| Output Length | 10% | Basic sanity check for response completeness |
Quality Gates
- ๐ โฅ 80%: EXCELLENT - Production ready
- โ ๏ธ 60-79%: GOOD - Minor improvements needed
- โ < 60%: NEEDS WORK - Significant improvements required
Scoring System
Confidence Score = (
semantic_similarity ร 0.5 +
keyword_presence ร 0.2 +
structure_match ร 0.2 +
output_length ร 0.1
)
๐ฏ Professional Spec Runner Features
1. Session Information Panel
The spec runner starts with a professional session overview:
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโ ๐ Spec Execution Session โโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ ๐ฏ Agent: developer โ
โ ๐
Session: 2025-01-07 14:30:15 โ
โ ๐ง Mode: Standard validation โ
โ ๐ Verbosity: Summary โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
2. Real-Time Progress Tracking
Watch your specifications execute in real-time with spinners and status updates:
โ
Pipeline loaded
๐ Discovering BDD Specifications...
๐ Found 5 BDD specifications
๐งช Executing BDD Specification Suite
โก Executing: developer_comprehensive_task...
โก Executing: developer_problem_solving...
3. Beautiful Specification Results Table
Professional tabular output showing all specification results at a glance:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโณโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Specification โ Status โ Score โ Description โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ developer_comprehensive_task โ โ
PASS โ 0.87 โ Complex software requirements handl... โ
โ developer_problem_solving โ โ FAIL โ 0.45 โ Problem-solving approach demonstra... โ
โ developer_best_practices โ โ
PASS โ 0.78 โ Industry standards and guidelines... โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
4. Comprehensive Summary Dashboard
Color-coded quality gates with detailed metrics:
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ ๐ก Specification Results Summary โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ โ
โ ๐ Total Specs: 5 ๐ฏ Pass Rate: 60.0% โ
โ โ
Passed: 3 ๐ค Model: llama3.1:8b โ
โ โ Failed: 2 ๐ช Capability: 0.68 โ
โ ๐ Quality Gate: โ ๏ธ GOOD ๐ Status: ๐ Optimized โ
โ โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
5. Intelligent Failure Analysis
Detailed breakdown of failing specifications with specific fix suggestions:
๐ Failure Analysis
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Failed Specification โ Issue โ Fix Suggestion โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ developer_problem_solving โ semantic meaning differs โ Improve response relevance โ
โ api_error_handling โ missing key terms or concepts โ Include technical terms โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ Verbose Mode - Deep Analysis
Use --verbose flag for detailed test analysis:
super agent evaluate developer --verbose
Detailed Test Results
Each failing specification gets a comprehensive analysis panel:
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Spec #2: โ FAILED โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ โ
โ Specification: developer_problem_solving โ
โ Description: When facing software challenges, the agent should demonstrate systematic problem-solving โ
โ Confidence Score: 0.452 โ
โ Semantic Similarity: 0.234 โ
โ Failure Reason: semantic meaning differs significantly โ
โ โ
โ ๐ก Fix Guidance: โ
โ โข Review and improve the response quality โ
โ โข Ensure the output addresses all aspects of the input โ
โ โข Make the response more relevant to the expected output โ
โ โข Use similar terminology and concepts โ
โ โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
๐ ๏ธ Advanced Techniques
Auto-Tuning
Automatically adjust evaluation criteria based on model performance:
super agent evaluate developer --auto-tune
Ignore Checks (Development Mode)
Skip validation for rapid development:
super agent evaluate developer --ignore-checks
CI/CD Integration
# GitHub Actions example
- name: Evaluate Agent
run: |
super agent evaluate developer --format json --save-report results.json
- name: Check Quality Gate
run: |
python -c "
import json
results = json.load(open('results.json'))
if results['quality_gate'] != 'EXCELLENT':
exit(1)
"
๐ญ BDD in SuperOptiX: Real-World Example
Step 1: Define BDD Scenarios in SuperSpec Playbook
# agents/developer/playbook/developer_playbook.yaml (SuperSpec DSL)
feature_specifications:
scenarios:
- name: "developer_comprehensive_task"
description: "Given a complex software requirement, the agent should provide detailed analysis and recommendations"
input:
feature_requirement: "Design a microservices architecture for an e-commerce platform with user authentication, product catalog, order management, and payment processing"
expected_output:
implementation: |
**Microservices Architecture Design**
**1. Service Decomposition:**
- User Service: Authentication, profiles, preferences
- Product Service: Catalog, inventory, search
- Order Service: Order lifecycle, status tracking
- Payment Service: Payment processing, refunds
- Notification Service: Email, SMS, push notifications
**2. Technology Stack:**
- API Gateway: Kong or AWS API Gateway
- Service Mesh: Istio for inter-service communication
- Database: PostgreSQL for each service (database per service pattern)
- Message Queue: RabbitMQ or Apache Kafka
- Monitoring: Prometheus + Grafana
**3. Security Considerations:**
- JWT tokens for authentication
- API rate limiting
- Data encryption in transit and at rest
- Service-to-service authentication
Step 2: Compile SuperSpec and Evaluate
# Compile SuperSpec playbook with BDD scenarios
super agent compile developer
# Run BDD evaluation (establishes baseline)
super agent evaluate developer
Output:
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ ๐ Spec Execution Session โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ ๐ฏ Agent: developer โ
โ ๐
Session: 2025-01-07 14:30:15 โ
โ ๐ง Mode: Standard validation โ
โ ๐ Verbosity: Summary โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
๐งช Executing BDD Specification Suite
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Progress: ๐งช Running 5 BDD specifications...
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 0/5
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโณโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Specification โ Status โ Score โ Description โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ developer_comprehensive_task โ โ
PASS โ 0.87 โ Complex software requirements handl... โ
โ developer_problem_solving โ โ FAIL โ 0.45 โ Problem-solving approach demonstra... โ
โ developer_best_practices โ โ
PASS โ 0.78 โ Industry standards and guidelines... โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ ๐ก Specification Results Summary โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ โ
โ ๐ Total Specs: 5 ๐ฏ Pass Rate: 60.0% โ
โ โ
Passed: 3 ๐ค Model: llama3.1:8b โ
โ โ Failed: 2 ๐ช Capability: 0.68 โ
โ ๐ Quality Gate: โ ๏ธ GOOD ๐ Status: ๐ Optimized โ
โ โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Step 3: Optimize Using SuperSpec BDD Scenarios
# DSPy optimization using SuperSpec BDD scenarios as training data
super agent optimize developer
What happens during optimization: 1. SuperSpec BDD scenarios are converted to DSPy gold examples 2. DSPy BootstrapFewShot uses scenarios to improve prompts 3. Optimized pipeline is saved for future use
Step 4: Re-evaluate SuperSpec and Measure Improvement
# Re-run BDD tests to measure improvement
super agent evaluate developer
Expected improvement:
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ ๐ข Specification Results Summary โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ โ
โ ๐ Total Specs: 5 ๐ฏ Pass Rate: 80.0% โ
โ โ
Passed: 4 ๐ค Model: llama3.1:8b โ
โ โ Failed: 1 ๐ช Capability: 0.82 โ
โ ๐ Quality Gate: ๐ EXCELLENT ๐ Status: ๐ Optimized โ
โ โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
๐ BDD Evaluation Metrics in SuperOptiX
Multi-Criteria Evaluation System
SuperOptiX uses 4 weighted criteria for SuperSpec BDD evaluation:
| Criterion | Weight | Description |
|---|---|---|
| Semantic Similarity | 50% | How closely the output matches expected meaning |
| Keyword Presence | 20% | Important terms and concepts inclusion |
| Structure Match | 20% | Format, length, and organization similarity |
| Output Length | 10% | Basic sanity check for completeness |
Quality Gates
- ๐ โฅ 80%: EXCELLENT - Production ready
- โ ๏ธ 60-79%: GOOD - Minor improvements needed
- โ < 60%: NEEDS WORK - Significant improvements required
Detailed Scoring
{
"scenario_name": "robust_error_handling",
"description": "When implementing functionality that can fail...",
"passed": true,
"confidence_score": 0.82,
"semantic_similarity": 0.85,
"criteria_breakdown": {
"semantic_similarity": 0.85,
"output_length": 1.0,
"keyword_presence": 0.75,
"structure_match": 0.80
},
"failure_reason": null,
"expected": {...},
"actual": {...},
"threshold_used": 0.6
}
๐ฏ BDD Best Practices for AI Agents
โ DO's
1. Write Specific, Testable Scenarios
# Good: Specific and testable
- name: "secure_password_validation"
description: "When validating user passwords, the agent should enforce security requirements"
input:
feature_requirement: "Implement password validation with minimum 8 characters, uppercase, lowercase, number, and special character"
expected_output:
implementation: |
def validate_password(password):
if len(password) < 8:
return False, "Password must be at least 8 characters"
if not re.search(r'[A-Z]', password):
return False, "Password must contain uppercase letter"
# ... additional validation
return True, "Password is valid"
2. Cover Multiple Behavioral Aspects
# Comprehensive scenario coverage
- name: "happy_path_scenario" # Normal operation
- name: "error_handling_scenario" # Error conditions
- name: "edge_case_scenario" # Boundary conditions
- name: "security_scenario" # Security requirements
- name: "performance_scenario" # Performance expectations
3. Use Realistic, Representative Data
# Realistic input data
input:
feature_requirement: "Create a REST API for user registration with email validation, password hashing, and rate limiting"
โ DON'Ts
1. Don't Write Vague Scenarios
# Bad: Too vague
- name: "create_function"
description: "Make a function"
input:
feature_requirement: "Function that does something"
expected_output:
implementation: "def func(): pass"
2. Don't Ignore Error Cases
# Missing error handling scenarios
# Always include scenarios for:
# - Invalid input handling
# - Error response formats
# - Edge case behavior
3. Don't Over-Complicate Scenarios
# Keep scenarios focused on single responsibilities
# One scenario = one specific behavior
# Multiple scenarios = comprehensive coverage
๐ BDD Development Workflow
The Complete SuperSpec BDD/TDD Cycle
graph TB
A[Define SuperSpec BDD Scenarios] --> B[Compile SuperSpec]
B --> C[Run Baseline Evaluation]
C --> D[Analyze Results]
D --> E{Quality Gates Pass?}
E -->|Yes| F[Deploy to Production]
E -->|No| G[Optimize Agent]
G --> H[Re-evaluate]
H --> D
style A fill:#1e3a8a,stroke:#3b82f6,stroke-width:2px,color:#ffffff
style B fill:#7c3aed,stroke:#a855f7,stroke-width:2px,color:#ffffff
style C fill:#059669,stroke:#10b981,stroke-width:2px,color:#ffffff
style D fill:#d97706,stroke:#f59e0b,stroke-width:2px,color:#ffffff
style E fill:#dc2626,stroke:#ef4444,stroke-width:2px,color:#ffffff
style F fill:#059669,stroke:#10b981,stroke-width:2px,color:#ffffff
style G fill:#dc2626,stroke:#ef4444,stroke-width:2px,color:#ffffff
style H fill:#059669,stroke:#10b981,stroke-width:2px,color:#ffffff
Command Sequence
# 1. Define SuperSpec BDD scenarios in playbook
vim agents/developer/playbook/developer_playbook.yaml
# 2. Compile SuperSpec with BDD scenarios
super agent compile developer
# 3. Establish baseline performance
super agent evaluate developer
# 4. Optimize using SuperSpec scenarios as training data
super agent optimize developer
# 5. Measure improvement
super agent evaluate developer
# 6. Deploy if quality gates pass
super agent run developer --goal "Your production task"
๐ Advanced BDD Features
Verbose Mode for Deep Analysis
# Detailed analysis of each SuperSpec scenario
super agent evaluate developer --verbose
Output includes: - Detailed failure analysis for SuperSpec scenarios - Specific fix recommendations - Confidence score breakdown - Expected vs actual output comparison
Custom Validation Criteria
# Enhanced scenarios with validation hints
- name: "security_focused_implementation"
description: "Agent should generate secure code with proper input validation"
input:
feature_requirement: "Create a password reset endpoint with security best practices"
expected_output:
implementation: |
# Expected secure implementation here
validation_criteria: # Optional hints
- "Uses secure random token generation"
- "Includes rate limiting"
- "Validates email format"
- "Handles edge cases gracefully"
Scenario Categories
feature_specifications:
scenarios:
# Basic functionality
- name: "happy_path_scenario"
category: "functionality"
# ...
# Error handling
- name: "error_handling_scenario"
category: "error_handling"
# ...
# Performance
- name: "efficiency_scenario"
category: "performance"
# ...
# Security
- name: "security_scenario"
category: "security"
# ...
๐ฏ BDD vs Traditional Testing
Traditional Unit Testing
def test_password_validation():
assert validate_password("weak") == False
assert validate_password("Strong123!") == True
SuperSpec BDD in SuperOptiX
- name: "password_validation_behavior"
description: "When validating passwords, the agent should enforce security requirements"
input:
feature_requirement: "Implement password validation with security requirements"
expected_output:
implementation: |
def validate_password(password):
# Comprehensive validation logic
# Security-focused implementation
# Clear error messages
Key Differences
| Aspect | Traditional Testing | BDD in SuperOptiX |
|---|---|---|
| Focus | Implementation details | Behavioral expectations |
| Language | Technical code | Natural language + examples |
| Stakeholders | Developers only | Business + Technical |
| Training Data | No | Yes (SuperSpec โ DSPy optimization) |
| Quality Gates | Pass/Fail | Multi-criteria scoring |
๐ Conclusion
SuperSpec BDD in SuperOptiX represents a revolutionary approach to AI agent development that combines:
- ๐ฏ Behavior-driven specifications that focus on what agents should do
- ๐ SuperSpec + DSPy integration that uses scenarios for both training and testing
- ๐งช Evaluation-first development that ensures quality before deployment
- ๐ Multi-criteria quality gates that provide comprehensive validation
- ๐ Continuous improvement through iterative optimization cycles
The SuperOptiX BDD Advantage
- ๐ญ Professional Spec Runner: Beautiful UI with detailed analysis
- ๐ค AI-Powered Optimization: BDD scenarios become DSPy training data
- ๐ Quality Assurance: Multi-criteria evaluation with clear metrics
- ๐ Iterative Development: Continuous improvement through feedback loops
- ๐ Production Readiness: Quality gates ensure reliable deployment
Start using SuperSpec BDD in SuperOptiX today and experience the difference of scientifically validated, behavior-driven AI agents!
๐ฏ SuperOptiX Workflow Integration
The Complete Workflow
graph TD
A[Define Agent Playbook] --> B[Compile Agent]
B --> C[Evaluate Agent]
C --> D{Pass Quality Gate?}
D -->|Yes| E[Run Agent]
D -->|No| F[Optimize Agent]
F --> B
E --> G[Add to Orchestra]
G --> H[Run Orchestra]
style A fill:#1e3a8a,stroke:#3b82f6,stroke-width:2px,color:#ffffff
style B fill:#7c3aed,stroke:#a855f7,stroke-width:2px,color:#ffffff
style C fill:#059669,stroke:#10b981,stroke-width:2px,color:#ffffff
style D fill:#dc2626,stroke:#ef4444,stroke-width:2px,color:#ffffff
style E fill:#059669,stroke:#10b981,stroke-width:2px,color:#ffffff
style F fill:#d97706,stroke:#f59e0b,stroke-width:2px,color:#ffffff
style G fill:#7c3aed,stroke:#a855f7,stroke-width:2px,color:#ffffff
style H fill:#059669,stroke:#10b981,stroke-width:2px,color:#ffffff
1. Define Agent Playbook
Write declarative specifications using SuperSpec DSL:
apiVersion: agent/v1
kind: Agent
metadata:
name: customer-service
tier: genie
spec:
context:
memory: true
tools: true
tasks:
- name: "handle_inquiry"
description: "Handle customer inquiries"
2. Compile Agent
Translate playbooks into executable pipelines:
super agent compile customer-service
3. Evaluate Agent
Run BDD specifications against the compiled agent:
super agent evaluate customer-service
4. Optimize Agent
If evaluation fails, optimize based on feedback:
super agent optimize customer-service
5. Run Agent
Once evaluation passes, run the agent:
super agent run customer-service --input "Help me with my order"
๐ก Pro Tip: Start with 3-5 well-crafted SuperSpec BDD scenarios for your agents. Quality over quantity leads to better optimization and more reliable evaluation results. Remember: your SuperSpec BDD scenarios serve dual purposes - they're both your test cases AND your training data!