π CI/CD Integration Guide
SuperOptiX provides comprehensive CI/CD integration capabilities that enable automated testing, optimization, and deployment of AI agents. This guide covers how to integrate SuperOptiX into your CI/CD pipeline for professional-grade agent development.
π― Overview
CI/CD integration in SuperOptiX follows BDD/TDD best practices to ensure reliable, production-ready AI agents. The workflow emphasizes:
- Baseline Measurement: Establish performance baselines before optimization
- Quality Gates: Automated pass/fail thresholds for deployment
- Iterative Improvement: Measure optimization effectiveness quantitatively
- Production Validation: Final validation before deployment
π CLI Command Options for CI/CD
The super agent evaluate
command provides comprehensive testing capabilities with multiple output formats for different CI/CD use cases:
Basic Usage
# Standard evaluation
super agent evaluate <agent_name>
# Verbose output with detailed results
super agent evaluate <agent_name> --verbose
# Auto-tuning for improved evaluation accuracy
super agent evaluate <agent_name> --auto-tune
Output Formats for Automation
# Table format (default) - Beautiful console output
super agent evaluate <agent_name> --format table
# JSON format - For CI/CD integration and automation
super agent evaluate <agent_name> --format json
# JUnit format - Compatible with CI/CD systems
super agent evaluate <agent_name> --format junit
Report Generation
# Save detailed report to file
super agent evaluate <agent_name> --save-report test_results.json
# Combine with JSON format for automation
super agent evaluate <agent_name> --format json --save-report results.json
Development Options
# Ignore non-essential checks for rapid development
super agent evaluate <agent_name> --ignore-checks
# Verbose mode for detailed analysis
super agent evaluate <agent_name> --verbose
π Proper BDD/TDD Workflow
The correct CI/CD workflow follows BDD/TDD best practices:
Phase 1: Specification-Driven Development
# 1. Define BDD scenarios FIRST
vim agents/<agent_name>/Playbook/developer_playbook.yaml
# Add comprehensive feature_specifications
# 2. Compile agent with scenarios
super agent compile developer
# 3. Run baseline evaluation (should show current performance)
super agent evaluate developer
# This gives us baseline metrics before optimization
Phase 2: Iterative Improvement
# 4. Analyze baseline results
# - Identify failing scenarios
# - Understand performance gaps
# - Plan optimization strategy
# 5. Optimize based on evaluation feedback
super agent optimize developer
# 6. Re-evaluate to measure improvement
super agent evaluate developer
# 7. Iterate until quality gates pass
# Repeat steps 5-6 until pass rate β₯ 80%
Phase 3: Production Deployment
# 8. Final validation
super agent evaluate developer --verbose
# 9. Deploy only if quality gates pass
super agent run developer --goal "production task"
ποΈ CI/CD Platform Integration
π‘ Note: The following CI/CD configurations are example setups designed to demonstrate integration patterns. Users should adapt these examples to their specific use cases, environment requirements, and organizational needs. Modify agent names, quality gate thresholds, and deployment strategies according to your project requirements.
GitHub Actions Integration
# .github/workflows/agent-quality.yml
name: Agent Quality Check
on: [push, pull_request]
jobs:
bdd-workflow:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Setup Python
uses: actions/setup-python@v3
with:
python-version: '3.11'
- name: Setup SuperOptiX
run: pip install superoptix
- name: Compile Agent
run: super agent compile developer
- name: Baseline Evaluation
run: |
super agent evaluate developer --format json --save-report baseline.json
# Check if baseline is acceptable
pass_rate=$(jq -r '.summary.pass_rate' baseline.json | tr -d '%')
if (( $(echo "$pass_rate < 40" | bc -l) )); then
echo "Baseline too low: $pass_rate%"
exit 1
fi
- name: Optimize Agent
run: super agent optimize developer
- name: Final Evaluation
run: |
super agent evaluate developer --format json --save-report results.json
# Quality gate: must improve by at least 20%
improvement=$(jq -r '.improvement.percentage' results.json | tr -d '%')
if (( $(echo "$improvement < 20" | bc -l) )); then
echo "Insufficient improvement: $improvement%"
exit 1
fi
- name: Upload Test Results
uses: actions/upload-artifact@v3
with:
name: bdd-spec-results
path: results.json
GitLab CI Integration
# .gitlab-ci.yml
stages:
- test
- optimize
- validate
agent-test:
stage: test
image: python:3.11
script:
- pip install superoptix
- super agent compile developer
- super agent evaluate developer --format json --save-report baseline.json
- |
# Extract pass rate for quality gate
PASS_RATE=$(python -c "
import json
with open('baseline.json') as f:
data = json.load(f)
print(data['summary']['pass_rate'].replace('%', ''))
")
echo "Pass rate: $PASS_RATE%"
if [ "$PASS_RATE" -lt 40 ]; then
echo "Baseline too low: $PASS_RATE%"
exit 1
fi
artifacts:
reports:
junit: test-results.xml
paths:
- baseline.json
agent-optimize:
stage: optimize
image: python:3.11
script:
- pip install superoptix
- super agent optimize developer
dependencies:
- agent-test
agent-validate:
stage: validate
image: python:3.11
script:
- pip install superoptix
- super agent evaluate developer --format json --save-report final.json
- |
# Quality gate check
IMPROVEMENT=$(python -c "
import json
with open('final.json') as f:
data = json.load(f)
print(data.get('improvement', {}).get('percentage', '0').replace('%', ''))
")
echo "Improvement: $IMPROVEMENT%"
if [ "$IMPROVEMENT" -lt 20 ]; then
echo "Insufficient improvement: $IMPROVEMENT%"
exit 1
fi
dependencies:
- agent-optimize
Jenkins Pipeline Integration
// Jenkinsfile
pipeline {
agent any
stages {
stage('Setup') {
steps {
sh 'pip install superoptix'
}
}
stage('Compile') {
steps {
sh 'super agent compile developer'
}
}
stage('Baseline Test') {
steps {
sh 'super agent evaluate developer --format json --save-report baseline.json'
script {
def baseline = readJSON file: 'baseline.json'
def passRate = baseline.summary.pass_rate.replace('%', '').toInteger()
if (passRate < 40) {
error "Baseline too low: ${passRate}%"
}
}
}
}
stage('Optimize') {
steps {
sh 'super agent optimize developer'
}
}
stage('Final Test') {
steps {
sh 'super agent evaluate developer --format json --save-report final.json'
script {
def final = readJSON file: 'final.json'
def improvement = final.improvement?.percentage?.replace('%', '')?.toInteger() ?: 0
if (improvement < 20) {
error "Insufficient improvement: ${improvement}%"
}
}
}
}
}
post {
always {
archiveArtifacts artifacts: '*.json', fingerprint: true
publishTestResults testResultsPattern: 'test-results.xml'
}
}
}
CircleCI Integration
# .circleci/config.yml
version: 2.1
jobs:
test-agent:
docker:
- image: cimg/python:3.11
steps:
- checkout
- run:
name: Install SuperOptiX
command: pip install superoptix
- run:
name: Compile Agent
command: super agent compile developer
- run:
name: Baseline Evaluation
command: |
super agent evaluate developer --format json --save-report baseline.json
# Check baseline quality gate
PASS_RATE=$(python -c "
import json
with open('baseline.json') as f:
data = json.load(f)
print(data['summary']['pass_rate'].replace('%', ''))
")
if [ "$PASS_RATE" -lt 40 ]; then
echo "Baseline too low: $PASS_RATE%"
exit 1
fi
- run:
name: Optimize Agent
command: super agent optimize developer
- run:
name: Final Evaluation
command: |
super agent evaluate developer --format json --save-report final.json
# Check improvement quality gate
IMPROVEMENT=$(python -c "
import json
with open('final.json') as f:
data = json.load(f)
print(data.get('improvement', {}).get('percentage', '0').replace('%', ''))
")
if [ "$IMPROVEMENT" -lt 20 ]; then
echo "Insufficient improvement: $IMPROVEMENT%"
exit 1
fi
- store_artifacts:
path: *.json
destination: test-results
workflows:
version: 2
test:
jobs:
- test-agent
π― Quality Gates Implementation
Automated Quality Gates
For custom CI/CD scripts, you can implement quality gates:
#!/bin/bash
# quality-gate.sh
# Set minimum thresholds
MIN_PASS_RATE=80
MIN_IMPROVEMENT=20
# Run evaluation and extract metrics
super agent evaluate developer --format json --save-report results.json
# Extract pass rate
PASS_RATE=$(jq -r '.summary.pass_rate' results.json | tr -d '%')
# Extract improvement percentage
IMPROVEMENT=$(jq -r '.improvement.percentage' results.json | tr -d '%')
echo "Pass Rate: $PASS_RATE%"
echo "Improvement: $IMPROVEMENT%"
# Quality gate checks
if [ "$PASS_RATE" -lt "$MIN_PASS_RATE" ]; then
echo "β Quality gate failed: $PASS_RATE% < $MIN_PASS_RATE%"
exit 1
fi
if [ "$IMPROVEMENT" -lt "$MIN_IMPROVEMENT" ]; then
echo "β Improvement gate failed: $IMPROVEMENT% < $MIN_IMPROVEMENT%"
exit 1
fi
echo "β
All quality gates passed!"
Quality Gate Thresholds
Metric | Threshold | Action |
---|---|---|
Baseline Pass Rate | β₯ 40% | Continue to optimization |
Final Pass Rate | β₯ 80% | Deploy to production |
Improvement | β₯ 20% | Accept optimization |
Response Time | < 30s | Performance check |
π§ Advanced CI/CD Features
Multi-Agent Testing
Test multiple agents in parallel:
# .github/workflows/multi-agent-test.yml
name: Multi-Agent Testing
on: [push, pull_request]
jobs:
test-agents:
strategy:
matrix:
agent: [developer, customer-service, data-analyst]
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Setup Python
uses: actions/setup-python@v3
with:
python-version: '3.11'
- name: Setup SuperOptiX
run: pip install superoptix
- name: Test Agent
run: |
super agent compile ${{ matrix.agent }}
super agent evaluate ${{ matrix.agent }} --format json --save-report ${{ matrix.agent }}_results.json
- name: Upload Results
uses: actions/upload-artifact@v3
with:
name: ${{ matrix.agent }}-results
path: ${{ matrix.agent }}_results.json
Conditional Optimization
Only optimize when needed:
# .github/workflows/conditional-optimization.yml
name: Conditional Optimization
on: [push, pull_request]
jobs:
evaluate-and-optimize:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Setup SuperOptiX
run: pip install superoptix
- name: Baseline Evaluation
run: |
super agent evaluate developer --format json --save-report baseline.json
PASS_RATE=$(jq -r '.summary.pass_rate' baseline.json | tr -d '%')
echo "PASS_RATE=$PASS_RATE" >> $GITHUB_ENV
- name: Optimize if Needed
if: env.PASS_RATE < 80
run: |
super agent optimize developer
super agent evaluate developer --format json --save-report final.json
- name: Skip Optimization
if: env.PASS_RATE >= 80
run: |
echo "Pass rate already at $PASS_RATE%, skipping optimization"
cp baseline.json final.json
Automated Deployment
Deploy only after quality gates pass:
# .github/workflows/deploy.yml
name: Deploy Agent
on:
workflow_run:
workflows: ["Agent Quality Check"]
types: [completed]
jobs:
deploy:
if: ${{ github.event.workflow_run.conclusion == 'success' }}
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Setup SuperOptiX
run: pip install superoptix
- name: Download Test Results
uses: actions/download-artifact@v3
with:
name: bdd-spec-results
path: ./
- name: Verify Quality Gates
run: |
PASS_RATE=$(jq -r '.summary.pass_rate' results.json | tr -d '%')
if [ "$PASS_RATE" -lt 80 ]; then
echo "Quality gate failed: $PASS_RATE% < 80%"
exit 1
fi
- name: Deploy to Production
run: |
# Your deployment commands here
echo "Deploying agent to production..."
π Monitoring and Reporting
Test Results Dashboard
Create a dashboard for test results:
# .github/workflows/dashboard.yml
name: Test Results Dashboard
on:
workflow_run:
workflows: ["Agent Quality Check"]
types: [completed]
jobs:
update-dashboard:
runs-on: ubuntu-latest
steps:
- name: Download Results
uses: actions/download-artifact@v3
with:
name: bdd-spec-results
path: ./
- name: Generate Dashboard
run: |
# Generate HTML dashboard from JSON results
python -c "
import json
with open('results.json') as f:
data = json.load(f)
# Generate HTML dashboard
"
- name: Deploy Dashboard
run: |
# Deploy dashboard to GitHub Pages or other hosting
echo "Dashboard updated"
Slack Notifications
Send notifications to Slack:
# .github/workflows/notifications.yml
name: Notifications
on:
workflow_run:
workflows: ["Agent Quality Check"]
types: [completed]
jobs:
notify:
runs-on: ubuntu-latest
steps:
- name: Download Results
uses: actions/download-artifact@v3
with:
name: bdd-spec-results
path: ./
- name: Send Slack Notification
uses: 8398a7/action-slack@v3
with:
status: ${{ github.event.workflow_run.conclusion }}
text: |
Agent Quality Check: ${{ github.event.workflow_run.conclusion }}
Pass Rate: $(jq -r '.summary.pass_rate' results.json)
Improvement: $(jq -r '.improvement.percentage' results.json)
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
π Best Practices
β DO's
- Always evaluate before optimizing
- Set clear quality gates
- Measure improvement quantitatively
- Iterate based on evaluation feedback
- Use scenarios as both training data and test cases
- Automate everything
- Monitor trends over time
β DON'Ts
- Don't optimize without baseline
- Don't skip evaluation after optimization
- Don't deploy without quality gates
- Don't ignore failing scenarios
- Don't deploy on failing tests
π Troubleshooting
Common CI/CD Issues
Quality Gate Failures
# Check baseline quality
super agent evaluate developer --format json --save-report baseline.json
jq '.summary.pass_rate' baseline.json
# Check improvement
super agent evaluate developer --format json --save-report final.json
jq '.improvement.percentage' final.json
Optimization Failures
# Check if agent is compiled
super agent compile developer
# Check if scenarios exist
super agent evaluate developer --verbose
# Force re-optimization
super agent optimize developer --force
Deployment Failures
# Verify quality gates locally
./quality-gate.sh
# Check agent status
super agent inspect developer
# Test agent manually
super agent run developer --goal "test task"
π Related Resources
- Evaluation & Testing Guide - Detailed testing information
- Optimization Guide - Agent optimization techniques
- BDD Guide - Behavior-Driven Development practices
- Agent Development Guide - Agent development workflow
π― Key Takeaway: CI/CD integration ensures reliable, production-ready AI agents through automated testing, quality gates, and iterative improvement. Always evaluate before optimizing and deploy only when quality gates pass.