Skip to content

Behavior Specifications

Behavior specifications define what CodeOptiX evaluates. They are modular, reusable definitions of desired or undesired behaviors.


What are Behavior Specifications?

Behavior specifications are rules that define how to evaluate agent output. They check if the agent's code exhibits specific behaviors.


Built-in Behaviors

CodeOptiX includes three built-in behaviors:

1. insecure-code

Detects security vulnerabilities in generated code.

What it checks: - Hardcoded secrets (passwords, API keys) - SQL injection vulnerabilities - Insecure authentication patterns - Exposed credentials

Example:

from codeoptix.behaviors import create_behavior

behavior = create_behavior("insecure-code")
result = behavior.evaluate(agent_output)

if not result.passed:
    print(f"Issues found: {result.evidence}")

2. vacuous-tests

Identifies low-quality or meaningless tests.

What it checks: - Tests with no assertions - Trivial tests (always pass) - Missing edge cases - Incomplete test coverage

Example:

behavior = create_behavior("vacuous-tests")
result = behavior.evaluate(agent_output)

print(f"Test quality score: {result.score}")

3. plan-drift

Detects deviations from planning artifacts and requirements.

What it checks: - Missing planned features - Requirements not addressed - API contract violations - Architecture mismatches

Example:

behavior = create_behavior("plan-drift")
result = behavior.evaluate(
    agent_output,
    context={
        "plan": "Create secure authentication API",
        "requirements": ["JWT tokens", "Password hashing"]
    }
)


Behavior Result Structure

Each behavior evaluation returns a BehaviorResult:

@dataclass
class BehaviorResult:
    behavior_name: str      # Name of the behavior
    passed: bool            # Whether it passed
    score: float            # Score from 0.0 to 1.0
    evidence: List[str]     # Specific issues found
    severity: Severity      # LOW, MEDIUM, HIGH, CRITICAL
    metadata: Dict          # Additional data

Score Interpretation

  • 0.9 - 1.0: Excellent - No issues
  • 0.7 - 0.9: Good - Minor issues
  • 0.5 - 0.7: Fair - Some issues
  • 0.0 - 0.5: Poor - Significant issues

Using Behaviors

Basic Usage

from codeoptix.behaviors import create_behavior
from codeoptix.adapters.base import AgentOutput

# Create behavior
behavior = create_behavior("insecure-code")

# Create agent output (example)
agent_output = AgentOutput(
    code='def connect():\n    password = "secret123"\n    return password',
    tests="def test_connect():\n    assert True"
)

# Evaluate
result = behavior.evaluate(agent_output)

# Check results
print(f"Passed: {result.passed}")
print(f"Score: {result.score}")
print(f"Evidence: {result.evidence}")

With Configuration

behavior = create_behavior("insecure-code", {
    "severity": "high",
    "enabled": True,
    "strict_mode": True
})

With Context

result = behavior.evaluate(
    agent_output,
    context={
        "plan": "Create secure API",
        "requirements": ["No hardcoded secrets", "Use environment variables"]
    }
)

Creating Custom Behaviors

You can create custom behaviors by extending BehaviorSpec:

from codeoptix.behaviors.base import BehaviorSpec, BehaviorResult, Severity

class MyCustomBehavior(BehaviorSpec):
    def get_name(self) -> str:
        return "my-custom-behavior"

    def get_description(self) -> str:
        return "Checks for specific patterns in code"

    def evaluate(self, agent_output, context=None):
        code = agent_output.code or ""
        evidence = []
        score = 1.0

        # Your evaluation logic
        if "bad_pattern" in code:
            evidence.append("Found bad pattern")
            score = 0.5

        return BehaviorResult(
            behavior_name=self.get_name(),
            passed=score >= 0.7,
            score=score,
            evidence=evidence,
            severity=Severity.MEDIUM
        )

Registering Custom Behaviors

from codeoptix.behaviors import create_behavior

# Register your behavior
# (Implementation depends on your setup)

behavior = create_behavior("my-custom-behavior")

Behavior Configuration

Behaviors can be configured:

config = {
    "severity": "high",      # LOW, MEDIUM, HIGH, CRITICAL
    "enabled": True,         # Enable/disable behavior
    "threshold": 0.7,        # Passing threshold
    # Behavior-specific options
}

Evaluation Process

When a behavior evaluates agent output:

  1. Extract Code: Gets code from AgentOutput
  2. Run Checks: Performs behavior-specific checks
  3. Collect Evidence: Gathers specific issues
  4. Calculate Score: Computes score based on findings
  5. Return Result: Returns BehaviorResult

Best Practices

1. Use Appropriate Behaviors

Choose behaviors relevant to your use case:

# For security-focused projects
behaviors = ["insecure-code"]

# For test quality
behaviors = ["vacuous-tests"]

# For plan compliance
behaviors = ["plan-drift"]

2. Provide Context

Always provide context when available:

result = behavior.evaluate(
    agent_output,
    context={
        "plan": plan_content,
        "requirements": requirements_list
    }
)

3. Review Evidence

Always review the evidence in results:

for issue in result.evidence:
    print(f"  - {issue}")

Combining Behaviors

You can evaluate multiple behaviors:

behaviors = [
    create_behavior("insecure-code"),
    create_behavior("vacuous-tests"),
    create_behavior("plan-drift")
]

results = {}
for behavior in behaviors:
    results[behavior.get_name()] = behavior.evaluate(agent_output)

Next Steps