Behavior Specifications¶
Behavior specifications define what CodeOptiX evaluates. They are modular, reusable definitions of desired or undesired behaviors.
What are Behavior Specifications?¶
Behavior specifications are rules that define how to evaluate agent output. They check if the agent's code exhibits specific behaviors.
Built-in Behaviors¶
CodeOptiX includes three built-in behaviors:
1. insecure-code¶
Detects security vulnerabilities in generated code.
What it checks: - Hardcoded secrets (passwords, API keys) - SQL injection vulnerabilities - Insecure authentication patterns - Exposed credentials
Example:
from codeoptix.behaviors import create_behavior
behavior = create_behavior("insecure-code")
result = behavior.evaluate(agent_output)
if not result.passed:
print(f"Issues found: {result.evidence}")
2. vacuous-tests¶
Identifies low-quality or meaningless tests.
What it checks: - Tests with no assertions - Trivial tests (always pass) - Missing edge cases - Incomplete test coverage
Example:
behavior = create_behavior("vacuous-tests")
result = behavior.evaluate(agent_output)
print(f"Test quality score: {result.score}")
3. plan-drift¶
Detects deviations from planning artifacts and requirements.
What it checks: - Missing planned features - Requirements not addressed - API contract violations - Architecture mismatches
Example:
behavior = create_behavior("plan-drift")
result = behavior.evaluate(
agent_output,
context={
"plan": "Create secure authentication API",
"requirements": ["JWT tokens", "Password hashing"]
}
)
Behavior Result Structure¶
Each behavior evaluation returns a BehaviorResult:
@dataclass
class BehaviorResult:
behavior_name: str # Name of the behavior
passed: bool # Whether it passed
score: float # Score from 0.0 to 1.0
evidence: List[str] # Specific issues found
severity: Severity # LOW, MEDIUM, HIGH, CRITICAL
metadata: Dict # Additional data
Score Interpretation¶
- 0.9 - 1.0: Excellent - No issues
- 0.7 - 0.9: Good - Minor issues
- 0.5 - 0.7: Fair - Some issues
- 0.0 - 0.5: Poor - Significant issues
Using Behaviors¶
Basic Usage¶
from codeoptix.behaviors import create_behavior
from codeoptix.adapters.base import AgentOutput
# Create behavior
behavior = create_behavior("insecure-code")
# Create agent output (example)
agent_output = AgentOutput(
code='def connect():\n password = "secret123"\n return password',
tests="def test_connect():\n assert True"
)
# Evaluate
result = behavior.evaluate(agent_output)
# Check results
print(f"Passed: {result.passed}")
print(f"Score: {result.score}")
print(f"Evidence: {result.evidence}")
With Configuration¶
behavior = create_behavior("insecure-code", {
"severity": "high",
"enabled": True,
"strict_mode": True
})
With Context¶
result = behavior.evaluate(
agent_output,
context={
"plan": "Create secure API",
"requirements": ["No hardcoded secrets", "Use environment variables"]
}
)
Creating Custom Behaviors¶
You can create custom behaviors by extending BehaviorSpec:
from codeoptix.behaviors.base import BehaviorSpec, BehaviorResult, Severity
class MyCustomBehavior(BehaviorSpec):
def get_name(self) -> str:
return "my-custom-behavior"
def get_description(self) -> str:
return "Checks for specific patterns in code"
def evaluate(self, agent_output, context=None):
code = agent_output.code or ""
evidence = []
score = 1.0
# Your evaluation logic
if "bad_pattern" in code:
evidence.append("Found bad pattern")
score = 0.5
return BehaviorResult(
behavior_name=self.get_name(),
passed=score >= 0.7,
score=score,
evidence=evidence,
severity=Severity.MEDIUM
)
Registering Custom Behaviors¶
from codeoptix.behaviors import create_behavior
# Register your behavior
# (Implementation depends on your setup)
behavior = create_behavior("my-custom-behavior")
Behavior Configuration¶
Behaviors can be configured:
config = {
"severity": "high", # LOW, MEDIUM, HIGH, CRITICAL
"enabled": True, # Enable/disable behavior
"threshold": 0.7, # Passing threshold
# Behavior-specific options
}
Evaluation Process¶
When a behavior evaluates agent output:
- Extract Code: Gets code from
AgentOutput - Run Checks: Performs behavior-specific checks
- Collect Evidence: Gathers specific issues
- Calculate Score: Computes score based on findings
- Return Result: Returns
BehaviorResult
Best Practices¶
1. Use Appropriate Behaviors¶
Choose behaviors relevant to your use case:
# For security-focused projects
behaviors = ["insecure-code"]
# For test quality
behaviors = ["vacuous-tests"]
# For plan compliance
behaviors = ["plan-drift"]
2. Provide Context¶
Always provide context when available:
result = behavior.evaluate(
agent_output,
context={
"plan": plan_content,
"requirements": requirements_list
}
)
3. Review Evidence¶
Always review the evidence in results:
Combining Behaviors¶
You can evaluate multiple behaviors:
behaviors = [
create_behavior("insecure-code"),
create_behavior("vacuous-tests"),
create_behavior("plan-drift")
]
results = {}
for behavior in behaviors:
results[behavior.get_name()] = behavior.evaluate(agent_output)
Next Steps¶
- Evaluation Engine - Run evaluations with behaviors
- Custom Behaviors Guide - Create your own
- Python API Guide - Advanced usage