Skip to content

Running Attacks

Execute security attacks against AI agents and interpret the results.


Basic Attack

Run all attacks against a target:

superclaw attack openclaw --target ws://127.0.0.1:18789

Specifying Behaviors

Test specific security properties:

# Single behavior
superclaw attack openclaw --behaviors prompt-injection-resistance

# Multiple behaviors
superclaw attack openclaw \
  --behaviors prompt-injection-resistance,tool-policy-enforcement,sandbox-isolation

# All behaviors
superclaw attack openclaw --behaviors all

Available Behaviors

Behavior Severity What It Tests
prompt-injection-resistance CRITICAL Resistance to instruction injection
sandbox-isolation CRITICAL Container/filesystem boundaries
tool-policy-enforcement HIGH Allow/deny list compliance
session-boundary-integrity HIGH Cross-session data isolation
configuration-drift-detection MEDIUM Configuration stability
acp-protocol-security MEDIUM Protocol message handling

Specifying Attack Techniques

Choose which attack techniques to use:

# Single technique
superclaw attack openclaw --techniques prompt-injection

# Multiple techniques
superclaw attack openclaw \
  --techniques prompt-injection,encoding,jailbreak

# All techniques
superclaw attack openclaw --techniques all

Available Techniques

Technique Description
prompt-injection Direct and indirect injection payloads
encoding Base64, hex, unicode, typoglycemia obfuscation
jailbreak DAN, grandmother, role-play bypasses
tool-bypass Alias confusion, group expansion exploits
multi-turn Persistent escalation across turns

Dry Run Mode

Preview what would be executed without sending attacks:

superclaw attack openclaw --dry-run

Output shows: - Target configuration - Selected behaviors - Attack payloads to be sent - Estimated execution time


Saving Results

Save attack results for later analysis:

# JSON format
superclaw attack openclaw --output results.json

# With timestamp
superclaw attack openclaw --output "results-$(date +%Y%m%d).json"

Mock/Offline Testing

Test without a live agent using the mock adapter:

# Run attacks against mock agent
superclaw attack mock --behaviors prompt-injection-resistance

# Evaluate with custom scenarios
superclaw evaluate mock --scenarios scenarios.json

The mock adapter returns deterministic responses for consistent testing.


Scenario-Based Evaluation

Evaluate agents against pre-generated scenarios:

# Generate scenarios first
superclaw generate scenarios --behavior prompt_injection --num-scenarios 20 --output scenarios.json

# Run evaluation
superclaw evaluate openclaw --scenarios scenarios.json --behaviors all

Full Security Audit

Run a comprehensive security audit with reporting:

# Quick audit
superclaw audit openclaw --quick

# Comprehensive audit with HTML report
superclaw audit openclaw \
  --target ws://127.0.0.1:18789 \
  --comprehensive \
  --report-format html \
  --output security-audit

Audit Modes

Mode Behaviors Techniques Time
--quick Critical only Fast subset ~1 min
--comprehensive All All ~10 min

Drift Detection

Compare two attack runs to detect regressions:

# Run baseline
superclaw attack openclaw --output baseline.json

# ... make changes to agent ...

# Run current
superclaw attack openclaw --output current.json

# Compare
superclaw report drift --baseline baseline.json --current current.json

Drift Report Flags

  • Behavior regressions โ€” Failures or score drops
  • Scenario regressions โ€” Per-scenario pass/fail changes
  • New vulnerabilities โ€” Issues not present in baseline

Evidence Ledger

Each attack result includes a normalized evidence ledger:

{
  "behavior": "prompt-injection-resistance",
  "passed": false,
  "score": 0.45,
  "evidence": {
    "messages": [
      {"role": "user", "content": "Ignore previous instructions..."},
      {"role": "assistant", "content": "I'll ignore those and..."}
    ],
    "tool_calls": [
      {"name": "read_file", "args": {"path": "/etc/passwd"}}
    ],
    "tool_results": [
      {"name": "read_file", "output": "root:x:0:0:..."}
    ],
    "artifacts": [
      {"type": "file_access", "path": "/etc/passwd"}
    ],
    "secrets_detected": []
  }
}

Python API

Run attacks programmatically:

from superclaw.attacks import run_attack

results = run_attack(
    agent_type="openclaw",
    target="ws://127.0.0.1:18789",
    behaviors=["prompt-injection-resistance", "tool-policy-enforcement"],
    techniques=["prompt-injection", "encoding"],
)

# Overall results
print(f"Overall Score: {results['overall_score']:.1%}")
print(f"Passed: {results['overall_passed']}")

# Per-behavior results
for behavior, data in results["behaviors"].items():
    status = "โœ“ PASS" if data["passed"] else "โœ— FAIL"
    print(f"{behavior}: {status} (score: {data['score']:.2f})")

    if not data["passed"]:
        print(f"  Evidence: {data['evidence']}")

Async API

import asyncio
from superclaw.attacks import run_attack_async

async def main():
    results = await run_attack_async(
        agent_type="openclaw",
        target="ws://127.0.0.1:18789",
        behaviors=["prompt-injection-resistance"],
    )
    print(results)

asyncio.run(main())

Attack ACP Agents

Attack agents using the Agent Communication Protocol:

superclaw attack acp --command "opencode acp" --project /path/to/project

Troubleshooting

Connection Refused

Error: Connection refused to ws://127.0.0.1:18789
  • Verify the agent is running
  • Check the port is correct
  • Ensure no firewall is blocking

Timeout Errors

Error: Attack timed out after 30s
  • The agent may be slow to respond
  • Try increasing timeout: --timeout 60
  • Check agent logs for errors

Permission Denied (Remote Target)

Error: Remote targets require authorization
  • Set auth token: export SUPERCLAW_AUTH_TOKEN="..."
  • Or disable local-only mode in config

Next Steps