Human-in-the-Loop & Approval System¶

Overview¶

RLM Code agents execute arbitrary code in pursuit of their tasks. While sandboxing provides a first line of defense, some actions -- deleting files, making network requests, running privileged commands -- carry inherent risk that no sandbox can fully contain. The Human-in-the-Loop (HITL) and Approval System provides a safety layer that evaluates every agent action for risk, enforces approval policies, and maintains a complete audit trail of all decisions.

This system answers a fundamental question: should the agent be allowed to do this?

Why Safety Matters¶

Autonomous code execution introduces risks at multiple levels:

Risk Category	Examples	Potential Impact
Data loss	`rm -rf`, `DROP TABLE`, file overwrites	Irreversible loss of files, databases, or state
System compromise	`sudo` commands, privilege escalation	Security breaches, unauthorized access
Network exfiltration	HTTP POST to external services	Data leaks, API key exposure
Resource exhaustion	Infinite loops, fork bombs	System instability, denial of service
Side effects	Git force-push, package installation	Environment corruption, team disruption

The core problem

An LLM generating code cannot be fully trusted to avoid harmful actions. Even well-intentioned prompts can lead to destructive code through hallucination, misinterpretation, or emergent behavior. The approval system provides a programmatic safety net that works regardless of the model's intent.

The Approval Workflow¶

Every agent action passes through a structured approval workflow before execution:

Agent Action --> Risk Assessment --> Policy Check --> Handler --> Decision --> Audit Log
     |               |                   |              |            |            |
     v               v                   v              v            v            v
  dict with      RiskAssessor       ApprovalPolicy   Handler     approve/    AuditEntry
  action/code    evaluates 40+      determines if    prompts     deny        logged to
  fields         risk rules         approval needed  user/auto               file + memory

Step-by-Step Flow¶

1. Agent Action

The agent produces an action dictionary containing the action type, code to execute, and metadata:

action = {
    "action": "code",
    "code": "import shutil; shutil.rmtree('/tmp/experiment')",
    "reasoning": "Clean up temporary files",
}

2. Risk Assessment

The RiskAssessor evaluates the action against 40+ configurable risk rules using pattern matching. Each triggered rule contributes to the overall risk level:

assessment = RiskAssessment(
    level=ToolRiskLevel.HIGH,
    reasons=["File deletion may cause data loss"],
    affected_resources=["file:/tmp/experiment"],
    reversible=False,
    estimated_impact="Significant impact, may require manual intervention to undo",
    recommendations=["Review the action carefully before approving"],
)

See Risk Assessment for full documentation.

3. Policy Check

The ApprovalPolicy determines whether the assessed risk level requires human approval. Six policy modes are available, from fully permissive (AUTO_APPROVE) to fully restrictive (CONFIRM_ALL):

# Only require approval for HIGH and CRITICAL actions
policy = ApprovalPolicy.CONFIRM_HIGH_RISK

# This HIGH-risk action requires approval
requires_approval = True

See Approval Gates for full documentation.

4. Handler

If approval is required, an ApprovalHandler manages the approval interaction. Handlers range from interactive terminal prompts to automated callbacks for integration with external systems:

# Console handler: prompts user in terminal
# Auto handlers: approve or deny without interaction
# Callback handler: delegates to custom function

See Approval Gates for handler documentation.

5. Decision

The handler returns an ApprovalResponse with the decision:

response = ApprovalResponse(
    request_id="abc123",
    status=ApprovalStatus.APPROVED,
    approved=True,
    reason="User approved via console",
    approver="console_user",
)

6. Audit Log

Every decision -- whether approved, denied, auto-approved, or timed out -- is recorded in the audit log for compliance and debugging:

entry = AuditEntry(
    entry_id="abc123-2025-01-15",
    timestamp="2025-01-15T10:30:00Z",
    request_id="abc123",
    action_type="code",
    risk_level="high",
    approved=True,
    status="approved",
    reason="User approved via console",
    approver="console_user",
    code_preview="import shutil; shutil.rmtree('/tmp/experiment')",
    affected_resources=["file:/tmp/experiment"],
)

See Audit Logging for full documentation.

Architecture¶

The approval system consists of four components:

+-------------------+     +------------------+     +------------------+
|   ApprovalGate    |---->|  RiskAssessor    |     | ApprovalHandler  |
|   (orchestrator)  |     |  (40+ rules)     |     | (interaction)    |
|                   |     +------------------+     +------------------+
|  check_action()   |                                      |
|  request_approval |--------------------------------------+
|  approve/deny     |
+-------------------+
        |
        v
+-------------------+
| ApprovalAuditLog  |
| (persistence)     |
+-------------------+

Component	Module	Responsibility
`ApprovalGate`	`approval.gate`	Orchestrates the entire workflow
`RiskAssessor`	`approval.policy`	Evaluates action risk using rules
`ApprovalPolicy`	`approval.policy`	Determines approval requirements
`ApprovalHandler`	`approval.handlers`	Manages human/automated approval
`ApprovalAuditLog`	`approval.audit`	Records all decisions

Quick Start¶

Basic Setup¶

from rlm_code.rlm.approval import (
    ApprovalGate,
    ApprovalPolicy,
    ConsoleApprovalHandler,
    ApprovalAuditLog,
)

# Create audit log
audit_log = ApprovalAuditLog(log_file="audit.jsonl")

# Create console handler for interactive approval
handler = ConsoleApprovalHandler(timeout_seconds=60)

# Create approval gate
gate = ApprovalGate(
    policy=ApprovalPolicy.CONFIRM_HIGH_RISK,
    approval_handler=handler.handle,
    audit_log=audit_log,
)

Checking an Action¶

# Agent produces an action
action = {
    "action": "code",
    "code": "os.remove('/important/file.txt')",
}

# Check if approval is needed
request = gate.check_action(action)

if request.requires_approval:
    # Request approval (async)
    response = await gate.request_approval(request)
    if response.approved:
        # Execute the action
        execute(action)
    else:
        print(f"Action denied: {response.reason}")
else:
    # Low risk, execute directly
    execute(action)

Non-Interactive Setup¶

from rlm_code.rlm.approval import (
    ApprovalGate,
    ApprovalPolicy,
    AutoDenyHandler,
)

# Deny all risky actions automatically (safest for CI/CD)
gate = ApprovalGate(
    policy=ApprovalPolicy.CONFIRM_MEDIUM_AND_UP,
    approval_handler=AutoDenyHandler().handle,
)

Module Reference¶

Import	Description
`ApprovalGate`	Central orchestrator for the approval workflow
`ApprovalRequest`	Represents a request for approval with risk assessment
`ApprovalResponse`	Represents the approval decision
`ApprovalStatus`	Enum of possible decision states
`ApprovalPolicy`	Enum of approval policy modes
`RiskAssessor`	Evaluates action risk using configurable rules
`ToolRiskLevel`	Enum of risk levels (SAFE through CRITICAL)
`RiskAssessment`	Data class containing risk evaluation results
`ApprovalHandler`	Base class for approval handlers
`ConsoleApprovalHandler`	Interactive terminal-based handler
`AutoApproveHandler`	Automatic approval (use with caution)
`AutoDenyHandler`	Automatic denial (strictest)
`CallbackApprovalHandler`	Custom callback-based handler
`ApprovalAuditLog`	Persistent audit log for compliance
`AuditEntry`	Single audit log entry

from rlm_code.rlm.approval import (
    ApprovalGate,
    ApprovalPolicy,
    ToolRiskLevel,
    ApprovalRequest,
    ApprovalResponse,
    ApprovalStatus,
    RiskAssessor,
    RiskAssessment,
    ConsoleApprovalHandler,
    AutoApproveHandler,
    AutoDenyHandler,
    CallbackApprovalHandler,
    ApprovalAuditLog,
    AuditEntry,
)