Docker Runtime¶

The Docker Runtime executes agent-generated code inside an ephemeral Docker container, providing process isolation, filesystem restrictions, configurable memory limits, and network policy controls.

Module¶

rlm_code.sandbox.runtimes.docker_runtime

Class: `DockerSandboxRuntime`¶

class DockerSandboxRuntime:
    """Executes code inside a Docker container."""

    name = "docker"

    def __init__(
        self,
        image: str = "python:3.11-slim",
        memory_limit_mb: int = 512,
        cpus: float | None = 1.0,
        network_enabled: bool = False,
        extra_args: list[str] | None = None,
    ):
        ...

    def execute(self, request: RuntimeExecutionRequest) -> RuntimeExecutionResult:
        ...

    @staticmethod
    def check_health(timeout_seconds: float = 2.5) -> tuple[bool, str]:
        ...

    @staticmethod
    def normalize_workdir(workdir: Path) -> str:
        ...

How It Works¶

For each execute() call, the runtime:

Resolves the working directory to an absolute path.
Builds a docker run --rm command with:
- A bind mount of workdir to /workspace inside the container.
- Environment variables from request.env injected via --env flags.
- Network, memory, and CPU constraints applied.
- Any user-specified extra_args appended.
Runs the command with subprocess.run(), enforcing the configured timeout.
Returns the container's exit code, stdout, and stderr as a RuntimeExecutionResult.

Ephemeral containers

Every execution creates a fresh container (--rm flag). No state persists between steps unless the working directory is shared via bind mount.

Configuration¶

YAML configcreate_runtime()TUI command

sandbox:
  runtime: docker
  docker:
    image: "python:3.11-slim"
    memory_limit_mb: 512
    cpus: 1.0
    network_enabled: false
    extra_args: []

from rlm_code.sandbox.runtimes.registry import create_runtime

runtime = create_runtime("docker", sandbox_config=cfg)

/sandbox use docker

Configuration Parameters¶

Parameter	Type	Default	Description
`image`	`str`	`"python:3.11-slim"`	Docker image to use for execution
`memory_limit_mb`	`int`	`512`	Container memory limit in MB (`--memory`)
`cpus`	`float`	`1.0`	CPU quota (`--cpus`)
`network_enabled`	`bool`	`false`	Whether to allow container networking
`extra_args`	`list[str]`	`[]`	Additional `docker run` arguments (policy-checked)

Docker Image Configuration¶

Choose an image that matches the packages your agent code needs:

# Minimal Python (fastest pull, smallest surface)
sandbox:
  docker:
    image: "python:3.11-slim"

# Full scientific Python stack
sandbox:
  docker:
    image: "python:3.11"

# Custom image with pre-installed packages
sandbox:
  docker:
    image: "myregistry/rlm-sandbox:latest"

Pre-pull for speed

The first execution pulls the image if it is not cached locally. Pre-pull with docker pull python:3.11-slim to avoid latency on the first run.

Volume Mounts¶

The runtime automatically mounts the working directory as a read-write bind mount:

host: <workdir>  -->  container: /workspace:rw

The allowed_mount_roots configuration controls which host paths are permitted as bind-mount sources. By default, the project root (.) and /tmp are allowed.

sandbox:
  allowed_mount_roots:
    - "."
    - "/tmp"

Explicit volume mounts are blocked

The --volume, -v, and --mount flags in extra_args are blocked by the dangerous flag detector. Only the automatic workdir mount is permitted.

Network Policy¶

By default, container networking is disabled (--network none). This prevents agent-generated code from making outbound HTTP calls, exfiltrating data, or downloading arbitrary packages.

# Enable networking (use with caution)
sandbox:
  docker:
    network_enabled: true

Enable networking only when required

Allowing network access means agent code can reach the internet, internal services, and cloud metadata endpoints. Only enable this when the task genuinely requires it.

Memory Limits¶

The memory_limit_mb parameter sets a hard cap via Docker's --memory flag. If the container exceeds this limit, Docker kills it with an OOM signal.

sandbox:
  docker:
    memory_limit_mb: 1024  # 1 GB

Dangerous Flag Detection¶

The registry maintains a blocklist of Docker flags that would weaken sandbox isolation. Both create_runtime() and run_runtime_doctor() enforce this policy.

Blocked Flags¶

Flag	Why It Is Blocked
`--privileged`	Grants full host device access to the container
`--pid=host`	Shares the host PID namespace
`--network=host`	Shares the host network stack (bypasses `--network`)
`--ipc=host`	Shares the host IPC namespace
`--uts=host`	Shares the host UTS namespace
`--cap-add=ALL`	Grants all Linux capabilities
`--volume` / `-v`	Arbitrary host mounts (use `allowed_mount_roots`)
`--mount`	Arbitrary mounts (use `allowed_mount_roots`)

Additionally, any argument starting with --volume= or --mount= is blocked.

What Happens When a Blocked Flag is Detected¶

from rlm_code.sandbox.runtimes.registry import create_runtime

# This raises ConfigurationError immediately:
create_runtime("docker", config_with_privileged)
# ConfigurationError: Docker extra arg '--privileged' is blocked by sandbox policy.

Defence in depth

The flag check runs at runtime creation time -- before any container is launched. Even if configuration is loaded from an untrusted source, the sandbox policy prevents privilege escalation.

Health Check¶

The Docker Runtime provides a static check_health() method that probes the Docker daemon:

ok, detail = DockerSandboxRuntime.check_health()
# ok=True, detail="docker daemon ready (server 24.0.7)"

The check runs docker info --format "{{.ServerVersion}}" with a 2.5-second timeout and reports:

docker CLI not found -- Docker is not installed or not on PATH.
docker check timed out -- Daemon is unresponsive.
docker daemon unavailable -- Daemon returned an error.
docker daemon ready (server X.Y.Z) -- Ready to use.

Setup¶

1. Install Docker¶

macOSLinuxWindows

brew install --cask docker
# Then open Docker Desktop

curl -fsSL https://get.docker.com | sh
sudo systemctl start docker
sudo usermod -aG docker $USER

Download and install Docker Desktop.

2. Pre-pull the Image¶

docker pull python:3.11-slim

3. Configure RLM Code¶

sandbox:
  runtime: docker
  docker:
    image: "python:3.11-slim"
    memory_limit_mb: 512
    network_enabled: false

4. Verify¶

/sandbox doctor

This runs run_runtime_doctor() and reports the status of every check:

[pass] configured_runtime: Runtime set to 'docker'.
[pass] env_allowlist: 0 host env var(s) allowed.
[pass] docker_cli: docker CLI found at /usr/local/bin/docker.
[pass] docker_daemon: docker daemon ready (server 24.0.7)
[pass] docker_image: Image 'python:3.11-slim' is available locally.
[pass] docker_network_policy: Container networking is disabled.
[pass] docker_extra_args: Docker extra args passed policy checks.
[pass] mount_policy: Temp dir '/tmp' is allowed for bind mounts.
[pass] temp_write_access: Writable temp directory: /tmp

Usage Example¶

from pathlib import Path
from rlm_code.sandbox.runtimes.base import RuntimeExecutionRequest
from rlm_code.sandbox.runtimes.docker_runtime import DockerSandboxRuntime

runtime = DockerSandboxRuntime(
    image="python:3.11-slim",
    memory_limit_mb=256,
    network_enabled=False,
)

request = RuntimeExecutionRequest(
    code_file=Path("/tmp/workspace/step.py"),
    workdir=Path("/tmp/workspace"),
    timeout_seconds=30,
    python_executable=Path("python"),  # ignored inside container
    env={"TASK_ID": "abc123"},
)

result = runtime.execute(request)
print(f"Exit: {result.return_code}")
print(f"Stdout: {result.stdout}")
print(f"Stderr: {result.stderr}")