
๐งช RLM Code¶
Research Playground & Evaluation OS for Recursive Language Model Agentic Systems
v0.1.5 Python 3.11+ Apache 2.0
RLM Code is the definitive research operating system for building, running, evaluating, comparing, and optimizing LLM-based coding agents. It supports multiple agent paradigms including Pure RLM, CodeAct, and Traditional in a single unified platform with built-in safety, observability, and reproducibility.
๐ฏ What RLM Code Solves¶
The underlying long-context reasoning problem is what RLM (the method) addresses. RLM Code addresses the tooling and workflow problem around using that method in practice.
Core product problems it targets:
- Implementation friction: provide a runnable RLM environment (
llm_query, REPL, run loop) without custom scaffolding. - Experiment management: run, replay, compare, and benchmark experiments in one place.
- Safety controls: route execution through secure backends and explicit runtime settings.
- Reproducibility: store traces, metrics, and benchmark artifacts for repeatable research.
- Operational visibility: expose observability, status, and diagnostics for debugging experiments.
In short, RLM Code is a research tooling layer for building and evaluating RLM-style workflows.
โจ Highlights¶
๐ง Multi-Paradigm Engine¶
Run Pure RLM (paper-compliant with context-as-variable), CodeAct (context-in-tokens), or Traditional agent orchestration, all from one TUI.
๐ฌ Built-in Research Tab¶
A dedicated Research tab inside the TUI with Dashboard, Trajectory, Benchmarks, Replay, and Live Events sub-tabs for real-time experiment tracking.
๐ Benchmarks & Leaderboard¶
10 preset benchmarks with 33+ test cases, a multi-metric leaderboard, and side-by-side paradigm comparison.
๐ Session Replay¶
Time-travel through any RLM run step-by-step with forward/backward navigation, reward curve visualization, and checkpoint/restore.
๐ฏ Hot-Swappable Policies¶
Swap reward, action selection, compaction, and termination policies at runtime via the Policy Lab.
๐ HITL Approval Gates¶
Risk assessment with 40+ rules, 6 approval modes, and full audit logging to keep humans in the loop for every critical action.
๐ Pluggable Observability¶
7 sinks including JSONL, MLflow, OpenTelemetry, LangSmith, LangFuse, and Logfire to trace every step of every run.
๐ฆ Sandbox Runtimes¶
6 runtimes including Local, Docker, Apple Container, Modal, E2B, and Daytona for safe, isolated code execution.
๐ผ๏ธ RLM Research Lab¶
๐ Quick Start¶
Install and launch
Connect to a model
Run your first benchmark
Keep runs bounded
Compare benchmark output
Switch to the Research tab
Press Ctrl+5 or F6 to open the Research tab to see your run's dashboard, trajectory, reward curves, and live events.
๐๏ธ Architecture¶
graph TB
CLI["๐ rlm-code CLI"]
CLI --> TUI["๐ฅ๏ธ Unified TUI"]
TUI --> RLM["๐ RLM"]
TUI --> FILES["๐ Files"]
TUI --> DETAILS["๐ Details"]
TUI --> SHELL["โก Shell"]
TUI --> RESEARCH["๐ฌ Research"]
CLI --> CMD["โจ๏ธ 50+ Slash Commands"]
CMD --> RUNNER["๐ง RLM Runner"]
RUNNER --> EVENTS["๐ก Event Bus (27+ types)"]
RUNNER --> OBS["๐ Observability (7 sinks)"]
RUNNER --> TRAJ["๐ Trajectory Logger"]
RUNNER --> POL["๐ฏ Policy Lab"]
RUNNER --> HITL["๐ HITL Approval Gates"]
RUNNER --> ENV["๐ Environments"]
ENV --> PURE["Pure RLM"]
ENV --> DSPY["DSPy Coding"]
ENV --> GEN["Generic"]
RUNNER --> SAND["๐ฆ Sandbox Runtimes"]
SAND --> LOCAL["Local"]
SAND --> DOCKER["Docker"]
SAND --> CLOUD["Modal ยท E2B ยท Daytona"]
CMD --> BENCH["๐ Benchmarks (10 presets)"]
CMD --> LB["๐ Leaderboard"]
CMD --> SR["โช Session Replay"] ๐ Feature Matrix¶
| Feature | Module |
|---|---|
| ๐ง RLM Runner (multi-paradigm) | rlm_code.rlm.runner |
| ๐งช Pure RLM Environment | rlm_code.rlm.pure_rlm_environment |
| ๐ก Event System (27+ types) | rlm_code.rlm.events |
| ๐ฏ Policy Lab (16 policies) | rlm_code.rlm.policies |
| ๐ HITL Approval Gates | rlm_code.rlm.approval |
| ๐ Observability (7 sinks) | rlm_code.rlm.observability |
| ๐ Benchmarks (10 presets) | rlm_code.rlm.benchmarks |
| ๐ Leaderboard | rlm_code.rlm.leaderboard |
| โช Session Replay | rlm_code.rlm.session_replay |
| ๐ Paradigm Comparison | rlm_code.rlm.comparison |
| ๐ Trajectory Logging | rlm_code.rlm.trajectory |
| ๐งน Memory Compaction | rlm_code.rlm.memory_compaction |
| ๐ฆ 6 Sandbox Runtimes | rlm_code.sandbox.runtimes |
| ๐ค 12+ LLM Providers | rlm_code.models |
| ๐ MCP Server | rlm_code.mcp |
| ๐ฅ๏ธ Unified TUI (5 tabs) | rlm_code.ui.tui_app |
| โจ๏ธ 50+ Slash Commands | rlm_code.commands |
| Code Validation | rlm_code.validation |
| ๐งฉ Framework Adapters | rlm_code.rlm.frameworks |
๐ฅ๏ธ The TUI at a Glance¶
RLM Code ships a single unified TUI with 5 tabs:
| Tab | Shortcut | Purpose |
|---|---|---|
| ๐ RLM | Ctrl+1 / F2 | Converse with LLMs, run slash commands |
| ๐ Files | Ctrl+2 / F3 | Browse project files with syntax preview |
| ๐ Details | Ctrl+3 / F4 | Status panel, diff viewer |
| โก Shell | Ctrl+4 / F5 | Persistent stateful shell |
| ๐ฌ Research | Ctrl+5 / F6 | Dashboard, trajectory, benchmarks, replay, live events |
The Research tab has 5 internal sub-tabs for organizing experiment data:
- Dashboard: Run metrics, reward sparkline, summary
- Trajectory: Step-by-step timeline of actions and rewards
- Benchmarks: Leaderboard table from
/rlm benchruns - Replay: Step-through controls for time-travel debugging
- Events: Live event stream from the RLM event bus
๐ฌ Research Tab
Press Ctrl+5 after running /rlm bench preset=dspy_quick to see real experiment data populate the Research tab dashboards.
๐ Documentation Guide¶
| Section | What You'll Find |
|---|---|
| ๐ Getting Started | Installation, quick start, CLI reference, configuration |
| ๐ง Core Engine | RLM Runner, environments, events, termination, trajectory |
| ๐ฏ Policies & Safety | Reward, action, compaction, termination policies + HITL gates |
| ๐ฅ๏ธ Terminal UI | Tab reference, Research tab, widgets, theme system |
| ๐ Benchmarks & Replay | Presets, leaderboard, session replay |
| ๐ Observability | Sink architecture, MLflow, OTel, LangSmith, LangFuse, Logfire |
| ๐ฆ Platform | Sandbox runtimes, LLM providers, MCP, framework adapters |
| ๐ Reference | Full API reference |