Observability¶

RLM Code ships with a pluggable, multi-sink observability architecture that captures every run start, step event, and run completion, then fans them out to one or more telemetry backends in real time. Whether you are debugging locally with JSONL files or shipping distributed traces to a production Jaeger cluster, the system adapts without code changes -- you only toggle environment variables.

Architecture at a Glance¶

flowchart LR
    Runner["RLMRunner"] -->|events| Obs["RLMObservability"]
    Obs --> S1["LocalJSONLSink"]
    Obs --> S2["MLflowSink"]
    Obs --> S3["OpenTelemetrySink"]
    Obs --> S4["LangSmithSink"]
    Obs --> S5["LangFuseSink"]
    Obs --> S6["LogfireSink"]
    Obs --> S7["CompositeSink"]
    S7 --> S7a["Custom Sink A"]
    S7 --> S7b["Custom Sink B"]

The central coordinator, RLMObservability, iterates over its list of sinks and calls each one inside a try/except guard. A failing sink never crashes the run.

The RLMObservabilitySink Protocol¶

Every sink -- built-in or custom -- must satisfy the RLMObservabilitySink structural protocol defined in rlm_code.rlm.observability:

class RLMObservabilitySink(Protocol):
    """Sink contract for RLM observability events."""

    name: str

    def status(self) -> dict[str, Any]:
        """Return sink status for CLI visibility."""
        ...

    def on_run_start(
        self,
        run_id: str,
        *,
        task: str,
        environment: str,
        params: dict[str, Any],
    ) -> None:
        """Hook called at run start."""
        ...

    def on_step(
        self,
        run_id: str,
        *,
        event: dict[str, Any],
        cumulative_reward: float,
    ) -> None:
        """Hook called after each step event."""
        ...

    def on_run_end(
        self,
        run_id: str,
        *,
        result: Any,
        run_path: Path,
    ) -> None:
        """Hook called once at run completion."""
        ...

Method	When It Fires	Key Arguments
`on_run_start`	Immediately before the first iteration	`run_id`, `task`, `environment`, `params`
`on_step`	After every iteration completes	`run_id`, step `event` dict, `cumulative_reward`
`on_run_end`	After the run finishes (success or failure)	`run_id`, `result` object, `run_path`
`status`	Any time the CLI queries sink health	Returns a dict with `name`, `enabled`, `available`, `detail`

Available Sinks¶

RLM Code provides 7 sinks out of the box:

#	Sink	Class	Always Active?	Activation
1	Local JSONL	`LocalJSONLSink`	Yes (default)	`DSPY_RLM_OBS_LOCAL_JSONL=true`
2	MLflow	`MLflowSink`	No	`DSPY_RLM_MLFLOW_ENABLED=true` + `MLFLOW_TRACKING_URI`
3	OpenTelemetry	`OpenTelemetrySink`	No	`DSPY_RLM_OTEL_ENABLED=true` + `OTEL_EXPORTER_OTLP_ENDPOINT`
4	LangSmith	`LangSmithSink`	No	`DSPY_RLM_LANGSMITH_ENABLED=true` + `LANGCHAIN_API_KEY`
5	LangFuse	`LangFuseSink`	No	`DSPY_RLM_LANGFUSE_ENABLED=true` + `LANGFUSE_PUBLIC_KEY` + `LANGFUSE_SECRET_KEY`
6	Logfire	`LogfireSink`	No	`DSPY_RLM_LOGFIRE_ENABLED=true` + `LOGFIRE_TOKEN`
7	Composite	`CompositeSink`	N/A (wrapper)	Programmatic only

Master Switch

Set DSPY_RLM_OBS_ENABLED=false to disable all observability sinks at once. The default is true.

Automatic Activation from Environment Variables¶

When RLMObservability.default() is called (which happens automatically at the start of every RLM run), it reads the following environment variables and instantiates sinks accordingly:

Environment Variable	Default	Description
`DSPY_RLM_OBS_ENABLED`	`true`	Master switch for all observability
`DSPY_RLM_OBS_LOCAL_JSONL`	`true`	Enable the local JSONL file sink
`DSPY_RLM_MLFLOW_ENABLED`	`false`	Enable MLflow experiment tracking
`DSPY_RLM_MLFLOW_EXPERIMENT`	`rlm-code-rlm`	MLflow experiment name
`MLFLOW_TRACKING_URI`	(none)	MLflow server URI
`DSPY_RLM_OTEL_ENABLED`	`false`	Enable OpenTelemetry tracing
`OTEL_EXPORTER_OTLP_ENDPOINT`	(none)	OTLP gRPC endpoint
`OTEL_SERVICE_NAME`	`rlm-code`	OTEL service name
`DSPY_RLM_OTEL_METRICS_ENABLED`	`true`	Enable OTEL metrics alongside traces
`DSPY_RLM_LANGSMITH_ENABLED`	`false`	Enable LangSmith tracing
`LANGCHAIN_API_KEY`	(none)	LangSmith API key
`LANGCHAIN_PROJECT`	`rlm-code`	LangSmith project name
`DSPY_RLM_LANGFUSE_ENABLED`	`false`	Enable LangFuse observability
`LANGFUSE_PUBLIC_KEY`	(none)	LangFuse public API key
`LANGFUSE_SECRET_KEY`	(none)	LangFuse secret API key
`LANGFUSE_HOST`	`https://cloud.langfuse.com`	LangFuse host URL
`DSPY_RLM_LOGFIRE_ENABLED`	`false`	Enable Logfire (Pydantic) tracing
`LOGFIRE_TOKEN`	(none)	Logfire API token
`LOGFIRE_PROJECT_NAME`	`rlm-code`	Logfire project name

Multiple Sinks Simultaneously

You can activate as many sinks as you like. For example, you might keep the local JSONL sink for offline analysis, MLflow for experiment tracking dashboards, and OpenTelemetry for production tracing -- all at the same time.

Runtime Sink Management¶

The RLMObservability coordinator exposes three methods for runtime sink management:

Adding a Sink¶

from rlm_code.rlm.observability import RLMObservability

obs = RLMObservability.default(workdir=workdir, run_dir=run_dir)

# Add a custom sink at runtime
obs.add_sink(my_custom_sink)

Removing a Sink¶

removed = obs.remove_sink("mlflow")  # Returns True if found and removed

Retrieving a Sink¶

otel_sink = obs.get_sink("opentelemetry")
if otel_sink:
    trace_id = otel_sink.get_trace_id(run_id)

Querying Sink Status¶

for sink_status in obs.status():
    print(f"{sink_status['name']}: enabled={sink_status['enabled']}, "
          f"available={sink_status['available']}, detail={sink_status['detail']}")

Example output:

local-jsonl: enabled=True, available=True, detail=/home/user/.rlm_code/rlm/observability
mlflow: enabled=True, available=True, detail=http://localhost:5000
opentelemetry: enabled=False, available=False, detail=disabled
langsmith: enabled=False, available=False, detail=disabled
langfuse: enabled=False, available=False, detail=disabled
logfire: enabled=False, available=False, detail=disabled

Event Flow¶

Every RLM run follows this lifecycle through the observability system:

sequenceDiagram
    participant R as RLMRunner
    participant O as RLMObservability
    participant S as Sink (any)

    R->>O: on_run_start(run_id, task, env, params)
    O->>S: on_run_start(...)

    loop Each Iteration
        R->>O: on_step(run_id, event, cumulative_reward)
        O->>S: on_step(...)
    end

    R->>O: on_run_end(run_id, result, run_path)
    O->>S: on_run_end(...)

Error Isolation

If any sink raises an exception during any hook, the RLMObservability coordinator catches it, logs a warning, and proceeds to the next sink. Your run is never interrupted by a sink failure.

Quick Start¶

Enable MLflow and OpenTelemetry alongside the default local sink:

export DSPY_RLM_MLFLOW_ENABLED=true
export MLFLOW_TRACKING_URI=http://localhost:5000
export DSPY_RLM_OTEL_ENABLED=true
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317

rlm-code run --task "Build a DSPy signature" --environment dspy

All three sinks will receive the same events in parallel.

What's Next¶

Page	Description
Sink Architecture	Deep dive into the sink protocol, the `CompositeSink`, custom sink creation, and factory functions
MLflow	MLflow experiment tracking integration
OpenTelemetry	Distributed tracing with OTEL, Jaeger, and Zipkin
LangSmith	LangChain's LLM observability platform
LangFuse	Open-source LLM observability
Logfire	Pydantic's structured observability platform