LangFuse Integration¶
The LangFuseSink sends RLM traces to LangFuse, an open-source LLM observability platform that provides trace visualization, cost tracking, prompt management, and evaluation tools.
Overview¶
| Property | Value |
|---|---|
| Class | rlm_code.rlm.observability_sinks.LangFuseSink |
| Sink name | langfuse |
| Activation | DSPY_RLM_LANGFUSE_ENABLED=true |
| Primary env vars | LANGFUSE_PUBLIC_KEY + LANGFUSE_SECRET_KEY |
| Optional dependency | pip install langfuse |
Activation¶
export DSPY_RLM_LANGFUSE_ENABLED=true
export LANGFUSE_PUBLIC_KEY=pk-lf-your-public-key
export LANGFUSE_SECRET_KEY=sk-lf-your-secret-key
export LANGFUSE_HOST=https://cloud.langfuse.com # optional, default
Self-Hosted LangFuse
If you are running a self-hosted LangFuse instance, set LANGFUSE_HOST to your instance URL (e.g., http://localhost:3000).
Features¶
Open-Source LLM Observability¶
LangFuse is fully open-source and can be self-hosted. The RLM sink provides:
- Trace-level visibility into every RLM run
- Span-level detail for each step
- Automatic scoring of traces based on reward and completion
- Tag-based organization for filtering by environment
Trace Visualization¶
Each RLM run creates a trace in LangFuse:
| Trace Field | Value |
|---|---|
id | The RLM run_id |
name | rlm-run |
input | Task text |
metadata | Environment, params dict |
tags | ["rlm", "<environment>"] |
At run end, the trace is updated with output data:
| Output Field | Description |
|---|---|
completed | Whether the run completed |
steps | Total steps taken |
total_reward | Final reward |
final_answer | The final answer (first 500 chars) |
Step Spans¶
Each step creates a span nested under the trace:
| Span Field | Value |
|---|---|
name | step-<n> |
input | Action type and code (first 500 chars) |
metadata | Step number, reward, cumulative reward |
output | Success flag and output (first 500 chars) |
level | ERROR if the step failed, DEFAULT otherwise |
status_message | Error message if the step failed |
Cost Tracking¶
LangFuse automatically tracks token usage and cost when using its LLM integrations. The RLM sink provides per-step and per-run metrics that LangFuse uses to aggregate cost data across your project.
Automatic Scoring¶
At run end, the sink creates two scores on the trace:
| Score Name | Value | Description |
|---|---|---|
reward | float | The total cumulative reward for the run |
completed | 1.0 or 0.0 | Whether the run completed successfully |
These scores are visible in the LangFuse dashboard and can be used for filtering, aggregation, and evaluation.
# Scores are created automatically at run end
self._langfuse.score(
trace_id=run_id,
name="reward",
value=float(getattr(result, "total_reward", 0.0)),
)
self._langfuse.score(
trace_id=run_id,
name="completed",
value=1.0 if getattr(result, "completed", False) else 0.0,
)
Automatic Flush¶
The sink calls self._langfuse.flush() at the end of each run to ensure all data is sent to the LangFuse backend before the process exits.
Setup Guide¶
Option A: LangFuse Cloud¶
- Sign up at langfuse.com
- Create a project and obtain your API keys
- Configure environment variables:
export DSPY_RLM_LANGFUSE_ENABLED=true
export LANGFUSE_PUBLIC_KEY=pk-lf-your-public-key
export LANGFUSE_SECRET_KEY=sk-lf-your-secret-key
Option B: Self-Hosted LangFuse¶
- Deploy LangFuse using Docker:
Refer to the LangFuse self-hosting guide for the full docker-compose.yml.
- Configure with your local instance:
export DSPY_RLM_LANGFUSE_ENABLED=true
export LANGFUSE_PUBLIC_KEY=pk-lf-your-local-key
export LANGFUSE_SECRET_KEY=sk-lf-your-local-key
export LANGFUSE_HOST=http://localhost:3000
Install the SDK¶
Run a Task¶
View Traces¶
Open the LangFuse dashboard (cloud or self-hosted). Navigate to Traces and find the run by its run_id. The trace view shows:
- Timeline: Visual span hierarchy for each step
- Input/Output: Full task and result data
- Scores: Reward and completion scores
- Tags: Environment-based tags for filtering
Configuration Options¶
| Parameter | Type | Default | Env Var | Description |
|---|---|---|---|---|
enabled | bool | False | DSPY_RLM_LANGFUSE_ENABLED | Enable/disable the sink |
host | str | None | None | LANGFUSE_HOST | LangFuse host URL |
API Keys
LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY are read directly by the langfuse Python SDK, not by the sink constructor. They must be set as environment variables.
Programmatic Usage¶
from rlm_code.rlm.observability_sinks import LangFuseSink
sink = LangFuseSink(
enabled=True,
host="http://localhost:3000",
)
print(sink.status())
# {'name': 'langfuse', 'enabled': True, 'available': True,
# 'detail': 'http://localhost:3000'}
Factory Function¶
from rlm_code.rlm.observability_sinks import create_langfuse_sink_from_env
# Reads DSPY_RLM_LANGFUSE_ENABLED and LANGFUSE_HOST
sink = create_langfuse_sink_from_env()
Connection Validation¶
During initialization, the sink validates the connection by calling self._langfuse.auth_check():
try:
from langfuse import Langfuse
self._langfuse = Langfuse(host=self.host) if self.host else Langfuse()
self._langfuse.auth_check()
self._available = True
self._detail = self.host or "https://cloud.langfuse.com"
except Exception as exc:
self._available = False
self._detail = f"connection failed: {exc}"
If the auth check fails, the sink becomes inactive and all subsequent hook calls return immediately.
Trace Structure¶
A typical 3-step RLM run creates this structure in LangFuse:
Trace: rlm-run (id: abc12345)
Tags: [rlm, dspy]
Input: { task: "Create a DSPy signature..." }
|
+-- Span: step-1
| Input: { action: "run_python", code: "..." }
| Output: { success: true, output: "..." }
| Metadata: { step: 1, reward: 0.5, cumulative_reward: 0.5 }
|
+-- Span: step-2
| Input: { action: "run_python", code: "..." }
| Output: { success: true, output: "..." }
| Metadata: { step: 2, reward: 0.5, cumulative_reward: 1.0 }
|
+-- Span: step-3
Input: { action: "submit", code: "" }
Output: { success: true, output: "..." }
Metadata: { step: 3, reward: 0.5, cumulative_reward: 1.5 }
|
Output: { completed: true, steps: 3, total_reward: 1.5 }
Scores: reward=1.5, completed=1.0
Troubleshooting¶
| Symptom | Cause | Solution |
|---|---|---|
available: False, langfuse not installed | SDK not installed | pip install langfuse |
available: False, connection failed | Bad API keys or network | Verify LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY |
| Traces not appearing | Sink not enabled | Set DSPY_RLM_LANGFUSE_ENABLED=true |
| Traces show in wrong host | LANGFUSE_HOST mismatch | Set LANGFUSE_HOST to the correct URL |
| Missing scores | Run did not complete on_run_end | Check for run errors; scores are created at run end |