Skip to content

LangSmith Integration

The LangSmithSink sends RLM run traces to LangSmith, LangChain's observability platform for LLM application debugging, testing, and monitoring.


Overview

Property Value
Class rlm_code.rlm.observability_sinks.LangSmithSink
Sink name langsmith
Activation DSPY_RLM_LANGSMITH_ENABLED=true
Primary env var LANGCHAIN_API_KEY
Optional dependency pip install langsmith

Activation

export DSPY_RLM_LANGSMITH_ENABLED=true
export LANGCHAIN_API_KEY=ls-your-api-key-here
export LANGCHAIN_PROJECT=rlm-code              # optional, default: rlm-code
export LANGCHAIN_TRACING_V2=true               # auto-set by the sink if not present

LANGCHAIN_TRACING_V2

The sink automatically calls os.environ.setdefault("LANGCHAIN_TRACING_V2", "true") during initialization. You do not need to set this variable manually unless you want to ensure it is set before any other LangChain code runs.


Features

Run Tracing

Each RLM run is represented as a root RunTree in LangSmith:

  • Name: rlm-run-<first 8 chars of run_id>
  • Run type: chain
  • Project: Configurable via LANGCHAIN_PROJECT (default: rlm-code)
  • Inputs: Task text, environment name, and full parameters dict
  • Metadata: run_id and environment

Step-Level Child Runs

Each step is created as a child run under the root:

Field Source
Name step-<n>
Run type tool
Inputs Action type and code (first 500 chars)
Outputs Success flag, output (first 500 chars), reward, cumulative reward
Error Set if the step did not succeed

Run Completion

At run end, the root RunTree is updated with outputs:

Output Field Description
completed Whether the run completed successfully
steps Total number of steps taken
total_reward Final cumulative reward
final_answer The final answer text (first 500 chars)

If the run did not complete, an error message is attached.

Feedback Collection

LangSmith supports feedback/evaluation annotations on runs. While the sink does not automatically create feedback, you can add it via the LangSmith SDK using the run_id logged in the trace.

Dataset Creation

You can use LangSmith's dataset features to create evaluation datasets from RLM benchmark results. Export runs from the LangSmith UI or use the SDK to query by project name.


Setup Guide

1. Create a LangSmith Account

Sign up at smith.langchain.com and obtain an API key.

2. Install the SDK

pip install langsmith

3. Configure Environment

export DSPY_RLM_LANGSMITH_ENABLED=true
export LANGCHAIN_API_KEY=ls-your-api-key-here
export LANGCHAIN_PROJECT=rlm-code

4. Run a Task

rlm-code run --task "Create a DSPy module" --environment dspy

5. View Traces

Open smith.langchain.com, navigate to your project (rlm-code), and view the run traces. Each trace shows:

  • The root run with inputs and outputs
  • Child runs for each step with timing
  • Input/output data for debugging
  • Error details for failed steps

Configuration Options

Parameter Type Default Env Var Description
enabled bool False DSPY_RLM_LANGSMITH_ENABLED Enable/disable the sink
project str "rlm-code" LANGCHAIN_PROJECT LangSmith project name

Programmatic Usage

from rlm_code.rlm.observability_sinks import LangSmithSink

sink = LangSmithSink(
    enabled=True,
    project="my-rlm-project",
)

print(sink.status())
# {'name': 'langsmith', 'enabled': True, 'available': True,
#  'detail': 'project: my-rlm-project', 'project': 'my-rlm-project'}

Factory Function

from rlm_code.rlm.observability_sinks import create_langsmith_sink_from_env

# Reads DSPY_RLM_LANGSMITH_ENABLED and LANGCHAIN_PROJECT
sink = create_langsmith_sink_from_env()

Trace Structure

A typical RLM run creates this hierarchy in LangSmith:

rlm-run-abc12345 (chain)
  |-- Inputs: { task: "...", environment: "dspy", params: {...} }
  |-- Metadata: { run_id: "abc12345", environment: "dspy" }
  |
  +-- step-1 (tool)
  |    |-- Inputs: { action: "run_python", code: "import dspy..." }
  |    |-- Outputs: { success: true, output: "...", reward: 0.5 }
  |
  +-- step-2 (tool)
  |    |-- Inputs: { action: "run_python", code: "class Module..." }
  |    |-- Outputs: { success: true, output: "...", reward: 0.5 }
  |
  +-- step-3 (tool)
       |-- Inputs: { action: "submit", code: "" }
       |-- Outputs: { success: true, reward: 0.5 }
  |
  |-- Outputs: { completed: true, steps: 3, total_reward: 1.5 }

Connection Validation

During initialization, the sink tests the connection to LangSmith by calling self._client.list_projects(limit=1). If this call fails (invalid API key, network error, etc.), the sink sets _available=False and records the error:

try:
    self._client = Client()
    self._client.list_projects(limit=1)
    self._available = True
    self._detail = f"project: {self.project}"
except Exception as exc:
    self._available = False
    self._detail = f"connection failed: {exc}"

Check Status

After initialization, call sink.status() to verify the connection. The available field tells you whether the sink is live.


Troubleshooting

Symptom Cause Solution
available: False, detail mentions ImportError langsmith package not installed pip install langsmith
available: False, detail mentions connection failed Invalid API key or network issue Verify LANGCHAIN_API_KEY and network connectivity
Traces not appearing in UI Sink not enabled Ensure DSPY_RLM_LANGSMITH_ENABLED=true
Traces in wrong project LANGCHAIN_PROJECT mismatch Set LANGCHAIN_PROJECT to the correct project name
Missing step details Step truncation Code and output are truncated to 500 chars in LangSmith