Trajectory Logging¶

Module

rlm_code.rlm.trajectory

Trajectory logging provides JSONL-based execution tracing for RLM runs. It records every phase of execution -- reasoning, code, output, LLM calls, child agents, and termination -- in a format compatible with agent evaluation frameworks and visualization tools.

Overview¶

Every RLM run produces a trajectory: a chronological sequence of events capturing what the LLM thought, what code it wrote, what output it received, and how it arrived at its final answer. Trajectories are stored as JSONL (one JSON object per line) for streaming-friendly append-only writes.

traces/run_001.jsonl
{"event_type": "run_start", "timestamp": 1706400000.0, "run_id": "run_001", "data": {"task": "..."}}
{"event_type": "iteration_reasoning", "timestamp": 1706400001.2, "run_id": "run_001", "iteration": 1, "data": {"reasoning": "..."}}
{"event_type": "iteration_code", "timestamp": 1706400001.5, "run_id": "run_001", "iteration": 1, "data": {"code": "..."}}
{"event_type": "iteration_output", "timestamp": 1706400002.1, "run_id": "run_001", "iteration": 1, "data": {"output": "..."}, "duration_ms": 600}
{"event_type": "final_detected", "timestamp": 1706400005.0, "run_id": "run_001", "data": {"answer": "..."}}
{"event_type": "run_end", "timestamp": 1706400005.1, "run_id": "run_001", "data": {"success": true}, "duration_ms": 5100}

Classes¶

`TrajectoryEventType`¶

Enumeration of 18 event types for trajectory logging.

class TrajectoryEventType(str, Enum):
    # Run lifecycle
    RUN_START = "run_start"
    RUN_END = "run_end"

    # REPL iterations
    ITERATION_START = "iteration_start"
    ITERATION_REASONING = "iteration_reasoning"
    ITERATION_CODE = "iteration_code"
    ITERATION_OUTPUT = "iteration_output"
    ITERATION_END = "iteration_end"

    # LLM calls
    LLM_REQUEST = "llm_request"
    LLM_RESPONSE = "llm_response"

    # Sub-LLM (llm_query from code)
    SUB_LLM_REQUEST = "sub_llm_request"
    SUB_LLM_RESPONSE = "sub_llm_response"

    # Child agents
    CHILD_SPAWN = "child_spawn"
    CHILD_RESULT = "child_result"

    # Termination
    FINAL_DETECTED = "final_detected"

    # Context
    CONTEXT_LOAD = "context_load"
    CONTEXT_UPDATE = "context_update"

    # Memory
    MEMORY_COMPACT = "memory_compact"

    # Errors
    ERROR = "error"

Category	Event Types
Run lifecycle	`RUN_START`, `RUN_END`
Iterations	`ITERATION_START`, `ITERATION_REASONING`, `ITERATION_CODE`, `ITERATION_OUTPUT`, `ITERATION_END`
LLM calls	`LLM_REQUEST`, `LLM_RESPONSE`
Sub-LLM	`SUB_LLM_REQUEST`, `SUB_LLM_RESPONSE`
Child agents	`CHILD_SPAWN`, `CHILD_RESULT`
Termination	`FINAL_DETECTED`
Context	`CONTEXT_LOAD`, `CONTEXT_UPDATE`
Memory	`MEMORY_COMPACT`
Errors	`ERROR`

`TrajectoryEvent`¶

A single event in a trajectory.

@dataclass
class TrajectoryEvent:
    event_type: TrajectoryEventType          # Event category
    timestamp: float = field(...)            # Unix timestamp (time.time())
    run_id: str = ""                         # Run identifier
    iteration: int | None = None             # Iteration number
    depth: int = 0                           # Recursion depth (0 = root)
    parent_id: str | None = None             # Parent agent ID

    # Event-specific data
    data: dict[str, Any] = field(...)        # Arbitrary event payload

    # Metrics
    tokens_in: int | None = None             # Input tokens
    tokens_out: int | None = None            # Output tokens
    duration_ms: float | None = None         # Duration in milliseconds

Field	Type	Description
`event_type`	`TrajectoryEventType`	Category of the event
`timestamp`	`float`	Unix timestamp (from `time.time()`)
`run_id`	`str`	Correlates events within a single run
`iteration`	`int \\| None`	Which iteration this event belongs to
`depth`	`int`	Recursion depth (0 for root agent, 1+ for children)
`parent_id`	`str \\| None`	ID of the parent agent (for child events)
`data`	`dict[str, Any]`	Event-specific payload
`tokens_in`	`int \\| None`	Input token count
`tokens_out`	`int \\| None`	Output token count
`duration_ms`	`float \\| None`	Duration in milliseconds

Serialization¶

event = TrajectoryEvent(
    event_type=TrajectoryEventType.ITERATION_CODE,
    run_id="run_abc123",
    iteration=2,
    data={"code": "print(len(context))"},
)

d = event.to_dict()
# {"event_type": "iteration_code", "timestamp": 1706400001.5,
#  "run_id": "run_abc123", "iteration": 2,
#  "data": {"code": "print(len(context))"}}

Deserialization¶

event = TrajectoryEvent.from_dict({
    "event_type": "iteration_output",
    "timestamp": 1706400002.1,
    "run_id": "run_abc123",
    "iteration": 2,
    "data": {"output": "45230"},
    "duration_ms": 15.3,
})

`TrajectoryLogger`¶

JSONL trajectory logger for RLM execution. Provides convenience methods for logging every event type.

class TrajectoryLogger:
    def __init__(
        self,
        output_path: str | Path,
        run_id: str | None = None,
        metadata: dict[str, Any] | None = None,
    ):

Parameter	Type	Default	Description
`output_path`	`str \\| Path`	required	Path to the JSONL output file
`run_id`	`str \\| None`	Auto-generated	Run identifier (default: `run_{timestamp_ms}`)
`metadata`	`dict[str, Any] \\| None`	`None`	Arbitrary metadata stored with `run_start`

Directory Creation

The logger automatically creates parent directories if they do not exist.

Context Manager Support¶

with TrajectoryLogger("traces/run_001.jsonl") as logger:
    logger.log_run_start(task="Analyze data")
    # ... log events ...
    logger.log_run_end(success=True, answer="42")

Logging Methods¶

`log_event(event)`¶

Log a raw TrajectoryEvent. Automatically sets run_id, iteration, depth, and parent_id from logger state.

`log_run_start(task, context_length=None, model=None)`¶

Log the beginning of a run.

logger.log_run_start(
    task="Analyze sentiment of customer reviews",
    context_length=45230,
    model="gpt-4o",
)

`log_run_end(success, answer=None, total_tokens=None, duration_seconds=None)`¶

Log the end of a run.

logger.log_run_end(
    success=True,
    answer="Overall sentiment is positive (87% confidence)",
    total_tokens=8523,
)

`log_iteration_start(iteration)`¶

Log the start of an iteration. Updates the logger's current iteration counter.

`log_iteration(iteration, reasoning, code, output, duration_ms=None, tokens_used=None)`¶

Convenience method to log a complete iteration (reasoning + code + output) in one call.

logger.log_iteration(
    iteration=1,
    reasoning="First, explore the data structure",
    code='print(f"Length: {len(context)}")',
    output="Length: 45230",
    duration_ms=15.3,
)

`log_llm_call(prompt, response, tokens_in=None, tokens_out=None, duration_ms=None, is_sub_llm=False)`¶

Log an LLM call (root or sub-LLM).

# Root LLM call
logger.log_llm_call(
    prompt="Analyze this data...",
    response="The data shows...",
    tokens_in=500,
    tokens_out=200,
    duration_ms=1200,
)

# Sub-LLM call from llm_query()
logger.log_llm_call(
    prompt="Summarize this chunk...",
    response="This chunk discusses...",
    is_sub_llm=True,
)

Response Truncation

Responses longer than 1,000 characters are truncated in the log to keep file sizes manageable.

`log_child_spawn(child_id, task, depth)`¶

Log a child agent being spawned.

`log_child_result(child_id, result, success)`¶

Log a child agent's result. Results are truncated to 500 characters.

`log_final(answer)`¶

Log final answer detection.

`log_context_load(context_type, length, preview=None)`¶

Log context being loaded. Preview is truncated to 200 characters.

`log_error(error, traceback=None)`¶

Log an error with optional traceback.

Depth Management¶

For recursive/child agent tracing:

logger.push_depth("child_agent_001")  # depth becomes 1
# ... log child events ...
logger.pop_depth()                     # depth returns to 0

`close()`¶

Close the underlying file handle. Called automatically by the context manager.

`TrajectoryViewer`¶

Viewer for trajectory JSONL files. Provides visualization and analysis.

class TrajectoryViewer:
    def __init__(self, trajectory_path: str | Path):

Events are loaded from the JSONL file on construction. Invalid lines are silently skipped.

Methods¶

`events() -> list[TrajectoryEvent]`¶

Get all loaded events.

`iterations() -> Iterator[list[TrajectoryEvent]]`¶

Yield events grouped by iteration number.

viewer = TrajectoryViewer("traces/run_001.jsonl")
for iteration_events in viewer.iterations():
    print(f"Iteration with {len(iteration_events)} events")

`summary() -> dict[str, Any]`¶

Get a comprehensive trajectory summary.

summary = viewer.summary()

Returns:

Key	Type	Description
`run_id`	`str`	Run identifier
`task`	`str`	Original task
`success`	`bool`	Whether the run succeeded
`answer`	`str`	Final answer
`total_events`	`int`	Total event count
`total_iterations`	`int`	Number of unique iterations
`max_depth`	`int`	Maximum recursion depth reached
`total_tokens_in`	`int`	Sum of all input tokens
`total_tokens_out`	`int`	Sum of all output tokens
`total_tokens`	`int`	`total_tokens_in + total_tokens_out`
`total_duration_ms`	`float`	Sum of all event durations
`event_counts`	`dict[str, int]`	Count of each event type

`format_tree() -> str`¶

Format the trajectory as a human-readable tree visualization.

print(viewer.format_tree())

Example output:

Trajectory: run_abc123
Task: Analyze sentiment of customer reviews...
Status: SUCCESS

[Iteration 1]
  THINK: First, let me explore the structure of the context dat...
  CODE: print(f"Context length: {len(context)} chars")...
  OUTPUT: Context length: 45230 chars... (15ms)
[Iteration 2]
  THINK: Now let me process chunks using llm_query_batched...
  CODE: chunks = [context[i:i+10000] for i in range(0, len(cont...
  OUTPUT: Processed 5 chunks successfully... (3200ms)
  -> SUB_LLM_CALL
[Iteration 3]
  THINK: Aggregate summaries and produce final answer...
  CODE: final = llm_query(f"Combine: {summaries}")...
  FINAL: Overall sentiment is positive (87% confidence)...

Summary: 3 iterations, 8523 tokens, 4500ms

`export_html(output_path)`¶

Export the trajectory as an interactive HTML file with a dark theme, collapsible iterations, and syntax-highlighted code blocks.

viewer.export_html("traces/run_001.html")

The HTML includes:

Header with run metadata and summary metrics (status, iterations, tokens, duration, depth)
Timeline with collapsible iteration sections
Color-coded events (reasoning in orange, code in monospace, output in green, finals in teal, errors in red)
Click-to-expand iterations (first iteration expanded by default)
Sub-LLM and child events indented to show nesting

Standalone Functions¶

`load_trajectory(path) -> TrajectoryViewer`¶

Convenience function to load a trajectory file for viewing.

from rlm_code.rlm.trajectory import load_trajectory

viewer = load_trajectory("traces/run_001.jsonl")
print(viewer.summary())

`compare_trajectories(paths) -> dict[str, Any]`¶

Compare multiple trajectory files and compute aggregate statistics.

from rlm_code.rlm.trajectory import compare_trajectories

comparison = compare_trajectories([
    "traces/run_001.jsonl",
    "traces/run_002.jsonl",
    "traces/run_003.jsonl",
])

Returns:

{
    "trajectories": [
        {
            "path": "traces/run_001.jsonl",
            "run_id": "run_001",
            "task": "Analyze sentiment...",
            "success": True,
            "iterations": 3,
            "tokens": 8523,
            "duration_ms": 4500,
        },
        # ... more trajectories ...
    ],
    "comparison": {
        "avg_iterations": 3.7,
        "avg_tokens": 9100,
        "avg_duration_ms": 5200,
        "success_rate": 0.67,  # 2 out of 3 succeeded
    },
}

JSONL Format Specification¶

Each line in a trajectory JSONL file is a valid JSON object with the following structure:

Required Fields¶

Field	Type	Description
`event_type`	`string`	One of the `TrajectoryEventType` values
`timestamp`	`float`	Unix timestamp
`run_id`	`string`	Run identifier

Optional Fields¶

Field	Type	Description
`iteration`	`integer`	Iteration number
`depth`	`integer`	Recursion depth (omitted when 0)
`parent_id`	`string`	Parent agent ID
`data`	`object`	Event-specific payload
`tokens_in`	`integer`	Input tokens
`tokens_out`	`integer`	Output tokens
`duration_ms`	`float`	Duration in milliseconds

Example Complete Trajectory¶

{"event_type": "run_start", "timestamp": 1706400000.0, "run_id": "run_001", "data": {"task": "Analyze sentiment", "context_length": 45230, "model": "gpt-4o"}}
{"event_type": "iteration_start", "timestamp": 1706400000.1, "run_id": "run_001", "iteration": 1}
{"event_type": "iteration_reasoning", "timestamp": 1706400001.0, "run_id": "run_001", "iteration": 1, "data": {"reasoning": "Explore context structure"}}
{"event_type": "iteration_code", "timestamp": 1706400001.1, "run_id": "run_001", "iteration": 1, "data": {"code": "print(len(context))"}}
{"event_type": "iteration_output", "timestamp": 1706400001.5, "run_id": "run_001", "iteration": 1, "data": {"output": "45230"}, "duration_ms": 15}
{"event_type": "sub_llm_request", "timestamp": 1706400002.0, "run_id": "run_001", "iteration": 2, "data": {"prompt": "Summarize..."}, "tokens_in": 500}
{"event_type": "sub_llm_response", "timestamp": 1706400003.5, "run_id": "run_001", "iteration": 2, "data": {"response": "This discusses..."}, "tokens_out": 200, "duration_ms": 1500}
{"event_type": "final_detected", "timestamp": 1706400005.0, "run_id": "run_001", "iteration": 3, "data": {"answer": "Sentiment is positive"}}
{"event_type": "run_end", "timestamp": 1706400005.1, "run_id": "run_001", "data": {"success": true, "answer": "Sentiment is positive", "total_iterations": 3}, "duration_ms": 5100}

Complete Usage Example¶

from rlm_code.rlm.trajectory import TrajectoryLogger, TrajectoryViewer

# --- Logging ---
with TrajectoryLogger("traces/my_run.jsonl", metadata={"model": "gpt-4o"}) as logger:
    logger.log_run_start(task="Analyze data", context_length=50000)

    logger.log_iteration(
        iteration=1,
        reasoning="Explore data structure",
        code='print(type(context), len(context))',
        output="<class 'str'> 50000",
        duration_ms=12,
    )

    logger.log_llm_call(
        prompt="Summarize the first section...",
        response="The first section covers...",
        tokens_in=600,
        tokens_out=150,
        duration_ms=1100,
        is_sub_llm=True,
    )

    logger.log_final("Analysis complete: found 3 key themes")
    logger.log_run_end(success=True, answer="3 key themes", total_tokens=2500)

# --- Viewing ---
viewer = TrajectoryViewer("traces/my_run.jsonl")
print(viewer.format_tree())
viewer.export_html("traces/my_run.html")

summary = viewer.summary()
print(f"Run {summary['run_id']}: {summary['total_iterations']} iterations, "
      f"{summary['total_tokens']} tokens")

Trajectory Logging¶

Overview¶

Classes¶

TrajectoryEventType¶

TrajectoryEvent¶

Serialization¶

Deserialization¶

TrajectoryLogger¶

Context Manager Support¶

Logging Methods¶

log_event(event)¶

log_run_start(task, context_length=None, model=None)¶

log_run_end(success, answer=None, total_tokens=None, duration_seconds=None)¶

log_iteration_start(iteration)¶

log_iteration(iteration, reasoning, code, output, duration_ms=None, tokens_used=None)¶

log_llm_call(prompt, response, tokens_in=None, tokens_out=None, duration_ms=None, is_sub_llm=False)¶

log_child_spawn(child_id, task, depth)¶

log_child_result(child_id, result, success)¶

log_final(answer)¶

log_context_load(context_type, length, preview=None)¶

log_error(error, traceback=None)¶