Skip to content

Trace Analysis

rlm-code includes a HALO-style trace analysis environment for diagnosing agent harness failures from one-span-per-line JSONL traces.

The environment is named trace_analysis. It indexes a trace file into a sidecar cache, exposes bounded trace-inspection actions to the RLM planner, and keeps large payloads under control by returning summaries or selected spans instead of blindly loading full traces into context.

It can also export an AHE-style layered evidence corpus for downstream coding agents or meta-harness: a benchmark-level overview.md, one detail report per selected trace, an index.json, and optional processed raw JSONL span files for drill-down.

Usage

/rlm run "Find systemic harness failures trace=./traces.jsonl" env=trace_analysis steps=6

The task can include either trace=<path> or trace_path=<path>. The planner can also explicitly load a file with the set_trace_path action.

Actions

The environment supports these planner actions:

Action Purpose
set_trace_path Load and index a trace JSONL file
get_dataset_overview Return dataset-level trace, span, service, model, agent, token, and error counts
query_traces List matching trace summaries with pagination
count_traces Count matching traces without materializing summaries
view_trace Read all spans for a small trace, or return an oversized summary
search_trace Search one trace for a literal substring
view_spans Read selected spans at a higher per-attribute cap
export_evidence_corpus Write layered evidence files for downstream harness optimization
final Return the final evidence report

Supported filters are has_errors, model_names, service_names, agent_names, and project_id.

Evidence Corpus Export

Use export_evidence_corpus when a report should be handed to another coding agent or to meta-harness --trace-evidence.

Planner action shape:

{
  "action": "export_evidence_corpus",
  "output_dir": "./trace-evidence",
  "filters": {"has_errors": true},
  "limit": 100,
  "include_raw": true
}

The output directory contains:

  • overview.md: compact entry point with dataset counts and links to detail files
  • detail/<trace-id>.md: per-trace summary, task ids, error spans, and tool-like spans
  • raw/<trace-id>.jsonl: processed selected raw spans for drill-down when include_raw is true
  • index.json: machine-readable corpus metadata and trace file references

For MetaHarness, pass the generated overview directly:

uv run metaharness run ./my-harness \
  --trace-evidence ./trace-evidence/overview.md

Trace Shape

The first implementation expects one JSON object per line. Each line should represent one span with fields such as:

{
  "trace_id": "trace-1",
  "span_id": "span-1",
  "parent_span_id": null,
  "name": "agent.Root",
  "kind": "SPAN_KIND_INTERNAL",
  "start_time": "2026-01-01T00:00:00Z",
  "end_time": "2026-01-01T00:00:01Z",
  "status": {"code": "STATUS_CODE_ERROR"},
  "resource": {"attributes": {"service.name": "my-agent"}},
  "attributes": {
    "inference.project_id": "my-project",
    "inference.agent_name": "Root",
    "inference.llm.model_name": "gpt-test"
  }
}

This is intentionally compatible with the HALO/OpenTelemetry-style file export pattern where trace data is stored as JSONL and queried through a sidecar index.