Building a Memory Layer for Coding Agents¶
SpecMem can be used as a practical external memory layer for coding agents. It turns repository knowledge, specifications, coding guidelines, Kiro configuration, and previous sessions into queryable context that an agent can retrieve before it edits code.
This guide is written for developers building agent memory systems with vector search, hybrid retrieval, and structured specifications.
What SpecMem Stores¶
SpecMem normalizes several kinds of project knowledge into memory:
| Source | Stored As | Why It Matters |
|---|---|---|
.kiro/specs/, SpecKit, Tessl, Claude, Cursor, Codex, Gemini, Warp, Factory |
SpecBlock records in SpecIR |
Keeps requirements, design intent, and tasks portable across agents |
| Kiro steering files, hooks, and MCP config | Structured Kiro config records | Preserves how the workspace expects agents to behave |
| Coding guidelines | Pinned or searchable constraints | Keeps style, architecture, and team rules in context |
| Kiro coding sessions | Indexed session records | Lets agents recover prior decisions and debugging history |
| Impact graph and coverage data | Relationships between specs, files, and tests | Supports targeted retrieval and selective test runs |
| Lifecycle metadata | Active, deprecated, legacy, obsolete status | Prevents stale specs from dominating results |
Memory Architecture¶
flowchart LR
A[Specs and agent files] --> B[Adapters]
C[Kiro config and sessions] --> B
D[Coding guidelines] --> B
B --> E[SpecIR]
E --> F[Memory Bank]
E --> G[Impact Graph]
E --> H[Lifecycle and Validation]
F --> I[Vector Store]
I --> J[LanceDB, ChromaDB, Qdrant]
F --> K[Context Optimizer]
G --> K
H --> K
K --> L[CLI, MCP, Web UI, Python API]
The key idea is that memory is not only a vector database. The vector store finds semantically related records, while structured metadata controls what gets included, excluded, pinned, compressed, or linked to changed files.
Retrieval Pattern¶
SpecMem uses a hybrid retrieval pattern:
- Semantic search finds relevant specs by embedding similarity.
- Pinned memory includes critical constraints even when the query would not rank them highly.
- Graph retrieval connects files to specs, specs to tests, and related project concepts.
- Lifecycle filtering suppresses obsolete memory and warns on deprecated memory.
- Token optimization ranks, truncates, and formats context so it fits an agent budget.
from specmem import SpecMemClient
sm = SpecMemClient()
bundle = sm.get_context_for_change(
["src/auth/service.py"],
token_budget=4000,
)
print(bundle.to_markdown())
For agent integrations, expose this through MCP:
The agent can then call specmem_context, specmem_query, specmem_impact, specmem_validate, and related tools before making changes.
Qdrant for Production Memory¶
Use Qdrant when memory has to survive beyond a local developer machine, serve multiple agents, or scale to larger repositories.
Install the optional backend:
Use embedded Qdrant for local experiments:
Use Qdrant server or Qdrant Cloud for shared memory:
[vectordb]
backend = "qdrant"
path = ".specmem/qdrant"
[vectordb.qdrant]
url = "https://your-cluster.qdrant.io"
api_key = "${QDRANT_API_KEY}"
Qdrant gives SpecMem a production retrieval substrate while SpecMem keeps the agent-facing semantics: spec types, pinned constraints, lifecycle state, source paths, and audit history.
Session Memory¶
Kiro session search turns past coding conversations into retrievable memory. This is useful when the architectural intent lives in a previous debugging or design session rather than in a formal spec.
specmem sessions config --auto --workspace-only
specmem sessions index --workspace-only
specmem sessions search "why did we choose qdrant" --days 30
specmem sessions view <session-id>
Use --robot when another tool or agent needs JSON:
Session search currently falls back to text search when no semantic session vector store is configured, so it works as a low-friction recovery path even before a full production setup.
Context Optimization¶
Long-running coding tasks fail when the agent receives either too little context or too much undifferentiated context. SpecMem’s optimizer ranks memory and fits it to a token budget:
- pinned blocks first
- then higher semantic relevance
- then complete chunks before truncated chunks
- sentence-boundary truncation when a block is too large
- JSON, Markdown, or text formatting overhead included in the budget
Use the streaming API when the agent UI or orchestration layer wants incremental context delivery:
from specmem.context import StreamingContextAPI
api = StreamingContextAPI(memory_bank, default_budget=4000)
async for item in api.stream_query(
"authentication requirements and test impact",
profile="claude-code",
timeout_ms=1500,
):
print(item.to_dict())
Agent profiles let each coding agent use its own context preferences without changing the underlying memory store.
Making Specs Stay Useful¶
A memory layer becomes noisy if stale specs never decay. SpecMem includes lifecycle tools to keep memory useful:
Use these before important agent work or in CI:
validatedetects contradictions, missing acceptance criteria, duplicates, and timeline issues.healthscores spec quality and freshness.compressreduces verbose specs so they fit into context windows.prunearchives stale or orphaned memory.
Demo Flow for a Talk¶
For a live walkthrough of an agent memory layer:
-
Initialize memory in a repo:
-
Show semantic retrieval:
-
Show repository-scale impact:
-
Switch from local memory to Qdrant:
-
Recover prior session context:
-
Connect an agent through MCP and call
specmem_contextbefore editing code.
The story to emphasize is simple: vector search finds likely memory, structured specs preserve intent, and token-aware context optimization makes the memory usable by real coding agents.