Context API¶
The context API retrieves specification memory and optimizes it for an agent token budget.
Use it when you need to serve context to an agent through an API, MCP tool, web UI, or custom orchestration layer.
Components¶
| Component | Purpose |
|---|---|
TokenEstimator |
Counts tokens and estimates format overhead |
ContextOptimizer |
Sorts, truncates, and fits memory chunks into a budget |
ContextFormatter |
Formats optimized chunks as JSON, Markdown, or text |
ProfileManager |
Applies per-agent context preferences |
StreamingContextAPI |
Synchronous and async streaming context retrieval |
Synchronous Context¶
from specmem.context import StreamingContextAPI
api = StreamingContextAPI(memory_bank, default_budget=4000)
response = api.get_context(
query="authentication requirements and impacted tests",
token_budget=4000,
format="markdown",
top_k=20,
)
print(response.formatted_content)
print(response.total_tokens)
Streaming Context¶
from specmem.context import ContextChunk, StreamCompletion, StreamingContextAPI
api = StreamingContextAPI(memory_bank)
async for item in api.stream_query(
"payment retry behavior",
token_budget=3000,
format="json",
timeout_ms=1500,
):
if isinstance(item, ContextChunk):
print(item.text)
elif isinstance(item, StreamCompletion):
print(item.to_dict())
Optimization Rules¶
The optimizer prioritizes:
- pinned memory
- higher relevance scores
- complete chunks over truncated chunks
- sentence-boundary truncation when a chunk is too large
The response includes:
| Field | Description |
|---|---|
chunks |
Optimized context chunks |
total_tokens |
Token count after optimization |
token_budget |
Budget used for the request |
truncated_count |
Number of chunks shortened to fit |
formatted_content |
Rendered JSON, Markdown, or text payload |
Agent Profiles¶
Pass profile to apply a stored agent profile:
Profiles can set token budgets, preferred output format, and type filters. This lets different coding agents share the same memory bank while receiving context in the shape they work best with.