Context, Compaction, and Pruning
The context for a run is the message stack provided to the model. Context is bounded by the model's context window (token limit), so long-running sessions use compaction and pruning to stay within limits without losing safety-critical information.
Context stack
Typical layers:
- System prompt (rules, tools, skills, runtime, injected files)
- Conversation history (user + assistant messages)
- Tool calls/results and attachments (command output, files, images/audio)
Tool schemas (contracts) are also part of what the model receives and therefore count toward the context window even though they are not plain-text history.
Context reports (inspectability)
For observability, the gateway produces a per-run context report that captures:
- injected workspace files (raw vs injected sizes)
- system prompt section sizes
- largest tool schema contributors
- recent-history and tool-result contributions
Operator clients expose the report via /context list and /context detail (see Observability).
Compaction
When the session approaches the context limit, older history is compacted into a summary that preserves safety and task-relevant facts.
Compaction should:
- Keep approvals, constraints, and user preferences intact.
- Preserve key decisions and unfinished threads.
- Avoid inventing facts or deleting obligations.
Compaction vs long-term memory
Session compaction is a prompt-level optimization; it is not a long-term memory system.
- The compaction summary exists to keep ongoing work safe and coherent within a bounded context window.
- Long-term memory lives in the StateStore (agent-scoped) and is retrieved as a budgeted digest for each turn (see Memory).
At compaction boundaries, the system MAY trigger consolidation workflows that promote durable lessons (facts/preferences/procedures) out of ephemeral context into long-term memory. These workflows are budget-driven and auditable, and must not silently “remember” sensitive content.
Pruning (tool-result trimming)
Pruning reduces context bloat by trimming or clearing older tool results in the prompt for a single run while leaving the durable transcript intact.
Pruning:
- applies only to tool-result messages (never to user or assistant turns)
- is deterministic and policy-controlled
- is designed to improve cost and cache behavior for providers that support prompt caching