Memory & Context¶
The problem¶
An agent needs to remember the conversation — but a model's context window is finite and every token costs money and latency. So there are really two problems hiding under "memory":
- Persistence — where do past turns live, so the agent can recall them across runs and restarts?
- Context assembly — how do you fit a long history into a small window before each model call, without losing what matters?
Ravi splits these cleanly: a HistoryProvider owns persistence; a CompactionPipeline owns context assembly. They meet in the ContextConfig you pass to an agent.
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#E8EAF6','primaryTextColor': '#1A237E','primaryBorderColor': '#3949AB','lineColor': '#546E7A','fontSize': '13px'}}}%%
graph LR
classDef store fill:#F3E5F5,stroke:#6A1B9A,color:#4A148C,font-weight:bold
classDef process fill:#E8EAF6,stroke:#3949AB,color:#1A237E,font-weight:bold
classDef llm fill:#E8F5E9,stroke:#2E7D32,color:#1B5E20,font-weight:bold
H[("HistoryProvider<br/>full transcript")]:::store -->|"get_messages(session_id)"| C["CompactionPipeline<br/>trim to fit"]:::process
C -->|"compacted window"| LLM["Model call"]:::llm
LLM -->|"new turns"| H The full transcript always lives in the history provider. Compaction produces a view for the model — it never deletes the source.
Persistence: the HistoryProvider¶
A HistoryProvider is a small Protocol for reading and writing turns, always scoped by session_id so conversations never bleed into each other:
class HistoryProvider(Protocol):
async def append(self, agent_id, message, *, session_id, run_id): ...
async def append_many(self, agent_id, messages, *, session_id, run_id): ...
async def get_messages(self, agent_id, *, session_id) -> list[ChatMessage]: ...
async def clear(self, agent_id, *, session_id): ...
async def clear_run(self, agent_id, *, session_id, run_id): ...
Three backends ship, all interchangeable:
| Backend | Use |
|---|---|
InMemoryHistoryProvider | Dev, tests, throwaway sessions |
RedisHistoryProvider | Fast, TTL'd session memory |
PostgresHistoryProvider | Durable, queryable transcripts |
Retention: how long history lives¶
Different agents want different lifetimes. ContextConfig carries a HistoryRetention policy:
| Policy | Behaviour | For |
|---|---|---|
PERMANENT | Kept forever | User-facing assistants |
RUN | Deleted when the run ends | Transient sub-agents |
NONE | Never written | Stateless workers |
When a run completes, the Worker honours this: a RUN-retention sub-agent's history is cleared automatically, so a fleet of short-lived helpers doesn't accumulate junk.
Context assembly: the CompactionPipeline¶
Before every model call, the agent runs the history through a CompactionPipeline — an ordered chain of strategies, each taking the previous one's output:
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#E8EAF6','primaryTextColor': '#1A237E','primaryBorderColor': '#3949AB','lineColor': '#546E7A','fontSize': '13px'}}}%%
graph LR
classDef raw fill:#FFF3E0,stroke:#E65100,color:#BF360C
classDef step fill:#E8EAF6,stroke:#3949AB,color:#1A237E,font-weight:bold
classDef out fill:#E8F5E9,stroke:#2E7D32,color:#1B5E20,font-weight:bold
RAW["full history"]:::raw --> S1["ToolResultCompaction<br/>shrink bulky tool output"]:::step
S1 --> S2["SlidingWindow<br/>keep last N turns"]:::step
S2 --> OUT["window sent to model"]:::out Strategies that ship in agents/context/compaction/:
| Strategy | What it does |
|---|---|
SlidingWindowCompaction | Keep the most recent N messages (the default) |
ToolResultCompactionStrategy | Shrink large/old tool results, which bloat context fastest |
SelectiveToolCallCompactionStrategy | Drop tool-call noise that's no longer relevant |
SummarizationCompaction | Replace old turns with an LLM-generated summary |
TruncationStrategy | Hard cap on message count |
TokenBudgetComposedStrategy | Compose strategies to hit a token budget |
Compose them to taste:
from ravi.agents.context import (
ContextConfig, InMemoryHistoryProvider,
CompactionPipeline, ToolResultCompactionStrategy, SlidingWindowCompaction,
)
from ravi.kernel.agent.supervision import HistoryRetention
ctx = ContextConfig(
InMemoryHistoryProvider(),
CompactionPipeline([
ToolResultCompactionStrategy(),
SlidingWindowCompaction(max_messages=40),
]),
retention=HistoryRetention.PERMANENT,
)
agent = ReActAgent("bot", model=model, context=ctx)
An empty pipeline is a valid no-op (returns history unchanged). Pass a single strategy directly without wrapping it.
When trimming isn't enough¶
Sliding windows and summaries are lossy by recency — they assume old means irrelevant. That assumption breaks for long-running agents that must recall a specific fact or decision from far back. For those, Ravi has three orthogonal advanced strategies that recall by relevance or structure instead of recency. They are not mutually exclusive — you can layer them with the basic pipeline above.
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#E8EAF6','primaryTextColor': '#1A237E','primaryBorderColor': '#3949AB','lineColor': '#546E7A','fontSize': '13px'}}}%%
graph TB
classDef base fill:#E8F5E9,stroke:#2E7D32,color:#1B5E20,font-weight:bold
classDef adv fill:#E8EAF6,stroke:#3949AB,color:#1A237E,font-weight:bold
BASE["Compaction pipeline<br/>(recency-based)"]:::base
BASE --> V["Vector Memory<br/>semantic similarity"]:::adv
BASE --> G["Graph Memory<br/>entity relationships"]:::adv
BASE --> P["Paged Memory<br/>explicit pages + retrieval"]:::adv | Strategy | Recalls by | Best for | Failure mode |
|---|---|---|---|
| Vector Memory | Embedding similarity | Fuzzy recall over big histories | Misses causally-important but dissimilar turns |
| Graph Memory | Entity / relationship traversal | Structured facts, constraints, decisions | Loses unstructured narrative |
| Paged Memory | Explicit pages + agent-driven retrieval | Full-fidelity recall, agent decides what to load | Index summary may miss detail |
All three plug in as compaction strategies (or alongside them), backed by the in-memory stores in agents/storage/ for dev and the Postgres-backed PgVectorStore / AGEGraphStore in capabilities/ for production.
Where this lives¶
| Piece | Location |
|---|---|
HistoryProvider Protocol | kernel/storage/history.py |
ContextConfig, AgentContext | agents/context/context.py |
CompactionPipeline + strategies | agents/context/compaction/ |
| History backends | agents/context/history.py, capabilities/history/ |
HistoryRetention | kernel/agent/supervision.py |
Next: Supervision & Budgets — bounding multi-agent systems.