Memory & Context¶

The problem¶

An agent needs to remember the conversation — but a model's context window is finite and every token costs money and latency. So there are really two problems hiding under "memory":

Persistence — where do past turns live, so the agent can recall them across runs and restarts?
Context assembly — how do you fit a long history into a small window before each model call, without losing what matters?

Ravi splits these cleanly: a HistoryProvider owns persistence; a CompactionPipeline owns context assembly. They meet in the ContextConfig you pass to an agent.

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#E8EAF6','primaryTextColor': '#1A237E','primaryBorderColor': '#3949AB','lineColor': '#546E7A','fontSize': '13px'}}}%%
graph LR
    classDef store   fill:#F3E5F5,stroke:#6A1B9A,color:#4A148C,font-weight:bold
    classDef process fill:#E8EAF6,stroke:#3949AB,color:#1A237E,font-weight:bold
    classDef llm     fill:#E8F5E9,stroke:#2E7D32,color:#1B5E20,font-weight:bold

    H[("HistoryProvider<br/>full transcript")]:::store -->|"get_messages(session_id)"| C["CompactionPipeline<br/>trim to fit"]:::process
    C -->|"compacted window"| LLM["Model call"]:::llm
    LLM -->|"new turns"| H

The full transcript always lives in the history provider. Compaction produces a view for the model — it never deletes the source.

Persistence: the HistoryProvider¶

A HistoryProvider is a small Protocol for reading and writing turns, always scoped by session_id so conversations never bleed into each other:

class HistoryProvider(Protocol):
    async def append(self, agent_id, message, *, session_id, run_id): ...
    async def append_many(self, agent_id, messages, *, session_id, run_id): ...
    async def get_messages(self, agent_id, *, session_id) -> list[ChatMessage]: ...
    async def clear(self, agent_id, *, session_id): ...
    async def clear_run(self, agent_id, *, session_id, run_id): ...

Three backends ship, all interchangeable:

Backend	Use
`InMemoryHistoryProvider`	Dev, tests, throwaway sessions
`RedisHistoryProvider`	Fast, TTL'd session memory
`PostgresHistoryProvider`	Durable, queryable transcripts

Retention: how long history lives¶

Different agents want different lifetimes. ContextConfig carries a HistoryRetention policy:

Policy	Behaviour	For
`PERMANENT`	Kept forever	User-facing assistants
`RUN`	Deleted when the run ends	Transient sub-agents
`NONE`	Never written	Stateless workers

When a run completes, the Worker honours this: a RUN-retention sub-agent's history is cleared automatically, so a fleet of short-lived helpers doesn't accumulate junk.

Context assembly: the CompactionPipeline¶

Before every model call, the agent runs the history through a CompactionPipeline — an ordered chain of strategies, each taking the previous one's output:

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#E8EAF6','primaryTextColor': '#1A237E','primaryBorderColor': '#3949AB','lineColor': '#546E7A','fontSize': '13px'}}}%%
graph LR
    classDef raw   fill:#FFF3E0,stroke:#E65100,color:#BF360C
    classDef step  fill:#E8EAF6,stroke:#3949AB,color:#1A237E,font-weight:bold
    classDef out   fill:#E8F5E9,stroke:#2E7D32,color:#1B5E20,font-weight:bold

    RAW["full history"]:::raw --> S1["ToolResultCompaction<br/>shrink bulky tool output"]:::step
    S1 --> S2["SlidingWindow<br/>keep last N turns"]:::step
    S2 --> OUT["window sent to model"]:::out

Strategies that ship in agents/context/compaction/:

Strategy	What it does
`SlidingWindowCompaction`	Keep the most recent N messages (the default)
`ToolResultCompactionStrategy`	Shrink large/old tool results, which bloat context fastest
`SelectiveToolCallCompactionStrategy`	Drop tool-call noise that's no longer relevant
`SummarizationCompaction`	Replace old turns with an LLM-generated summary
`TruncationStrategy`	Hard cap on message count
`TokenBudgetComposedStrategy`	Compose strategies to hit a token budget

Compose them to taste:

from ravi.agents.context import (
    ContextConfig, InMemoryHistoryProvider,
    CompactionPipeline, ToolResultCompactionStrategy, SlidingWindowCompaction,
)
from ravi.kernel.agent.supervision import HistoryRetention

ctx = ContextConfig(
    InMemoryHistoryProvider(),
    CompactionPipeline([
        ToolResultCompactionStrategy(),
        SlidingWindowCompaction(max_messages=40),
    ]),
    retention=HistoryRetention.PERMANENT,
)
agent = ReActAgent("bot", model=model, context=ctx)

An empty pipeline is a valid no-op (returns history unchanged). Pass a single strategy directly without wrapping it.

When trimming isn't enough¶

Sliding windows and summaries are lossy by recency — they assume old means irrelevant. That assumption breaks for long-running agents that must recall a specific fact or decision from far back. For those, Ravi has three orthogonal advanced strategies that recall by relevance or structure instead of recency. They are not mutually exclusive — you can layer them with the basic pipeline above.

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#E8EAF6','primaryTextColor': '#1A237E','primaryBorderColor': '#3949AB','lineColor': '#546E7A','fontSize': '13px'}}}%%
graph TB
    classDef base fill:#E8F5E9,stroke:#2E7D32,color:#1B5E20,font-weight:bold
    classDef adv  fill:#E8EAF6,stroke:#3949AB,color:#1A237E,font-weight:bold

    BASE["Compaction pipeline<br/>(recency-based)"]:::base
    BASE --> V["Vector Memory<br/>semantic similarity"]:::adv
    BASE --> G["Graph Memory<br/>entity relationships"]:::adv
    BASE --> P["Paged Memory<br/>explicit pages + retrieval"]:::adv

Strategy	Recalls by	Best for	Failure mode
Vector Memory	Embedding similarity	Fuzzy recall over big histories	Misses causally-important but dissimilar turns
Graph Memory	Entity / relationship traversal	Structured facts, constraints, decisions	Loses unstructured narrative
Paged Memory	Explicit pages + agent-driven retrieval	Full-fidelity recall, agent decides what to load	Index summary may miss detail

All three plug in as compaction strategies (or alongside them), backed by the in-memory stores in agents/storage/ for dev and the Postgres-backed PgVectorStore / AGEGraphStore in capabilities/ for production.

Where this lives¶

Piece	Location
`HistoryProvider` Protocol	`kernel/storage/history.py`
`ContextConfig`, `AgentContext`	`agents/context/context.py`
`CompactionPipeline` + strategies	`agents/context/compaction/`
History backends	`agents/context/history.py`, `capabilities/history/`
`HistoryRetention`	`kernel/agent/supervision.py`

Next: Supervision & Budgets — bounding multi-agent systems.