Vector Memory¶
What It Is¶
Vector memory stores every conversation turn as an embedding (a numeric vector that captures semantic meaning). When a new message arrives, the agent embeds it and searches for the most similar past turns. Only the top-k matches are pulled back into the context window — the rest stay in storage.
This is retrieval-augmented memory (RAM): the agent always has the most semantically relevant history, not just the most recent.
How It Works¶
Turn arrives
│
▼
Embed the message text
│
▼
Upsert into VectorStore (collection = session_id)
│
▼
Embed the current user query
│
▼
Search VectorStore → top-k similar past turns
│
▼
Build context window:
[system prompt]
+ [retrieved k turns, in chronological order]
+ [last N recent turns]
+ [current turn]
The final context the LLM sees is a blend of relevant history (semantic) and recent history (recency), not a raw sliding window.
Strengths¶
- No explicit context loss — every turn is stored; nothing is dropped, just not always loaded.
- Scales to very long sessions — only the embedding index grows, not the active context.
- Zero agent behaviour change — the agent doesn't need to call any retrieval tool; the compactor handles it transparently.
Weaknesses¶
- Probabilistic recall — a past turn that is causally important but semantically distant from the current query may never surface. The agent silently acts as if it doesn't exist.
- Embedding cost per turn — requires an
EmbeddingClientcall on every message in and every query. - Ordering artefacts — retrieved turns are reinserted out of chronological sequence; some models are confused by non-contiguous history.
InMemoryVectorStoreis not persistent — lost on restart. UsePgVectorStore(L2) for real sessions.
When To Use It¶
| Scenario | Good fit? |
|---|---|
| User asks questions over a large knowledge base already in the vector store | Yes |
| Long personal-assistant sessions where old preferences matter | Yes (but watch causal-chain gap) |
| Short task-completion agents | Overkill — use sliding window |
| Agent needs to enforce an earlier hard constraint | Risky — prefer Paged Memory |
Where It Lives In The Codebase¶
kernel/storage/vector.py ← VectorStore Protocol, Document, SearchResult
agents/storage/vector.py ← InMemoryVectorStore (dev / tests)
capabilities/vector/ ← PgVectorStore (production)
capabilities/knowledge/ ← RAGPipeline, GraphRAGPipeline (document-level RAG)
For session memory specifically, the compactor would live at:
It implements compact(messages) -> list[ChatMessage] — same interface as every other compaction strategy. Internally it writes to a VectorStore and reads back the top-k at compact time.
Relationship To Other Memory Approaches¶
Vector memory is orthogonal to Graph memory and Paged memory.
- It can run alongside either without conflict.
- Use vector as the fuzzy recall layer and paged memory as the structured/explicit layer.
- Use graph memory to follow entity relationships that vector similarity would miss.
See Graph Memory and Paged Memory.