Skip to content

Vector Memory

What It Is

Vector memory stores every conversation turn as an embedding (a numeric vector that captures semantic meaning). When a new message arrives, the agent embeds it and searches for the most similar past turns. Only the top-k matches are pulled back into the context window — the rest stay in storage.

This is retrieval-augmented memory (RAM): the agent always has the most semantically relevant history, not just the most recent.


How It Works

Turn arrives
Embed the message text
Upsert into VectorStore (collection = session_id)
Embed the current user query
Search VectorStore → top-k similar past turns
Build context window:
  [system prompt]
  + [retrieved k turns, in chronological order]
  + [last N recent turns]
  + [current turn]

The final context the LLM sees is a blend of relevant history (semantic) and recent history (recency), not a raw sliding window.


Strengths

  • No explicit context loss — every turn is stored; nothing is dropped, just not always loaded.
  • Scales to very long sessions — only the embedding index grows, not the active context.
  • Zero agent behaviour change — the agent doesn't need to call any retrieval tool; the compactor handles it transparently.

Weaknesses

  • Probabilistic recall — a past turn that is causally important but semantically distant from the current query may never surface. The agent silently acts as if it doesn't exist.
  • Embedding cost per turn — requires an EmbeddingClient call on every message in and every query.
  • Ordering artefacts — retrieved turns are reinserted out of chronological sequence; some models are confused by non-contiguous history.
  • InMemoryVectorStore is not persistent — lost on restart. Use PgVectorStore (L2) for real sessions.

When To Use It

Scenario Good fit?
User asks questions over a large knowledge base already in the vector store Yes
Long personal-assistant sessions where old preferences matter Yes (but watch causal-chain gap)
Short task-completion agents Overkill — use sliding window
Agent needs to enforce an earlier hard constraint Risky — prefer Paged Memory

Where It Lives In The Codebase

kernel/storage/vector.py      ← VectorStore Protocol, Document, SearchResult
agents/storage/vector.py      ← InMemoryVectorStore  (dev / tests)
capabilities/vector/          ← PgVectorStore         (production)
capabilities/knowledge/       ← RAGPipeline, GraphRAGPipeline (document-level RAG)

For session memory specifically, the compactor would live at:

agents/context/compaction/semantic.py   ← SemanticMemoryCompactor (not yet built)

It implements compact(messages) -> list[ChatMessage] — same interface as every other compaction strategy. Internally it writes to a VectorStore and reads back the top-k at compact time.


Relationship To Other Memory Approaches

Vector memory is orthogonal to Graph memory and Paged memory.

  • It can run alongside either without conflict.
  • Use vector as the fuzzy recall layer and paged memory as the structured/explicit layer.
  • Use graph memory to follow entity relationships that vector similarity would miss.

See Graph Memory and Paged Memory.