The five components of an agent memory system

When teams start building agents that survive past a single session, the first instinct is to reach for a vector store and call it a memory layer. That is the same mistake the field made in 2023 with retrieval-augmented generation — confuse a storage primitive with a system. A memory system that holds up in production has more parts than that, and the parts cooperate.

The taxonomy that converges across 12 production implementations (Mem0, Letta, Zep, Honcho, LangMem, and the others) plus the CoALA paper (arXiv:2309.02427) lands on five components. Removing any one of them silently degrades the system.

The five components

┌─────────────────────────────────────────────────────────────┐
│                       AGENT LOOP                             │
│                                                              │
│   ┌──────────────┐    recall blocks    ┌────────────────┐    │
│   │              │ ──────────────────► │                │    │
│   │   MEMORY     │                     │    CONTEXT     │    │
│   │  (durable)   │                     │  (ephemeral)   │    │
│   │              │ ◄────── NEVER ────  │                │    │
│   └──────┬───────┘                     └────────────────┘    │
│          │                                     │             │
│          │                                     ▼             │
│   ┌──────┴───────┐                       LLM Prompt          │
│   │ 5 components │                             │             │
│   │              │                             │             │
│   │ 1. Store     │     store_memory            │             │
│   │ 2. Write     │◄───(explicit via tool)──────┘             │
│   │ 3. Recall    │                                           │
│   │ 4. Dream     │         agent decides                     │
│   │ 5. Aux LLM   │         to store                          │
│   └──────────────┘                                           │
└─────────────────────────────────────────────────────────────┘

1. Store — persistence

The physical layer. Postgres + pgvector is the boring, correct default for the open-source tier; managed runtimes can layer object storage and durability guarantees on top. Two operations matter: keyed write and similarity search. Everything else (filtering, ranking, joins) is on top of those two.

We picked Postgres + pgvector for the local mode of @usetheo/memory because every developer already has Postgres on their docker-compose, and the production runtime can swap the storage driver without changing the API.

2. Write — when something becomes a memory

Two design choices, and both have empirical evidence behind them.

Explicit writes via tool call. The agent (the main LLM) decides when something is worth persisting, by calling store_memory. There is no automatic extraction by background heuristics. The Anthropic Harvey case study (May 2026) reports task completion +6× and .docx output quality +8.4% after adopting explicit versioned writes. The opposite failure mode is documented in TriMem (arXiv:2605.19952) — automatic fact-extraction suffers from "brevity bias", discarding fine-grained details that turn out to matter later.

Multi-tier targeting. A single bucket is wrong. The agent is writing about three different things:

User tier — semantic facts about a specific user that should survive every session. "Maria prefers Python over Go. Allergic to peanuts."
Session tier — episodic context bound to the current conversation. "We are debugging the migration in PR #142."
Agent tier — procedural skills learned across all users. "The Stripe webhook signature uses the Stripe-Signature header with HMAC-SHA256." Voyager-shaped (arXiv:2305.16291).

Mixing those three in one store is the cheap path. Separating them is the path that scales.

3. Recall — getting it back

The unsexy component. Most teams default to "embed the query, top-K from the vector store, done" and lose 20-30 points of accuracy versus what the system could do.

What actually wins in production is hybrid retrieval — vector similarity for semantic match, plus BM25 for exact-token match, plus entity matching for names and IDs that vectors lose. Mem0 LoCoMo (2026) reports 91.6% accuracy with this hybrid stack versus 72.9% for full-context baselines — +18.7 percentage points while using 3.7× fewer tokens and 91% less latency.

There is a second axis: when to recall. Single-shot bootstrap (one recall at session start) is what most implementations did first. Production-grade systems do more:

Automatic recall every N turns (drift correction)
On-demand recall via tool when the agent asks for it
Loop reload after long tool chains
JIT recall when the agent touches a specific file or entity

The original architecture of @usetheo/memory only foresaw single-shot. ADR-15 was strengthened in cross-validation against Codex + Gemini CLI, which implement all four mechanisms — a rare case of the implementations teaching the spec.

4. Dream — consolidation while idle

This is the component most early implementations skip and most mature ones add later.

When the agent is idle (no active session), a background process consolidates the raw memory: deduplicates near-duplicates, merges related entries, summarizes long episodes into compact reflections, drops entries that have not been recalled in a long time. The Anthropic Dreaming paper (May 2026) showed that adding this loop to a system that already had Store/Write/Recall improved retrieval quality measurably on its real workloads — same model, same storage, just a periodic background pass.

Dream is the difference between a memory that grows without bound and a memory that stays useful at the 10k-entry mark.

5. Auxiliary LLM — the cheap brain

A memory system needs LLM calls for jobs the main model should not do: extracting facts on Write, summarizing on Dream, ranking on Recall. Running those on the main model (Opus, Sonnet 4) is wasteful — the work is structured enough that a cheap model (Haiku 4.5, Gemini 2.0 Flash) does it well at 10-20× lower cost.

Most teams discover this after the first cost report. Designing it in from day one is cheaper than retrofitting.

What this taxonomy lets you skip

When the five components are explicit, the conversations about memory get easier:

"Can we use a vector store?" — yes, that is the Store. What about the other four?
"How does the agent know to remember things?" — that is the Write contract. Tool-call or sub-agent fold.
"Why is retrieval slow?" — Recall is doing the wrong thing, or doing too much. Hybrid retrieval is rarely slower than pure vector.
"Why is the memory full of junk after a month?" — there is no Dream loop. Add one.

If you are evaluating Mem0, Letta, Zep, Honcho, LangMem, or building your own, the five-component lens is the cheapest way to compare them. Each project makes different trade-offs on each component — there is no winner across all five.

We will go deeper on Write and Recall in upcoming posts. Push back or build on this in Discord.