Architecture 20 min read Tested March 2026

Agent Memory Architecture — Which Type You Actually Need

Most agents don't need a vector database. Some do. The wrong choice costs you either money (over-engineering) or reliability (under-engineering). This is the decision guide — with real costs, real thresholds, and the exact point where you should upgrade.

The Four Memory Types

Every agent memory system is some combination of these four. They differ in how long information lives, how it's retrieved, and what it costs to operate.

⚡

Working Memory

The context window. Dies when the session ends. Free to use, limited by model window size.

📝

Episodic Memory

Daily logs of what happened. "On March 5th, we decided X." Retrieved by date or keyword search.

🧠

Semantic Memory

Curated facts and lessons. "Always use Sonnet for support." Retrieved by meaning, not date.

🔎

Retrieval-Augmented

Vector store + embeddings. Searches thousands of entries by semantic similarity. Overkill until it's not.

The File-Based Stack (Where 90% of Agents Should Start)

Three files. Zero infrastructure. Handles the first 6 months of any agent deployment.

workspace/
├── MEMORY.md              # Semantic — curated long-term facts (under 500 lines)
├── memory/
│   ├── 2026-03-05.md      # Episodic — what happened today
│   ├── 2026-03-04.md      # Episodic — what happened yesterday
│   └── ...
└── SOUL.md                # Includes recall instructions

Working memory = the context window (automatic, no setup needed).
Episodic memory = daily markdown files. Agent writes a summary at session end.
Semantic memory = MEMORY.md. Curated lessons and standing decisions. Promoted from daily files when a pattern repeats 3+ times.

Add this to your agent's instructions:

## Memory Protocol
- At session start: search MEMORY.md, then read today + yesterday's daily file
- At session end: write a summary to memory/YYYY-MM-DD.md
- Weekly: review last 7 daily files, promote patterns to MEMORY.md
- If MEMORY.md exceeds 400 lines: prune entries not referenced in 30 days

What this costs

MEMORY.md at 400 lines ≈ 2,000 tokens. Two daily files ≈ 1,000 tokens. That's 3,000 tokens of memory overhead per session — about $0.003 on Sonnet pricing. At 20 sessions/day, you're paying $0.06/day for memory. Negligible.

When it breaks down

MEMORY.md exceeds 800 lines — Search starts returning noisy results. The agent loads context it doesn't need.
100+ daily files — Keyword search across files gets slow and imprecise. "Find every decision about pricing" hits 40 files because the word "pricing" appears in half of them.
You need semantic search — Not "find files containing 'Stripe'" but "find every conversation about payment processing, including the one where we discussed Coinbase as an alternative."

The Vector Store Upgrade (When You Actually Need It)

A vector store converts text into numerical embeddings and retrieves by meaning similarity instead of keyword match. This is the right tool when file-based search can't find what you need.

Practical options

Chroma — Open-source, runs locally, zero cost. Best for single-machine agents. pip install chromadb and you're running in 5 minutes.
Pinecone — Managed service, free tier up to 100K vectors. Best when you don't want to manage infrastructure.
PostgreSQL + pgvector — If you already run Postgres. Adds vector search to your existing database. No new service to manage.
OpenClaw's memory_search — Built-in semantic search across your memory files. No external database needed. This is the "ClawVault" approach: files on disk, semantic retrieval on top.

The ClawVault pattern

Instead of migrating to a database, keep your files and add a semantic search layer on top. Your daily files and MEMORY.md remain the source of truth. The search index is a read-only view that can be rebuilt at any time.

# OpenClaw already does this with memory_search:
# It indexes MEMORY.md + memory/*.md and returns semantic matches

# For a custom setup with Chroma:
import chromadb

client = chromadb.PersistentClient(path="./memory-index")
collection = client.get_or_create_collection("agent_memory")

# Index a daily file
with open("memory/2026-03-05.md") as f:
    content = f.read()

# Split into chunks (one per section or paragraph)
chunks = content.split("\n## ")
for i, chunk in enumerate(chunks):
    collection.add(
        documents=[chunk],
        ids=[f"2026-03-05-{i}"],
        metadatas=[{"date": "2026-03-05", "source": "daily"}]
    )

# Retrieve by meaning
results = collection.query(
    query_texts=["decisions about payment processing"],
    n_results=5
)
# Returns the 5 most semantically relevant chunks
# — even if none of them contain the word "payment"

What this costs

Chroma (local): Free. Disk space only. 10K memory chunks ≈ 50MB.

Pinecone (managed): Free up to 100K vectors. Beyond that, $0.33/hr for dedicated pods.

Embedding cost: You pay to convert text to vectors. OpenAI's text-embedding-3-small: $0.02 per million tokens. Indexing 6 months of daily files (≈ 500K tokens) costs $0.01 total. Retrieval queries cost fractions of a cent each.

The real cost is complexity — another service to run, another failure point, another thing to debug at 2 AM.

The Decision Flowchart

Which memory architecture do you need?

Q1: How many daily memory files do you have?

→ Under 100 files: File-based. Stop here. You don't need vectors yet.

→ Over 100 files: Continue to Q2.

Q2: Can keyword search find what you need?

→ Yes, grep/search works fine: File-based. Add better file naming conventions.

→ No, I need to search by meaning: Continue to Q3.

Q3: Are you already running a database?

→ Yes (Postgres): Add pgvector. One extension, no new service.

→ No: Continue to Q4.

Q4: Do you want to manage infrastructure?

→ Yes / I run locally: Chroma. Free, local, 5-minute setup.

→ No / I want managed: Pinecone free tier or OpenClaw memory_search.

How Memory Affects Reliability

Memory architecture is a reliability decision, not just a storage decision. The wrong setup causes specific, predictable failures:

No memory at all: Agent asks the same questions repeatedly. Users lose trust within 3 sessions.
Too much memory loaded: Context window fills with irrelevant history. Agent performance degrades — slower responses, less focused answers, higher cost per turn.
Stale memory: Agent references decisions that were reversed two weeks ago. Worse than no memory — it's confidently wrong.
Write failures: Agent "remembers" in its context window but never writes to disk. Next session, everything is gone. This is the #1 memory bug.

The reliability rules

Always write before you close. Session-end summaries are non-negotiable. If the session crashes, you lose only the current turn, not the whole day.
Load less, not more. Search first, then load only what's relevant. An agent with 3,000 tokens of targeted memory outperforms one with 15,000 tokens of everything.
Timestamp everything. Every MEMORY.md entry gets a date. Facts older than 30 days without a reference should be verified or pruned.
Test recall, not just storage. Writing to files is easy. The hard part is retrieving the right memory at the right time. Test by asking your agent questions about last week — does it find the answer?

Cost Comparison: Real Numbers

Monthly cost at different scales

Solo agent, 20 sessions/day, file-based:

Memory overhead: 3K tokens/session × 20 × 30 = 1.8M tokens/month → $1.80/month (Sonnet)

3-agent team, 50 sessions/day, file-based:

Memory overhead: 3K × 50 × 30 = 4.5M tokens/month → $4.50/month

Same team + Chroma vector search:

File memory: $4.50 + Embedding indexing: ~$0.05 + Chroma: free (local) → $4.55/month

Same team + Pinecone:

File memory: $4.50 + Embeddings: ~$0.05 + Pinecone: free tier → $4.55/month (up to 100K vectors)

Bottom line: Memory is one of the cheapest parts of running agents. The cost difference between file-based and vector-augmented is essentially zero. The real cost is the engineering time to set it up and the operational complexity of maintaining it.

The 15-Minute Setup

Now (2 min): Create MEMORY.md and memory/ directory in your workspace.
Now (5 min): Add the Memory Protocol block (above) to your agent's instructions.
Now (3 min): Write your first MEMORY.md entry — 5-10 lines of standing decisions and preferences your agent should always know.
End of today (5 min): Check that the agent wrote a daily file. If not, make the write instruction more explicit.
In 3 months: If keyword search stops finding what you need, add Chroma or use memory_search. Not before.

55+ Guides Like This One

Tested memory patterns, cost optimization, multi-agent coordination, and more. Every guide runs in production. $9/month, cancel anytime.

Get The Library — $9/mo

30-day money-back guarantee