The agents that feel intelligent aren't necessarily smarter β they just remember better. This guide covers the complete memory stack for solopreneurs: when to use files vs. vector databases, which vector DB to pick (Chroma vs. Qdrant vs. Pinecone vs. pgvector), and five retrieval patterns that make the difference between a forgettable agent and one that compounds value with every session. Includes a copy-paste RAG pipeline you can implement this weekend.
Every solopreneur building with AI agents hits the same wall within a week: the agent is brilliant in the moment and amnesiac by the next session. You spent an hour configuring the perfect tone, teaching it your brand voice, walking it through your customer personas. Then you restart. It's gone.
This isn't a model problem. GPT-4o, Claude, Gemini β they all have the same architectural constraint: they are stateless by design. Every session starts with a blank slate unless you build the memory layer.
βββββββββββββββββββββββββββββββββββββββββββββββ
β LAYER 1: Working Memory (Context Window) β β Dies when session ends
β What's happening right now β
βββββββββββββββββββββββββββββββββββββββββββββββ€
β LAYER 2: Episodic Memory (Daily Files) β β Days to weeks
β What happened in past sessions β
βββββββββββββββββββββββββββββββββββββββββββββββ€
β LAYER 3: Semantic Memory (MEMORY.md) β β Months to years
β Distilled facts, preferences, patterns β
βββββββββββββββββββββββββββββββββββββββββββββββ€
β LAYER 4: External Memory (Vector Store) β β Indefinite, queryable
β Knowledge base, documents, history at scaleβ
βββββββββββββββββββββββββββββββββββββββββββββββ
Layers 1β3 are the file-based foundation covered in Library Item #41. This guide goes deep on Layer 4: vector stores, semantic retrieval, and RAG pipelines.
Most solopreneurs reach for Pinecone too early. The honest decision matrix:
| Situation | Recommendation |
|---|---|
| Single user, under 500 sessions | Daily files + MEMORY.md. Skip vector DB. |
| Product knowledge base (docs, FAQs) | Vector DB β keyword search won't cut it |
| Multi-user agent with per-user history | Vector DB with per-user namespacing |
| Customer support bot, 10K+ past tickets | Definitely vector DB |
| Personal assistant for 1β5 people | Still fine with files |
| Under 10,000 total memory chunks | Grep and files beat the ops overhead |
Rule of thumb: If you can grep for it in under 2 seconds, you don't need a vector database.
A vector database stores information as high-dimensional numeric arrays called embeddings. Instead of searching by keywords, you search by meaning.
Your text Embedding model Vector stored in DB
"pricing strategy" βββββββΊ [0.23, -0.41, 0.87, ...] βββββββΊ chroma/pinecone
Query at retrieval time:
"what do we charge?" βββββΊ [0.21, -0.39, 0.91, ...] βββββββΊ cosine similarity
β returns pricing docs
(similarity: 0.97)
Similar meanings produce similar vectors. The database finds the closest matches by measuring the angle between vectors (cosine similarity). This is how your agent can answer "what are our pricing rules?" even when the stored memory says "subscription tiers" β same meaning, different words.
Chroma runs locally, needs no API key, and takes 5 minutes to set up. It's the right starting point for prototypes and personal agents.
pip install chromadb openai
import chromadb
from chromadb.utils import embedding_functions
ef = embedding_functions.OpenAIEmbeddingFunction(
api_key="your-openai-key",
model_name="text-embedding-3-small" # $0.02/1M tokens β nearly free
)
client = chromadb.PersistentClient(path="./memory/vector-store")
collection = client.get_or_create_collection(
name="agent_memory",
embedding_function=ef
)
# Store a memory
collection.add(
documents=["User prefers bullet points over prose. Dislikes long intros."],
metadatas=[{"type": "preference", "date": "2026-03-06", "user": "pk"}],
ids=["pref_001"]
)
# Retrieve relevant memories
results = collection.query(
query_texts=["how should I format my response?"],
n_results=3
)
# Returns: [["User prefers bullet points over prose..."]]
Chroma pros: Free, local, private, fast under 100K entries, persistent across restarts.
Chroma cons: Single machine only, no cloud sync.
Graduate when: You need multi-machine access or multiple users querying the same store.
Qdrant is the best option for production performance without Pinecone's pricing. Open-source, runs on a $6/month VPS.
# Run Qdrant via Docker
docker run -p 6333:6333 -v ./qdrant_storage:/qdrant/storage qdrant/qdrant
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
client = QdrantClient(host="localhost", port=6333)
client.create_collection(
collection_name="agent_memory",
vectors_config=VectorParams(size=1536, distance=Distance.COSINE)
)
def remember(text: str, metadata: dict, id: str):
vector = embed(text) # your OpenAI embed call
client.upsert(
collection_name="agent_memory",
points=[PointStruct(id=id, vector=vector, payload=metadata | {"text": text})]
)
def recall(query: str, top_k: int = 5, filter_by: dict = None):
vector = embed(query)
results = client.search(
collection_name="agent_memory",
query_vector=vector,
limit=top_k,
query_filter=filter_by # e.g., {"user": "pk"}
)
return [r.payload for r in results]
Zero additional infrastructure β just add the extension to your existing database.
-- Enable the extension
CREATE EXTENSION vector;
-- Create memory table
CREATE TABLE agent_memory (
id SERIAL PRIMARY KEY,
content TEXT NOT NULL,
embedding VECTOR(1536),
metadata JSONB,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX ON agent_memory USING ivfflat (embedding vector_cosine_ops);
The rest covers Pattern 2 (hybrid BM25 + semantic), Pattern 3 (multi-query retrieval), Pattern 4 (contextual compression), Pattern 5 (self-querying with metadata filters), the complete RAG pipeline, embedding model comparison, and the solopreneur starter stack.
Includes 54+ library items + Daily Briefing. 30-day money-back guarantee.