Building Agent Memory with Vector Databases
By Diesel
tutorialmemoryvector-databasesimplementation
An agent without memory is a goldfish with a PhD. Brilliant in the moment, completely lost about what happened five minutes ago. Every conversation starts from zero. Every lesson is relearned. Every mistake is repeated.
Memory is what turns a chatbot into an agent. Not just conversation history (that's context, not memory), but actual persistent knowledge that survives across sessions and grows over time.
## Three Types of Memory
**Conversation memory.** What happened in this session. The current dialogue, recent tool calls, intermediate results. Short-lived. Discarded when the session ends.
**Semantic memory.** Facts, patterns, and knowledge. "The user prefers TypeScript." "The codebase uses PostgreSQL." "Last time we tried approach X, it failed because of Y." Long-lived. Persists across sessions.
**Episodic memory.** Records of past interactions with context. "On March 5th, the user asked about authentication and we built an OAuth flow." Searchable history. Lets the agent recall relevant past experiences.
We're implementing all three.
## The Stack
```bash
pip install chromadb langchain-anthropic langchain-chroma sentence-transformers
```
ChromaDB for the vector store. Runs locally, no infrastructure needed. In production, swap it for pgvector or Pinecone. The interface stays the same.
## Embedding Model
Everything we store needs to be searchable by meaning, not just keywords. That's what embeddings do.
```python
from langchain_community.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(
model_name="BAAI/bge-small-en-v1.5",
model_kwargs={"device": "cpu"},
encode_kwargs={"normalize_embeddings": True}
)
``` It is worth reading about [memory architecture patterns](/blog/agent-memory-patterns) alongside this.
BGE-small. Fast enough for real-time use, good enough for retrieval. Run it locally so you're not sending user data to an embedding API.
## Conversation Memory: The Short-Term Buffer
This is the simplest layer. Keep the last N messages in a buffer. Inject them into every prompt.
```python
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.messages import HumanMessage, AIMessage
class ConversationMemory:
def __init__(self, max_messages: int = 20):
self.max_messages = max_messages
self.sessions: dict[str, InMemoryChatMessageHistory] = {}
def get_session(self, session_id: str) -> InMemoryChatMessageHistory:
if session_id not in self.sessions:
self.sessions[session_id] = InMemoryChatMessageHistory()
return self.sessions[session_id]
def add_exchange(self, session_id: str, human: str, ai: str):
session = self.get_session(session_id)
session.add_message(HumanMessage(content=human))
session.add_message(AIMessage(content=ai))
# Trim to max
messages = session.messages
if len(messages) > self.max_messages:
session.messages = messages[-self.max_messages:]
def get_context(self, session_id: str) -> list:
return self.get_session(session_id).messages
```
Twenty messages is the buffer size. Adjust based on your token budget. The key point: this dies when the session ends. It's working memory, not storage.
## Semantic Memory: The Knowledge Base
This is where it gets interesting. Semantic memory stores facts and patterns as vector embeddings, searchable by meaning.
```python
import chromadb
from datetime import datetime
import hashlib
import json
class SemanticMemory:
def __init__(self, persist_dir: str = "./memory_db"):
self.client = chromadb.PersistentClient(path=persist_dir)
self.collection = self.client.get_or_create_collection(
name="semantic_memory",
metadata={"hnsw:space": "cosine"}
)
self.embeddings = embeddings
def store(self, content: str, metadata: dict = None) -> str:
"""Store a fact or pattern in semantic memory."""
content_hash = hashlib.md5(content.encode()).hexdigest()[:12]
embedding = self.embeddings.embed_query(content)
meta = {
"stored_at": datetime.utcnow().isoformat(),
"type": "fact",
**(metadata or {})
}
# Upsert: same content updates instead of duplicating
self.collection.upsert(
ids=[content_hash],
embeddings=[embedding],
documents=[content],
metadatas=[meta]
)
return content_hash
def search(self, query: str, top_k: int = 5, min_score: float = 0.5) -> list[dict]:
"""Search memory by semantic similarity."""
embedding = self.embeddings.embed_query(query)
results = self.collection.query(
query_embeddings=[embedding],
n_results=top_k,
include=["documents", "metadatas", "distances"]
)
memories = []
for doc, meta, distance in zip(
results["documents"][0],
results["metadatas"][0],
results["distances"][0]
):
score = 1 - distance # Cosine distance to similarity
if score >= min_score:
memories.append({
"content": doc,
"metadata": meta,
"relevance": round(score, 3)
}) It is worth reading about [choosing the right vector database](/blog/vector-database-comparison-2025) alongside this.
return memories
def forget(self, memory_id: str):
"""Delete a specific memory."""
self.collection.delete(ids=[memory_id])
```
The content hash as ID is deliberate. Store the same fact twice and it updates instead of duplicating. Agents love to re-store things they already know.
The minimum score threshold (0.5) filters out irrelevant matches. Vector search always returns results. Without a threshold, you'll inject noise into every prompt.
## Episodic Memory: The Experience Archive
Episodic memory records entire interactions with context. "What happened, when, and how did it turn out?"
```python
class EpisodicMemory:
def __init__(self, persist_dir: str = "./memory_db"):
self.client = chromadb.PersistentClient(path=persist_dir)
self.collection = self.client.get_or_create_collection(
name="episodic_memory",
metadata={"hnsw:space": "cosine"}
)
self.embeddings = embeddings
def record_episode(
self,
summary: str,
session_id: str,
outcome: str = "success",
tools_used: list[str] = None,
tags: list[str] = None
) -> str:
"""Record a completed interaction as an episode."""
episode_id = f"ep-{datetime.utcnow().strftime('%Y%m%d-%H%M%S')}"
embedding = self.embeddings.embed_query(summary)
self.collection.add(
ids=[episode_id],
embeddings=[embedding],
documents=[summary],
metadatas=[{
"session_id": session_id,
"timestamp": datetime.utcnow().isoformat(),
"outcome": outcome,
"tools_used": json.dumps(tools_used or []),
"tags": json.dumps(tags or [])
}]
)
return episode_id
def recall_similar(self, situation: str, top_k: int = 3) -> list[dict]:
"""Recall past episodes similar to the current situation."""
embedding = self.embeddings.embed_query(situation)
results = self.collection.query(
query_embeddings=[embedding],
n_results=top_k,
include=["documents", "metadatas", "distances"]
)
episodes = []
for doc, meta, distance in zip(
results["documents"][0],
results["metadatas"][0],
results["distances"][0]
):
episodes.append({
"summary": doc,
"outcome": meta["outcome"],
"tools_used": json.loads(meta["tools_used"]),
"when": meta["timestamp"],
"relevance": round(1 - distance, 3)
})
return episodes
```
## The Unified Memory Manager
Tie all three layers together with a single interface.
```python
class AgentMemory:
def __init__(self, persist_dir: str = "./memory_db"):
self.conversation = ConversationMemory(max_messages=20)
self.semantic = SemanticMemory(persist_dir)
self.episodic = EpisodicMemory(persist_dir)
def build_context(self, session_id: str, current_query: str) -> str:
"""Build the full memory context for a prompt."""
parts = []
# 1. Relevant semantic memories
facts = self.semantic.search(current_query, top_k=5)
if facts:
parts.append("## Relevant Knowledge")
for fact in facts:
parts.append(f"- {fact['content']} (relevance: {fact['relevance']})")
# 2. Similar past episodes
episodes = self.episodic.recall_similar(current_query, top_k=3)
if episodes:
parts.append("\n## Similar Past Interactions")
for ep in episodes:
parts.append(
f"- {ep['summary']} (outcome: {ep['outcome']}, "
f"tools: {', '.join(ep['tools_used'])})"
)
# 3. Recent conversation
messages = self.conversation.get_context(session_id)
if messages:
parts.append("\n## Recent Conversation")
for msg in messages[-6:]: # Last 3 exchanges
role = "User" if isinstance(msg, HumanMessage) else "Agent"
parts.append(f"**{role}:** {msg.content[:200]}")
return "\n".join(parts)
def learn(self, content: str, metadata: dict = None):
"""Store a new fact in semantic memory."""
return self.semantic.store(content, metadata)
def record(self, summary: str, session_id: str, **kwargs):
"""Record an episode in episodic memory."""
return self.episodic.record_episode(summary, session_id, **kwargs)
``` This connects directly to [sharing memory across agents](/blog/cross-agent-memory-sharing).
## Integrating with Your Agent
```python
from langchain_anthropic import ChatAnthropic
memory = AgentMemory(persist_dir="./agent_memory")
llm = ChatAnthropic(model="claude-sonnet-4-20250514")
async def agent_respond(session_id: str, user_message: str) -> str:
# Build context from all memory layers
memory_context = memory.build_context(session_id, user_message)
# Construct the prompt with memory
system_prompt = f"""You are a helpful assistant with memory.
Use the following context from your memory to inform your responses.
If you learn something new and important, tell me so I can store it.
{memory_context}"""
response = llm.invoke([
("system", system_prompt),
("human", user_message)
])
# Store the conversation turn
memory.conversation.add_exchange(session_id, user_message, response.content)
return response.content
```
## Automatic Memory Extraction
Don't make the agent decide what to remember. Extract facts automatically after each interaction.
```python
async def extract_and_store_memories(conversation: str):
"""Use the LLM to extract storable facts from a conversation."""
extraction_prompt = """Analyze this conversation and extract any facts,
preferences, or patterns worth remembering for future interactions.
Return each fact on a new line. Only include genuinely useful information.
Skip pleasantries and routine exchanges.
Conversation:
{conversation}"""
response = llm.invoke([("human", extraction_prompt.format(conversation=conversation))])
facts = [f.strip() for f in response.content.strip().split("\n") if f.strip()]
for fact in facts:
memory.learn(fact, metadata={"source": "auto_extraction"})
return facts
```
Run this at session end. The agent gradually builds a knowledge base about users, projects, and patterns without anyone explicitly telling it what to remember.
## The Memory Lifecycle
Store aggressively, search selectively, forget deliberately. Every interaction potentially generates memories. Every prompt searches for relevant context. And periodically, you prune memories that are outdated, contradicted, or just noise.
That's how you build an agent that actually gets better over time. Not by fine-tuning the model. By giving it a memory that accumulates experience.