Building Agent Memory with Vector Databases

An agent without memory is a goldfish with a PhD. Brilliant in the moment, completely lost about what happened five minutes ago. Every conversation starts from zero. Every lesson is relearned. Every mistake is repeated.

Memory is what turns a chatbot into an agent. Not just conversation history (that's context, not memory), but actual persistent knowledge that survives across sessions and grows over time.

Three Types of Memory

Conversation memory. What happened in this session. The current dialogue, recent tool calls, intermediate results. Short-lived. Discarded when the session ends.

Semantic memory. Facts, patterns, and knowledge. "The user prefers TypeScript." "The codebase uses PostgreSQL." "Last time we tried approach X, it failed because of Y." Long-lived. Persists across sessions.

Episodic memory. Records of past interactions with context. "On March 5th, the user asked about authentication and we built an OAuth flow." Searchable history. Lets the agent recall relevant past experiences.

We're implementing all three.

The Stack

pip install chromadb langchain-anthropic langchain-chroma sentence-transformers

ChromaDB for the vector store. Runs locally, no infrastructure needed. In production, swap it for pgvector or Pinecone. The interface stays the same.

Embedding Model

Everything we store needs to be searchable by meaning, not just keywords. That's what embeddings do.

from langchain_community.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(
    model_name="BAAI/bge-small-en-v1.5",
    model_kwargs={"device": "cpu"},
    encode_kwargs={"normalize_embeddings": True}
)
``` It is worth reading about [memory architecture patterns](/blog/agent-memory-patterns) alongside this.

BGE-small. Fast enough for real-time use, good enough for retrieval. Run it locally so you're not sending user data to an embedding API.

## Conversation Memory: The Short-Term Buffer

This is the simplest layer. Keep the last N messages in a buffer. Inject them into every prompt.

```python
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.messages import HumanMessage, AIMessage

class ConversationMemory:
    def __init__(self, max_messages: int = 20):
        self.max_messages = max_messages
        self.sessions: dict[str, InMemoryChatMessageHistory] = {}

    def get_session(self, session_id: str) -> InMemoryChatMessageHistory:
        if session_id not in self.sessions:
            self.sessions[session_id] = InMemoryChatMessageHistory()
        return self.sessions[session_id]

    def add_exchange(self, session_id: str, human: str, ai: str):
        session = self.get_session(session_id)
        session.add_message(HumanMessage(content=human))
        session.add_message(AIMessage(content=ai))

        # Trim to max
        messages = session.messages
        if len(messages) > self.max_messages:
            session.messages = messages[-self.max_messages:]

    def get_context(self, session_id: str) -> list:
        return self.get_session(session_id).messages

Twenty messages is the buffer size. Adjust based on your token budget. The key point: this dies when the session ends. It's working memory, not storage.

Semantic Memory: The Knowledge Base

This is where it gets interesting. Semantic memory stores facts and patterns as vector embeddings, searchable by meaning.

import chromadb
from datetime import datetime
import hashlib
import json

class SemanticMemory:
    def __init__(self, persist_dir: str = "./memory_db"):
        self.client = chromadb.PersistentClient(path=persist_dir)
        self.collection = self.client.get_or_create_collection(
            name="semantic_memory",
            metadata={"hnsw:space": "cosine"}
        )
        self.embeddings = embeddings

    def store(self, content: str, metadata: dict = None) -> str:
        """Store a fact or pattern in semantic memory."""
        content_hash = hashlib.md5(content.encode()).hexdigest()[:12]
        embedding = self.embeddings.embed_query(content)

        meta = {
            "stored_at": datetime.utcnow().isoformat(),
            "type": "fact",
            **(metadata or {})
        }

        # Upsert: same content updates instead of duplicating
        self.collection.upsert(
            ids=[content_hash],
            embeddings=[embedding],
            documents=[content],
            metadatas=[meta]
        )
        return content_hash

    def search(self, query: str, top_k: int = 5, min_score: float = 0.5) -> list[dict]:
        """Search memory by semantic similarity."""
        embedding = self.embeddings.embed_query(query)

        results = self.collection.query(
            query_embeddings=[embedding],
            n_results=top_k,
            include=["documents", "metadatas", "distances"]
        )

        memories = []
        for doc, meta, distance in zip(
            results["documents"][0],
            results["metadatas"][0],
            results["distances"][0]
        ):
            score = 1 - distance  # Cosine distance to similarity
            if score >= min_score:
                memories.append({
                    "content": doc,
                    "metadata": meta,
                    "relevance": round(score, 3)
                }) It is worth reading about [choosing the right vector database](/blog/vector-database-comparison-2025) alongside this.

        return memories

    def forget(self, memory_id: str):
        """Delete a specific memory."""
        self.collection.delete(ids=[memory_id])

The content hash as ID is deliberate. Store the same fact twice and it updates instead of duplicating. Agents love to re-store things they already know.

The minimum score threshold (0.5) filters out irrelevant matches. Vector search always returns results. Without a threshold, you'll inject noise into every prompt.

Episodic Memory: The Experience Archive

Episodic memory records entire interactions with context. "What happened, when, and how did it turn out?"

class EpisodicMemory:
    def __init__(self, persist_dir: str = "./memory_db"):
        self.client = chromadb.PersistentClient(path=persist_dir)
        self.collection = self.client.get_or_create_collection(
            name="episodic_memory",
            metadata={"hnsw:space": "cosine"}
        )
        self.embeddings = embeddings

    def record_episode(
        self,
        summary: str,
        session_id: str,
        outcome: str = "success",
        tools_used: list[str] = None,
        tags: list[str] = None
    ) -> str:
        """Record a completed interaction as an episode."""
        episode_id = f"ep-{datetime.utcnow().strftime('%Y%m%d-%H%M%S')}"
        embedding = self.embeddings.embed_query(summary)

        self.collection.add(
            ids=[episode_id],
            embeddings=[embedding],
            documents=[summary],
            metadatas=[{
                "session_id": session_id,
                "timestamp": datetime.utcnow().isoformat(),
                "outcome": outcome,
                "tools_used": json.dumps(tools_used or []),
                "tags": json.dumps(tags or [])
            }]
        )
        return episode_id

    def recall_similar(self, situation: str, top_k: int = 3) -> list[dict]:
        """Recall past episodes similar to the current situation."""
        embedding = self.embeddings.embed_query(situation)

        results = self.collection.query(
            query_embeddings=[embedding],
            n_results=top_k,
            include=["documents", "metadatas", "distances"]
        )

        episodes = []
        for doc, meta, distance in zip(
            results["documents"][0],
            results["metadatas"][0],
            results["distances"][0]
        ):
            episodes.append({
                "summary": doc,
                "outcome": meta["outcome"],
                "tools_used": json.loads(meta["tools_used"]),
                "when": meta["timestamp"],
                "relevance": round(1 - distance, 3)
            })

        return episodes

The Unified Memory Manager

Tie all three layers together with a single interface.

class AgentMemory:
    def __init__(self, persist_dir: str = "./memory_db"):
        self.conversation = ConversationMemory(max_messages=20)
        self.semantic = SemanticMemory(persist_dir)
        self.episodic = EpisodicMemory(persist_dir)

    def build_context(self, session_id: str, current_query: str) -> str:
        """Build the full memory context for a prompt."""
        parts = []

        # 1. Relevant semantic memories
        facts = self.semantic.search(current_query, top_k=5)
        if facts:
            parts.append("## Relevant Knowledge")
            for fact in facts:
                parts.append(f"- {fact['content']} (relevance: {fact['relevance']})")

        # 2. Similar past episodes
        episodes = self.episodic.recall_similar(current_query, top_k=3)
        if episodes:
            parts.append("\n## Similar Past Interactions")
            for ep in episodes:
                parts.append(
                    f"- {ep['summary']} (outcome: {ep['outcome']}, "
                    f"tools: {', '.join(ep['tools_used'])})"
                )

        # 3. Recent conversation
        messages = self.conversation.get_context(session_id)
        if messages:
            parts.append("\n## Recent Conversation")
            for msg in messages[-6:]:  # Last 3 exchanges
                role = "User" if isinstance(msg, HumanMessage) else "Agent"
                parts.append(f"**{role}:** {msg.content[:200]}")

        return "\n".join(parts)

    def learn(self, content: str, metadata: dict = None):
        """Store a new fact in semantic memory."""
        return self.semantic.store(content, metadata)

    def record(self, summary: str, session_id: str, **kwargs):
        """Record an episode in episodic memory."""
        return self.episodic.record_episode(summary, session_id, **kwargs)
``` This connects directly to [sharing memory across agents](/blog/cross-agent-memory-sharing).

## Integrating with Your Agent

```python
from langchain_anthropic import ChatAnthropic

memory = AgentMemory(persist_dir="./agent_memory")
llm = ChatAnthropic(model="claude-sonnet-4-20250514")

async def agent_respond(session_id: str, user_message: str) -> str:
    # Build context from all memory layers
    memory_context = memory.build_context(session_id, user_message)

    # Construct the prompt with memory
    system_prompt = f"""You are a helpful assistant with memory.
Use the following context from your memory to inform your responses.
If you learn something new and important, tell me so I can store it.

{memory_context}"""

    response = llm.invoke([
        ("system", system_prompt),
        ("human", user_message)
    ])

    # Store the conversation turn
    memory.conversation.add_exchange(session_id, user_message, response.content)

    return response.content

Automatic Memory Extraction

Don't make the agent decide what to remember. Extract facts automatically after each interaction.

async def extract_and_store_memories(conversation: str):
    """Use the LLM to extract storable facts from a conversation."""
    extraction_prompt = """Analyze this conversation and extract any facts,
    preferences, or patterns worth remembering for future interactions.

    Return each fact on a new line. Only include genuinely useful information.
    Skip pleasantries and routine exchanges.

    Conversation:
    {conversation}"""

    response = llm.invoke([("human", extraction_prompt.format(conversation=conversation))])

    facts = [f.strip() for f in response.content.strip().split("\n") if f.strip()]

    for fact in facts:
        memory.learn(fact, metadata={"source": "auto_extraction"})

    return facts

Run this at session end. The agent gradually builds a knowledge base about users, projects, and patterns without anyone explicitly telling it what to remember.

The Memory Lifecycle

Store aggressively, search selectively, forget deliberately. Every interaction potentially generates memories. Every prompt searches for relevant context. And periodically, you prune memories that are outdated, contradicted, or just noise.

That's how you build an agent that actually gets better over time. Not by fine-tuning the model. By giving it a memory that accumulates experience.