Agent Memory: Short-Term, Long-Term, and Why It Matters
By Diesel
ai-agentsmemoryarchitecture
Ask an agent to fix a bug it encountered yesterday and it stares at you blankly. Ask it to remember the coding pattern it used successfully an hour ago and it's gone. Ask it to learn from a mistake it made three conversations back and you'll get a hallucinated answer that sounds confident and is completely wrong.
An agent without memory is just a fancy autocomplete. It generates one response at a time, with no continuity, no learning, and no accumulation of knowledge. It's a goldfish with a PhD.
Memory is what turns a stateless text generator into something that actually gets better at its job.
## The Context Window Is Not Memory
Let's get this out of the way. The context window (the text the model can see in a single interaction) is not memory. It's working memory at best, and it has brutal limitations. This connects directly to [sharing memory across agents](/blog/cross-agent-memory-sharing).
It fills up. Current context windows range from 128K to 2M tokens. That sounds like a lot until your agent has been working for 20 minutes, reading files, making edits, checking results. The context fills up fast.
It's expensive. Every token in the context window costs money on every API call. Stuffing 200K tokens of conversation history into every request is a great way to bankrupt your API budget.
It's ephemeral. When the conversation ends, the context window is gone. Everything the agent "knew" disappears. Next conversation, blank slate.
Real memory has to survive beyond the context window. It has to persist across sessions. It has to be searchable, updatable, and relevant. This connects directly to [vector databases for agent memory](/blog/building-agent-memory-vector-databases).
## Short-Term Memory: What Am I Doing Right Now?
Short-term memory is the agent's awareness of its current task. What has it done so far? What's the current state? What's left to do?
In most implementations, this is literally the conversation history, possibly compressed. The agent reads its own previous actions and uses them to decide what to do next.
### Patterns That Work
**Scratchpad.** Give the agent an explicit working memory space. Instead of relying on the conversation history to remember its plan, have it write a structured scratchpad that it updates as it works. "Current objective: Fix login bug. Steps completed: read auth.ts, identified null check missing. Next: write fix and test." It is worth reading about [stateful versus stateless agent design](/blog/stateful-vs-stateless-agents) alongside this.
**Summarization.** When the context window gets full, summarize the old conversation rather than dropping it entirely. "In the first 50 messages, we established the database schema, identified three performance bottlenecks, and fixed two of them." The details are gone, but the narrative survives.
**State objects.** Maintain an explicit state object that tracks key variables across the interaction. Not the full conversation, just the important bits: current file being edited, errors encountered, decisions made, items remaining.
```python
agent_state = {
"current_task": "Fix authentication timeout",
"files_modified": ["src/auth/session.ts", "src/auth/config.ts"],
"tests_status": "2 passing, 1 failing",
"blockers": ["Need database connection string format"],
"decisions": ["Using JWT refresh tokens instead of session extension"]
}
```
## Long-Term Memory: What Have I Learned?
Long-term memory persists across conversations. It's how an agent remembers that your codebase uses a specific testing framework, that your API requires a particular authentication header, that the last time it tried approach X it failed because of Y.
This is where things get architecturally interesting.
### Vector Memory (Semantic Search)
The most common approach. Store memories as text, generate vector embeddings for each one, retrieve relevant memories by semantic similarity when needed.
Agent finishes a task. It stores a summary: "Fixed authentication timeout by implementing JWT refresh token rotation in session.ts. Key insight: the session middleware was checking expiry after processing, not before."
Next week, when the agent encounters a similar authentication issue, it searches its memory with the current problem description. The previous fix surfaces as relevant context. The agent doesn't have to rediscover the solution.
Strengths: flexible, works with any kind of text, good at finding conceptually similar memories even with different wording.
Weaknesses: embedding quality varies, similarity search can return plausible but irrelevant results, no structured querying.
### Key-Value Memory (Exact Retrieval)
Simple and effective for factual information. The agent stores structured facts: "database_connection_format: postgresql://user:pass@host:5432/db". When it needs that information, it retrieves it by key.
Strengths: exact, fast, no ambiguity.
Weaknesses: you need to know the key. Doesn't handle fuzzy queries. Doesn't generalize.
### Episodic Memory (What Happened)
Stores sequences of events, not just isolated facts. "In session #47, we tried to optimize the search query by adding an index. It improved read performance by 40% but caused write latency to spike. We reverted and used a materialized view instead."
This is the agent equivalent of experience. Not just knowing facts, but remembering what happened and what the outcomes were. Especially valuable for avoiding repeated mistakes.
### Hybrid Approaches
The best memory systems combine multiple patterns. Vector memory for fuzzy retrieval. Key-value for known facts. Episodic for experiences. Short-term scratchpad for current work.
I run a hybrid system with HNSW vector indexing on top of a SQL backend. Semantic search for discovery, exact key lookups for known entities, namespace separation for different memory types. It took effort to build, but an agent with this kind of memory is fundamentally different from one without.
## The Memory Lifecycle
Memory isn't just store and retrieve. It has a lifecycle that you need to manage.
**Ingestion.** What gets stored? Not everything. If you store every observation the agent makes, you'll drown in noise. Store patterns, decisions, outcomes, and errors. Skip routine observations that won't be useful later.
**Consolidation.** Over time, memories need to be consolidated. Ten separate memories about fixing authentication bugs should become one general pattern: "Authentication issues in this codebase usually stem from the session middleware. Check expiry timing first."
**Decay.** Old memories become less relevant. A memory about a codebase structure from six months ago might be actively harmful if the codebase has changed. Implement TTLs or confidence decay so stale memories don't poison current decisions.
**Deduplication.** Agents will try to store the same insight multiple times if you don't prevent it. Deduplicate before storing. Compare new memories against existing ones using similarity scores. If something is 90% similar to an existing memory, update the existing one instead of creating a duplicate.
## Memory and Multi-Agent Systems
Memory gets especially interesting with multiple agents. Does agent A have access to agent B's memories? Should they share a common memory pool or have isolated namespaces?
**Shared memory** enables collaboration. Agent A discovers a codebase pattern and stores it. Agent B, working on a different task, retrieves that pattern and benefits from it. Knowledge propagates across the system.
**Isolated memory** prevents contamination. Agent A's domain-specific knowledge might confuse agent B if their domains are different. A customer service agent's memory of product details shouldn't interfere with a code review agent's memory of coding patterns.
The practical answer is namespaced memory. Agents have their own namespaces for domain-specific knowledge and share common namespaces for project-wide information. Like giving each person their own notebook but keeping the team wiki accessible to everyone.
## The Honest Limitations
Memory systems today are nowhere near human memory. They don't generalize well. They don't form abstractions naturally. They don't know what they don't know.
An agent might retrieve a memory that's semantically similar but contextually irrelevant. It might fail to retrieve a memory that's critically important because the query wording didn't match the stored wording. It might trust a stale memory over fresh evidence.
These problems are manageable but real. The agents I trust most are the ones where I've invested heavily in memory quality, not memory quantity. Fewer, higher-quality memories beat a massive store of everything the agent has ever seen.
Memory is infrastructure, not a feature. Build it right and everything the agent does gets better over time. Skip it and your agent wakes up with amnesia every single session. Your call.