Data Leakage in AI Agents: When Your Agent Tells Secrets

Your AI agent just read a customer's medical records, summarised them perfectly, and then casually mentioned the diagnosis in a response to a different customer. Nobody noticed for three weeks.

That's not a hypothetical. I've seen variations of this story play out in production systems. The details change. The pattern doesn't. AI agents are phenomenal at processing information and absolutely terrible at understanding who should see what.

Data leakage in agent systems is a different beast from traditional data breaches. It's subtle, hard to detect, and can happen through mechanisms that don't look like security failures at all.

How Agents Leak Data

There are several distinct leakage vectors, and most teams only think about one or two of them.

Cross-Conversation Contamination

This is the most common one. Agent systems that maintain context across conversations, or use shared memory, can bleed information between users. User A asks about their account. The details enter the agent's context. User B asks a vaguely related question. The agent, trying to be helpful, references information from User A's session.

With shared memory systems, this gets worse. An agent stores a useful pattern it learned from processing one customer's data. That pattern, complete with specific details, gets retrieved when helping a different customer. The agent didn't "decide" to leak. It retrieved relevant context, and the context happened to contain someone else's private information.

Tool Output Leakage

Your agent queries a database and gets back more data than it needs. Maybe the query returns full customer records when it only needed a name. The agent now has that excess data in its context. Even if it doesn't immediately expose it, it's there. It can influence responses. It can appear in logs. It can get stored in conversation history that someone else reviews later.

This is the data minimisation problem, and most agent architectures ignore it completely. They pass raw API responses and database results into the model context without filtering.

Prompt and Context Logging

Every debugging and monitoring system I've seen for AI agents logs prompts and responses. This is necessary for debugging. It's also a massive data leakage surface. Those logs contain every piece of sensitive data the agent processed. Customer names, addresses, financial details, health information. All sitting in a logging system that probably has broader access controls than the production data itself. This connects directly to least-privilege permissions.

I've seen teams with strict database access controls whose agent logs were readable by every developer on the team. The logs contained the same data that was supposedly protected by those database controls.

Model Provider Exposure

If you're using a cloud-hosted model (and most of you are), every prompt you send is transmitted to a third party. Yes, providers have data processing agreements. Yes, they claim not to train on your data. But the data still leaves your infrastructure. For regulated industries, this can be a compliance violation regardless of what the provider's terms say.

Embedding and Memory Leakage

Vector databases and retrieval systems store embeddings of your data. Those embeddings can be inverted, partially or fully, to recover the original text. If your embedding store has weaker access controls than your primary data store (and it usually does), you've created a shadow copy of your sensitive data with fewer protections.

Why Traditional DLP Doesn't Work

Data Loss Prevention tools are designed for a world where data moves in predictable ways. Files get attached to emails. Records get exported to spreadsheets. Databases get queried through known interfaces.

AI agents break every assumption DLP relies on.

Data doesn't move as files. It flows as natural language through prompts and responses. A customer's social security number doesn't appear as a structured field. It appears embedded in a sentence: "The customer's SSN ending in 4532 was used for verification." Good luck writing a regex that catches every possible natural language expression of sensitive data.

The data paths are non-deterministic. The same agent, given the same input, might produce different outputs that include different subsets of sensitive data. You can't predict which data will appear where because the model's behaviour is stochastic.

Context windows are opaque. You can inspect what goes in and what comes out, but you can't see how data combines and influences outputs inside the model. Two innocuous pieces of information might combine to reveal something sensitive that neither piece reveals alone.

Building Leak-Resistant Agent Architectures

You can't eliminate leakage risk entirely. But you can reduce it dramatically with the right architecture.

Data Minimisation at the Tool Layer

Never pass raw data into the agent's context. Every tool should return the minimum data required for the task. If the agent needs a customer's name, the tool returns a name. Not a full customer record. Not a JSON blob with 47 fields. A name. For a deeper look, see RAG access control.

This requires building intentional data access layers instead of connecting your agent directly to your database. More work upfront. Dramatically less risk in production.

Session Isolation

Each conversation gets its own isolated context. No shared memory between users. No cross-session context retrieval that could surface another user's data. If you need shared knowledge, use curated, non-sensitive knowledge bases.

For multi-user agent systems, this means maintaining strict tenant boundaries in your memory and retrieval systems. Every stored embedding, every cached response, every conversation log needs to be scoped to a specific user or organisation.

Output Scanning

Before any agent response reaches the user, scan it for data that shouldn't be there. PII detection, pattern matching for sensitive formats (card numbers, SSNs, API keys), and anomaly detection for responses that contain more specific information than the query warranted.

This isn't foolproof. Natural language makes pattern matching hard. But it catches the obvious cases, and the obvious cases are the ones most likely to happen at scale.

Structured Logging With Redaction

You need logs for debugging. You don't need sensitive data in those logs. Implement automatic redaction in your logging pipeline. Replace PII with tokens. Store the mapping separately with strict access controls. Developers can debug agent behaviour without seeing customer data.

Access-Scoped Tool Execution

When an agent acts on behalf of a user, its data access should be scoped to that user's permissions. Not the agent's service account permissions. Not the developer's permissions. The end user's permissions. This prevents the agent from accessing data the requesting user shouldn't see, even if the agent technically could.

The Compliance Dimension

GDPR, HIPAA, SOC 2, PCI DSS. Pick your framework. They all have data handling requirements that AI agents can violate in creative and unexpected ways. This connects directly to sandboxed execution environments.

GDPR's right to erasure becomes complicated when your agent's conversation logs, embedding stores, memory systems, and fine-tuning data all contain traces of the user's information. "Delete my data" means finding and removing data from potentially dozens of subsystems.

HIPAA's minimum necessary standard directly conflicts with how most agents process medical data. Throwing a full patient record into an LLM context to answer a specific question violates minimum necessary, even if the agent's response only mentions the relevant detail.

The answer isn't to avoid using agents in regulated environments. It's to design your agent architecture with compliance requirements as first-class constraints, not afterthoughts.

The Uncomfortable Reality

Every AI agent system leaks data. The question is whether it leaks to authorised parties in controlled ways, or to unauthorised parties in ways you don't detect.

Build for the assumption that data will flow in unexpected directions. Minimise what's available to flow. Monitor what actually flows. React quickly when flows go wrong.

Your agent is processing the most sensitive data in your organisation. Treat its data handling with the same rigour you'd apply to your most critical database. Actually, treat it with more rigour, because databases don't improvise.