Agent Communication Protocols: How AI Agents Talk to Each Other
By Diesel
multi-agentprotocolscommunication
## The Translation Problem
Two humans can miscommunicate with a shared language, shared culture, and decades of social context. Now imagine two LLMs trying to coordinate with nothing but JSON payloads.
Agent communication isn't just message passing. It's establishing shared understanding between systems that have no ground truth, no shared memory (by default), and a tendency to confidently state things that are wrong. The protocol has to handle all of that.
I've iterated through four generations of agent communication in production. Each one taught me something the hard way. For a deeper look, see [the Model Context Protocol](/blog/model-context-protocol-mcp).
## Protocol 1: Unstructured Natural Language
The lazy approach. Agents talk to each other in plain English.
```
Agent A → Agent B: "Hey, can you review this code and check
for security issues? The file is auth.py and I changed the
token validation logic."
```
You'd think LLMs would excel at this since they're literally language models. They don't. Or rather, they're too good at it. They interpret, paraphrase, add context, and introduce ambiguity. Agent B might "helpfully" review more than auth.py because it inferred you'd want a broader review.
**The failure mode:** Semantic drift. By the third hop in a chain, the original intent is garbled. Like a game of telephone played by overthinkers.
I used this for exactly one project. Never again.
## Protocol 2: Structured Messages with Schemas
Define message types. Validate them. Reject anything that doesn't conform.
```python
from pydantic import BaseModel
from enum import Enum
class MessageType(Enum):
TASK_ASSIGN = "task_assign"
TASK_RESULT = "task_result"
QUERY = "query"
QUERY_RESPONSE = "query_response"
STATUS_UPDATE = "status_update"
class AgentMessage(BaseModel):
type: MessageType
sender: str
recipient: str
payload: dict
correlation_id: str # ties request to response
timestamp: float
ttl: int = 30 # seconds before message expires
```
Now agents can't freestyle. They fill out forms. Boring? Yes. Reliable? Dramatically.
The `correlation_id` is critical. Without it, you can't match a response to its request when multiple conversations are happening concurrently. Learn this from distributed systems, not from painful debugging at 2 AM. Actually, I learned it at 2 AM. You don't have to.
### The Payload Problem
The schema above structures the envelope. But what about the payload? If the payload is unstructured, you've just moved the chaos one level deeper.
```python
class TaskAssignPayload(BaseModel):
task_description: str
input_files: list[str]
expected_output_format: str
constraints: list[str]
max_tokens_budget: int
quality_threshold: float # 0-1
class TaskResultPayload(BaseModel):
status: Literal["success", "failure", "partial"]
output: dict
confidence: float
tokens_used: int
issues_found: list[str] | None = None
```
Type the payloads too. Every field that's a free-text string is a field where agents will introduce ambiguity. Minimize them.
## Protocol 3: Tool-Based Communication
Instead of messages, agents expose tools to each other. Agent A doesn't "ask" Agent B to review code. Agent A calls Agent B's `review_code` tool with typed parameters.
```python
class ReviewerAgent:
@tool(
name="review_code",
params={
"file_path": str,
"focus_areas": list[str],
"severity_threshold": str
},
returns={
"issues": list[dict],
"approval": bool,
"summary": str
}
)
async def review_code(self, file_path, focus_areas,
severity_threshold):
# Structured in, structured out
...
```
This is the MCP (Model Context Protocol) approach. Each agent is a server that exposes capabilities as tools. Callers don't need to know how the agent works internally. They just call the tool with the right parameters and get typed results back. It is worth reading about [consensus mechanisms](/blog/multi-agent-consensus) alongside this.
**The advantage:** Zero ambiguity in the interface. The tool signature is the contract.
**The disadvantage:** Rigidity. If an agent discovers something unexpected during review ("this file also has a SQL injection"), it has nowhere to put that finding unless the return schema accounts for it. You end up either over-engineering schemas to handle every edge case or losing valuable information that doesn't fit the mold.
## Protocol 4: Event-Driven with Typed Channels
My current preference. Agents publish events to typed channels. Other agents subscribe to channels they care about.
```python
class EventBus:
def __init__(self):
self.channels: dict[str, list[Callable]] = {}
async def publish(self, channel: str, event: BaseModel):
for handler in self.channels.get(channel, []):
await handler(event)
def subscribe(self, channel: str, handler: Callable):
self.channels.setdefault(channel, []).append(handler)
# Typed events
class CodeWrittenEvent(BaseModel):
file_path: str
author_agent: str
change_summary: str
diff: str
class ReviewCompletedEvent(BaseModel):
file_path: str
reviewer_agent: str
approved: bool
issues: list[dict]
needs_revision: bool
```
Agents don't talk to each other directly. They announce what happened. Interested parties react. This decouples the "who" from the "what."
The code author doesn't need to know a security agent exists. It publishes `CodeWrittenEvent`. If a security agent is subscribed, it reviews. If not, the event goes nowhere. Add a new agent type and you don't modify any existing agents. Just subscribe to the channels.
## Practical Lessons
### Always Include Provenance
Every message or event needs to say where it came from and what it's based on.
```python
class Provenance(BaseModel):
source_agent: str
based_on: list[str] # IDs of inputs used
confidence: float
reasoning_summary: str
```
Without provenance, you can't debug, you can't audit, and you can't detect when an agent's conclusion is based on another agent's hallucination.
### Set TTL on Everything
Messages expire. If Agent B is down and Agent A sent a task 10 minutes ago, that task shouldn't suddenly execute when Agent B wakes up. The context has moved on. Stale messages cause stale actions. For a deeper look, see [orchestration layer above them](/blog/multi-agent-orchestration-patterns).
```python
def is_valid(message: AgentMessage) -> bool:
age = time.time() - message.timestamp
return age < message.ttl
```
### Rate Limit Agent-to-Agent Communication
Agents are chatty by nature. An agent asked to "continuously monitor" will flood the bus with status updates. Every message consumes context tokens on the receiving end.
```python
class RateLimiter:
def __init__(self, max_per_minute=10):
self.limit = max_per_minute
self.counts: dict[str, deque] = {}
def allow(self, agent_id: str) -> bool:
now = time.time()
window = self.counts.setdefault(agent_id, deque())
while window and window[0] < now - 60:
window.popleft()
if len(window) >= self.limit:
return False
window.append(now)
return True
```
### Validate Both Sides
Don't just validate incoming messages. Validate outgoing ones too. Agents hallucinate. An agent might produce a `TaskResult` with `status: "success"` and an empty output field. Catch it before it propagates.
```python
async def send(self, message: AgentMessage):
# Validate outgoing
message.model_validate(message.model_dump())
if message.type == MessageType.TASK_RESULT:
payload = TaskResultPayload(**message.payload)
assert payload.output, "Empty output on success"
await self.bus.publish(message)
```
## The Protocol Stack
In practice, you layer these. Tool-based for direct commands ("do this specific thing"). Event-driven for coordination ("this happened, react if relevant"). Structured messages for queries ("what's the status of X").
Don't pick one protocol and force everything through it. Different communication needs deserve different protocols. The key is that every protocol in your stack enforces structure, includes provenance, expires stale data, and validates at both ends.
The agents themselves are non-deterministic. The communication layer can't afford to be.