Agent Communication Protocols: How AI Agents Talk to Each Other

The Translation Problem

Two humans can miscommunicate with a shared language, shared culture, and decades of social context. Now imagine two LLMs trying to coordinate with nothing but JSON payloads.

Agent communication isn't just message passing. It's establishing shared understanding between systems that have no ground truth, no shared memory (by default), and a tendency to confidently state things that are wrong. The protocol has to handle all of that.

I've iterated through four generations of agent communication in production. Each one taught me something the hard way. For a deeper look, see the Model Context Protocol.

Protocol 1: Unstructured Natural Language

The lazy approach. Agents talk to each other in plain English.

Agent A → Agent B: "Hey, can you review this code and check
for security issues? The file is auth.py and I changed the
token validation logic."

You'd think LLMs would excel at this since they're literally language models. They don't. Or rather, they're too good at it. They interpret, paraphrase, add context, and introduce ambiguity. Agent B might "helpfully" review more than auth.py because it inferred you'd want a broader review.

The failure mode: Semantic drift. By the third hop in a chain, the original intent is garbled. Like a game of telephone played by overthinkers.

I used this for exactly one project. Never again.

Protocol 2: Structured Messages with Schemas

Define message types. Validate them. Reject anything that doesn't conform.

from pydantic import BaseModel
from enum import Enum

class MessageType(Enum):
    TASK_ASSIGN = "task_assign"
    TASK_RESULT = "task_result"
    QUERY = "query"
    QUERY_RESPONSE = "query_response"
    STATUS_UPDATE = "status_update"

class AgentMessage(BaseModel):
    type: MessageType
    sender: str
    recipient: str
    payload: dict
    correlation_id: str  # ties request to response
    timestamp: float
    ttl: int = 30  # seconds before message expires

Now agents can't freestyle. They fill out forms. Boring? Yes. Reliable? Dramatically.

The correlation_id is critical. Without it, you can't match a response to its request when multiple conversations are happening concurrently. Learn this from distributed systems, not from painful debugging at 2 AM. Actually, I learned it at 2 AM. You don't have to.

The Payload Problem

The schema above structures the envelope. But what about the payload? If the payload is unstructured, you've just moved the chaos one level deeper.

class TaskAssignPayload(BaseModel):
    task_description: str
    input_files: list[str]
    expected_output_format: str
    constraints: list[str]
    max_tokens_budget: int
    quality_threshold: float  # 0-1

class TaskResultPayload(BaseModel):
    status: Literal["success", "failure", "partial"]
    output: dict
    confidence: float
    tokens_used: int
    issues_found: list[str] | None = None

Type the payloads too. Every field that's a free-text string is a field where agents will introduce ambiguity. Minimize them.

Protocol 3: Tool-Based Communication

Instead of messages, agents expose tools to each other. Agent A doesn't "ask" Agent B to review code. Agent A calls Agent B's review_code tool with typed parameters.

class ReviewerAgent:
    @tool(
        name="review_code",
        params={
            "file_path": str,
            "focus_areas": list[str],
            "severity_threshold": str
        },
        returns={
            "issues": list[dict],
            "approval": bool,
            "summary": str
        }
    )
    async def review_code(self, file_path, focus_areas,
                          severity_threshold):
        # Structured in, structured out
        ...

This is the MCP (Model Context Protocol) approach. Each agent is a server that exposes capabilities as tools. Callers don't need to know how the agent works internally. They just call the tool with the right parameters and get typed results back. It is worth reading about consensus mechanisms alongside this.

The advantage: Zero ambiguity in the interface. The tool signature is the contract.

The disadvantage: Rigidity. If an agent discovers something unexpected during review ("this file also has a SQL injection"), it has nowhere to put that finding unless the return schema accounts for it. You end up either over-engineering schemas to handle every edge case or losing valuable information that doesn't fit the mold.

Protocol 4: Event-Driven with Typed Channels

My current preference. Agents publish events to typed channels. Other agents subscribe to channels they care about.

class EventBus:
    def __init__(self):
        self.channels: dict[str, list[Callable]] = {}

    async def publish(self, channel: str, event: BaseModel):
        for handler in self.channels.get(channel, []):
            await handler(event)

    def subscribe(self, channel: str, handler: Callable):
        self.channels.setdefault(channel, []).append(handler)

# Typed events
class CodeWrittenEvent(BaseModel):
    file_path: str
    author_agent: str
    change_summary: str
    diff: str

class ReviewCompletedEvent(BaseModel):
    file_path: str
    reviewer_agent: str
    approved: bool
    issues: list[dict]
    needs_revision: bool

Agents don't talk to each other directly. They announce what happened. Interested parties react. This decouples the "who" from the "what."

The code author doesn't need to know a security agent exists. It publishes CodeWrittenEvent. If a security agent is subscribed, it reviews. If not, the event goes nowhere. Add a new agent type and you don't modify any existing agents. Just subscribe to the channels.

Practical Lessons

Always Include Provenance

Every message or event needs to say where it came from and what it's based on.

class Provenance(BaseModel):
    source_agent: str
    based_on: list[str]  # IDs of inputs used
    confidence: float
    reasoning_summary: str

Without provenance, you can't debug, you can't audit, and you can't detect when an agent's conclusion is based on another agent's hallucination.

Set TTL on Everything

Messages expire. If Agent B is down and Agent A sent a task 10 minutes ago, that task shouldn't suddenly execute when Agent B wakes up. The context has moved on. Stale messages cause stale actions. For a deeper look, see orchestration layer above them.

def is_valid(message: AgentMessage) -> bool:
    age = time.time() - message.timestamp
    return age < message.ttl

Rate Limit Agent-to-Agent Communication

Agents are chatty by nature. An agent asked to "continuously monitor" will flood the bus with status updates. Every message consumes context tokens on the receiving end.

class RateLimiter:
    def __init__(self, max_per_minute=10):
        self.limit = max_per_minute
        self.counts: dict[str, deque] = {}

    def allow(self, agent_id: str) -> bool:
        now = time.time()
        window = self.counts.setdefault(agent_id, deque())
        while window and window[0] < now - 60:
            window.popleft()
        if len(window) >= self.limit:
            return False
        window.append(now)
        return True

Validate Both Sides

Don't just validate incoming messages. Validate outgoing ones too. Agents hallucinate. An agent might produce a TaskResult with status: "success" and an empty output field. Catch it before it propagates.

async def send(self, message: AgentMessage):
    # Validate outgoing
    message.model_validate(message.model_dump())
    if message.type == MessageType.TASK_RESULT:
        payload = TaskResultPayload(**message.payload)
        assert payload.output, "Empty output on success"
    await self.bus.publish(message)

The Protocol Stack

In practice, you layer these. Tool-based for direct commands ("do this specific thing"). Event-driven for coordination ("this happened, react if relevant"). Structured messages for queries ("what's the status of X").

Don't pick one protocol and force everything through it. Different communication needs deserve different protocols. The key is that every protocol in your stack enforces structure, includes provenance, expires stale data, and validates at both ends.

The agents themselves are non-deterministic. The communication layer can't afford to be.