Hierarchical vs Flat: Choosing Your Multi-Agent Topology
By Diesel
multi-agenttopologyarchitecture
## The Org Chart for Robots
Every multi-agent system has a topology whether you design one or not. Skip the design phase and your agents will self-organize into chaos. That's not an emergent intelligence breakthrough. That's a production incident.
Topology defines who talks to whom, who delegates to whom, and who has authority over whom. Get it wrong and you'll spend more time debugging agent interactions than building features. I've done both. The debugging is worse.
## Flat Topology: Everyone's Equal
In a flat topology, every agent can communicate with every other agent. No hierarchy. No managers. Pure peer-to-peer democracy.
```
Agent A ←→ Agent B
↕ × ↕
Agent C ←→ Agent D
```
The appeal is obvious. Simple to reason about. No single point of failure. Any agent can pick up work from any other agent.
### Where Flat Works
Small teams. Three to five agents doing distinct jobs with minimal overlap. A coder, a reviewer, and a tester. They pass work between themselves, each one knows its lane, and there's not enough complexity to warrant management overhead.
```python
class FlatTeam:
def __init__(self, agents):
self.agents = {a.role: a for a in agents}
self.message_bus = MessageBus()
async def process(self, task):
# Any agent can claim work
for agent in self.agents.values():
self.message_bus.subscribe(
agent.role, agent.handle
)
await self.message_bus.publish("task.new", task)
```
Flat topologies shine when every agent is a specialist with zero role ambiguity. The coder never reviews. The reviewer never codes. Crystal clear boundaries.
### Where Flat Dies
Scale it past five agents and watch the communication explosion. With N agents in a fully connected mesh, you get N*(N-1)/2 potential communication channels. Ten agents means 45 channels. Fifteen means 105.
That's not just a networking problem. It's a context problem. Every agent needs to understand messages from every other agent. The security agent needs to parse messages from the UI agent, the database agent, the API agent, and the deployment agent. Its context window fills up with irrelevant cross-domain chatter. The related post on [orchestration patterns](/blog/multi-agent-orchestration-patterns) goes further on this point.
I've seen flat topologies where agents spent more tokens coordinating than working. That's the death signal.
## Hierarchical Topology: Chain of Command
A tree. Top-level coordinator delegates to mid-level managers who delegate to workers. Communication flows up and down the tree, never sideways between branches.
```
Architect
/ | \
Frontend API Data
/ \ | | \
React Style Routes DB Cache
```
### Where Hierarchy Works
Complex systems with natural domain boundaries. Enterprise projects. Anything that maps to how human engineering teams actually organize.
The killer advantage is context isolation. The React agent never sees database queries. The DB agent never sees CSS. Each agent operates in a focused context window with only the information relevant to its domain. This directly improves output quality because LLMs perform better with less noise in context.
```python
class HierarchicalNode:
def __init__(self, agent, children=None):
self.agent = agent
self.children = children or []
async def delegate(self, task):
subtasks = await self.agent.decompose(task)
results = await asyncio.gather(*[
self.route_to_child(st) for st in subtasks
])
return await self.agent.synthesize(results)
async def route_to_child(self, subtask):
child = self.find_best_child(subtask)
return await child.delegate(subtask)
```
### Where Hierarchy Dies
Latency. Information has to traverse the tree. If a React component needs to know about an API endpoint's response shape, that question goes up to Frontend Manager, up to Architect, down to API Manager, down to the Routes agent, and then the answer travels all the way back. Four hops for a simple question.
The other killer is the single point of failure at the top. If the Architect agent hallucinates or runs out of context, every branch is affected. You need the top of the tree to be the most reliable, highest-quality agent in your system. Which usually means the most expensive.
## Mesh Topology: Controlled Connections
A middle ground. Agents connect to some peers, not all. The connection graph is designed, not emergent.
```
Architect → Frontend → React
↓ ↘
API ← SharedTypes
↓
Database
```
You get lateral communication where it makes sense (Frontend talks directly to SharedTypes) without the chaos of full mesh. The key word is "designed." You decide which lateral connections exist based on actual workflow needs.
### Hierarchical Mesh: My Default
This is what I run in production. A hierarchy for authority and delegation, with selective mesh connections for agents that frequently need each other's context.
```python
class HierarchicalMesh:
def __init__(self):
self.hierarchy = {} # parent-child tree
self.lateral = {} # peer-to-peer shortcuts
def add_lateral(self, agent_a, agent_b, context):
"""Allow direct communication between peers"""
self.lateral[(agent_a, agent_b)] = context
# Agents can query each other without
# routing through the hierarchy
``` For a deeper look, see [supervisor-based hierarchies](/blog/supervisor-pattern-agent-managers).
The hierarchy handles task decomposition and result synthesis. The mesh handles "I need to know X from agent Y right now" without routing through three managers.
## The Topology Decision Framework
Don't pick based on what sounds elegant. Pick based on your constraints.
**Agent count under 5, distinct roles, simple workflow:** Flat. Don't overcomplicate it. A hierarchy of three agents is just bureaucracy.
**Agent count 5-15, clear domain boundaries, quality matters:** Hierarchical. The context isolation alone is worth the latency cost.
**Agent count 5-15, frequent cross-domain queries, latency-sensitive:** Hierarchical mesh. Hierarchy for structure, mesh shortcuts for speed.
**Agent count 15+:** You need a proper service mesh. Hierarchical mesh with routing tables, load balancing, and circuit breakers. At this point you're building a distributed system, and all the distributed systems literature applies.
## What People Get Wrong
**Premature hierarchy.** Three agents don't need a manager. They need a message bus and clear role definitions. Adding hierarchy to a small system adds latency and complexity for zero benefit.
**Flat at scale.** "But it's simpler!" No. It was simpler. Past five agents, the emergent complexity of uncontrolled communication dwarfs the designed complexity of a hierarchy. It is worth reading about [load balancing across agents](/blog/load-balancing-ai-agents) alongside this.
**Static topology.** Your topology should evolve with your system. Start flat, add hierarchy when coordination overhead becomes visible, add mesh shortcuts when latency hurts. Topology is a living design decision, not a one-time architecture choice.
**Ignoring the context window.** This is the constraint that makes agent topology different from microservice topology. Microservices have unlimited memory for their domain. Agents don't. Every message received is context consumed. Topology design is fundamentally a context budget exercise.
## The Practical Takeaway
I start every new project flat. Two or three agents, clear roles, shared message bus. When I add the fourth or fifth agent, I look at the communication patterns. If most messages flow in a clear direction (task comes in, gets decomposed, specialists work, results flow back), I refactor to hierarchy. If there are frequent lateral queries, I add mesh shortcuts.
The topology you ship is rarely the topology you started with. And that's fine. The mistake is treating topology as a fixed architectural decision instead of an evolving operational one.
Design for today's agent count. Plan for tomorrow's. Refactor when the pain is real, not theoretical.