The word "agent" has been beaten senseless by marketing departments. Every chatbot wrapper with a system prompt now calls itself an AI agent. Every automation script with an LLM call somewhere in the pipeline gets the agent label slapped on it. And every enterprise vendor who sold you RPA last year is now selling you "agentic AI" with the same code and a new landing page.
Let's fix that.
## What Actually Makes Something an Agent
An AI agent is a system that perceives its environment, reasons about what to do, takes actions, and observes the results. Then it does it again. And again. Until the goal is met or it decides it can't be.
That loop is the whole thing. Perception, reasoning, action, observation. Repeat.
A chatbot takes your input, generates a response, and stops. It doesn't go do anything. It doesn't check if its answer worked. It doesn't adapt. You ask it a question, it gives you text back, conversation over.
A script runs a fixed sequence of steps. No reasoning. No adaptation. If step 3 fails, it either crashes or runs the hardcoded error handler. It doesn't think about what went wrong and try a different approach.
A workflow orchestrator like n8n or Zapier connects steps together with conditional logic. Closer, but still no reasoning. The decision tree is authored by a human at design time. The system doesn't generate novel plans at runtime.
An agent does all of that. It decides what to do next based on what it observes, not based on a hardcoded flowchart.
### What That Loop Looks Like in Practice
Here's a concrete example. An agent tasked with researching a topic might:
1. Read the user's question and any prior search results it's collected
2. Decide whether what it has is sufficient, or whether more searching would help
3. Call a search API with a refined query based on gaps in its current knowledge
4. Read the new results and fold them into what it already knows
5. Go back to step 2
Each pass through the loop, the agent has more context. Its next decision is informed by what it's already tried. A chatbot can't do this. A script can't do this unless every branch was explicitly coded. A workflow orchestrator can simulate it, but only along paths someone anticipated at design time.
The genuine novelty is that agents generate plans the author never explicitly wrote. The branching logic emerges from the model's reasoning at runtime. That's a different category of system, not just a fancier chatbot.
## The Three Properties That Matter
**Autonomy.** An agent can operate without human input for stretches of time. Not infinitely. Not without guardrails. But it can take multiple steps toward a goal without someone pressing "continue" after each one.
This doesn't mean unsupervised. The most useful production agents have human checkpoints at high-stakes decision points. Autonomy is about the default mode, not the maximum. An agent that requires human approval for every action is a UI, not an agent.
**Tool use.** An agent can interact with the world beyond generating text. It can call APIs, query databases, read files, execute code, send emails. The specific tools don't matter. What matters is that it can act on external systems.
Tool use is what separates agents from very sophisticated text generators. Without tools, the model can only reason about the world. With tools, it can change the world. That combination is what makes agents worth the complexity they add.
**Goal-directed behavior.** An agent works toward an objective, not just a single response. It can decompose a goal into steps, execute those steps, evaluate progress, and adjust its approach.
If your system has all three, it's an agent. If it's missing one or more, it might still be useful, but calling it an agent is just marketing.
### Why Tool Use Is the Pivotal Property
Of the three, tool use is the one that changes the risk profile most dramatically.
An agent that can only reason is a sophisticated autocomplete. If it gets something wrong, a human reads the output and catches it.
An agent that can act on external systems is something else entirely. It can send emails you didn't write. It can delete records you didn't mean to delete. It can call APIs that charge money, mutate state, or trigger downstream processes. The reasoning capability that makes it useful is the same capability that makes mistakes expensive.
This is why the engineering around agents matters more than the model powering them. The guardrails, the permission boundaries, the confirmation steps for irreversible actions: none of that is glamorous, and all of it is what stands between a useful system and an expensive incident.
## What Agents Aren't
**Agents aren't magic.** They're software systems with clear architectural patterns. The LLM provides the reasoning capability, but the agent framework provides the structure, the tools, the memory, and the guardrails. Without that scaffolding, you just have a very expensive text generator.
The engineering work that makes agents valuable is mostly invisible in demos: state management, tool definitions, error handling, retry logic, observation formatting, output validation. None of it is exciting. All of it is necessary.
**Agents aren't reliable by default.** LLMs hallucinate. They misunderstand instructions. They get stuck in loops. An agent built naively will do all of these things, repeatedly, with access to your production systems. The engineering challenge isn't making agents work. It's making them work reliably enough to trust.
Reliability doesn't come from picking a better model. It comes from systematic evaluation, tight permission scopes for tools, human checkpoints at decision boundaries, and honest monitoring of failure modes in production. The model is maybe 30% of the reliability story. The rest is infrastructure.
**Agents aren't a replacement for good software design.** If you can solve a problem with a deterministic script, do that. Agents add complexity, latency, cost, and unpredictability. They're worth it when the problem genuinely requires flexible reasoning at runtime. They're not worth it when you just want to send a formatted email.
The worst agent implementations I've seen were built because agents are exciting, not because the problem needed one. A deterministic script would have been faster, cheaper, and more reliable. Choose the right tool for the problem, not the most interesting tool on the shelf.
## The Spectrum Nobody Acknowledges
The industry treats "agent" as a binary. Either your system is an agent or it isn't. Reality is a spectrum.
On one end: a chatbot with no tools, no memory, no autonomy. Pure text in, text out.
On the other end: a fully autonomous system that sets its own goals, builds its own tools, and operates indefinitely without human oversight. This doesn't really exist yet, despite what some demos suggest.
Everything interesting lives in the middle. A customer support system that can look up orders, issue refunds under $50, and escalate everything else to a human. A coding assistant that can read your codebase, propose changes, run tests, and iterate until they pass. A research agent that can search multiple sources, cross-reference findings, and produce a synthesis.
These are all agents. They all have different levels of autonomy, different tool sets, different trust boundaries. The level of independence should match the stakes involved, but that's a design decision, not a definition question.
### Matching Autonomy to Stakes
A practical rule: autonomy should be inversely proportional to the cost of a mistake.
An agent that summarizes internal documents? Let it run. A bad summary is low-cost. A human reviews the output before acting on it.
An agent that sends customer emails? Add a human approval step for anything outside a narrow template. A bad email has reputational cost and potentially legal exposure.
An agent that executes financial transactions? Human in the loop at every significant action. Mistakes are measured in actual money.
This isn't a limitation of the technology. It's rational risk management. For a deeper look at designing this correctly, [autonomous vs assistive agents](/blog/autonomous-vs-assistive-agents) covers how to match agent independence to business risk.
## Why This Distinction Matters for Enterprise
When a vendor tells you they're selling an "AI agent," ask these questions:
1. What actions can it take? If the answer is "it generates responses," that's a chatbot.
2. Can it use tools? Which ones? With what permissions?
3. What happens when it's wrong? Is there a human in the loop? Where?
4. How does it handle multi-step problems? Does it plan, or does it just respond?
5. What does it remember between interactions?
The answers will tell you whether you're looking at an actual agent or a chatbot in an expensive trenchcoat.
### The Questions Nobody Asks But Should
Beyond the basic checklist, there are questions that separate vendors who understand agents from vendors who are riding a trend.
**How does it handle ambiguity?** Real problems are ambiguous. An agent that asks clarifying questions before acting is more useful than one that charges ahead and confidently does the wrong thing. If the vendor shows you a demo where the input is always perfectly formed, ask what happens when it isn't.
**What's the failure mode?** Every system fails. The interesting question is how. Does the agent fail silently? Does it escalate to a human? Does it produce an error a human can act on? A vendor who can't answer this hasn't thought about production.
**How is it evaluated?** If the answer is "we ran some demos," that's not evaluation. Real agent evaluation involves adversarial test cases, distribution shift testing, and ongoing monitoring in production. Ask for metrics, not screenshots.
**What does the audit trail look like?** For anything in a regulated industry or touching sensitive data, you need to know what the agent did and why. If the vendor can't show you a trace of every action and decision, you can't use it in anything that matters.
## The Honest Assessment
I've built enough agents to know that the technology is genuinely powerful and genuinely overhyped at the same time. Agents can do things that were impossible two years ago. They can also fail in ways that are spectacular, expensive, and occasionally hilarious.
The difference between a useful agent and a liability isn't the model. It's the engineering around it. The guardrails, the evaluation, the monitoring, the fallback strategies. The boring stuff that doesn't make it into demo videos.
If you're evaluating agents for your organization, ignore the demos. Ask about the failure modes. Ask about the testing. Ask about what happens at 3 AM when the agent encounters something it's never seen before.
That's where the real engineering lives.
If you're ready to build rather than evaluate, [building your first agent with LangGraph](/blog/building-first-agent-langgraph) gets you to a working, looping, tool-using agent in about 30 minutes.