Building Your First AI Agent with LangGraph: A Step-by-Step Guide
By Diesel
tutoriallanggraphpythonbeginner
Everyone wants to build AI agents. Nobody wants to read another blog post explaining what an agent is. So I won't. You're here to build one. Let's go.
We're using LangGraph because it gives you explicit control over the agent loop. No magic. No hidden orchestration. You define the graph, the nodes, the edges, and the state. If something breaks, you can actually debug it because you built it.
If you want the conceptual grounding before touching code, [what AI agents actually are](/blog/what-ai-agents-actually-are) covers the fundamentals. But you can read that after. Build first.
## What We're Building
A research agent that takes a topic, searches the web, summarizes findings, and decides whether it needs more information. The key word is "decides." That's the agent part. It loops until it's satisfied.
This isn't a toy. It has a real loop, real state, real tools, and conditional routing based on what it finds. Run it, and it will call the web, spend real API credits, and produce something useful. That's how you know it's the right kind of tutorial.
## Why LangGraph and Not Something Else
LangGraph is a graph-based execution engine built on top of LangChain. You define nodes (functions that transform state) and edges (conditions that route between them). The framework handles the loop.
The alternative is rolling your own while loop, which works until it doesn't. LangGraph gives you:
**Checkpointing.** Serialize state at any point. Resume after failures. Implement human-in-the-loop without rebuilding the world.
**Streaming.** Emit intermediate state to a UI without hacking callbacks. Your users see something happening instead of staring at a spinner for 30 seconds.
**Subgraphs.** Compose agent behaviors by nesting graphs inside graphs. A supervisor agent that spawns specialist subgraphs is 30 lines, not 300.
**Visualization.** Render your agent as a Mermaid diagram. Useful for debugging and for explaining what the hell your code does to non-engineers.
Other frameworks abstract more. The more they abstract, the less you control. LangGraph sits at the right level for anything production-grade.
## Setup
```bash
pip install langgraph langchain-anthropic langchain-community tavily-python
```
You'll need API keys for Anthropic and Tavily. Set them as environment variables.
```bash
export ANTHROPIC_API_KEY="your-key-here"
export TAVILY_API_KEY="your-key-here"
```
Tavily is a search API built for LLM applications. It returns structured results rather than raw HTML, which means you don't have to parse and clean before feeding results to the model. For production you'd wrap multiple search providers, add rate limit handling, and cache aggressively. For now, Tavily is fine.
## Define the State
LangGraph agents are stateful. Every node in the graph reads and writes to a shared state object. Think of it as the agent's working memory.
```python
from typing import TypedDict, Annotated
from langgraph.graph.message import add_messages
from langchain_core.messages import BaseMessage
class AgentState(TypedDict):
messages: Annotated[list[BaseMessage], add_messages]
research_topic: str
findings: list[str]
is_sufficient: bool
iteration: int
```
The `add_messages` annotation is important. It tells LangGraph to append new messages instead of replacing the list. Every other field overwrites on update.
This state design is intentionally minimal. In a real agent you'd add fields for errors, confidence scores, source URLs, and domain-specific context. The TypedDict constraint means LangGraph validates your state shape at runtime, which saves you from subtle bugs where a node returns the wrong key name and nothing changes but you have no idea why.
## Define the Tools
Our agent needs one tool: web search. Tavily handles this cleanly.
```python
from langchain_community.tools.tavily_search import TavilySearchResults
search_tool = TavilySearchResults(max_results=3)
tools = [search_tool]
```
Three results per search keeps costs manageable during development. Bump to 5-10 in production and add deduplication. Tavily occasionally returns the same URL twice with slightly different snippets.
The tool binding step later (`llm.bind_tools(tools)`) converts your Python tool definitions into a schema the LLM understands. The model decides when to call them and what arguments to pass. It does not execute the tool. Your code does. This distinction matters when debugging. If the agent isn't searching, the problem is either the prompt (model deciding not to call the tool) or the routing (tool call not reaching the executor), not Tavily itself.
## Build the Nodes
Each node is a function that takes the state, does something, and returns updated state. Three nodes: research, analyze, and decide.
```python
from langchain_anthropic import ChatAnthropic
llm = ChatAnthropic(model="claude-sonnet-4-20250514")
llm_with_tools = llm.bind_tools(tools)
def research_node(state: AgentState) -> dict:
"""Search for information on the topic."""
messages = state["messages"]
response = llm_with_tools.invoke(messages)
return {"messages": [response]}
def analyze_node(state: AgentState) -> dict:
"""Analyze search results and extract findings."""
messages = state["messages"]
prompt = f"""Based on the conversation so far, extract the key findings
about '{state["research_topic"]}'. Be specific and factual.
List each finding on a new line."""
response = llm.invoke(messages + [("human", prompt)])
findings = response.content.strip().split("\n")
return {
"findings": findings,
"messages": [response],
"iteration": state.get("iteration", 0) + 1
}
def decide_node(state: AgentState) -> dict:
"""Decide if we have enough information."""
findings = state.get("findings", [])
iteration = state.get("iteration", 0)
if iteration >= 3:
return {"is_sufficient": True}
prompt = f"""You have {len(findings)} findings after {iteration} searches.
Findings so far: {findings}
Is this sufficient for a comprehensive summary? Answer only YES or NO."""
response = llm.invoke([("human", prompt)])
is_sufficient = "YES" in response.content.upper()
return {"is_sufficient": is_sufficient, "messages": [response]}
```
A few things worth noticing here.
`research_node` uses `llm_with_tools`. `analyze_node` uses the plain `llm`. That's intentional. Research needs tool-calling capability. Analysis just needs reasoning. Using the tool-capable model for everything is wasteful, and using the plain model for research means your agent describes what it would search for rather than actually searching.
The iteration cap in `decide_node` is a guard against infinite loops. Without it, an overly curious agent can burn through your API budget chasing marginally better results. In production you'd also add a token budget check. Agents that run for a long time accumulate enormous context windows, and at some point you're paying for tokens that add noise rather than signal.
The `analyze_node` splits findings by newline, which is fragile. The model might use numbered lists, bullets, or paragraph breaks. In production, use structured output with Pydantic and parse a validated schema instead.
## Handle Tool Calls
When the LLM decides to use a tool, we need a node that actually executes it. LangGraph has a built-in for this.
```python
from langgraph.prebuilt import ToolNode
tool_node = ToolNode(tools)
```
`ToolNode` handles the full execution lifecycle: parsing the tool call from the LLM response, running the actual function, and formatting the result back into a message the LLM can read. One line that would be fifty if you wrote it yourself.
When you have multiple tools, `ToolNode` dispatches to the right one based on the tool name in the LLM's response. Add more tools to the `tools` list and everything else stays the same.
## Wire the Graph
This is where LangGraph shines. You define the flow explicitly. No guessing what happens next.
```python
from langgraph.graph import StateGraph, END
def should_use_tools(state: AgentState) -> str:
"""Route based on whether the LLM wants to call a tool."""
last_message = state["messages"][-1]
if hasattr(last_message, "tool_calls") and last_message.tool_calls:
return "tools"
return "analyze"
def should_continue(state: AgentState) -> str:
"""Route based on whether research is sufficient."""
if state.get("is_sufficient", False):
return "done"
return "research"
# Build the graph
graph = StateGraph(AgentState)
# Add nodes
graph.add_node("research", research_node)
graph.add_node("tools", tool_node)
graph.add_node("analyze", analyze_node)
graph.add_node("decide", decide_node)
# Add edges
graph.set_entry_point("research")
graph.add_conditional_edges("research", should_use_tools, {
"tools": "tools",
"analyze": "analyze"
})
graph.add_edge("tools", "research")
graph.add_edge("analyze", "decide")
graph.add_conditional_edges("decide", should_continue, {
"done": END,
"research": "research"
})
# Compile
agent = graph.compile()
```
Notice the edge from `tools` back to `research`. That's the inner loop. The LLM requests a search, `tool_node` executes it, the result feeds back to the LLM for interpretation. This can repeat multiple times within a single outer iteration. One research cycle might involve two or three searches before the model has what it needs.
The compiled graph is a Python object you can invoke, stream, or serve behind an API. Compilation validates the graph structure. You'll see errors here if your conditional routing has missing branches or unreachable nodes.
## Run It
```python
from langchain_core.messages import HumanMessage
result = agent.invoke({
"messages": [HumanMessage(content="Research the current state of AI agents in enterprise software")],
"research_topic": "AI agents in enterprise software",
"findings": [],
"is_sufficient": False,
"iteration": 0
})
# Print the findings
for i, finding in enumerate(result["findings"], 1):
print(f"{i}. {finding}")
```
Run this and watch the output. You'll see tool call requests, search results, and analysis happening in sequence. It takes 15-45 seconds depending on model latency and how many iterations the agent decides it needs.
## Visualize the Graph
LangGraph can render your agent as an actual graph. Incredibly useful for debugging.
```python
from IPython.display import Image, display
display(Image(agent.get_graph().draw_mermaid_png()))
```
Outside Jupyter, write the PNG to disk:
```python
with open("agent_graph.png", "wb") as f:
f.write(agent.get_graph().draw_mermaid_png())
```
When you're debugging a routing issue, this is faster than reading the code. You can see at a glance if an edge is missing or a conditional branch has no exit path.
## Adding Persistence
Real agents need memory between runs. LangGraph supports checkpointing out of the box.
```python
from langgraph.checkpoint.memory import MemorySaver
memory = MemorySaver()
agent = graph.compile(checkpointer=memory)
# Now invoke with a thread_id
config = {"configurable": {"thread_id": "research-session-1"}}
result = agent.invoke({
"messages": [HumanMessage(content="Research quantum computing trends")],
"research_topic": "quantum computing trends",
"findings": [],
"is_sufficient": False,
"iteration": 0
}, config=config)
```
Resume the same session later by using the same `thread_id`. The agent picks up right where it left off.
`MemorySaver` stores state in-process, which is gone when your process restarts. For production use `SqliteSaver` (single-server) or `PostgresSaver` (multi-server, concurrent agents). The API is identical. Swap the checkpointer, keep everything else.
Thread IDs are how you implement multi-user agents. Each user gets their own thread. Their history, findings, and state are completely isolated. The same compiled graph serves all of them.
## What Can Go Wrong (And Will)
The tutorial works. Here's what breaks it in the real world.
**Context window overflow.** A looping agent accumulates messages. After five or six iterations on a complex topic, you might have 30,000+ tokens in context. You need to trim the message history or summarize older turns. LangGraph gives you hooks to do this inside any node.
**Tool call failures.** Tavily can return errors. APIs rate-limit. Networks drop. Without error handling, a single failed tool call crashes the agent. Wrap tool nodes in try/except and add retry logic for transient failures.
**Infinite loops from bad routing.** If `should_continue` always returns "research" because `is_sufficient` never gets set, your agent runs until you kill it or run out of money. Always have an iteration cap and a time budget. Test exit conditions explicitly.
**LLM output variability.** The model might return "yes" instead of "YES," breaking your string check. Use structured output with Pydantic for any node where you need a machine-readable response.
```python
from pydantic import BaseModel
class SufficiencyCheck(BaseModel):
is_sufficient: bool
reasoning: str
structured_llm = llm.with_structured_output(SufficiencyCheck)
result = structured_llm.invoke([("human", prompt)])
# result.is_sufficient is always a bool, never "yes" or "Yeah probably"
```
This pattern eliminates an entire class of bugs. Use it for any decision node that drives routing logic.
## What You've Actually Built
A system that perceives (search results), reasons (LLM analysis), acts (web search), and loops (the graph cycle). That's a real agent. Not a chatbot with a system prompt.
The graph structure means you can see exactly what's happening at every step. When it breaks (it will), you'll know which node failed and why. That's the entire point of LangGraph over frameworks that hide the loop from you.
## Where to Go Next
Add more tools. A calculator, a code executor, a database query tool. The graph pattern stays the same. Add a node, wire an edge, handle the routing. Every new capability is another node.
Or add human-in-the-loop. LangGraph supports interrupt points where the agent pauses and waits for human approval before continuing. Critical for anything touching production data.
Or scale to multiple agents. A supervisor that spawns specialist subgraphs, collects results, and synthesizes an answer. The same graph primitives, nested. Once you understand the pattern, multi-agent systems stop being intimidating and start being obvious.
The bones are here. Build on them.
## Further Reading
- [Building a Code Generation Agent with Claude and Tool Use](/blog/code-generation-agent-claude)
- [Building an MCP Server: Giving Your Agent Custom Tools](/blog/building-mcp-server-custom-tools)