Building Your First AI Agent with LangGraph: A Step-by-Step Guide

Everyone wants to build AI agents. Nobody wants to read another blog post explaining what an agent is. So I won't. You're here to build one. Let's go.

We're using LangGraph because it gives you explicit control over the agent loop. No magic. No hidden orchestration. You define the graph, the nodes, the edges, and the state. If something breaks, you can actually debug it because you built it.

If you want the conceptual grounding before touching code, what AI agents actually are covers the fundamentals. But you can read that after. Build first.

What We're Building

A research agent that takes a topic, searches the web, summarizes findings, and decides whether it needs more information. The key word is "decides." That's the agent part. It loops until it's satisfied.

This isn't a toy. It has a real loop, real state, real tools, and conditional routing based on what it finds. Run it, and it will call the web, spend real API credits, and produce something useful. That's how you know it's the right kind of tutorial.

Why LangGraph and Not Something Else

LangGraph is a graph-based execution engine built on top of LangChain. You define nodes (functions that transform state) and edges (conditions that route between them). The framework handles the loop.

The alternative is rolling your own while loop, which works until it doesn't. LangGraph gives you:

Checkpointing. Serialize state at any point. Resume after failures. Implement human-in-the-loop without rebuilding the world.

Streaming. Emit intermediate state to a UI without hacking callbacks. Your users see something happening instead of staring at a spinner for 30 seconds.

Subgraphs. Compose agent behaviors by nesting graphs inside graphs. A supervisor agent that spawns specialist subgraphs is 30 lines, not 300.

Visualization. Render your agent as a Mermaid diagram. Useful for debugging and for explaining what the hell your code does to non-engineers.

Other frameworks abstract more. The more they abstract, the less you control. LangGraph sits at the right level for anything production-grade.

Setup

pip install langgraph langchain-anthropic langchain-community tavily-python

You'll need API keys for Anthropic and Tavily. Set them as environment variables.

export ANTHROPIC_API_KEY="your-key-here"
export TAVILY_API_KEY="your-key-here"

Tavily is a search API built for LLM applications. It returns structured results rather than raw HTML, which means you don't have to parse and clean before feeding results to the model. For production you'd wrap multiple search providers, add rate limit handling, and cache aggressively. For now, Tavily is fine.

Define the State

LangGraph agents are stateful. Every node in the graph reads and writes to a shared state object. Think of it as the agent's working memory.

from typing import TypedDict, Annotated
from langgraph.graph.message import add_messages
from langchain_core.messages import BaseMessage

class AgentState(TypedDict):
    messages: Annotated[list[BaseMessage], add_messages]
    research_topic: str
    findings: list[str]
    is_sufficient: bool
    iteration: int

The add_messages annotation is important. It tells LangGraph to append new messages instead of replacing the list. Every other field overwrites on update.

This state design is intentionally minimal. In a real agent you'd add fields for errors, confidence scores, source URLs, and domain-specific context. The TypedDict constraint means LangGraph validates your state shape at runtime, which saves you from subtle bugs where a node returns the wrong key name and nothing changes but you have no idea why.

Define the Tools

Our agent needs one tool: web search. Tavily handles this cleanly.

from langchain_community.tools.tavily_search import TavilySearchResults

search_tool = TavilySearchResults(max_results=3)
tools = [search_tool]

Three results per search keeps costs manageable during development. Bump to 5-10 in production and add deduplication. Tavily occasionally returns the same URL twice with slightly different snippets.

The tool binding step later (llm.bind_tools(tools)) converts your Python tool definitions into a schema the LLM understands. The model decides when to call them and what arguments to pass. It does not execute the tool. Your code does. This distinction matters when debugging. If the agent isn't searching, the problem is either the prompt (model deciding not to call the tool) or the routing (tool call not reaching the executor), not Tavily itself.

Build the Nodes

Each node is a function that takes the state, does something, and returns updated state. Three nodes: research, analyze, and decide.

from langchain_anthropic import ChatAnthropic

llm = ChatAnthropic(model="claude-sonnet-4-20250514")
llm_with_tools = llm.bind_tools(tools)

def research_node(state: AgentState) -> dict:
    """Search for information on the topic."""
    messages = state["messages"]
    response = llm_with_tools.invoke(messages)
    return {"messages": [response]}

def analyze_node(state: AgentState) -> dict:
    """Analyze search results and extract findings."""
    messages = state["messages"]
    prompt = f"""Based on the conversation so far, extract the key findings
    about '{state["research_topic"]}'. Be specific and factual.
    List each finding on a new line."""

    response = llm.invoke(messages + [("human", prompt)])
    findings = response.content.strip().split("\n")

    return {
        "findings": findings,
        "messages": [response],
        "iteration": state.get("iteration", 0) + 1
    }

def decide_node(state: AgentState) -> dict:
    """Decide if we have enough information."""
    findings = state.get("findings", [])
    iteration = state.get("iteration", 0)

    if iteration >= 3:
        return {"is_sufficient": True}

    prompt = f"""You have {len(findings)} findings after {iteration} searches.
    Findings so far: {findings}

    Is this sufficient for a comprehensive summary? Answer only YES or NO."""

    response = llm.invoke([("human", prompt)])
    is_sufficient = "YES" in response.content.upper()

    return {"is_sufficient": is_sufficient, "messages": [response]}

A few things worth noticing here.

research_node uses llm_with_tools. analyze_node uses the plain llm. That's intentional. Research needs tool-calling capability. Analysis just needs reasoning. Using the tool-capable model for everything is wasteful, and using the plain model for research means your agent describes what it would search for rather than actually searching.

The iteration cap in decide_node is a guard against infinite loops. Without it, an overly curious agent can burn through your API budget chasing marginally better results. In production you'd also add a token budget check. Agents that run for a long time accumulate enormous context windows, and at some point you're paying for tokens that add noise rather than signal.

The analyze_node splits findings by newline, which is fragile. The model might use numbered lists, bullets, or paragraph breaks. In production, use structured output with Pydantic and parse a validated schema instead.

Handle Tool Calls

When the LLM decides to use a tool, we need a node that actually executes it. LangGraph has a built-in for this.

from langgraph.prebuilt import ToolNode

tool_node = ToolNode(tools)

ToolNode handles the full execution lifecycle: parsing the tool call from the LLM response, running the actual function, and formatting the result back into a message the LLM can read. One line that would be fifty if you wrote it yourself.

When you have multiple tools, ToolNode dispatches to the right one based on the tool name in the LLM's response. Add more tools to the tools list and everything else stays the same.

Wire the Graph

This is where LangGraph shines. You define the flow explicitly. No guessing what happens next.

from langgraph.graph import StateGraph, END

def should_use_tools(state: AgentState) -> str:
    """Route based on whether the LLM wants to call a tool."""
    last_message = state["messages"][-1]
    if hasattr(last_message, "tool_calls") and last_message.tool_calls:
        return "tools"
    return "analyze"

def should_continue(state: AgentState) -> str:
    """Route based on whether research is sufficient."""
    if state.get("is_sufficient", False):
        return "done"
    return "research"

# Build the graph
graph = StateGraph(AgentState)

# Add nodes
graph.add_node("research", research_node)
graph.add_node("tools", tool_node)
graph.add_node("analyze", analyze_node)
graph.add_node("decide", decide_node)

# Add edges
graph.set_entry_point("research")
graph.add_conditional_edges("research", should_use_tools, {
    "tools": "tools",
    "analyze": "analyze"
})
graph.add_edge("tools", "research")
graph.add_edge("analyze", "decide")
graph.add_conditional_edges("decide", should_continue, {
    "done": END,
    "research": "research"
})

# Compile
agent = graph.compile()

Notice the edge from tools back to research. That's the inner loop. The LLM requests a search, tool_node executes it, the result feeds back to the LLM for interpretation. This can repeat multiple times within a single outer iteration. One research cycle might involve two or three searches before the model has what it needs.

The compiled graph is a Python object you can invoke, stream, or serve behind an API. Compilation validates the graph structure. You'll see errors here if your conditional routing has missing branches or unreachable nodes.

Run It

from langchain_core.messages import HumanMessage

result = agent.invoke({
    "messages": [HumanMessage(content="Research the current state of AI agents in enterprise software")],
    "research_topic": "AI agents in enterprise software",
    "findings": [],
    "is_sufficient": False,
    "iteration": 0
})

# Print the findings
for i, finding in enumerate(result["findings"], 1):
    print(f"{i}. {finding}")

Run this and watch the output. You'll see tool call requests, search results, and analysis happening in sequence. It takes 15-45 seconds depending on model latency and how many iterations the agent decides it needs.

Visualize the Graph

LangGraph can render your agent as an actual graph. Incredibly useful for debugging.

from IPython.display import Image, display

display(Image(agent.get_graph().draw_mermaid_png()))

Outside Jupyter, write the PNG to disk:

with open("agent_graph.png", "wb") as f:
    f.write(agent.get_graph().draw_mermaid_png())

When you're debugging a routing issue, this is faster than reading the code. You can see at a glance if an edge is missing or a conditional branch has no exit path.

Adding Persistence

Real agents need memory between runs. LangGraph supports checkpointing out of the box.

from langgraph.checkpoint.memory import MemorySaver

memory = MemorySaver()
agent = graph.compile(checkpointer=memory)

# Now invoke with a thread_id
config = {"configurable": {"thread_id": "research-session-1"}}
result = agent.invoke({
    "messages": [HumanMessage(content="Research quantum computing trends")],
    "research_topic": "quantum computing trends",
    "findings": [],
    "is_sufficient": False,
    "iteration": 0
}, config=config)

Resume the same session later by using the same thread_id. The agent picks up right where it left off.

MemorySaver stores state in-process, which is gone when your process restarts. For production use SqliteSaver (single-server) or PostgresSaver (multi-server, concurrent agents). The API is identical. Swap the checkpointer, keep everything else.

Thread IDs are how you implement multi-user agents. Each user gets their own thread. Their history, findings, and state are completely isolated. The same compiled graph serves all of them.

What Can Go Wrong (And Will)

The tutorial works. Here's what breaks it in the real world.

Context window overflow. A looping agent accumulates messages. After five or six iterations on a complex topic, you might have 30,000+ tokens in context. You need to trim the message history or summarize older turns. LangGraph gives you hooks to do this inside any node.

Tool call failures. Tavily can return errors. APIs rate-limit. Networks drop. Without error handling, a single failed tool call crashes the agent. Wrap tool nodes in try/except and add retry logic for transient failures.

Infinite loops from bad routing. If should_continue always returns "research" because is_sufficient never gets set, your agent runs until you kill it or run out of money. Always have an iteration cap and a time budget. Test exit conditions explicitly.

LLM output variability. The model might return "yes" instead of "YES," breaking your string check. Use structured output with Pydantic for any node where you need a machine-readable response.

from pydantic import BaseModel

class SufficiencyCheck(BaseModel):
    is_sufficient: bool
    reasoning: str

structured_llm = llm.with_structured_output(SufficiencyCheck)
result = structured_llm.invoke([("human", prompt)])
# result.is_sufficient is always a bool, never "yes" or "Yeah probably"

This pattern eliminates an entire class of bugs. Use it for any decision node that drives routing logic.

What You've Actually Built

A system that perceives (search results), reasons (LLM analysis), acts (web search), and loops (the graph cycle). That's a real agent. Not a chatbot with a system prompt.

The graph structure means you can see exactly what's happening at every step. When it breaks (it will), you'll know which node failed and why. That's the entire point of LangGraph over frameworks that hide the loop from you.

Where to Go Next

Add more tools. A calculator, a code executor, a database query tool. The graph pattern stays the same. Add a node, wire an edge, handle the routing. Every new capability is another node.

Or add human-in-the-loop. LangGraph supports interrupt points where the agent pauses and waits for human approval before continuing. Critical for anything touching production data.

Or scale to multiple agents. A supervisor that spawns specialist subgraphs, collects results, and synthesizes an answer. The same graph primitives, nested. Once you understand the pattern, multi-agent systems stop being intimidating and start being obvious.

The bones are here. Build on them.

What We're Building

Why LangGraph and Not Something Else

The alternative is rolling your own while loop, which works until it doesn't. LangGraph gives you:

Checkpointing. Serialize state at any point. Resume after failures. Implement human-in-the-loop without rebuilding the world.

Streaming. Emit intermediate state to a UI without hacking callbacks. Your users see something happening instead of staring at a spinner for 30 seconds.

Subgraphs. Compose agent behaviors by nesting graphs inside graphs. A supervisor agent that spawns specialist subgraphs is 30 lines, not 300.

Visualization. Render your agent as a Mermaid diagram. Useful for debugging and for explaining what the hell your code does to non-engineers.

Other frameworks abstract more. The more they abstract, the less you control. LangGraph sits at the right level for anything production-grade.

Setup

pip install langgraph langchain-anthropic langchain-community tavily-python

You'll need API keys for Anthropic and Tavily. Set them as environment variables.

export ANTHROPIC_API_KEY="your-key-here"
export TAVILY_API_KEY="your-key-here"

Define the State

LangGraph agents are stateful. Every node in the graph reads and writes to a shared state object. Think of it as the agent's working memory.

from typing import TypedDict, Annotated
from langgraph.graph.message import add_messages
from langchain_core.messages import BaseMessage

class AgentState(TypedDict):
    messages: Annotated[list[BaseMessage], add_messages]
    research_topic: str
    findings: list[str]
    is_sufficient: bool
    iteration: int

The add_messages annotation is important. It tells LangGraph to append new messages instead of replacing the list. Every other field overwrites on update.

Define the Tools

Our agent needs one tool: web search. Tavily handles this cleanly.

from langchain_community.tools.tavily_search import TavilySearchResults

search_tool = TavilySearchResults(max_results=3)
tools = [search_tool]

Three results per search keeps costs manageable during development. Bump to 5-10 in production and add deduplication. Tavily occasionally returns the same URL twice with slightly different snippets.

Build the Nodes

Each node is a function that takes the state, does something, and returns updated state. Three nodes: research, analyze, and decide.

from langchain_anthropic import ChatAnthropic

llm = ChatAnthropic(model="claude-sonnet-4-20250514")
llm_with_tools = llm.bind_tools(tools)

def research_node(state: AgentState) -> dict:
    """Search for information on the topic."""
    messages = state["messages"]
    response = llm_with_tools.invoke(messages)
    return {"messages": [response]}

def analyze_node(state: AgentState) -> dict:
    """Analyze search results and extract findings."""
    messages = state["messages"]
    prompt = f"""Based on the conversation so far, extract the key findings
    about '{state["research_topic"]}'. Be specific and factual.
    List each finding on a new line."""

    response = llm.invoke(messages + [("human", prompt)])
    findings = response.content.strip().split("\n")

    return {
        "findings": findings,
        "messages": [response],
        "iteration": state.get("iteration", 0) + 1
    }

def decide_node(state: AgentState) -> dict:
    """Decide if we have enough information."""
    findings = state.get("findings", [])
    iteration = state.get("iteration", 0)

    if iteration >= 3:
        return {"is_sufficient": True}

    prompt = f"""You have {len(findings)} findings after {iteration} searches.
    Findings so far: {findings}

    Is this sufficient for a comprehensive summary? Answer only YES or NO."""

    response = llm.invoke([("human", prompt)])
    is_sufficient = "YES" in response.content.upper()

    return {"is_sufficient": is_sufficient, "messages": [response]}

A few things worth noticing here.

Handle Tool Calls

When the LLM decides to use a tool, we need a node that actually executes it. LangGraph has a built-in for this.

from langgraph.prebuilt import ToolNode

tool_node = ToolNode(tools)

When you have multiple tools, ToolNode dispatches to the right one based on the tool name in the LLM's response. Add more tools to the tools list and everything else stays the same.

Wire the Graph

This is where LangGraph shines. You define the flow explicitly. No guessing what happens next.

from langgraph.graph import StateGraph, END

def should_use_tools(state: AgentState) -> str:
    """Route based on whether the LLM wants to call a tool."""
    last_message = state["messages"][-1]
    if hasattr(last_message, "tool_calls") and last_message.tool_calls:
        return "tools"
    return "analyze"

def should_continue(state: AgentState) -> str:
    """Route based on whether research is sufficient."""
    if state.get("is_sufficient", False):
        return "done"
    return "research"

# Build the graph
graph = StateGraph(AgentState)

# Add nodes
graph.add_node("research", research_node)
graph.add_node("tools", tool_node)
graph.add_node("analyze", analyze_node)
graph.add_node("decide", decide_node)

# Add edges
graph.set_entry_point("research")
graph.add_conditional_edges("research", should_use_tools, {
    "tools": "tools",
    "analyze": "analyze"
})
graph.add_edge("tools", "research")
graph.add_edge("analyze", "decide")
graph.add_conditional_edges("decide", should_continue, {
    "done": END,
    "research": "research"
})

# Compile
agent = graph.compile()

Run It

from langchain_core.messages import HumanMessage

result = agent.invoke({
    "messages": [HumanMessage(content="Research the current state of AI agents in enterprise software")],
    "research_topic": "AI agents in enterprise software",
    "findings": [],
    "is_sufficient": False,
    "iteration": 0
})

# Print the findings
for i, finding in enumerate(result["findings"], 1):
    print(f"{i}. {finding}")

Visualize the Graph

LangGraph can render your agent as an actual graph. Incredibly useful for debugging.

from IPython.display import Image, display

display(Image(agent.get_graph().draw_mermaid_png()))

Outside Jupyter, write the PNG to disk:

with open("agent_graph.png", "wb") as f:
    f.write(agent.get_graph().draw_mermaid_png())

When you're debugging a routing issue, this is faster than reading the code. You can see at a glance if an edge is missing or a conditional branch has no exit path.

Adding Persistence

Real agents need memory between runs. LangGraph supports checkpointing out of the box.

from langgraph.checkpoint.memory import MemorySaver

memory = MemorySaver()
agent = graph.compile(checkpointer=memory)

# Now invoke with a thread_id
config = {"configurable": {"thread_id": "research-session-1"}}
result = agent.invoke({
    "messages": [HumanMessage(content="Research quantum computing trends")],
    "research_topic": "quantum computing trends",
    "findings": [],
    "is_sufficient": False,
    "iteration": 0
}, config=config)

Resume the same session later by using the same thread_id. The agent picks up right where it left off.

Thread IDs are how you implement multi-user agents. Each user gets their own thread. Their history, findings, and state are completely isolated. The same compiled graph serves all of them.

What Can Go Wrong (And Will)

The tutorial works. Here's what breaks it in the real world.

LLM output variability. The model might return "yes" instead of "YES," breaking your string check. Use structured output with Pydantic for any node where you need a machine-readable response.

from pydantic import BaseModel

class SufficiencyCheck(BaseModel):
    is_sufficient: bool
    reasoning: str

structured_llm = llm.with_structured_output(SufficiencyCheck)
result = structured_llm.invoke([("human", prompt)])
# result.is_sufficient is always a bool, never "yes" or "Yeah probably"

This pattern eliminates an entire class of bugs. Use it for any decision node that drives routing logic.

What You've Actually Built

A system that perceives (search results), reasons (LLM analysis), acts (web search), and loops (the graph cycle). That's a real agent. Not a chatbot with a system prompt.

Where to Go Next

Add more tools. A calculator, a code executor, a database query tool. The graph pattern stays the same. Add a node, wire an edge, handle the routing. Every new capability is another node.

Or add human-in-the-loop. LangGraph supports interrupt points where the agent pauses and waits for human approval before continuing. Critical for anything touching production data.

The bones are here. Build on them.

Building Your First AI Agent with LangGraph: A Step-by-Step Guide

What We're Building

Why LangGraph and Not Something Else

Setup

Define the State

Define the Tools

Build the Nodes

Handle Tool Calls

Wire the Graph

Run It

Visualize the Graph

Adding Persistence

What Can Go Wrong (And Will)

What You've Actually Built

Where to Go Next

Further Reading

Building Your First AI Agent with LangGraph: A Step-by-Step Guide

What We're Building

Why LangGraph and Not Something Else

Setup

Define the State

Define the Tools

Build the Nodes

Handle Tool Calls

Wire the Graph

Run It

Visualize the Graph

Adding Persistence

What Can Go Wrong (And Will)

What You've Actually Built

Where to Go Next

Further Reading