Claude Agent SDK: Building Production Agents with Anthropic

I've been building with Claude's Agent SDK since it dropped, and it's become my default for single-agent production systems. Not because it does the most things. Because it does the right things and gets out of the way for everything else.

Anthropic took a different approach from the LangChain ecosystem. Instead of building a framework that abstracts everything, they built a thin SDK that gives you tool use, streaming, and structured outputs. Then they said "the rest is your problem." Which, honestly, is the correct call for production systems.

What the SDK Actually Gives You

The core loop is simple. You send messages to Claude. Claude can respond with text or tool calls. You execute the tools and send the results back. Claude continues until it's done. That's the agent loop, and the SDK handles the mechanical parts cleanly.

import anthropic

client = anthropic.Anthropic()

tools = [
    {
        "name": "search_database",
        "description": "Search the product database",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {"type": "string"},
                "limit": {"type": "integer", "default": 10}
            },
            "required": ["query"]
        }
    }
]

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    tools=tools,
    messages=[{"role": "user", "content": "Find laptops under $1000"}]
)

When Claude decides to use a tool, you get a tool_use content block with the tool name and arguments. You run your function, send back a tool_result, and let Claude continue. The SDK handles serialization, validation against your schema, and streaming of partial responses.

The Tool Use Pattern That Actually Works

Here's what production tool use looks like. Not the demo version with one tool. The real version where your agent has ten tools and needs to use them in sequence, handle errors, and know when to stop.

def run_agent(user_message: str) -> str:
    messages = [{"role": "user", "content": user_message}]

    while True:
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=4096,
            tools=tools,
            messages=messages,
        ) This connects directly to [code generation with Claude](/blog/code-generation-agent-claude).

        # Collect all tool uses from this turn
        tool_results = []
        for block in response.content:
            if block.type == "tool_use":
                result = execute_tool(block.name, block.input)
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": str(result)
                })

        # If no tool calls, we're done
        if not tool_results:
            return extract_text(response)

        # Add assistant response and tool results, continue
        messages.append({"role": "assistant", "content": response.content})
        messages.append({"role": "user", "content": tool_results})

That while loop is your agent. It runs until Claude stops calling tools. Clean, debuggable, and you control every part of it. No framework magic hiding the flow from you.

Extended Thinking Changes Everything

Claude's extended thinking feature is where things get genuinely interesting for agent work. Before calling tools or generating a response, Claude can think through the problem step by step in a dedicated thinking block. You can see the reasoning. You can log it. You can use it for debugging and auditing.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000
    },
    messages=messages,
)

For production agents, this is gold. When your agent makes a bad tool call, you don't have to guess why. You read the thinking block. It says "I'm going to search for X because the user mentioned Y and the previous result showed Z." The reasoning is right there.

I've built evaluation pipelines where we capture thinking blocks alongside tool calls and final outputs. When something goes wrong in production, the thinking trace tells you exactly where the reasoning went sideways. No prompt archaeology required.

Structured Outputs for Reliable Pipelines

The SDK supports constrained JSON output. You give Claude a JSON schema, and the response is guaranteed to match it. Not "usually matches." Guaranteed. The decoding is constrained at the token level.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    messages=[{"role": "user", "content": "Classify this ticket..."}],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "ticket_classification",
            "schema": {
                "type": "object",
                "properties": {
                    "category": {"enum": ["bug", "feature", "question"]},
                    "priority": {"enum": ["low", "medium", "high", "critical"]},
                    "summary": {"type": "string"}
                },
                "required": ["category", "priority", "summary"]
            }
        }
    }
)

For agent pipelines where one step feeds into the next, structured outputs eliminate an entire category of bugs. No more "Claude returned a slightly different JSON shape and the parser choked." The shape is locked.

What's Missing (Honestly)

The SDK is deliberately minimal, and that cuts both ways.

No built-in memory. You manage conversation history yourself. For simple agents, that's fine. For long-running agents that need to handle context window limits, summarize old turns, or persist state across sessions, you're writing that infrastructure yourself or pulling in another library.

No multi-agent coordination. If you need Agent A to hand off to Agent B based on some condition, you build that routing yourself. LangGraph or CrewAI give you this out of the box. With the Claude SDK, it's your code.

No built-in tool execution sandbox. When Claude says "run this code," you need your own sandboxing. The SDK doesn't care what you do with tool calls. That's freedom, but it's also responsibility. The related post on deployment patterns goes further on this point.

No native checkpointing. If your agent crashes mid-workflow, you restart from scratch unless you've built your own checkpointing layer.

These aren't criticisms. They're trade-offs. Anthropic chose to keep the SDK thin and let you compose it with whatever infrastructure you already have. For teams that have opinions about their database, their queue, their serialization format, that's the right call.

Where It Shines in Production

The places where the Claude Agent SDK genuinely excels:

Streaming. The SSE streaming implementation is solid. You get thinking tokens, text tokens, and tool use blocks as they're generated. For user-facing agents, this means responsive UIs that show the agent working in real time.

Batching. The batch API lets you submit thousands of requests and get results asynchronously at half the cost. For offline processing, evaluation runs, and data pipelines, this is a significant cost advantage.

Token counting. The SDK gives you exact token counts for input and output. You can manage context windows precisely instead of guessing with tiktoken estimates. It is worth reading about Model Context Protocol integration alongside this.

Model flexibility. Same SDK, same tool definitions, same code. Switch between Haiku (fast and cheap), Sonnet (balanced), and Opus (maximum capability) by changing one string. Your agent architecture doesn't change when you change the model.

The Verdict

The Claude Agent SDK is not trying to be a framework. It's trying to be the best possible interface to Claude's capabilities. If you want an opinionated framework that makes architectural decisions for you, look at LangGraph or CrewAI. If you want a clean, well-typed SDK that lets you build exactly the agent you want without fighting against framework abstractions, this is it.

I use it for every single-agent system I build. For multi-agent orchestration, I reach for LangGraph or Mastra and use Claude as the underlying model. The SDK is the building block. What you build with it is up to you.

That's not a limitation. That's the point.