Claude Agent SDK: orchestration patterns that hold up in prod

Published on April 22, 2026 · 4 min read

Claude
Agents
MCP
TypeScript
Orchestration

For the past year, “agent” has become a catch-all word. On the ground, 90% of what gets called an agent is a while loop around an LLM call. Not necessarily a problem — except in prod, where every shortcut gets paid in cash. Here’s what I’ve taken away after several Claude Agent SDK integrations for B2B clients.

Why move to the SDK instead of a homemade loop

A while loop with tool_use holds up as long as the scope stays small: three tools, two turns, one timeout. Beyond that, you end up reinventing context management, compaction, token budgets, per-tool retries, structured logging — all by hand. The SDK gives you that out of the box, and more importantly it imposes a discipline: you think in sub-agents, tool permissions, and sessions, not in “I call the API again.”

The real win isn’t performance. It’s operational readability. When an agent crashes at 2 AM, I want to know which tool was called, with what input, in which session, and why the model decided to call it. Without that minimum, debugging turns into fortune-telling.

The three patterns that hold up

Single agent with a bounded tool budget

The most underrated pattern. One agent, 3 to 5 tools max, an explicit turn budget. For most business cases (extraction, classification with search, targeted assistance), it’s enough — and often better than something more sophisticated.

import { query } from "@anthropic-ai/claude-agent-sdk";

const result = await query({
  prompt: userRequest,
  options: {
    model: "claude-sonnet-4-6",
    maxTurns: 6,
    allowedTools: ["search_crm", "read_contract", "create_ticket"],
    systemPrompt: TICKET_AGENT_PROMPT,
  },
});

maxTurns isn’t overkill. Without it, an agent looping on a tool error can silently burn through 200k tokens. I’ve seen the bill.

Orchestrator + sub-agents in fan-out

As soon as a task splits into independent sub-tasks, the orchestrator pattern starts paying off. A main agent delegates to specialized sub-agents, each with its own context and tools. The benefit isn’t parallelization: it’s context isolation. A sub-agent that swallows 50 pages of docs doesn’t pollute the orchestrator’s reasoning.

My rule: if two sub-tasks don’t need to share intermediate context, they go in two sub-agents. Otherwise you pay for the context twice and you blur the model’s reasoning.

Human-in-the-loop at critical points

Anything that writes to a business system — sending email, creating an invoice, modifying a CRM — goes through explicit human confirmation. Not through “the user can cancel afterward.” The SDK exposes per-tool permission hooks, I use them every time:

const result = await query({
  prompt: userRequest,
  options: {
    allowedTools: ["draft_email", "send_email"],
    canUseTool: async (toolName, input) => {
      if (toolName === "send_email") {
        return await askUserConfirmation(input);
      }
      return { behavior: "allow", updatedInput: input };
    },
  },
});

An agent that fires off a wildly off-base email to a client is a chewing-out. An agent that drafts and waits for confirmation is a productivity gain. The difference is ten lines of code.

MCP: useful or hype

MCP (Model Context Protocol) is oversold on simple cases and underused on the one case where it shines: sharing the same tool inventory across multiple agents and multiple clients.

If your agent lives in a single Node service with three internal tools, write them in TypeScript directly. An MCP server for that is plumbing that slows down your prototyping. On the other hand, as soon as the same tools have to run in Claude Code, in your custom agent, and in some future support tool, MCP starts paying off. The threshold is clear: multiple consumers or nothing.

The non-negotiable guardrails

Three things I refuse to ship without:

Structured logging per session. Every tool call, input, output, latency, tokens. Without it, debugging an agent that goes off the rails is impossible.
Token budget per session. A hard cap on the application side, on top of maxTurns. Protects against prompt injection and stubborn tool loops.
Permission isolation. A “read-only” agent has no access to write tools. Even if it complicates the config. The SDK’s permission trees are built for this — use them.

You can stack on OpenTelemetry, per-user rate limiting, fallback to a smaller model for simple tasks. But those three are the minimum viable prod. No negotiation.

Where to start

If you’re hesitating:

Take a well-scoped internal workflow (ticket triage, lead qualification, contract extraction).
Write it first as a single agent with 3 tools and maxTurns: 6.
Measure on 50 to 100 real cases: success rate, average cost, p95 latency.
Only move to orchestrator + sub-agents after you’ve hit a clear ceiling with the simple pattern.

Most agent projects that break in prod don’t suffer from a lack of sophistication. They suffer from premature sophistication. A simple agent, well instrumented, with a human at the right spot, almost always beats a multi-agent architecture without guardrails.

If you have a concrete use case and you’re hesitating on the pattern, frame it on one page (input, expected output, available tools, business risk if the agent gets it wrong). Let’s talk.