Agent Deployment Patterns: From Dev to Production Without Losing Sleep
By Diesel
architecturedeploymentdevops
## The Deployment Problem
You've built your agent. It works on your laptop. It passes your tests. You're ready to ship.
Stop. Take a breath. Because deploying an AI agent to production is fundamentally different from deploying a traditional web application, and the differences will bite you if you don't plan for them.
A web app is deterministic. Same input, same output. You can test exhaustively. An agent is probabilistic. Same input, different output every time. You can't test exhaustively. You can only test the boundaries and hope the middle holds.
A web app costs compute. An agent costs compute plus LLM tokens plus tool execution. A bug in a web app returns a wrong answer. A bug in an agent returns a wrong answer AND charges you for the privilege.
Different beast. Different deployment strategy.
## Pattern 1: Shadow Deployment
Before your agent talks to real users, have it listen. Shadow deployment runs the new agent alongside the existing system without serving its results to users. For a deeper look, see [deploying with FastAPI and Docker](/blog/deploying-ai-agents-fastapi-docker).
```typescript
class ShadowDeployment {
async handle(request: Request): Promise {
// Production agent handles the request
const prodResponse = await this.prodAgent.run(request);
// Shadow agent processes the same request in background
this.shadowAgent.run(request).then(shadowResponse => {
this.compare(request, prodResponse, shadowResponse);
}).catch(error => {
this.logShadowFailure(request, error);
});
// Only production response goes to the user
return prodResponse;
}
private async compare(
request: Request,
prod: Response,
shadow: Response
) {
const comparison = {
request: request.id,
prodCost: prod.tokenUsage.totalCost,
shadowCost: shadow.tokenUsage.totalCost,
prodLatency: prod.durationMs,
shadowLatency: shadow.durationMs,
outputSimilarity: await computeSimilarity(prod.output, shadow.output),
qualityDelta: await evaluateQuality(prod.output, shadow.output, request),
};
await this.metrics.record(comparison);
}
}
```
Shadow deployment answers the question "would the new version be better?" without any risk to users. You get real traffic patterns, real edge cases, real cost data. The shadow agent's cost is your testing budget. It's usually worth it.
## Pattern 2: Canary Releases
Route a small percentage of traffic to the new agent. Monitor everything. Gradually increase if things look good.
```typescript
class CanaryRouter {
constructor(
private canaryPercent: number = 5,
private prodAgent: Agent,
private canaryAgent: Agent,
) {}
async handle(request: Request): Promise {
const isCanary = this.shouldRouteToCanary(request);
const agent = isCanary ? this.canaryAgent : this.prodAgent;
const response = await agent.run(request);
await this.metrics.record({
variant: isCanary ? "canary" : "production",
cost: response.tokenUsage.totalCost,
latency: response.durationMs,
success: response.success,
quality: await this.evaluateQuality(response),
});
return response;
}
private shouldRouteToCanary(request: Request): boolean {
// Consistent routing: same user always gets same variant
const hash = hashUserId(request.userId);
return (hash % 100) < this.canaryPercent;
}
}
```
The key is consistent routing. The same user should always get the same variant during the canary period. Otherwise you get confused users who see different behavior on consecutive requests.
Start at 5%. If error rate, cost, and latency are within bounds after 24 hours, go to 20%. Then 50%. Then 100%. At any point, if metrics degrade, roll back instantly.
## Pattern 3: Feature Flags for Agent Capabilities
Don't deploy the whole agent at once. Deploy capabilities incrementally behind feature flags.
```typescript
class FeatureFlaggedAgent {
async run(context: AgentContext): Promise {
const tools = this.getAvailableTools(context.userId);
const model = this.getModel(context.userId);
const maxSteps = this.getMaxSteps(context.userId);
return this.agent.run({
...context,
tools,
model,
maxSteps,
});
}
private getAvailableTools(userId: string): Tool[] {
const base = [searchTool, readFileTool];
if (featureFlags.isEnabled("agent-write-files", userId)) {
base.push(writeFileTool);
}
if (featureFlags.isEnabled("agent-execute-code", userId)) {
base.push(executeCodeTool);
}
return base;
}
}
```
New tool? Roll it out to 10% of users. New model? Test it on internal users first. Higher step limits? Enable for power users only. Each capability gets its own flag, its own rollout schedule, its own monitoring.
## Pattern 4: Environment Parity with Guardrails
Your staging environment needs to be as close to production as possible. For agents, that means real LLM calls, not mocks. Mocking the LLM removes the exact thing you're trying to test: the non-deterministic behavior.
```typescript
const envConfig = {
development: {
model: "haiku", // cheap model for iteration
maxToolCalls: 5, // low limits
budgetCents: 10, // tight budget
tools: developmentTools, // sandboxed tools
logging: "verbose",
},
staging: {
model: "sonnet", // production model
maxToolCalls: 15, // production limits
budgetCents: 100, // reasonable budget
tools: stagingTools, // real tools, sandboxed data
logging: "verbose",
},
production: {
model: "sonnet",
maxToolCalls: 15,
budgetCents: 500,
tools: productionTools,
logging: "structured",
},
};
```
In staging, use the same model as production. Use the same tools. The only difference should be the data (sanitized copies of production data) and the users (your team, not real customers). This connects directly to [cost optimisation](/blog/cost-optimization-ai-agents).
## Pattern 5: Rollback Strategy
When things go wrong (not if), you need to roll back fast. Agent rollbacks are trickier than web app rollbacks because agents have state.
```typescript
class AgentDeployment {
private versions: Map = new Map();
private activeVersion: string;
async rollback(reason: string) {
const previousVersion = this.getPreviousStableVersion();
// 1. Stop routing to current version
this.router.pauseRouting();
// 2. Drain in-flight requests (give running agents time to complete)
await this.drainInflight(timeoutMs: 30_000);
// 3. Switch to previous version
this.activeVersion = previousVersion;
this.router.resumeRouting();
// 4. Record the rollback
await this.audit.record({
action: "rollback",
from: this.activeVersion,
to: previousVersion,
reason,
timestamp: Date.now(),
});
// 5. Alert the team
await this.notify(`Agent rolled back: ${reason}`);
}
private async drainInflight(timeoutMs: number) {
const deadline = Date.now() + timeoutMs;
while (this.inflightCount() > 0 && Date.now() < deadline) {
await sleep(1000);
}
if (this.inflightCount() > 0) {
// Force-terminate remaining requests
await this.terminateInflight();
}
}
}
```
The drain step is critical. You can't just cut over to a new version while agents are mid-execution. They have context, they have pending tool calls, they have partial results. Give them a grace period to complete. If they don't finish, terminate them with a clear error message to the user.
## Pattern 6: Automated Quality Gates
Don't rely on humans to catch regressions. Automate quality checks at every stage.
```typescript
class QualityGate {
private checks: QualityCheck[] = [
new CostCheck({ maxIncrease: 0.20 }), // cost no more than 20% higher
new LatencyCheck({ maxP99Ms: 10_000 }), // P99 under 10s
new ErrorRateCheck({ maxRate: 0.05 }), // error rate under 5%
new ToolCallCheck({ maxAverage: 12 }), // average tool calls under 12
new QualityScoreCheck({ minAverage: 0.7 }),// quality score above 0.7
];
async evaluate(metrics: DeploymentMetrics): Promise {
const results = await Promise.all(
this.checks.map(c => c.evaluate(metrics))
);
const failed = results.filter(r => !r.passed);
return {
passed: failed.length === 0,
checks: results,
recommendation: failed.length === 0
? "proceed"
: failed.some(f => f.severity === "critical")
? "rollback"
: "pause_and_investigate",
};
}
}
```
Run these gates continuously during canary deployments. If any critical check fails, roll back automatically. If a warning fires, pause the rollout and alert the team. No human should have to watch dashboards 24/7 to catch deployment issues.
## Pattern 7: Blue-Green with State Migration
For major agent updates that change how state is structured, use blue-green deployment with state migration. This connects directly to [observability after deployment](/blog/agent-observability-tracing-logging).
```
Blue (current): Agent v1 + State Schema v1
Green (new): Agent v2 + State Schema v2
Migration:
1. Deploy green environment
2. Migrate state from v1 schema to v2 schema
3. Validate migrated state
4. Switch traffic to green
5. Monitor
6. Decommission blue after stability period
```
```typescript
class StateMigration {
async migrate(userId: string): Promise {
const v1State = await stateStoreV1.load(userId);
const v2State = this.transform(v1State);
// Validate the transformation
const valid = v2Schema.safeParse(v2State);
if (!valid.success) {
throw new MigrationError(userId, valid.error);
}
await stateStoreV2.save(userId, v2State);
}
private transform(v1: V1State): V2State {
return {
...v1,
// New fields in v2
preferences: v1.settings || defaultPreferences,
interactionHistory: v1.history?.map(this.transformHistoryEntry) || [],
version: 2,
};
}
}
```
Migrate state for a small batch of users first. Verify the green environment works with migrated state. Then migrate the rest. Keep the blue environment alive for a week as a safety net.
## The Deployment Checklist
Before every agent deployment, run through this:
```
Pre-deploy:
[ ] Shadow deployment results reviewed
[ ] Cost comparison: new vs current
[ ] Latency comparison: new vs current
[ ] Quality comparison: new vs current
[ ] Rollback plan documented
[ ] Monitoring dashboards configured
[ ] Alert thresholds set
[ ] On-call engineer assigned
Deploy:
[ ] Canary at 5% for 24 hours
[ ] Quality gates passing
[ ] No anomalies in cost or latency
[ ] Canary at 20% for 24 hours
[ ] Quality gates still passing
[ ] Full rollout
[ ] Monitor for 48 hours
Post-deploy:
[ ] Cleanup shadow environment
[ ] Archive old version (don't delete)
[ ] Update runbook
[ ] Record deployment metrics
```
It's not glamorous. It's thorough. And thorough is what keeps you sleeping while your agent talks to users at 3 AM.
## The Rule
Here's the rule I live by: deploy your agent like it's going to do something stupid at the worst possible moment. Because eventually, it will. The question isn't whether. It's whether you've built the system to catch it, contain it, and recover from it.
That's not pessimism. That's production engineering.