Retry Strategies for AI Agents: Exponential Backoff Isn''t Enough

The Naive Retry

You've seen this code. You've probably written this code. I definitely have.

for attempt in range(3):
    try:
        result = llm.generate(prompt)
        break
    except Exception:
        time.sleep(2 ** attempt)

Exponential backoff. The standard retry pattern for API calls. Wait 1 second, then 2, then 4. Give the server time to recover. Works great for rate limits and transient network errors.

Completely useless for most AI agent failures.

Here's why: when an LLM call fails in an agent system, the failure is usually not transient. It's structural. The prompt was too long. The tool call format was wrong. The model misunderstood the instructions. Retrying the exact same request with a longer delay doesn't fix any of these. It just wastes time and money.

Agent retries need to be smarter than backoff. They need to change the approach.

Taxonomy of Agent Failures

Before you can retry intelligently, you need to know what failed. Not all failures are created equal.

Transient failures: rate limits, network timeouts, service unavailability. These DO benefit from backoff. Retry the same request after a delay.

Context failures: the prompt was too long, the context window overflowed, token limit exceeded. You need to reduce the input, not retry it.

Format failures: the model returned invalid JSON, called a tool that doesn't exist, produced malformed output. You need better instructions or output parsing, not repetition.

Reasoning failures: the model chose the wrong tool, made a logical error, hallucinated data. You need a different prompt strategy, not the same one again.

Dependency failures: an external tool or API the agent relies on is broken. Retrying won't help until the dependency recovers.

type FailureType =
  | "transient"     // retry with backoff
  | "context"       // reduce input and retry
  | "format"        // fix output parsing and retry
  | "reasoning"     // modify prompt and retry
  | "dependency"    // skip or use alternative
  | "terminal";     // don't retry, fail gracefully

function classifyFailure(error: Error, context: AgentContext): FailureType {
  if (error instanceof RateLimitError) return "transient";
  if (error instanceof ContextLengthError) return "context";
  if (error instanceof JSONParseError) return "format";
  if (error instanceof ToolNotFoundError) return "format";
  if (error instanceof ToolExecutionError) return "dependency";
  if (context.sameErrorRepeated(3)) return "terminal";
  return "reasoning"; // default: assume the approach was wrong
}

Strategy 1: Contextual Reduction

When the context window overflows or the model is confused by too much information, reduce the input.

class ContextReductionRetry {
  private reductions = [
    (ctx: AgentContext) => ctx.summarizeOldMessages(keepRecent: 5),
    (ctx: AgentContext) => ctx.removeToolResults(keepRecent: 3),
    (ctx: AgentContext) => ctx.truncateSystemPrompt(maxTokens: 2000),
    (ctx: AgentContext) => ctx.dropLowRelevanceContext(threshold: 0.3),
  ];

  async retry(error: Error, context: AgentContext): Promise<AgentResult> {
    for (const reduce of this.reductions) {
      const reduced = await reduce(context);
      try {
        return await context.agent.run(reduced);
      } catch (nextError) {
        if (classifyFailure(nextError, context) !== "context") {
          throw nextError; // different failure type, don't continue reducing
        }
      }
    }
    throw new RetryExhaustedError("Context reduction didn't help");
  }
}

Each reduction is more aggressive than the last. First, summarize old messages. Then remove old tool results. Then truncate the system prompt. Then drop low-relevance context entirely. If none of these work, the task is too large for the context window and needs to be decomposed.

Strategy 2: Prompt Mutation

When the model's reasoning goes wrong, change how you ask.

class PromptMutationRetry {
  private mutations = [
    // Add explicit constraints
    (prompt: string, error: Error) =>
      `${prompt}\n\nIMPORTANT: ${this.extractConstraint(error)}`,

    // Add an example of the expected output
    (prompt: string, error: Error) =>
      `${prompt}\n\nExample of expected output format:\n${this.getExample(error)}`, This connects directly to [fault-tolerant architectures](/blog/fault-tolerance-multi-agent).

    // Simplify the instruction
    (prompt: string) =>
      this.simplify(prompt),

    // Chain-of-thought explicit
    (prompt: string) =>
      `${prompt}\n\nThink through this step by step. Show your reasoning before giving the final answer.`,
  ];

  async retry(error: Error, context: AgentContext): Promise<AgentResult> {
    for (const mutate of this.mutations) {
      const mutatedPrompt = mutate(context.systemPrompt, error);
      context.systemPrompt = mutatedPrompt;

      try {
        return await context.agent.run(context);
      } catch (nextError) {
        if (classifyFailure(nextError, context) === "terminal") {
          throw nextError;
        }
      }
    }
    throw new RetryExhaustedError("Prompt mutations exhausted");
  }
}

The key here: each mutation is informed by the previous failure. If the model returned invalid JSON, the mutation adds an explicit JSON example. If the model used the wrong tool, the mutation adds constraints about which tools are appropriate. You're not blindly retrying. You're teaching. It is worth reading about guardrails around retries alongside this.

Strategy 3: Model Escalation

Sometimes the model simply isn't capable enough for the task. Escalate to a more powerful one.

class ModelEscalationRetry {
  private escalationPath = ["haiku", "sonnet", "opus"];

  async retry(error: Error, context: AgentContext): Promise<AgentResult> {
    const currentIndex = this.escalationPath.indexOf(context.model);
    const remaining = this.escalationPath.slice(currentIndex + 1);

    for (const model of remaining) {
      context.model = model;
      context.addSystemMessage(
        `Previous attempt with ${this.escalationPath[currentIndex]} failed: ${error.message}. ` +
        `Please approach this differently.`
      );

      try {
        return await context.agent.run(context);
      } catch (nextError) {
        if (classifyFailure(nextError, context) === "terminal") {
          throw nextError;
        }
      }
    }
    throw new EscalationExhaustedError("All models failed");
  }
}

Notice we're not just switching models. We're telling the new model what happened. "The previous attempt failed because of X. Take a different approach." This gives the more powerful model context about what didn't work.

Strategy 4: Tool Substitution

When a tool fails, don't just retry it. Use an alternative.

class ToolSubstitutionRetry {
  private alternatives: Map<string, string[]> = new Map([
    ["web_search", ["cached_search", "knowledge_base"]],
    ["sql_query", ["cached_query", "approximate_query"]],
    ["api_call", ["cached_response", "mock_response"]],
  ]);

  async retry(
    toolName: string,
    error: Error,
    context: AgentContext
  ): Promise<ToolResult> {
    const alts = this.alternatives.get(toolName) || [];

    for (const alt of alts) {
      try {
        return await context.executeTool(alt, context.lastToolArgs);
      } catch {
        continue;
      }
    }

    // No alternative worked. Tell the agent to proceed without this tool.
    context.addSystemMessage(
      `Tool ${toolName} is unavailable. Available alternatives also failed. ` +
      `Please proceed with the information you have, or adjust your approach.`
    );

    return await context.agent.replan(context);
  }
}

The fallback chain matters. Can't search the web? Try the cache. Can't query the database? Try an approximate version. Can't call the API? Use a cached response. And if everything fails, tell the agent to work with what it has. This connects directly to incident response playbooks.

Strategy 5: Task Decomposition

When a complex task fails, break it into simpler subtasks.

class DecompositionRetry {
  async retry(error: Error, context: AgentContext): Promise<AgentResult> {
    const subtasks = await context.agent.decompose(context.task, {
      hint: `The full task failed with: ${error.message}. ` +
            `Break it into smaller, independent steps.`,
    });

    const results: AgentResult[] = [];
    for (const subtask of subtasks) {
      const result = await context.agent.run({
        ...context,
        task: subtask,
        maxRetries: 2,
      });
      results.push(result);
    }

    // Combine subtask results
    return await context.agent.synthesize(results, context.task);
  }
}

This is the most expensive retry strategy but also the most effective for complex failures. The task that was too hard as one piece becomes manageable as three pieces.

Composing Strategies

The real power is in composing these strategies into a retry pipeline.

class AgentRetryPipeline {
  async execute(context: AgentContext): Promise<AgentResult> {
    try {
      return await context.agent.run(context);
    } catch (error) {
      const failureType = classifyFailure(error, context);

      switch (failureType) {
        case "transient":
          return exponentialBackoff(context, maxRetries: 3);
        case "context":
          return new ContextReductionRetry().retry(error, context);
        case "format":
          return new PromptMutationRetry().retry(error, context);
        case "reasoning":
          // Try prompt mutation first, then model escalation
          try {
            return await new PromptMutationRetry().retry(error, context);
          } catch {
            return await new ModelEscalationRetry().retry(error, context);
          }
        case "dependency":
          return new ToolSubstitutionRetry().retry(
            error.toolName, error, context
          );
        case "terminal":
          throw error; // Don't retry, fail gracefully
      }
    }
  }
}

Each failure type gets the appropriate strategy. Transient failures get backoff. Context failures get reduction. Reasoning failures get prompt mutation followed by model escalation. Terminal failures get a graceful exit.

The Budget Check

Every retry costs money. Every retry adds latency. Your retry pipeline needs a budget.

async execute(context: AgentContext): Promise<AgentResult> {
  const budget = context.retryBudget || {
    maxRetries: 5,
    maxAdditionalCost: context.originalEstimatedCost * 2,
    maxAdditionalTime: context.timeout * 0.5,
  };

  // ... retry logic with budget checks ...

  if (totalRetryCost > budget.maxAdditionalCost) {
    throw new BudgetExceededError(
      `Retries exceeded cost budget: ${totalRetryCost}c > ${budget.maxAdditionalCost}c`
    );
  }
}

A good rule of thumb: your retry budget should be at most 2x the original task cost. If retries would cost more than that, the task needs human intervention, not more machine time.

What Exponential Backoff Gets Wrong

Backoff assumes the problem is temporary and will resolve itself with time. For LLM calls, the problem is almost never time-based. It's input-based. The same prompt will produce the same failure whether you wait 1 second or 100.

Smart retries change the input. They reduce context, mutate prompts, escalate models, substitute tools, or decompose tasks. The delay between retries is almost irrelevant compared to the strategy change.

Stop retrying the same thing harder. Start retrying smarter.