EnglishTutorial

Error Handling for AI APIs: A Developer's Complete Guide

Master error handling for AI APIs including rate limits, timeouts, token limits, and provider outages. Production-ready patterns with Python and Node.

Crazyrouter Team

February 20, 2026 / 343 views

Error Handling for AI APIs: A Developer's Complete Guide

Crazyrouter

Read the docs Check live pricing Open image tool Create account

AI APIs fail. They fail in creative, unpredictable ways — rate limits at peak hours, random 500 errors, context length exceeded, content filters triggered, and the occasional provider outage that takes your entire application down. If you're building anything production-grade on top of AI APIs, robust error handling isn't optional.

This guide covers every error type you'll encounter, battle-tested retry strategies, and production patterns that keep your application running even when providers don't.

Common AI API Error Types#

Here's every error you'll realistically encounter, organized by HTTP status code:

Status Code	Error	Cause	Recovery
400	Bad Request	Malformed request, invalid parameters	Fix request, don't retry
401	Unauthorized	Invalid or expired API key	Check/refresh credentials
403	Forbidden	Insufficient permissions or content policy	Check access level
404	Not Found	Wrong endpoint or model name	Fix URL/model
429	Rate Limited	Too many requests	Retry with backoff
500	Server Error	Provider-side issue	Retry with backoff
502	Bad Gateway	Infrastructure issue	Retry with backoff
503	Service Unavailable	Provider overloaded	Retry with backoff, consider fallback
504	Gateway Timeout	Request took too long	Retry, reduce input size

The retryable errors (429, 500, 502, 503, 504) are the ones you need automated handling for. The rest require code fixes.

Retry Strategy: Exponential Backoff with Jitter#

The standard pattern for retrying failed API calls:

Python Implementation#

python

import time
import random
from openai import OpenAI, APIError, RateLimitError, APITimeoutError

client = OpenAI(
    api_key="your-crazyrouter-api-key",
    base_url="https://crazyrouter.com/v1"
)

def call_with_retry(
    messages,
    model="gpt-4.1",
    max_retries=5,
    base_delay=1.0,
    max_delay=60.0
):
    """Call AI API with exponential backoff and jitter."""
    
    for attempt in range(max_retries + 1):
        try:
            response = client.chat.completions.create(
                model=model,
                messages=messages
            )
            return response
            
        except RateLimitError as e:
            if attempt == max_retries:
                raise
            # Use Retry-After header if available
            retry_after = getattr(e, 'retry_after', None)
            delay = retry_after or min(base_delay * (2 ** attempt), max_delay)
            # Add jitter (±25%) to prevent thundering herd
            delay *= (0.75 + random.random() * 0.5)
            print(f"Rate limited. Retrying in {delay:.1f}s (attempt {attempt + 1})")
            time.sleep(delay)
            
        except APITimeoutError:
            if attempt == max_retries:
                raise
            delay = min(base_delay * (2 ** attempt), max_delay)
            print(f"Timeout. Retrying in {delay:.1f}s (attempt {attempt + 1})")
            time.sleep(delay)
            
        except APIError as e:
            if e.status_code in (500, 502, 503, 504):
                if attempt == max_retries:
                    raise
                delay = min(base_delay * (2 ** attempt), max_delay)
                time.sleep(delay)
            else:
                raise  # Non-retryable error

# Usage
response = call_with_retry(
    messages=[{"role": "user", "content": "Hello!"}],
    model="gpt-4.1"
)

Node.js Implementation#

javascript

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'your-crazyrouter-api-key',
  baseURL: 'https://crazyrouter.com/v1'
});

async function callWithRetry(messages, model = 'gpt-4.1', maxRetries = 5) {
  const baseDelay = 1000; // 1 second
  const maxDelay = 60000; // 60 seconds

  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      return await client.chat.completions.create({ model, messages });
    } catch (error) {
      const isRetryable = error.status && [429, 500, 502, 503, 504].includes(error.status);
      
      if (!isRetryable || attempt === maxRetries) {
        throw error;
      }

      // Exponential backoff with jitter
      const delay = Math.min(baseDelay * Math.pow(2, attempt), maxDelay);
      const jitteredDelay = delay * (0.75 + Math.random() * 0.5);
      
      console.log(`Retry ${attempt + 1}/${maxRetries} in ${(jitteredDelay / 1000).toFixed(1)}s`);
      await new Promise(resolve => setTimeout(resolve, jitteredDelay));
    }
  }
}

Provider Fallback Pattern#

When one provider goes down, automatically switch to another. This is where using an API aggregator like Crazyrouter shines — but you can also implement it yourself:

Multi-Provider Fallback (Python)#

python

FALLBACK_CHAIN = [
    {"model": "gpt-4.1", "provider": "openai"},
    {"model": "claude-sonnet-4-5", "provider": "anthropic"},
    {"model": "gemini-2.5-flash", "provider": "google"},
    {"model": "deepseek-v3", "provider": "deepseek"},
]

def call_with_fallback(messages, max_retries_per_provider=2):
    """Try each provider in the fallback chain."""
    errors = []
    
    for provider_config in FALLBACK_CHAIN:
        try:
            response = call_with_retry(
                messages=messages,
                model=provider_config["model"],
                max_retries=max_retries_per_provider
            )
            return response
        except Exception as e:
            errors.append({
                "provider": provider_config["provider"],
                "model": provider_config["model"],
                "error": str(e)
            })
            print(f"Provider {provider_config['provider']} failed, trying next...")
            continue
    
    # All providers failed
    raise Exception(f"All providers failed: {errors}")

# With Crazyrouter, you get automatic routing — but explicit fallback
# gives you control over the priority order

Circuit Breaker Pattern#

Prevent hammering a failing provider:

python

import time
from collections import defaultdict

class CircuitBreaker:
    def __init__(self, failure_threshold=5, recovery_timeout=60):
        self.failure_threshold = failure_threshold
        self.recovery_timeout = recovery_timeout
        self.failures = defaultdict(int)
        self.last_failure_time = defaultdict(float)
        self.state = defaultdict(lambda: "closed")  # closed, open, half-open
    
    def can_execute(self, provider):
        if self.state[provider] == "closed":
            return True
        if self.state[provider] == "open":
            if time.time() - self.last_failure_time[provider] > self.recovery_timeout:
                self.state[provider] = "half-open"
                return True
            return False
        return True  # half-open: allow one test request
    
    def record_success(self, provider):
        self.failures[provider] = 0
        self.state[provider] = "closed"
    
    def record_failure(self, provider):
        self.failures[provider] += 1
        self.last_failure_time[provider] = time.time()
        if self.failures[provider] >= self.failure_threshold:
            self.state[provider] = "open"
            print(f"Circuit OPEN for {provider} — skipping for {self.recovery_timeout}s")

breaker = CircuitBreaker()

def call_with_circuit_breaker(messages):
    for config in FALLBACK_CHAIN:
        provider = config["provider"]
        if not breaker.can_execute(provider):
            continue
        try:
            response = call_with_retry(messages, model=config["model"], max_retries=2)
            breaker.record_success(provider)
            return response
        except Exception:
            breaker.record_failure(provider)
    raise Exception("All circuits open — no available providers")

Handling Specific Error Scenarios#

Context Length Exceeded#

When your input is too long for the model:

python

def smart_truncate(messages, model="gpt-4.1"):
    """Truncate conversation history to fit context window."""
    MAX_TOKENS = {
        "gpt-4.1": 128000,
        "claude-sonnet-4-5": 200000,
        "gemini-2.5-flash": 1000000,
    }
    
    max_ctx = MAX_TOKENS.get(model, 128000)
    
    try:
        return client.chat.completions.create(model=model, messages=messages)
    except Exception as e:
        if "context_length_exceeded" in str(e).lower():
            # Keep system message + last N messages
            system_msgs = [m for m in messages if m["role"] == "system"]
            other_msgs = [m for m in messages if m["role"] != "system"]
            
            # Progressively remove older messages
            while len(other_msgs) > 1:
                other_msgs = other_msgs[len(other_msgs) // 2:]
                try:
                    return client.chat.completions.create(
                        model=model,
                        messages=system_msgs + other_msgs
                    )
                except:
                    continue
            
            # Last resort: try a model with larger context
            return client.chat.completions.create(
                model="gemini-2.5-flash",  # 1M context
                messages=messages
            )
        raise

Content Filter Errors#

When the AI refuses to generate content:

python

def handle_content_filter(messages, model="gpt-4.1"):
    try:
        response = client.chat.completions.create(model=model, messages=messages)
        
        # Check for content filter in response
        if response.choices[0].finish_reason == "content_filter":
            return {
                "success": False,
                "reason": "content_filter",
                "message": "The request was flagged by content moderation."
            }
        
        return {"success": True, "content": response.choices[0].message.content}
        
    except Exception as e:
        if "content_policy" in str(e).lower() or "content_filter" in str(e).lower():
            return {
                "success": False,
                "reason": "content_policy",
                "message": "Request violates content policy."
            }
        raise

Timeout Handling#

python

from openai import OpenAI

# Set appropriate timeouts
client = OpenAI(
    api_key="your-crazyrouter-api-key",
    base_url="https://crazyrouter.com/v1",
    timeout=60.0,        # Total request timeout
    max_retries=0        # We handle retries ourselves
)

# For long-running requests (e.g., complex reasoning)
def call_with_timeout(messages, model, timeout=120):
    """Adjust timeout based on expected response length."""
    try:
        response = client.with_options(timeout=timeout).chat.completions.create(
            model=model,
            messages=messages
        )
        return response
    except Exception as e:
        if "timeout" in str(e).lower():
            # Try a faster model
            return client.chat.completions.create(
                model="gpt-4.1-mini",
                messages=messages
            )
        raise

Production Error Monitoring#

Structured Logging#

python

import logging
import json
from datetime import datetime

logger = logging.getLogger("ai_api")

def log_api_call(model, messages, response=None, error=None, duration_ms=None):
    log_entry = {
        "timestamp": datetime.utcnow().isoformat(),
        "model": model,
        "input_messages": len(messages),
        "duration_ms": duration_ms,
    }
    
    if response:
        log_entry["status"] = "success"
        log_entry["tokens"] = {
            "prompt": response.usage.prompt_tokens,
            "completion": response.usage.completion_tokens,
            "total": response.usage.total_tokens
        }
        log_entry["finish_reason"] = response.choices[0].finish_reason
        logger.info(json.dumps(log_entry))
    
    if error:
        log_entry["status"] = "error"
        log_entry["error_type"] = type(error).__name__
        log_entry["error_message"] = str(error)
        log_entry["status_code"] = getattr(error, "status_code", None)
        logger.error(json.dumps(log_entry))

Key Metrics to Track#

Metric	Why It Matters
Error rate by provider	Detect provider degradation early
P95 latency	Catch slowdowns before users complain
Rate limit hits/hour	Know when to scale or add providers
Fallback activation rate	Understand provider reliability
Token usage per request	Cost monitoring and optimization
Circuit breaker state changes	Track provider health over time

Why Crazyrouter Simplifies Error Handling#

Using an API aggregator like Crazyrouter reduces the error handling burden significantly:

Automatic retries: Built-in retry logic for transient errors
Unified error format: All providers return consistent error structures
Load balancing: Requests distributed across healthy endpoints
Rate limit pooling: Higher effective rate limits through multiple upstream keys
Single API key: One credential to manage instead of 5+

You still need application-level error handling (context length, content filters, business logic), but the infrastructure-level errors are largely handled for you.

FAQ#

How many retries should I use for AI API calls?#

3-5 retries with exponential backoff is standard. For rate limits, respect the Retry-After header. For server errors, 3 retries with 1s/2s/4s delays works well.

Should I use the same model for retries?#

For transient errors (500, 502, 503), retry the same model. For persistent failures, fall back to an alternative model. Using Crazyrouter makes model switching trivial since all models use the same API format.

How do I handle rate limits across multiple users?#

Implement per-user rate limiting in your application layer, and use a token bucket or sliding window algorithm. Set your per-user limits below your API provider's limits to maintain headroom.

What's the best way to monitor AI API errors in production?#

Log every API call with structured data (model, latency, tokens, error type). Use dashboards to track error rates, latency percentiles, and cost. Alert on error rate spikes and circuit breaker activations.

Summary#

Production AI applications need layered error handling: retries with backoff for transient errors, fallback chains for provider outages, circuit breakers to prevent cascade failures, and structured logging for visibility.

Crazyrouter handles much of the infrastructure-level complexity — automatic retries, unified error formats, and access to 300+ models through one endpoint. Focus your error handling on application logic, and let the infrastructure handle the rest. Start at crazyrouter.com.