Login
Back to Blog
"Error Handling for AI APIs: A Developer's Complete Guide"

"Error Handling for AI APIs: A Developer's Complete Guide"

C
Crazyrouter Team
February 20, 2026
38 viewsEnglishTutorial
Share:

AI APIs fail. They fail in creative, unpredictable ways — rate limits at peak hours, random 500 errors, context length exceeded, content filters triggered, and the occasional provider outage that takes your entire application down. If you're building anything production-grade on top of AI APIs, robust error handling isn't optional.

This guide covers every error type you'll encounter, battle-tested retry strategies, and production patterns that keep your application running even when providers don't.

Common AI API Error Types#

Here's every error you'll realistically encounter, organized by HTTP status code:

Status CodeErrorCauseRecovery
400Bad RequestMalformed request, invalid parametersFix request, don't retry
401UnauthorizedInvalid or expired API keyCheck/refresh credentials
403ForbiddenInsufficient permissions or content policyCheck access level
404Not FoundWrong endpoint or model nameFix URL/model
429Rate LimitedToo many requestsRetry with backoff
500Server ErrorProvider-side issueRetry with backoff
502Bad GatewayInfrastructure issueRetry with backoff
503Service UnavailableProvider overloadedRetry with backoff, consider fallback
504Gateway TimeoutRequest took too longRetry, reduce input size

The retryable errors (429, 500, 502, 503, 504) are the ones you need automated handling for. The rest require code fixes.

Retry Strategy: Exponential Backoff with Jitter#

The standard pattern for retrying failed API calls:

Python Implementation#

python
import time
import random
from openai import OpenAI, APIError, RateLimitError, APITimeoutError

client = OpenAI(
    api_key="your-crazyrouter-api-key",
    base_url="https://crazyrouter.com/v1"
)

def call_with_retry(
    messages,
    model="gpt-4.1",
    max_retries=5,
    base_delay=1.0,
    max_delay=60.0
):
    """Call AI API with exponential backoff and jitter."""
    
    for attempt in range(max_retries + 1):
        try:
            response = client.chat.completions.create(
                model=model,
                messages=messages
            )
            return response
            
        except RateLimitError as e:
            if attempt == max_retries:
                raise
            # Use Retry-After header if available
            retry_after = getattr(e, 'retry_after', None)
            delay = retry_after or min(base_delay * (2 ** attempt), max_delay)
            # Add jitter (±25%) to prevent thundering herd
            delay *= (0.75 + random.random() * 0.5)
            print(f"Rate limited. Retrying in {delay:.1f}s (attempt {attempt + 1})")
            time.sleep(delay)
            
        except APITimeoutError:
            if attempt == max_retries:
                raise
            delay = min(base_delay * (2 ** attempt), max_delay)
            print(f"Timeout. Retrying in {delay:.1f}s (attempt {attempt + 1})")
            time.sleep(delay)
            
        except APIError as e:
            if e.status_code in (500, 502, 503, 504):
                if attempt == max_retries:
                    raise
                delay = min(base_delay * (2 ** attempt), max_delay)
                time.sleep(delay)
            else:
                raise  # Non-retryable error

# Usage
response = call_with_retry(
    messages=[{"role": "user", "content": "Hello!"}],
    model="gpt-4.1"
)

Node.js Implementation#

javascript
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'your-crazyrouter-api-key',
  baseURL: 'https://crazyrouter.com/v1'
});

async function callWithRetry(messages, model = 'gpt-4.1', maxRetries = 5) {
  const baseDelay = 1000; // 1 second
  const maxDelay = 60000; // 60 seconds

  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      return await client.chat.completions.create({ model, messages });
    } catch (error) {
      const isRetryable = error.status && [429, 500, 502, 503, 504].includes(error.status);
      
      if (!isRetryable || attempt === maxRetries) {
        throw error;
      }

      // Exponential backoff with jitter
      const delay = Math.min(baseDelay * Math.pow(2, attempt), maxDelay);
      const jitteredDelay = delay * (0.75 + Math.random() * 0.5);
      
      console.log(`Retry ${attempt + 1}/${maxRetries} in ${(jitteredDelay / 1000).toFixed(1)}s`);
      await new Promise(resolve => setTimeout(resolve, jitteredDelay));
    }
  }
}

Provider Fallback Pattern#

When one provider goes down, automatically switch to another. This is where using an API aggregator like Crazyrouter shines — but you can also implement it yourself:

Multi-Provider Fallback (Python)#

python
FALLBACK_CHAIN = [
    {"model": "gpt-4.1", "provider": "openai"},
    {"model": "claude-sonnet-4-5", "provider": "anthropic"},
    {"model": "gemini-2.5-flash", "provider": "google"},
    {"model": "deepseek-v3", "provider": "deepseek"},
]

def call_with_fallback(messages, max_retries_per_provider=2):
    """Try each provider in the fallback chain."""
    errors = []
    
    for provider_config in FALLBACK_CHAIN:
        try:
            response = call_with_retry(
                messages=messages,
                model=provider_config["model"],
                max_retries=max_retries_per_provider
            )
            return response
        except Exception as e:
            errors.append({
                "provider": provider_config["provider"],
                "model": provider_config["model"],
                "error": str(e)
            })
            print(f"Provider {provider_config['provider']} failed, trying next...")
            continue
    
    # All providers failed
    raise Exception(f"All providers failed: {errors}")

# With Crazyrouter, you get automatic routing — but explicit fallback
# gives you control over the priority order

Circuit Breaker Pattern#

Prevent hammering a failing provider:

python
import time
from collections import defaultdict

class CircuitBreaker:
    def __init__(self, failure_threshold=5, recovery_timeout=60):
        self.failure_threshold = failure_threshold
        self.recovery_timeout = recovery_timeout
        self.failures = defaultdict(int)
        self.last_failure_time = defaultdict(float)
        self.state = defaultdict(lambda: "closed")  # closed, open, half-open
    
    def can_execute(self, provider):
        if self.state[provider] == "closed":
            return True
        if self.state[provider] == "open":
            if time.time() - self.last_failure_time[provider] > self.recovery_timeout:
                self.state[provider] = "half-open"
                return True
            return False
        return True  # half-open: allow one test request
    
    def record_success(self, provider):
        self.failures[provider] = 0
        self.state[provider] = "closed"
    
    def record_failure(self, provider):
        self.failures[provider] += 1
        self.last_failure_time[provider] = time.time()
        if self.failures[provider] >= self.failure_threshold:
            self.state[provider] = "open"
            print(f"Circuit OPEN for {provider} — skipping for {self.recovery_timeout}s")

breaker = CircuitBreaker()

def call_with_circuit_breaker(messages):
    for config in FALLBACK_CHAIN:
        provider = config["provider"]
        if not breaker.can_execute(provider):
            continue
        try:
            response = call_with_retry(messages, model=config["model"], max_retries=2)
            breaker.record_success(provider)
            return response
        except Exception:
            breaker.record_failure(provider)
    raise Exception("All circuits open — no available providers")

Handling Specific Error Scenarios#

Context Length Exceeded#

When your input is too long for the model:

python
def smart_truncate(messages, model="gpt-4.1"):
    """Truncate conversation history to fit context window."""
    MAX_TOKENS = {
        "gpt-4.1": 128000,
        "claude-sonnet-4-5": 200000,
        "gemini-2.5-flash": 1000000,
    }
    
    max_ctx = MAX_TOKENS.get(model, 128000)
    
    try:
        return client.chat.completions.create(model=model, messages=messages)
    except Exception as e:
        if "context_length_exceeded" in str(e).lower():
            # Keep system message + last N messages
            system_msgs = [m for m in messages if m["role"] == "system"]
            other_msgs = [m for m in messages if m["role"] != "system"]
            
            # Progressively remove older messages
            while len(other_msgs) > 1:
                other_msgs = other_msgs[len(other_msgs) // 2:]
                try:
                    return client.chat.completions.create(
                        model=model,
                        messages=system_msgs + other_msgs
                    )
                except:
                    continue
            
            # Last resort: try a model with larger context
            return client.chat.completions.create(
                model="gemini-2.5-flash",  # 1M context
                messages=messages
            )
        raise

Content Filter Errors#

When the AI refuses to generate content:

python
def handle_content_filter(messages, model="gpt-4.1"):
    try:
        response = client.chat.completions.create(model=model, messages=messages)
        
        # Check for content filter in response
        if response.choices[0].finish_reason == "content_filter":
            return {
                "success": False,
                "reason": "content_filter",
                "message": "The request was flagged by content moderation."
            }
        
        return {"success": True, "content": response.choices[0].message.content}
        
    except Exception as e:
        if "content_policy" in str(e).lower() or "content_filter" in str(e).lower():
            return {
                "success": False,
                "reason": "content_policy",
                "message": "Request violates content policy."
            }
        raise

Timeout Handling#

python
from openai import OpenAI

# Set appropriate timeouts
client = OpenAI(
    api_key="your-crazyrouter-api-key",
    base_url="https://crazyrouter.com/v1",
    timeout=60.0,        # Total request timeout
    max_retries=0        # We handle retries ourselves
)

# For long-running requests (e.g., complex reasoning)
def call_with_timeout(messages, model, timeout=120):
    """Adjust timeout based on expected response length."""
    try:
        response = client.with_options(timeout=timeout).chat.completions.create(
            model=model,
            messages=messages
        )
        return response
    except Exception as e:
        if "timeout" in str(e).lower():
            # Try a faster model
            return client.chat.completions.create(
                model="gpt-4.1-mini",
                messages=messages
            )
        raise

Production Error Monitoring#

Structured Logging#

python
import logging
import json
from datetime import datetime

logger = logging.getLogger("ai_api")

def log_api_call(model, messages, response=None, error=None, duration_ms=None):
    log_entry = {
        "timestamp": datetime.utcnow().isoformat(),
        "model": model,
        "input_messages": len(messages),
        "duration_ms": duration_ms,
    }
    
    if response:
        log_entry["status"] = "success"
        log_entry["tokens"] = {
            "prompt": response.usage.prompt_tokens,
            "completion": response.usage.completion_tokens,
            "total": response.usage.total_tokens
        }
        log_entry["finish_reason"] = response.choices[0].finish_reason
        logger.info(json.dumps(log_entry))
    
    if error:
        log_entry["status"] = "error"
        log_entry["error_type"] = type(error).__name__
        log_entry["error_message"] = str(error)
        log_entry["status_code"] = getattr(error, "status_code", None)
        logger.error(json.dumps(log_entry))

Key Metrics to Track#

MetricWhy It Matters
Error rate by providerDetect provider degradation early
P95 latencyCatch slowdowns before users complain
Rate limit hits/hourKnow when to scale or add providers
Fallback activation rateUnderstand provider reliability
Token usage per requestCost monitoring and optimization
Circuit breaker state changesTrack provider health over time

Why Crazyrouter Simplifies Error Handling#

Using an API aggregator like Crazyrouter reduces the error handling burden significantly:

  1. Automatic retries: Built-in retry logic for transient errors
  2. Unified error format: All providers return consistent error structures
  3. Load balancing: Requests distributed across healthy endpoints
  4. Rate limit pooling: Higher effective rate limits through multiple upstream keys
  5. Single API key: One credential to manage instead of 5+

You still need application-level error handling (context length, content filters, business logic), but the infrastructure-level errors are largely handled for you.

FAQ#

How many retries should I use for AI API calls?#

3-5 retries with exponential backoff is standard. For rate limits, respect the Retry-After header. For server errors, 3 retries with 1s/2s/4s delays works well.

Should I use the same model for retries?#

For transient errors (500, 502, 503), retry the same model. For persistent failures, fall back to an alternative model. Using Crazyrouter makes model switching trivial since all models use the same API format.

How do I handle rate limits across multiple users?#

Implement per-user rate limiting in your application layer, and use a token bucket or sliding window algorithm. Set your per-user limits below your API provider's limits to maintain headroom.

What's the best way to monitor AI API errors in production?#

Log every API call with structured data (model, latency, tokens, error type). Use dashboards to track error rates, latency percentiles, and cost. Alert on error rate spikes and circuit breaker activations.

Summary#

Production AI applications need layered error handling: retries with backoff for transient errors, fallback chains for provider outages, circuit breakers to prevent cascade failures, and structured logging for visibility.

Crazyrouter handles much of the infrastructure-level complexity — automatic retries, unified error formats, and access to 300+ models through one endpoint. Focus your error handling on application logic, and let the infrastructure handle the rest. Start at crazyrouter.com.

Related Articles