
"Error Handling for AI APIs: A Developer's Complete Guide"
AI APIs fail. They fail in creative, unpredictable ways — rate limits at peak hours, random 500 errors, context length exceeded, content filters triggered, and the occasional provider outage that takes your entire application down. If you're building anything production-grade on top of AI APIs, robust error handling isn't optional.
This guide covers every error type you'll encounter, battle-tested retry strategies, and production patterns that keep your application running even when providers don't.
Common AI API Error Types#
Here's every error you'll realistically encounter, organized by HTTP status code:
| Status Code | Error | Cause | Recovery |
|---|---|---|---|
| 400 | Bad Request | Malformed request, invalid parameters | Fix request, don't retry |
| 401 | Unauthorized | Invalid or expired API key | Check/refresh credentials |
| 403 | Forbidden | Insufficient permissions or content policy | Check access level |
| 404 | Not Found | Wrong endpoint or model name | Fix URL/model |
| 429 | Rate Limited | Too many requests | Retry with backoff |
| 500 | Server Error | Provider-side issue | Retry with backoff |
| 502 | Bad Gateway | Infrastructure issue | Retry with backoff |
| 503 | Service Unavailable | Provider overloaded | Retry with backoff, consider fallback |
| 504 | Gateway Timeout | Request took too long | Retry, reduce input size |
The retryable errors (429, 500, 502, 503, 504) are the ones you need automated handling for. The rest require code fixes.
Retry Strategy: Exponential Backoff with Jitter#
The standard pattern for retrying failed API calls:
Python Implementation#
import time
import random
from openai import OpenAI, APIError, RateLimitError, APITimeoutError
client = OpenAI(
api_key="your-crazyrouter-api-key",
base_url="https://crazyrouter.com/v1"
)
def call_with_retry(
messages,
model="gpt-4.1",
max_retries=5,
base_delay=1.0,
max_delay=60.0
):
"""Call AI API with exponential backoff and jitter."""
for attempt in range(max_retries + 1):
try:
response = client.chat.completions.create(
model=model,
messages=messages
)
return response
except RateLimitError as e:
if attempt == max_retries:
raise
# Use Retry-After header if available
retry_after = getattr(e, 'retry_after', None)
delay = retry_after or min(base_delay * (2 ** attempt), max_delay)
# Add jitter (±25%) to prevent thundering herd
delay *= (0.75 + random.random() * 0.5)
print(f"Rate limited. Retrying in {delay:.1f}s (attempt {attempt + 1})")
time.sleep(delay)
except APITimeoutError:
if attempt == max_retries:
raise
delay = min(base_delay * (2 ** attempt), max_delay)
print(f"Timeout. Retrying in {delay:.1f}s (attempt {attempt + 1})")
time.sleep(delay)
except APIError as e:
if e.status_code in (500, 502, 503, 504):
if attempt == max_retries:
raise
delay = min(base_delay * (2 ** attempt), max_delay)
time.sleep(delay)
else:
raise # Non-retryable error
# Usage
response = call_with_retry(
messages=[{"role": "user", "content": "Hello!"}],
model="gpt-4.1"
)
Node.js Implementation#
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: 'your-crazyrouter-api-key',
baseURL: 'https://crazyrouter.com/v1'
});
async function callWithRetry(messages, model = 'gpt-4.1', maxRetries = 5) {
const baseDelay = 1000; // 1 second
const maxDelay = 60000; // 60 seconds
for (let attempt = 0; attempt <= maxRetries; attempt++) {
try {
return await client.chat.completions.create({ model, messages });
} catch (error) {
const isRetryable = error.status && [429, 500, 502, 503, 504].includes(error.status);
if (!isRetryable || attempt === maxRetries) {
throw error;
}
// Exponential backoff with jitter
const delay = Math.min(baseDelay * Math.pow(2, attempt), maxDelay);
const jitteredDelay = delay * (0.75 + Math.random() * 0.5);
console.log(`Retry ${attempt + 1}/${maxRetries} in ${(jitteredDelay / 1000).toFixed(1)}s`);
await new Promise(resolve => setTimeout(resolve, jitteredDelay));
}
}
}
Provider Fallback Pattern#
When one provider goes down, automatically switch to another. This is where using an API aggregator like Crazyrouter shines — but you can also implement it yourself:
Multi-Provider Fallback (Python)#
FALLBACK_CHAIN = [
{"model": "gpt-4.1", "provider": "openai"},
{"model": "claude-sonnet-4-5", "provider": "anthropic"},
{"model": "gemini-2.5-flash", "provider": "google"},
{"model": "deepseek-v3", "provider": "deepseek"},
]
def call_with_fallback(messages, max_retries_per_provider=2):
"""Try each provider in the fallback chain."""
errors = []
for provider_config in FALLBACK_CHAIN:
try:
response = call_with_retry(
messages=messages,
model=provider_config["model"],
max_retries=max_retries_per_provider
)
return response
except Exception as e:
errors.append({
"provider": provider_config["provider"],
"model": provider_config["model"],
"error": str(e)
})
print(f"Provider {provider_config['provider']} failed, trying next...")
continue
# All providers failed
raise Exception(f"All providers failed: {errors}")
# With Crazyrouter, you get automatic routing — but explicit fallback
# gives you control over the priority order
Circuit Breaker Pattern#
Prevent hammering a failing provider:
import time
from collections import defaultdict
class CircuitBreaker:
def __init__(self, failure_threshold=5, recovery_timeout=60):
self.failure_threshold = failure_threshold
self.recovery_timeout = recovery_timeout
self.failures = defaultdict(int)
self.last_failure_time = defaultdict(float)
self.state = defaultdict(lambda: "closed") # closed, open, half-open
def can_execute(self, provider):
if self.state[provider] == "closed":
return True
if self.state[provider] == "open":
if time.time() - self.last_failure_time[provider] > self.recovery_timeout:
self.state[provider] = "half-open"
return True
return False
return True # half-open: allow one test request
def record_success(self, provider):
self.failures[provider] = 0
self.state[provider] = "closed"
def record_failure(self, provider):
self.failures[provider] += 1
self.last_failure_time[provider] = time.time()
if self.failures[provider] >= self.failure_threshold:
self.state[provider] = "open"
print(f"Circuit OPEN for {provider} — skipping for {self.recovery_timeout}s")
breaker = CircuitBreaker()
def call_with_circuit_breaker(messages):
for config in FALLBACK_CHAIN:
provider = config["provider"]
if not breaker.can_execute(provider):
continue
try:
response = call_with_retry(messages, model=config["model"], max_retries=2)
breaker.record_success(provider)
return response
except Exception:
breaker.record_failure(provider)
raise Exception("All circuits open — no available providers")
Handling Specific Error Scenarios#
Context Length Exceeded#
When your input is too long for the model:
def smart_truncate(messages, model="gpt-4.1"):
"""Truncate conversation history to fit context window."""
MAX_TOKENS = {
"gpt-4.1": 128000,
"claude-sonnet-4-5": 200000,
"gemini-2.5-flash": 1000000,
}
max_ctx = MAX_TOKENS.get(model, 128000)
try:
return client.chat.completions.create(model=model, messages=messages)
except Exception as e:
if "context_length_exceeded" in str(e).lower():
# Keep system message + last N messages
system_msgs = [m for m in messages if m["role"] == "system"]
other_msgs = [m for m in messages if m["role"] != "system"]
# Progressively remove older messages
while len(other_msgs) > 1:
other_msgs = other_msgs[len(other_msgs) // 2:]
try:
return client.chat.completions.create(
model=model,
messages=system_msgs + other_msgs
)
except:
continue
# Last resort: try a model with larger context
return client.chat.completions.create(
model="gemini-2.5-flash", # 1M context
messages=messages
)
raise
Content Filter Errors#
When the AI refuses to generate content:
def handle_content_filter(messages, model="gpt-4.1"):
try:
response = client.chat.completions.create(model=model, messages=messages)
# Check for content filter in response
if response.choices[0].finish_reason == "content_filter":
return {
"success": False,
"reason": "content_filter",
"message": "The request was flagged by content moderation."
}
return {"success": True, "content": response.choices[0].message.content}
except Exception as e:
if "content_policy" in str(e).lower() or "content_filter" in str(e).lower():
return {
"success": False,
"reason": "content_policy",
"message": "Request violates content policy."
}
raise
Timeout Handling#
from openai import OpenAI
# Set appropriate timeouts
client = OpenAI(
api_key="your-crazyrouter-api-key",
base_url="https://crazyrouter.com/v1",
timeout=60.0, # Total request timeout
max_retries=0 # We handle retries ourselves
)
# For long-running requests (e.g., complex reasoning)
def call_with_timeout(messages, model, timeout=120):
"""Adjust timeout based on expected response length."""
try:
response = client.with_options(timeout=timeout).chat.completions.create(
model=model,
messages=messages
)
return response
except Exception as e:
if "timeout" in str(e).lower():
# Try a faster model
return client.chat.completions.create(
model="gpt-4.1-mini",
messages=messages
)
raise
Production Error Monitoring#
Structured Logging#
import logging
import json
from datetime import datetime
logger = logging.getLogger("ai_api")
def log_api_call(model, messages, response=None, error=None, duration_ms=None):
log_entry = {
"timestamp": datetime.utcnow().isoformat(),
"model": model,
"input_messages": len(messages),
"duration_ms": duration_ms,
}
if response:
log_entry["status"] = "success"
log_entry["tokens"] = {
"prompt": response.usage.prompt_tokens,
"completion": response.usage.completion_tokens,
"total": response.usage.total_tokens
}
log_entry["finish_reason"] = response.choices[0].finish_reason
logger.info(json.dumps(log_entry))
if error:
log_entry["status"] = "error"
log_entry["error_type"] = type(error).__name__
log_entry["error_message"] = str(error)
log_entry["status_code"] = getattr(error, "status_code", None)
logger.error(json.dumps(log_entry))
Key Metrics to Track#
| Metric | Why It Matters |
|---|---|
| Error rate by provider | Detect provider degradation early |
| P95 latency | Catch slowdowns before users complain |
| Rate limit hits/hour | Know when to scale or add providers |
| Fallback activation rate | Understand provider reliability |
| Token usage per request | Cost monitoring and optimization |
| Circuit breaker state changes | Track provider health over time |
Why Crazyrouter Simplifies Error Handling#
Using an API aggregator like Crazyrouter reduces the error handling burden significantly:
- Automatic retries: Built-in retry logic for transient errors
- Unified error format: All providers return consistent error structures
- Load balancing: Requests distributed across healthy endpoints
- Rate limit pooling: Higher effective rate limits through multiple upstream keys
- Single API key: One credential to manage instead of 5+
You still need application-level error handling (context length, content filters, business logic), but the infrastructure-level errors are largely handled for you.
FAQ#
How many retries should I use for AI API calls?#
3-5 retries with exponential backoff is standard. For rate limits, respect the Retry-After header. For server errors, 3 retries with 1s/2s/4s delays works well.
Should I use the same model for retries?#
For transient errors (500, 502, 503), retry the same model. For persistent failures, fall back to an alternative model. Using Crazyrouter makes model switching trivial since all models use the same API format.
How do I handle rate limits across multiple users?#
Implement per-user rate limiting in your application layer, and use a token bucket or sliding window algorithm. Set your per-user limits below your API provider's limits to maintain headroom.
What's the best way to monitor AI API errors in production?#
Log every API call with structured data (model, latency, tokens, error type). Use dashboards to track error rates, latency percentiles, and cost. Alert on error rate spikes and circuit breaker activations.
Summary#
Production AI applications need layered error handling: retries with backoff for transient errors, fallback chains for provider outages, circuit breakers to prevent cascade failures, and structured logging for visibility.
Crazyrouter handles much of the infrastructure-level complexity — automatic retries, unified error formats, and access to 300+ models through one endpoint. Focus your error handling on application logic, and let the infrastructure handle the rest. Start at crazyrouter.com.


