
"AI Batch Processing API Guide 2026: Process Millions of Requests Efficiently"
AI Batch Processing API Guide 2026: Process Millions of Requests Efficiently#
When you need to process thousands or millions of AI requests—whether for data classification, content generation, document analysis, or embeddings—real-time API calls become impractical and expensive. Batch processing APIs offer a solution: submit large volumes of requests, get results within hours, and save up to 50% on costs.
What is AI Batch Processing?#
AI batch processing lets you submit many requests at once instead of making individual API calls. The provider processes them asynchronously, typically within a 24-hour window, and returns all results together.
Key benefits:
- 50% cost savings — Most providers offer significant discounts for batch jobs
- Higher throughput — Process millions of requests without rate limits
- Simplified infrastructure — No need to build retry logic and queue systems
- Better resource utilization — Providers can schedule batch jobs during off-peak hours
Common use cases:
- Classifying millions of customer support tickets
- Generating product descriptions for e-commerce catalogs
- Analyzing financial documents at scale
- Creating embeddings for large document collections
- Translating content libraries
- Evaluating LLM outputs for quality assessment
OpenAI Batch API: Complete Tutorial#
OpenAI's Batch API is the most mature batch processing solution, offering 50% cost reduction for requests processed within 24 hours.
Step 1: Prepare Your Input File (JSONL)#
Each line in the JSONL file is an independent request:
import json
# Create batch input file
requests_data = [
{
"custom_id": f"request-{i}",
"method": "POST",
"url": "/v1/chat/completions",
"body": {
"model": "gpt-5-mini",
"messages": [
{"role": "system", "content": "Classify the sentiment as positive, negative, or neutral."},
{"role": "user", "content": text}
],
"max_tokens": 10
}
}
for i, text in enumerate(customer_reviews) # Your data
]
# Write JSONL file
with open("batch_input.jsonl", "w") as f:
for req in requests_data:
f.write(json.dumps(req) + "\n")
print(f"Created batch with {len(requests_data)} requests")
Step 2: Upload and Submit the Batch#
from openai import OpenAI
client = OpenAI(
api_key="your-api-key",
base_url="https://api.crazyrouter.com/v1"
)
# Upload the input file
batch_file = client.files.create(
file=open("batch_input.jsonl", "rb"),
purpose="batch"
)
print(f"Uploaded file: {batch_file.id}")
# Create the batch job
batch_job = client.batches.create(
input_file_id=batch_file.id,
endpoint="/v1/chat/completions",
completion_window="24h",
metadata={"description": "Sentiment classification batch"}
)
print(f"Batch job created: {batch_job.id}")
print(f"Status: {batch_job.status}")
Step 3: Monitor Progress#
import time
def wait_for_batch(client, batch_id, poll_interval=60):
"""Poll batch status until completion"""
while True:
batch = client.batches.retrieve(batch_id)
print(f"Status: {batch.status} | "
f"Completed: {batch.request_counts.completed}/{batch.request_counts.total} | "
f"Failed: {batch.request_counts.failed}")
if batch.status == "completed":
return batch
elif batch.status in ["failed", "expired", "cancelled"]:
raise Exception(f"Batch {batch.status}: {batch.errors}")
time.sleep(poll_interval)
completed_batch = wait_for_batch(client, batch_job.id)
print(f"Output file: {completed_batch.output_file_id}")
Step 4: Download and Process Results#
# Download results
result_file = client.files.content(completed_batch.output_file_id)
results = result_file.text
# Parse results
processed = []
for line in results.strip().split("\n"):
result = json.loads(line)
custom_id = result["custom_id"]
response = result["response"]["body"]["choices"][0]["message"]["content"]
processed.append({"id": custom_id, "sentiment": response})
print(f"Processed {len(processed)} results")
# Show sample
for item in processed[:5]:
print(f" {item['id']}: {item['sentiment']}")
Complete Batch Processing Pipeline#
Here's a production-ready pipeline:
import json
import time
from pathlib import Path
from openai import OpenAI
class BatchProcessor:
def __init__(self, api_key: str, base_url: str = "https://api.crazyrouter.com/v1"):
self.client = OpenAI(api_key=api_key, base_url=base_url)
def create_batch_file(self, items: list, system_prompt: str,
model: str = "gpt-5-mini", output_path: str = "batch_input.jsonl"):
"""Create JSONL input file from a list of items"""
with open(output_path, "w") as f:
for i, item in enumerate(items):
request = {
"custom_id": f"req-{i:06d}",
"method": "POST",
"url": "/v1/chat/completions",
"body": {
"model": model,
"messages": [
{"role": "system", "content": system_prompt},
{"role": "user", "content": str(item)}
],
"max_tokens": 256,
"temperature": 0
}
}
f.write(json.dumps(request) + "\n")
print(f"Created {output_path} with {len(items)} requests")
return output_path
def submit_batch(self, input_path: str, description: str = "") -> str:
"""Upload file and create batch job"""
# Upload
with open(input_path, "rb") as f:
uploaded = self.client.files.create(file=f, purpose="batch")
# Create batch
batch = self.client.batches.create(
input_file_id=uploaded.id,
endpoint="/v1/chat/completions",
completion_window="24h",
metadata={"description": description}
)
print(f"Batch submitted: {batch.id}")
return batch.id
def wait_and_download(self, batch_id: str, poll_interval: int = 60) -> list:
"""Wait for completion and download results"""
while True:
batch = self.client.batches.retrieve(batch_id)
completed = batch.request_counts.completed
total = batch.request_counts.total
failed = batch.request_counts.failed
print(f"\r Progress: {completed}/{total} ({failed} failed)", end="", flush=True)
if batch.status == "completed":
print("\n ✅ Batch completed!")
break
elif batch.status in ["failed", "expired", "cancelled"]:
print(f"\n ❌ Batch {batch.status}")
raise Exception(f"Batch failed: {batch.errors}")
time.sleep(poll_interval)
# Download results
content = self.client.files.content(batch.output_file_id)
results = []
for line in content.text.strip().split("\n"):
data = json.loads(line)
results.append({
"id": data["custom_id"],
"response": data["response"]["body"]["choices"][0]["message"]["content"],
"tokens": data["response"]["body"]["usage"]["total_tokens"]
})
return sorted(results, key=lambda x: x["id"])
# Usage example
processor = BatchProcessor(api_key="your-crazyrouter-key")
# Classify 10,000 customer reviews
reviews = ["Great product, love it!", "Terrible service, never again.", ...] # Your data
processor.create_batch_file(
items=reviews,
system_prompt="Classify sentiment as: positive, negative, or neutral. Reply with one word only.",
model="gpt-5-mini"
)
batch_id = processor.submit_batch("batch_input.jsonl", "Customer review sentiment analysis")
results = processor.wait_and_download(batch_id)
for r in results[:10]:
print(f"{r['id']}: {r['response']} ({r['tokens']} tokens)")
Batch Embeddings#
Processing embeddings in batch is ideal for building search indexes:
# Create embedding batch file
documents = ["Document text 1...", "Document text 2...", ...]
with open("embedding_batch.jsonl", "w") as f:
for i, doc in enumerate(documents):
request = {
"custom_id": f"emb-{i:06d}",
"method": "POST",
"url": "/v1/embeddings",
"body": {
"model": "text-embedding-3-small",
"input": doc
}
}
f.write(json.dumps(request) + "\n")
# Submit and process same as above
DIY Async Batch Processing#
If a provider doesn't offer a native batch API, build your own with async:
import asyncio
import aiohttp
from typing import List
async def process_batch_async(
items: List[str],
api_key: str,
model: str = "gpt-5-mini",
max_concurrent: int = 50,
base_url: str = "https://api.crazyrouter.com/v1"
):
"""Process items concurrently with rate limiting"""
semaphore = asyncio.Semaphore(max_concurrent)
results = [None] * len(items)
async def process_one(session, index, text):
async with semaphore:
async with session.post(
f"{base_url}/chat/completions",
headers={"Authorization": f"Bearer {api_key}"},
json={
"model": model,
"messages": [{"role": "user", "content": text}],
"max_tokens": 256
}
) as resp:
data = await resp.json()
results[index] = data["choices"][0]["message"]["content"]
async with aiohttp.ClientSession() as session:
tasks = [process_one(session, i, item) for i, item in enumerate(items)]
await asyncio.gather(*tasks)
return results
# Run
results = asyncio.run(process_batch_async(
items=["Summarize: ...", "Classify: ...", ...],
api_key="your-api-key",
max_concurrent=50
))
Pricing: Batch vs Real-Time#
| Provider | Model | Real-Time (per 1M tokens) | Batch (per 1M tokens) | Savings |
|---|---|---|---|---|
| Crazyrouter | GPT-5-mini | 0.60 | 0.30 | 50% |
| Crazyrouter | GPT-5 | 9.00 | 4.50 | 50% |
| OpenAI | GPT-5-mini | 1.00 | 0.50 | 50% |
| OpenAI | GPT-5 | 15.00 | 7.50 | 50% |
| Crazyrouter | Claude Sonnet | 5.00 | 2.50 | 50% |
With Crazyrouter, you save on both the base price (20-40% cheaper than official) AND the batch discount (additional 50%), compounding your savings.
When to Use Batch vs Real-Time#
| Scenario | Batch | Real-Time |
|---|---|---|
| Latency requirement | Hours OK | Seconds needed |
| Volume | 1,000+ requests | 1-100 requests |
| Cost sensitivity | High | Low |
| Data classification | ✅ Best choice | Overkill |
| User-facing chatbot | ❌ Too slow | ✅ Required |
| Nightly data pipeline | ✅ Perfect | ❌ Wasteful |
| Content generation (bulk) | ✅ Best choice | Can work |
| Embedding large corpus | ✅ Best choice | Expensive |
Best Practices#
1. Optimize Prompt Length#
In batch processing, every extra token multiplied by millions adds up. Keep system prompts concise:
# ❌ Verbose (wastes tokens × millions of requests)
system = "You are a helpful assistant. Your job is to classify the sentiment of customer reviews. Please analyze the text carefully and determine whether the overall sentiment is positive, negative, or neutral. Respond with just the classification."
# ✅ Concise (saves millions of tokens)
system = "Classify sentiment: positive/negative/neutral. One word only."
2. Use the Cheapest Adequate Model#
For simple classification tasks, GPT-5-mini or Claude Haiku is often sufficient:
| Task | Recommended Model | Why |
|---|---|---|
| Sentiment classification | GPT-5-mini | Simple task, cheapest |
| Content summarization | Claude Sonnet | Good quality/price |
| Complex analysis | GPT-5 or Claude Opus | Accuracy matters |
| Embeddings | text-embedding-3-small | Cheapest, good quality |
3. Handle Failures Gracefully#
# Check for failed requests and retry them
failed_requests = []
for line in results_text.strip().split("\n"):
result = json.loads(line)
if result.get("error"):
failed_requests.append(result["custom_id"])
if failed_requests:
print(f"Retrying {len(failed_requests)} failed requests...")
# Rebuild JSONL with only failed requests and resubmit
Frequently Asked Questions#
How long does batch processing take?#
Most providers guarantee completion within 24 hours, but typical turnaround is 1-6 hours depending on queue depth. OpenAI's batch API usually completes within 2-4 hours for most workloads.
Is batch processing cheaper than real-time?#
Yes, significantly. OpenAI offers 50% discount for batch requests. Through Crazyrouter, you save an additional 20-40% on base pricing, making batch processing extremely cost-effective.
What's the maximum batch size?#
OpenAI's Batch API supports up to 50,000 requests per batch and 200MB per input file. For larger workloads, split into multiple batches and run them concurrently.
Can I cancel a batch job?#
Yes, most batch APIs support cancellation. Already-completed requests within the batch will still be billed, but remaining requests will be cancelled.
Does batch processing work with all models?#
Most text-based models support batch processing. Image generation and audio models typically don't have batch APIs—use async concurrent requests instead.
How do I handle rate limits in async batch processing?#
Use a semaphore (shown in the async example above) to limit concurrent requests. Start with 50 concurrent requests and adjust based on the provider's rate limits. Crazyrouter offers higher rate limits for batch workloads.
Summary#
Batch processing is essential for any application that needs to process large volumes of AI requests efficiently. Whether you use OpenAI's native Batch API for 50% cost savings or build your own async pipeline, the key is matching the right approach to your latency and cost requirements.
Crazyrouter makes batch processing even more affordable by offering competitive base pricing plus support for batch APIs across multiple providers—all through a single API key. Process millions of requests across GPT-5, Claude, Gemini, and 300+ other models without managing multiple accounts.


