
"AI API Token Cost Calculator: How to Estimate and Optimize Your AI Spending"
AI API costs can spiral quickly if you're not tracking token usage carefully. Whether you're building a chatbot, coding assistant, or document processing pipeline, understanding how tokens translate to dollars is essential for budgeting and profitability.
This guide covers everything you need to know about calculating AI API costs — from token counting basics to advanced optimization strategies that can cut your bill by 50% or more.
What Are Tokens and How Are They Counted?#
Tokens are the fundamental unit of text that AI models process. They're not exactly words — they're subword units that the model's tokenizer produces.
Token Rules of Thumb#
| Language | Approximate Ratio |
|---|---|
| English | 1 token ≈ 0.75 words |
| Chinese | 1 token ≈ 0.5-1 character |
| Code | 1 token ≈ 3-4 characters |
| JSON | Higher token density (brackets, keys) |
Quick Estimates#
| Content Type | ~Words | ~Tokens |
|---|---|---|
| Short prompt | 50 | 67 |
| 200 | 267 | |
| Blog post | 1,000 | 1,333 |
| Technical doc | 5,000 | 6,667 |
| Book chapter | 10,000 | 13,333 |
| Full codebase | 50,000 | 75,000+ |
AI API Pricing Comparison 2026#
Text Models (per 1M tokens)#
| Model | Input | Output | Cached Input |
|---|---|---|---|
| GPT-5.2 | $10.00 | $30.00 | $2.50 |
| GPT-5-mini | $0.40 | $1.60 | $0.10 |
| Claude Opus 4.6 | $15.00 | $75.00 | $3.75 |
| Claude Sonnet 4.5 | $3.00 | $15.00 | $0.75 |
| Claude Haiku 4.5 | $0.25 | $1.25 | $0.06 |
| Gemini 3 Pro | $7.00 | $21.00 | $1.75 |
| Gemini 2.5 Flash | $0.15 | $0.60 | $0.04 |
| DeepSeek V3.2 | $0.27 | $1.10 | $0.07 |
| Grok 4.1 Fast | $3.00 | $15.00 | — |
Crazyrouter Pricing (20-30% Savings)#
| Model | Input | Output | Savings |
|---|---|---|---|
| GPT-5.2 | $7.00 | $21.00 | 30% |
| Claude Opus 4.6 | $10.50 | $52.50 | 30% |
| Claude Sonnet 4.5 | $2.10 | $10.50 | 30% |
| Gemini 3 Pro | $5.60 | $16.80 | 20% |
| DeepSeek V3.2 | $0.19 | $0.77 | 30% |
Access all models through Crazyrouter with a single API key.
How to Calculate Your API Costs#
The Basic Formula#
Cost = (Input Tokens × Input Price) + (Output Tokens × Output Price)
Python Cost Calculator#
# AI API Cost Calculator
MODEL_PRICING = {
"gpt-5.2": {"input": 10.0, "output": 30.0},
"gpt-5-mini": {"input": 0.4, "output": 1.6},
"claude-opus-4-6": {"input": 15.0, "output": 75.0},
"claude-sonnet-4-5": {"input": 3.0, "output": 15.0},
"claude-haiku-4-5": {"input": 0.25, "output": 1.25},
"gemini-3-pro": {"input": 7.0, "output": 21.0},
"gemini-2.5-flash": {"input": 0.15, "output": 0.60},
"deepseek-v3.2": {"input": 0.27, "output": 1.10},
}
# Crazyrouter discount rates
CRAZYROUTER_DISCOUNT = {
"gpt-5.2": 0.30,
"claude-opus-4-6": 0.30,
"claude-sonnet-4-5": 0.30,
"gemini-3-pro": 0.20,
"deepseek-v3.2": 0.30,
}
def calculate_cost(model: str, input_tokens: int, output_tokens: int,
use_crazyrouter: bool = False) -> dict:
"""Calculate API cost for a given model and token usage."""
pricing = MODEL_PRICING[model]
input_cost = (input_tokens / 1_000_000) * pricing["input"]
output_cost = (output_tokens / 1_000_000) * pricing["output"]
total = input_cost + output_cost
result = {
"model": model,
"input_tokens": input_tokens,
"output_tokens": output_tokens,
"input_cost": round(input_cost, 6),
"output_cost": round(output_cost, 6),
"total_cost": round(total, 6),
}
if use_crazyrouter and model in CRAZYROUTER_DISCOUNT:
discount = CRAZYROUTER_DISCOUNT[model]
cr_total = total * (1 - discount)
result["crazyrouter_cost"] = round(cr_total, 6)
result["savings"] = round(total - cr_total, 6)
return result
# Example: Calculate cost for a coding assistant session
session = calculate_cost(
model="claude-opus-4-6",
input_tokens=50_000, # ~37K words of context
output_tokens=10_000, # ~7.5K words of output
use_crazyrouter=True
)
print(f"Official cost: ${session['total_cost']:.4f}")
print(f"Crazyrouter cost: ${session['crazyrouter_cost']:.4f}")
print(f"Savings: ${session['savings']:.4f}")
# Official cost: $1.5000
# Crazyrouter cost: $1.0500
# Savings: $0.4500
Monthly Cost Estimator#
def estimate_monthly_cost(model: str, requests_per_day: int,
avg_input_tokens: int, avg_output_tokens: int,
use_crazyrouter: bool = False) -> dict:
"""Estimate monthly API costs."""
daily_requests = requests_per_day
monthly_requests = daily_requests * 30
total_input = monthly_requests * avg_input_tokens
total_output = monthly_requests * avg_output_tokens
result = calculate_cost(model, total_input, total_output, use_crazyrouter)
result["monthly_requests"] = monthly_requests
result["total_input_tokens"] = total_input
result["total_output_tokens"] = total_output
return result
# Estimate for a SaaS product with 1000 daily API calls
estimate = estimate_monthly_cost(
model="claude-sonnet-4-5",
requests_per_day=1000,
avg_input_tokens=2000,
avg_output_tokens=500,
use_crazyrouter=True
)
print(f"Monthly requests: {estimate['monthly_requests']:,}")
print(f"Official monthly cost: ${estimate['total_cost']:.2f}")
print(f"Crazyrouter monthly cost: ${estimate['crazyrouter_cost']:.2f}")
print(f"Monthly savings: ${estimate['savings']:.2f}")
# Monthly requests: 30,000
# Official monthly cost: $405.00
# Crazyrouter monthly cost: $283.50
# Monthly savings: $121.50
7 Strategies to Optimize AI API Costs#
1. Model Routing — Use the Right Model for Each Task#
Not every request needs a frontier model. Route simple tasks to cheaper models:
def smart_route(task_complexity: str, messages: list) -> str:
"""Route to the most cost-effective model based on task complexity."""
routing_map = {
"simple": "gemini-2.5-flash", # $0.15/$0.60 per 1M
"medium": "claude-sonnet-4-5", # $3/$15 per 1M
"complex": "claude-opus-4-6", # $15/$75 per 1M
"long_context": "gemini-3-pro", # $7/$21 per 1M, 2M context
}
return routing_map.get(task_complexity, "claude-sonnet-4-5")
Potential savings: 60-80% on mixed workloads.
2. Prompt Caching — Reuse Common Context#
Most providers offer cached input pricing at 75% discount:
# Instead of sending full system prompt every time,
# use prompt caching for repeated context
response = client.chat.completions.create(
model="claude-sonnet-4-5",
messages=[
{
"role": "system",
"content": long_system_prompt, # This gets cached
"cache_control": {"type": "ephemeral"}
},
{"role": "user", "content": user_query}
]
)
# Cached input: $0.75/1M instead of $3.00/1M = 75% savings on system prompt
3. Token Optimization — Reduce Waste#
# BAD: Verbose prompt (wastes tokens)
prompt_bad = """
I would like you to please help me write a Python function.
The function should take a list of numbers as input and return
the sum of all even numbers in the list. Please make sure to
include proper error handling and type hints. Thank you!
"""
# GOOD: Concise prompt (saves ~40% tokens)
prompt_good = """
Write a Python function: sum of even numbers from a list.
Include type hints and error handling.
"""
4. Batch Processing — Reduce Overhead#
# Instead of 100 individual API calls, batch related items
items_to_analyze = ["item1", "item2", "item3", ...]
# BAD: One call per item
for item in items_to_analyze:
response = client.chat.completions.create(
model="claude-sonnet-4-5",
messages=[{"role": "user", "content": f"Analyze: {item}"}]
)
# GOOD: Batch multiple items in one call
batch_prompt = "Analyze each item and return JSON array:\n" + "\n".join(items_to_analyze)
response = client.chat.completions.create(
model="claude-sonnet-4-5",
messages=[{"role": "user", "content": batch_prompt}],
response_format={"type": "json_object"}
)
5. Response Length Control#
# Set max_tokens to prevent runaway responses
response = client.chat.completions.create(
model="gpt-5.2",
messages=[{"role": "user", "content": "Summarize this article."}],
max_tokens=500 # Cap output to ~375 words
)
6. Caching Responses Locally#
import hashlib
import json
def cached_completion(client, model, messages, **kwargs):
"""Cache API responses to avoid duplicate calls."""
cache_key = hashlib.md5(
json.dumps({"model": model, "messages": messages}).encode()
).hexdigest()
cache_file = f".cache/{cache_key}.json"
try:
with open(cache_file) as f:
return json.load(f)
except FileNotFoundError:
response = client.chat.completions.create(
model=model, messages=messages, **kwargs
)
result = response.choices[0].message.content
with open(cache_file, "w") as f:
json.dump(result, f)
return result
7. Use Crazyrouter for Automatic Savings#
The simplest optimization: route all API calls through Crazyrouter for automatic 20-30% savings with zero code changes:
# Just change the base URL — everything else stays the same
client = OpenAI(
api_key="your-crazyrouter-key",
base_url="https://api.crazyrouter.com/v1"
)
# Instant 20-30% savings on every API call
Real-World Cost Scenarios#
Scenario 1: AI Chatbot (B2C SaaS)#
| Metric | Value |
|---|---|
| Daily active users | 5,000 |
| Messages per user/day | 10 |
| Avg input tokens | 1,500 |
| Avg output tokens | 400 |
| Model | Claude Sonnet 4.5 |
Monthly cost (official): 1,890
Annual savings: $9,720
Scenario 2: Code Review Tool (Developer Tool)#
| Metric | Value |
|---|---|
| Daily reviews | 500 |
| Avg input tokens | 8,000 (code context) |
| Avg output tokens | 2,000 (review comments) |
| Model | Claude Opus 4.6 |
Monthly cost (official): 2,835
Annual savings: $14,580
Scenario 3: Document Processing Pipeline#
| Metric | Value |
|---|---|
| Documents per day | 200 |
| Avg input tokens | 20,000 |
| Avg output tokens | 1,000 |
| Model | Gemini 2.5 Flash |
Monthly cost (official): 37.80
Annual savings: $194
Frequently Asked Questions#
How do I count tokens before making an API call?#
Use the tiktoken library for OpenAI models or Anthropic's token counting API. For a quick estimate, divide your character count by 4 (English) or 2 (Chinese).
Which AI model gives the best value for money?#
For most tasks, Gemini 2.5 Flash (0.60 per 1M tokens) offers the best price-to-performance ratio. For complex tasks requiring frontier intelligence, Claude Sonnet 4.5 at 15 is the sweet spot.
How can I reduce AI API costs without sacrificing quality?#
Use model routing (cheap models for simple tasks, expensive models for complex ones), prompt caching, and an API gateway like Crazyrouter for automatic discounts.
What's the cheapest way to access GPT-5 and Claude?#
Through Crazyrouter, which offers 20-30% discounts on all major models with a single API key and OpenAI-compatible format.
How much does it cost to run an AI chatbot?#
It depends on traffic and model choice. A chatbot with 5,000 daily users using Claude Sonnet 4.5 costs approximately 100/month.
Summary#
Understanding and optimizing AI API costs is crucial for building sustainable AI products. The key strategies are: use model routing for mixed workloads, leverage prompt caching, optimize prompts for conciseness, and use Crazyrouter for automatic 20-30% savings across 300+ models.
Start optimizing today: Sign up at Crazyrouter and cut your AI API costs immediately.


