
Claude Sonnet 4.5 Pricing Explained — Caching, Batch API, and How to Save 45% with Crazyrouter
Claude Sonnet 4.5 Pricing Explained — Caching, Batch API, and How to Save 45% with Crazyrouter#
Claude Sonnet 4.5 (claude-sonnet-4-5-20250929) is Anthropic's previous-generation Sonnet model. Even after the release of Claude Sonnet 4.6, Sonnet 4.5 remains one of the most widely deployed models in production — powering everything from customer support chatbots to complex code generation pipelines. It shares the same pricing tier as its successor, making it a reliable and well-understood choice for teams that haven't yet migrated.
But understanding the full pricing picture goes far beyond the base per-token rates. Between prompt caching tiers, Batch API discounts, data residency multipliers, and third-party routing options, the actual cost of running Sonnet 4.5 can vary dramatically depending on how you use it.
This guide breaks down every pricing dimension of Claude Sonnet 4.5, walks through real-world cost scenarios, and shows you how to cut your API bill by up to 45% using Crazyrouter.
Last updated: April 27, 2026.
Base Pricing#
Claude Sonnet 4.5 follows Anthropic's standard Sonnet-tier pricing:
| Component | Price per Million Tokens (MTok) |
|---|---|
| Input tokens | $3.00 |
| Output tokens | $15.00 |
A few things to note:
- Input tokens include your system prompt, user messages, tool definitions, and any conversation history you pass in.
- Output tokens are everything the model generates — the assistant's response, tool calls, and any chain-of-thought reasoning.
- Output tokens cost 5× more than input tokens, which means optimizing your output length has an outsized impact on your bill.
For context, 1 million tokens is roughly 750,000 words — about 10 full-length novels. A typical API call with a moderate system prompt and a few-turn conversation might use 2,000–5,000 input tokens and 500–2,000 output tokens.
Quick cost example: A single request with 3,000 input tokens and 1,000 output tokens costs:
Input: 3,000 / 1,000,000 × $3.00 = $0.009
Output: 1,000 / 1,000,000 × $15.00 = $0.015
Total: $0.024 per request
At 10,000 requests per day, that's 7,200/month — before any optimizations.
Prompt Caching — The Biggest Cost Lever#
Prompt caching is where the real savings happen. If you're sending the same system prompt, tool definitions, or few-shot examples across multiple requests, you're paying full price for identical tokens every single time — unless you enable caching.
Anthropic offers two caching tiers for Claude Sonnet 4.5:

Cache Pricing Breakdown#
| Operation | Price per MTok | Multiplier vs Base Input |
|---|---|---|
| Base input (no cache) | $3.00 | 1.0× |
| 5-minute cache write | $3.75 | 1.25× |
| 1-hour cache write | $6.00 | 2.0× |
| Cache hit (read) | $0.30 | 0.1× |
Here's how it works:
- Cache write — The first request with a cacheable prefix pays a write premium (1.25× for 5-minute TTL, or 2.0× for 1-hour TTL).
- Cache hit — Subsequent requests that match the cached prefix pay only $0.30/MTok — a 90% discount on input tokens.
- Cache miss — If the cache expires or the prefix doesn't match, you pay the full write cost again.
Which Cache TTL Should You Choose?#
- 5-minute cache ($3.75/MTok write): Best for bursty workloads — chatbots handling multiple concurrent users, real-time coding assistants, or any scenario where requests come in clusters.
- 1-hour cache ($6.00/MTok write): Best for steady, continuous workloads — background processing pipelines, scheduled tasks, or applications with consistent traffic throughout the hour.
Break-Even Math#
The key question: How many cache hits do you need to break even on the cache write cost?
For 5-minute cache:
Break-even: cache_write_premium / savings_per_hit
Premium per MTok: $3.75 - $3.00 = $0.75
Savings per hit: $3.00 - $0.30 = $2.70
Break-even = $0.75 / $2.70 ≈ 0.28 hits
You break even after just 1 additional cache hit within the 5-minute window. If your cached prefix is 4,000 tokens and you make 2+ requests in 5 minutes, caching saves money.
For 1-hour cache:
Premium per MTok: $6.00 - $3.00 = $3.00
Savings per hit: $3.00 - $0.30 = $2.70
Break-even = $3.00 / $2.70 ≈ 1.11 hits
You need 2 cache hits within the hour to break even. For any application making more than 2 requests per hour with the same prefix, the 1-hour cache pays for itself.
Caching Code Example (Python)#
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are a senior software engineer assistant. You help with code reviews, debugging, and architecture decisions. Always provide specific, actionable feedback with code examples.",
"cache_control": {"type": "ephemeral"} # 5-minute TTL
}
],
messages=[
{"role": "user", "content": "Review this Python function for potential issues..."}
]
)
# Check cache performance in the response
usage = response.usage
print(f"Input tokens: {usage.input_tokens}")
print(f"Cache creation: {usage.cache_creation_input_tokens}")
print(f"Cache read (hits): {usage.cache_read_input_tokens}")
For 1-hour caching, change the cache control type:
"cache_control": {"type": "ephemeral", "ttl": 3600} # 1-hour TTL
Caching Best Practices#
- Put stable content first. System prompts, tool definitions, and few-shot examples should be at the beginning of your message — the cache matches from the start of the prefix.
- Minimum cacheable length is 1,024 tokens for Sonnet models. Shorter prefixes won't be cached.
- Monitor your cache hit rate. If it's below 50%, reconsider your caching strategy or switch TTL tiers.
- Combine with conversation management. For multi-turn chats, cache the system prompt + tool definitions, and append new turns outside the cached prefix.
Batch API — 50% Off for Async Workloads#
If your workload doesn't need real-time responses, the Batch API cuts your costs in half:
| Component | Standard Price | Batch API Price | Savings |
|---|---|---|---|
| Input tokens | $3.00/MTok | $1.50/MTok | 50% |
| Output tokens | $15.00/MTok | $7.50/MTok | 50% |
The Batch API processes requests asynchronously with a guaranteed completion window of 24 hours (though most batches finish much faster). It's ideal for:
- Data processing and classification — Categorizing thousands of support tickets or documents.
- Content generation — Generating product descriptions, summaries, or translations in bulk.
- Evaluation and testing — Running model evaluations across large test sets.
- Embeddings and analysis — Processing large datasets where latency doesn't matter.
Batch API Example#
import anthropic
client = anthropic.Anthropic()
# Create a batch
batch = client.batches.create(
requests=[
{
"custom_id": "request-001",
"params": {
"model": "claude-sonnet-4-5-20250929",
"max_tokens": 512,
"messages": [
{"role": "user", "content": "Summarize this article: ..."}
]
}
},
{
"custom_id": "request-002",
"params": {
"model": "claude-sonnet-4-5-20250929",
"max_tokens": 512,
"messages": [
{"role": "user", "content": "Classify this support ticket: ..."}
]
}
}
# ... up to 100,000 requests per batch
]
)
print(f"Batch ID: {batch.id}")
print(f"Status: {batch.processing_status}")
Pro tip: You can combine Batch API with prompt caching for even deeper savings. Cached input tokens in a batch cost just **0.30 cache hit price).
Data Residency — US-Only Processing#
For organizations with data sovereignty requirements, Anthropic offers a US data residency option that guarantees all processing occurs within the United States.
| Component | Standard | US Data Residency |
|---|---|---|
| Input tokens | $3.00/MTok | $3.30/MTok |
| Output tokens | $15.00/MTok | $16.50/MTok |
| Multiplier | 1.0× | 1.1× |
The 10% premium applies to all token types, including cached tokens. This is primarily relevant for healthcare, finance, and government applications subject to US data handling regulations.
Crazyrouter Pricing — Save 45% on Every Request#
Here's where it gets interesting. Crazyrouter offers Claude Sonnet 4.5 at 55% of Anthropic's official pricing — a flat 45% discount on both input and output tokens.

| Component | Anthropic Direct | Crazyrouter | You Save |
|---|---|---|---|
| Input tokens | $3.00/MTok | $1.65/MTok | 45% |
| Output tokens | $15.00/MTok | $8.25/MTok | 45% |
Crazyrouter is a unified API gateway that's fully compatible with both the OpenAI SDK format and Anthropic's native SDK. You switch by changing your base URL — no code rewrite needed.
OpenAI-Compatible SDK (Python)#
from openai import OpenAI
client = OpenAI(
api_key="your-crazyrouter-api-key",
base_url="https://crazyrouter.com/v1"
)
response = client.chat.completions.create(
model="claude-sonnet-4-5-20250929",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
max_tokens=1024
)
print(response.choices[0].message.content)
Anthropic-Native SDK (Python)#
import anthropic
client = anthropic.Anthropic(
api_key="your-crazyrouter-api-key",
base_url="https://crazyrouter.com"
)
response = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[
{"role": "user", "content": "Explain quantum computing in simple terms."}
]
)
print(response.content[0].text)
cURL#
curl -X POST https://crazyrouter.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-crazyrouter-api-key" \
-d '{
"model": "claude-sonnet-4-5-20250929",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
"max_tokens": 1024
}'
Why Crazyrouter?#
- No code changes — Drop-in replacement. Change the base URL and API key, keep everything else.
- OpenAI-compatible — Works with any tool or framework that supports the OpenAI API format (LangChain, LlamaIndex, Vercel AI SDK, etc.).
- All Claude models — Access Opus, Sonnet, and Haiku across all generations.
- Streaming support — Full SSE streaming, function calling, and tool use.
- Pay-as-you-go — No minimums, no commitments. Top up and start saving.
Real-World Cost Comparison — 3 Scenarios#
Let's put real numbers to three common use cases and compare Anthropic direct pricing vs. Crazyrouter.
Scenario 1: Customer Support Chatbot#
A mid-size SaaS company handling 5,000 conversations/day, averaging 4 turns each.
| Metric | Value |
|---|---|
| Requests/day | 20,000 |
| Avg input tokens/request | 3,500 (with system prompt + history) |
| Avg output tokens/request | 800 |
| Cache hit rate | 70% (system prompt cached) |
| Cacheable prefix | 2,000 tokens |
Monthly cost (Anthropic direct):
Non-cached input: 20,000 × 30 × 1,500 / 1M × $3.00 = $2,700
Cached input: 20,000 × 30 × 2,000 / 1M × $0.30 = $360
Cache writes: 20,000 × 30 × 0.30 × 2,000 / 1M × $3.75 = $1,350
Output: 20,000 × 30 × 800 / 1M × $15.00 = $7,200
Total: ~$11,610/month
Monthly cost (Crazyrouter at 45% off):
$11,610 × 0.55 = ~$6,386/month
Savings: $5,224/month ($62,688/year)
Scenario 2: Code Review Pipeline (Batch API)#
A development team running nightly batch code reviews on 500 pull requests.
| Metric | Value |
|---|---|
| Batch requests/night | 500 |
| Avg input tokens/request | 8,000 (code + context) |
| Avg output tokens/request | 2,000 (detailed review) |
| Frequency | Nightly (30×/month) |
Monthly cost (Anthropic Batch API):
Input: 500 × 30 × 8,000 / 1M × $1.50 = $180
Output: 500 × 30 × 2,000 / 1M × $7.50 = $225
Total: $405/month
Monthly cost (Crazyrouter, standard API at 45% off):
Input: 500 × 30 × 8,000 / 1M × $1.65 = $198
Output: 500 × 30 × 2,000 / 1M × $8.25 = $247.50
Total: $445.50/month
In this case, Anthropic's Batch API is slightly cheaper (445.50) — but Crazyrouter gives you real-time responses instead of waiting up to 24 hours. If latency matters at all, Crazyrouter wins.
Scenario 3: High-Volume Content Generation#
A content platform generating 2,000 articles/day with heavy system prompts and few-shot examples.
| Metric | Value |
|---|---|
| Requests/day | 2,000 |
| Avg input tokens/request | 12,000 (system + examples + instructions) |
| Avg output tokens/request | 3,000 (full article) |
| Cache hit rate | 90% (stable system prompt) |
| Cacheable prefix | 8,000 tokens |
Monthly cost (Anthropic direct with caching):
Non-cached input: 2,000 × 30 × 4,000 / 1M × $3.00 = $720
Cached input: 2,000 × 30 × 8,000 / 1M × $0.30 = $144
Cache writes: 2,000 × 30 × 0.10 × 8,000 / 1M × $3.75 = $180
Output: 2,000 × 30 × 3,000 / 1M × $15.00 = $2,700
Total: ~$3,744/month
Monthly cost (Crazyrouter at 45% off):
$3,744 × 0.55 = ~$2,059/month
Savings: $1,685/month ($20,220/year)
Claude Sonnet 4.5 vs Sonnet 4.6 — Which Should You Use?#
Here's the straightforward comparison:
| Aspect | Sonnet 4.5 | Sonnet 4.6 |
|---|---|---|
| Model ID | claude-sonnet-4-5-20250929 | claude-sonnet-4-6-20250514 |
| Input price | $3.00/MTok | $3.00/MTok |
| Output price | $15.00/MTok | $15.00/MTok |
| Cache pricing | Same | Same |
| Context window | 200K tokens | 200K tokens |
| Max output | 16,384 tokens | 16,384 tokens |
| Status | Previous generation | Current generation |
The pricing is identical. Sonnet 4.6 is the newer model with improved reasoning, better instruction following, and reduced hallucination rates. Unless you have a specific regression concern or a production pipeline that's been extensively validated on 4.5, we recommend upgrading to Sonnet 4.6.
The migration is a one-line change:
# Before
model = "claude-sonnet-4-5-20250929"
# After
model = "claude-sonnet-4-6-20250514"
Both models are available on Crazyrouter at the same 45% discount.
Key Takeaways#
-
Base pricing is 15 per MTok (input/output). Output tokens are 5× more expensive — optimize response length first.
-
Prompt caching is a must for any production workload. The 5-minute cache breaks even after just 1 hit; the 1-hour cache after 2 hits. At a 70%+ cache hit rate, you're saving 60–80% on input token costs.
-
Batch API saves 50% but adds latency (up to 24 hours). Use it for offline processing where real-time responses aren't needed.
-
Data residency adds 10% — only pay this if you have a regulatory requirement.
-
Crazyrouter saves 45% across the board with zero code changes. For a typical production workload, that's 60,000+ in annual savings.
-
Sonnet 4.5 and 4.6 are priced identically. Upgrade to 4.6 for better performance at no extra cost.
Start Saving Today#
Getting started with Crazyrouter takes about 2 minutes:
- Sign up at crazyrouter.com
- Get your API key from the dashboard
- Change your base URL to
https://crazyrouter.com/v1 - That's it. Every Claude API call now costs 45% less.
No contracts. No minimums. No vendor lock-in. Just cheaper tokens.
→ Get your API key at crazyrouter.com
Disclaimer: Pricing information is accurate as of April 27, 2026. Anthropic may update their pricing at any time. Always verify current rates on Anthropic's official pricing page and Crazyrouter's pricing page. The cost scenarios presented are estimates based on typical usage patterns and may vary depending on your specific implementation. Crazyrouter is an independent API gateway and is not affiliated with Anthropic.





