Login
Back to Blog
Claude Sonnet 4.5 Pricing Explained — Caching, Batch API, and How to Save 45% with Crazyrouter

Claude Sonnet 4.5 Pricing Explained — Caching, Batch API, and How to Save 45% with Crazyrouter

C
Crazyrouter Team
April 27, 2026
0 viewsEnglishPricing
Share:

Claude Sonnet 4.5 Pricing Explained — Caching, Batch API, and How to Save 45% with Crazyrouter#

Claude Sonnet 4.5 (claude-sonnet-4-5-20250929) is Anthropic's previous-generation Sonnet model. Even after the release of Claude Sonnet 4.6, Sonnet 4.5 remains one of the most widely deployed models in production — powering everything from customer support chatbots to complex code generation pipelines. It shares the same pricing tier as its successor, making it a reliable and well-understood choice for teams that haven't yet migrated.

But understanding the full pricing picture goes far beyond the base per-token rates. Between prompt caching tiers, Batch API discounts, data residency multipliers, and third-party routing options, the actual cost of running Sonnet 4.5 can vary dramatically depending on how you use it.

This guide breaks down every pricing dimension of Claude Sonnet 4.5, walks through real-world cost scenarios, and shows you how to cut your API bill by up to 45% using Crazyrouter.

Last updated: April 27, 2026.


Base Pricing#

Claude Sonnet 4.5 follows Anthropic's standard Sonnet-tier pricing:

ComponentPrice per Million Tokens (MTok)
Input tokens$3.00
Output tokens$15.00

A few things to note:

  • Input tokens include your system prompt, user messages, tool definitions, and any conversation history you pass in.
  • Output tokens are everything the model generates — the assistant's response, tool calls, and any chain-of-thought reasoning.
  • Output tokens cost 5× more than input tokens, which means optimizing your output length has an outsized impact on your bill.

For context, 1 million tokens is roughly 750,000 words — about 10 full-length novels. A typical API call with a moderate system prompt and a few-turn conversation might use 2,000–5,000 input tokens and 500–2,000 output tokens.

Quick cost example: A single request with 3,000 input tokens and 1,000 output tokens costs:

code
Input:  3,000 / 1,000,000 × $3.00  = $0.009
Output: 1,000 / 1,000,000 × $15.00 = $0.015
Total:  $0.024 per request

At 10,000 requests per day, that's 240/dayorroughly240/day** or roughly **7,200/month — before any optimizations.


Prompt Caching — The Biggest Cost Lever#

Prompt caching is where the real savings happen. If you're sending the same system prompt, tool definitions, or few-shot examples across multiple requests, you're paying full price for identical tokens every single time — unless you enable caching.

Anthropic offers two caching tiers for Claude Sonnet 4.5:

Prompt caching flow diagram showing cache write, cache hit, and expiration

Cache Pricing Breakdown#

OperationPrice per MTokMultiplier vs Base Input
Base input (no cache)$3.001.0×
5-minute cache write$3.751.25×
1-hour cache write$6.002.0×
Cache hit (read)$0.300.1×

Here's how it works:

  1. Cache write — The first request with a cacheable prefix pays a write premium (1.25× for 5-minute TTL, or 2.0× for 1-hour TTL).
  2. Cache hit — Subsequent requests that match the cached prefix pay only $0.30/MTok — a 90% discount on input tokens.
  3. Cache miss — If the cache expires or the prefix doesn't match, you pay the full write cost again.

Which Cache TTL Should You Choose?#

  • 5-minute cache ($3.75/MTok write): Best for bursty workloads — chatbots handling multiple concurrent users, real-time coding assistants, or any scenario where requests come in clusters.
  • 1-hour cache ($6.00/MTok write): Best for steady, continuous workloads — background processing pipelines, scheduled tasks, or applications with consistent traffic throughout the hour.

Break-Even Math#

The key question: How many cache hits do you need to break even on the cache write cost?

For 5-minute cache:

code
Break-even: cache_write_premium / savings_per_hit
Premium per MTok:    $3.75 - $3.00 = $0.75
Savings per hit:     $3.00 - $0.30 = $2.70

Break-even = $0.75 / $2.70 ≈ 0.28 hits

You break even after just 1 additional cache hit within the 5-minute window. If your cached prefix is 4,000 tokens and you make 2+ requests in 5 minutes, caching saves money.

For 1-hour cache:

code
Premium per MTok:    $6.00 - $3.00 = $3.00
Savings per hit:     $3.00 - $0.30 = $2.70

Break-even = $3.00 / $2.70 ≈ 1.11 hits

You need 2 cache hits within the hour to break even. For any application making more than 2 requests per hour with the same prefix, the 1-hour cache pays for itself.

Caching Code Example (Python)#

python
import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a senior software engineer assistant. You help with code reviews, debugging, and architecture decisions. Always provide specific, actionable feedback with code examples.",
            "cache_control": {"type": "ephemeral"}  # 5-minute TTL
        }
    ],
    messages=[
        {"role": "user", "content": "Review this Python function for potential issues..."}
    ]
)

# Check cache performance in the response
usage = response.usage
print(f"Input tokens:        {usage.input_tokens}")
print(f"Cache creation:      {usage.cache_creation_input_tokens}")
print(f"Cache read (hits):   {usage.cache_read_input_tokens}")

For 1-hour caching, change the cache control type:

python
"cache_control": {"type": "ephemeral", "ttl": 3600}  # 1-hour TTL

Caching Best Practices#

  • Put stable content first. System prompts, tool definitions, and few-shot examples should be at the beginning of your message — the cache matches from the start of the prefix.
  • Minimum cacheable length is 1,024 tokens for Sonnet models. Shorter prefixes won't be cached.
  • Monitor your cache hit rate. If it's below 50%, reconsider your caching strategy or switch TTL tiers.
  • Combine with conversation management. For multi-turn chats, cache the system prompt + tool definitions, and append new turns outside the cached prefix.

Batch API — 50% Off for Async Workloads#

If your workload doesn't need real-time responses, the Batch API cuts your costs in half:

ComponentStandard PriceBatch API PriceSavings
Input tokens$3.00/MTok$1.50/MTok50%
Output tokens$15.00/MTok$7.50/MTok50%

The Batch API processes requests asynchronously with a guaranteed completion window of 24 hours (though most batches finish much faster). It's ideal for:

  • Data processing and classification — Categorizing thousands of support tickets or documents.
  • Content generation — Generating product descriptions, summaries, or translations in bulk.
  • Evaluation and testing — Running model evaluations across large test sets.
  • Embeddings and analysis — Processing large datasets where latency doesn't matter.

Batch API Example#

python
import anthropic

client = anthropic.Anthropic()

# Create a batch
batch = client.batches.create(
    requests=[
        {
            "custom_id": "request-001",
            "params": {
                "model": "claude-sonnet-4-5-20250929",
                "max_tokens": 512,
                "messages": [
                    {"role": "user", "content": "Summarize this article: ..."}
                ]
            }
        },
        {
            "custom_id": "request-002",
            "params": {
                "model": "claude-sonnet-4-5-20250929",
                "max_tokens": 512,
                "messages": [
                    {"role": "user", "content": "Classify this support ticket: ..."}
                ]
            }
        }
        # ... up to 100,000 requests per batch
    ]
)

print(f"Batch ID: {batch.id}")
print(f"Status: {batch.processing_status}")

Pro tip: You can combine Batch API with prompt caching for even deeper savings. Cached input tokens in a batch cost just **0.15/MTok(500.15/MTok** (50% off the 0.30 cache hit price).


Data Residency — US-Only Processing#

For organizations with data sovereignty requirements, Anthropic offers a US data residency option that guarantees all processing occurs within the United States.

ComponentStandardUS Data Residency
Input tokens$3.00/MTok$3.30/MTok
Output tokens$15.00/MTok$16.50/MTok
Multiplier1.0×1.1×

The 10% premium applies to all token types, including cached tokens. This is primarily relevant for healthcare, finance, and government applications subject to US data handling regulations.


Crazyrouter Pricing — Save 45% on Every Request#

Here's where it gets interesting. Crazyrouter offers Claude Sonnet 4.5 at 55% of Anthropic's official pricing — a flat 45% discount on both input and output tokens.

Cost comparison chart showing Anthropic direct vs Crazyrouter pricing

ComponentAnthropic DirectCrazyrouterYou Save
Input tokens$3.00/MTok$1.65/MTok45%
Output tokens$15.00/MTok$8.25/MTok45%

Crazyrouter is a unified API gateway that's fully compatible with both the OpenAI SDK format and Anthropic's native SDK. You switch by changing your base URL — no code rewrite needed.

OpenAI-Compatible SDK (Python)#

python
from openai import OpenAI

client = OpenAI(
    api_key="your-crazyrouter-api-key",
    base_url="https://crazyrouter.com/v1"
)

response = client.chat.completions.create(
    model="claude-sonnet-4-5-20250929",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    max_tokens=1024
)

print(response.choices[0].message.content)

Anthropic-Native SDK (Python)#

python
import anthropic

client = anthropic.Anthropic(
    api_key="your-crazyrouter-api-key",
    base_url="https://crazyrouter.com"
)

response = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Explain quantum computing in simple terms."}
    ]
)

print(response.content[0].text)

cURL#

bash
curl -X POST https://crazyrouter.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-crazyrouter-api-key" \
  -d '{
    "model": "claude-sonnet-4-5-20250929",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    "max_tokens": 1024
  }'

Why Crazyrouter?#

  • No code changes — Drop-in replacement. Change the base URL and API key, keep everything else.
  • OpenAI-compatible — Works with any tool or framework that supports the OpenAI API format (LangChain, LlamaIndex, Vercel AI SDK, etc.).
  • All Claude models — Access Opus, Sonnet, and Haiku across all generations.
  • Streaming support — Full SSE streaming, function calling, and tool use.
  • Pay-as-you-go — No minimums, no commitments. Top up and start saving.

Real-World Cost Comparison — 3 Scenarios#

Let's put real numbers to three common use cases and compare Anthropic direct pricing vs. Crazyrouter.

Scenario 1: Customer Support Chatbot#

A mid-size SaaS company handling 5,000 conversations/day, averaging 4 turns each.

MetricValue
Requests/day20,000
Avg input tokens/request3,500 (with system prompt + history)
Avg output tokens/request800
Cache hit rate70% (system prompt cached)
Cacheable prefix2,000 tokens

Monthly cost (Anthropic direct):

code
Non-cached input:  20,000 × 30 × 1,500 / 1M × $3.00    = $2,700
Cached input:      20,000 × 30 × 2,000 / 1M × $0.30     = $360
Cache writes:      20,000 × 30 × 0.30 × 2,000 / 1M × $3.75 = $1,350
Output:            20,000 × 30 × 800 / 1M × $15.00       = $7,200
Total: ~$11,610/month

Monthly cost (Crazyrouter at 45% off):

code
$11,610 × 0.55 = ~$6,386/month
Savings: $5,224/month ($62,688/year)

Scenario 2: Code Review Pipeline (Batch API)#

A development team running nightly batch code reviews on 500 pull requests.

MetricValue
Batch requests/night500
Avg input tokens/request8,000 (code + context)
Avg output tokens/request2,000 (detailed review)
FrequencyNightly (30×/month)

Monthly cost (Anthropic Batch API):

code
Input:  500 × 30 × 8,000 / 1M × $1.50  = $180
Output: 500 × 30 × 2,000 / 1M × $7.50  = $225
Total: $405/month

Monthly cost (Crazyrouter, standard API at 45% off):

code
Input:  500 × 30 × 8,000 / 1M × $1.65  = $198
Output: 500 × 30 × 2,000 / 1M × $8.25  = $247.50
Total: $445.50/month

In this case, Anthropic's Batch API is slightly cheaper (405vs405 vs 445.50) — but Crazyrouter gives you real-time responses instead of waiting up to 24 hours. If latency matters at all, Crazyrouter wins.

Scenario 3: High-Volume Content Generation#

A content platform generating 2,000 articles/day with heavy system prompts and few-shot examples.

MetricValue
Requests/day2,000
Avg input tokens/request12,000 (system + examples + instructions)
Avg output tokens/request3,000 (full article)
Cache hit rate90% (stable system prompt)
Cacheable prefix8,000 tokens

Monthly cost (Anthropic direct with caching):

code
Non-cached input:  2,000 × 30 × 4,000 / 1M × $3.00      = $720
Cached input:      2,000 × 30 × 8,000 / 1M × $0.30       = $144
Cache writes:      2,000 × 30 × 0.10 × 8,000 / 1M × $3.75 = $180
Output:            2,000 × 30 × 3,000 / 1M × $15.00       = $2,700
Total: ~$3,744/month

Monthly cost (Crazyrouter at 45% off):

code
$3,744 × 0.55 = ~$2,059/month
Savings: $1,685/month ($20,220/year)

Claude Sonnet 4.5 vs Sonnet 4.6 — Which Should You Use?#

Here's the straightforward comparison:

AspectSonnet 4.5Sonnet 4.6
Model IDclaude-sonnet-4-5-20250929claude-sonnet-4-6-20250514
Input price$3.00/MTok$3.00/MTok
Output price$15.00/MTok$15.00/MTok
Cache pricingSameSame
Context window200K tokens200K tokens
Max output16,384 tokens16,384 tokens
StatusPrevious generationCurrent generation

The pricing is identical. Sonnet 4.6 is the newer model with improved reasoning, better instruction following, and reduced hallucination rates. Unless you have a specific regression concern or a production pipeline that's been extensively validated on 4.5, we recommend upgrading to Sonnet 4.6.

The migration is a one-line change:

python
# Before
model = "claude-sonnet-4-5-20250929"

# After
model = "claude-sonnet-4-6-20250514"

Both models are available on Crazyrouter at the same 45% discount.


Key Takeaways#

  1. Base pricing is 3/3/15 per MTok (input/output). Output tokens are 5× more expensive — optimize response length first.

  2. Prompt caching is a must for any production workload. The 5-minute cache breaks even after just 1 hit; the 1-hour cache after 2 hits. At a 70%+ cache hit rate, you're saving 60–80% on input token costs.

  3. Batch API saves 50% but adds latency (up to 24 hours). Use it for offline processing where real-time responses aren't needed.

  4. Data residency adds 10% — only pay this if you have a regulatory requirement.

  5. Crazyrouter saves 45% across the board with zero code changes. For a typical production workload, that's 5,0005,000–60,000+ in annual savings.

  6. Sonnet 4.5 and 4.6 are priced identically. Upgrade to 4.6 for better performance at no extra cost.


Start Saving Today#

Getting started with Crazyrouter takes about 2 minutes:

  1. Sign up at crazyrouter.com
  2. Get your API key from the dashboard
  3. Change your base URL to https://crazyrouter.com/v1
  4. That's it. Every Claude API call now costs 45% less.

No contracts. No minimums. No vendor lock-in. Just cheaper tokens.

Get your API key at crazyrouter.com


Disclaimer: Pricing information is accurate as of April 27, 2026. Anthropic may update their pricing at any time. Always verify current rates on Anthropic's official pricing page and Crazyrouter's pricing page. The cost scenarios presented are estimates based on typical usage patterns and may vary depending on your specific implementation. Crazyrouter is an independent API gateway and is not affiliated with Anthropic.

Related Articles