Login
Back to Blog
Claude Sonnet 4.6 Pricing Explained — Caching, Tiers, and How to Save 45% with Crazyrouter

Claude Sonnet 4.6 Pricing Explained — Caching, Tiers, and How to Save 45% with Crazyrouter

C
Crazyrouter Team
April 27, 2026
1 viewsEnglishPricing
Share:

Claude Sonnet 4.6 Pricing Explained — Caching, Tiers, and How to Save 45% with Crazyrouter#

Claude Sonnet 4.6 Pricing Guide

Claude Sonnet 4.6 is Anthropic's latest mid-range model — released in February 2026. It sits between the budget-friendly Haiku line and the premium Opus tier, making it the default choice for most production workloads: coding, chat, document analysis, and tool use.

But Anthropic's pricing isn't just "input + output." There's a layered caching system with two TTL tiers, a Batch API discount, and a data residency surcharge that can all stack on top of each other. This guide breaks down every component so you know exactly what you're paying — and how to pay less.

Last updated: April 27, 2026

Disclaimer: Prices in this article are accurate as of the publication date. Anthropic may adjust pricing at any time. Always verify on the official Anthropic pricing page before making production decisions.


Base Token Pricing#

The foundation of Claude Sonnet 4.6 pricing is straightforward:

Token TypePrice per 1M Tokens
Input (base)$3.00
Output$15.00

That's a 5:1 output-to-input ratio. For every dollar you spend on input tokens, you'd spend five dollars on the same number of output tokens. This ratio matters — if your workload is output-heavy (code generation, long-form writing), output costs will dominate your bill.

Quick reference: what does this actually cost?#

WorkloadTokensCost
1 short chat turn (500 in / 200 out)700 total$0.0045
1 code review (2K in / 1K out)3K total$0.021
1 document summary (10K in / 2K out)12K total$0.06
1 hour of chatbot traffic (500K in / 200K out)700K total$4.50
1 day of heavy API usage (5M in / 2M out)7M total$45.00

Prompt Caching: The Biggest Cost Lever#

Prompt caching is where Anthropic's pricing gets interesting — and where the real savings live.

How Claude prompt caching works — write once, read cheap

How it works#

When you send a request with cache_control enabled, Anthropic stores the computed state of your prompt prefix. On subsequent requests that start with the same bytes (same system prompt, same few-shot examples, same preamble), those tokens are served from cache instead of being reprocessed.

There are two cache duration tiers:

Cache OperationPrice per 1M TokensMultiplier vs Base InputDuration
5-minute cache write$3.751.25x5 minutes
1-hour cache write$6.002.0x1 hour
Cache hit (read)$0.300.1x

The math: when does caching pay off?#

5-minute cache (1.25x write):

  • Write cost: $3.75/M (you pay 25% more than base input on the first request)
  • Read cost: $0.30/M (you pay 90% less on every subsequent request)
  • Break-even: 1 cache read. After just one cache hit, you've saved money.
    • Write: 3.75Read:3.75 → Read: 0.30 → Total for 2 requests: $4.05
    • Without cache: 3.00×2=3.00 × 2 = 6.00
    • Savings: $1.95 (32.5%)

1-hour cache (2.0x write):

  • Write cost: $6.00/M (you pay double on the first request)
  • Read cost: $0.30/M (same 90% discount on reads)
  • Break-even: 2 cache reads. After two hits, you're ahead.
    • Write: 6.002Reads:6.00 → 2 Reads: 0.60 → Total for 3 requests: $6.60
    • Without cache: 3.00×3=3.00 × 3 = 9.00
    • Savings: $2.40 (26.7%)

When to use which cache tier#

ScenarioRecommended CacheWhy
Real-time chatbot (many requests/minute)5-minuteHigh request frequency, cache stays warm
Batch processing (bursts every few minutes)5-minuteRequests cluster within 5-min windows
Long-running agent sessions1-hourRequests spread over 10-60 minutes
Scheduled jobs (hourly reports)1-hourPredictable hourly pattern
One-off requestsNo cacheNo reuse opportunity

How to enable caching#

Automatic caching (recommended for most cases):

python
import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-6-20260213",
    max_tokens=1024,
    cache_control={"type": "auto"},  # Automatic cache management
    system="You are a senior code reviewer. Review the following code for bugs, security issues, and performance problems.",
    messages=[
        {"role": "user", "content": "Review this Python function:\n\ndef process_data(items):\n    results = []\n    for item in items:\n        if item.get('status') == 'active':\n            results.append(transform(item))\n    return results"}
    ]
)
print(response.usage)
# Look for cache_creation_input_tokens and cache_read_input_tokens

Explicit cache breakpoints (fine-grained control):

python
response = client.messages.create(
    model="claude-sonnet-4-6-20260213",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a senior code reviewer...",
            "cache_control": {"type": "ephemeral"}  # Cache this block
        }
    ],
    messages=[
        {"role": "user", "content": "Review this code..."}
    ]
)

Reading your cache usage in the response#

Every response includes cache metrics in the usage object:

json
{
  "usage": {
    "input_tokens": 50,
    "output_tokens": 320,
    "cache_creation_input_tokens": 1200,
    "cache_read_input_tokens": 0
  }
}
  • cache_creation_input_tokens: tokens written to cache (charged at 1.25x or 2x)
  • cache_read_input_tokens: tokens read from cache (charged at 0.1x)
  • input_tokens: tokens processed normally (charged at base rate)

Your actual input cost for a request is:

code
cost = (input_tokens × $3.00/M)
     + (cache_creation_input_tokens × $3.75/M or $6.00/M)
     + (cache_read_input_tokens × $0.30/M)
     + (output_tokens × $15.00/M)

Batch API Discount#

Anthropic offers a Batch API for asynchronous processing. You submit requests in bulk, and results are returned within 24 hours. The tradeoff: no real-time responses, but you get a 50% discount on all token types.

Token TypeStandardBatch API
Input$3.00/M$1.50/M
Output$15.00/M$7.50/M
5-min cache write$3.75/M$1.875/M
1-hour cache write$6.00/M$3.00/M
Cache hit$0.30/M$0.15/M

The Batch discount stacks with caching. If you're running a nightly batch job with a consistent system prompt, you get both the 50% batch discount AND the 0.1x cache read discount on repeated prefixes. That's $0.15/M for cached input tokens in batch mode — 95% cheaper than standard base input.

When to use Batch API#

  • Bulk content generation (product descriptions, summaries)
  • Large-scale data extraction or classification
  • Evaluation runs across hundreds of test prompts
  • Any workload where latency doesn't matter

Data Residency Surcharge#

Starting with Claude Sonnet 4.5 and newer models (including Sonnet 4.6), Anthropic charges a 1.1x multiplier if you specify US-only inference via the inference_geo parameter.

Token TypeGlobal (default)US-only (1.1x)
Input$3.00/M$3.30/M
Output$15.00/M$16.50/M
Cache write (5min)$3.75/M$4.125/M
Cache hit$0.30/M$0.33/M

This surcharge stacks with everything else. If you use US-only + Batch + caching, all multipliers apply.

Most users don't need this — global routing is the default and has no surcharge. Only enable inference_geo if you have strict data residency requirements.


Crazyrouter Pricing: 45% Off#

Comparing direct Anthropic pricing vs Crazyrouter discounted pricing

Through Crazyrouter, Claude Sonnet 4.6 is available at 55% of official pricing — a 45% discount on base token rates.

Token TypeAnthropic OfficialCrazyrouter (55%)
Input$3.00/M$1.65/M
Output$15.00/M$8.25/M

Crazyrouter supports both OpenAI-compatible and native Anthropic API formats, so you can use whichever SDK you prefer.

Code example: using Claude Sonnet 4.6 via Crazyrouter#

OpenAI-compatible format:

python
from openai import OpenAI

client = OpenAI(
    api_key="your-crazyrouter-key",
    base_url="https://crazyrouter.com/v1"
)

response = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[
        {"role": "system", "content": "You are a helpful coding assistant."},
        {"role": "user", "content": "Write a Python function to merge two sorted lists."}
    ]
)
print(response.choices[0].message.content)

Anthropic-native format:

python
import anthropic

client = anthropic.Anthropic(
    api_key="your-crazyrouter-key",
    base_url="https://crazyrouter.com"
)

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Write a Python function to merge two sorted lists."}
    ]
)
print(response.content[0].text)

curl:

bash
curl https://crazyrouter.com/v1/chat/completions \
  -H "Authorization: Bearer your-crazyrouter-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "messages": [
      {"role": "system", "content": "You are a helpful coding assistant."},
      {"role": "user", "content": "Write a Python function to merge two sorted lists."}
    ]
  }'

Real-World Cost Comparison#

Let's compare costs for three common workloads: Anthropic direct vs Crazyrouter.

Scenario 1: Chatbot — 1M input + 500K output tokens per day#

Anthropic DirectCrazyrouter
Input cost$3.00$1.65
Output cost$7.50$4.125
Daily total$10.50$5.775
Monthly (30 days)$315.00$173.25
Monthly savings$141.75 (45%)

Scenario 2: Code generation — 500K input + 2M output tokens per day#

Anthropic DirectCrazyrouter
Input cost$1.50$0.825
Output cost$30.00$16.50
Daily total$31.50$17.325
Monthly (30 days)$945.00$519.75
Monthly savings$425.25 (45%)

Scenario 3: Chatbot with 60% cache hit rate — 1M input + 500K output per day#

With caching on Anthropic direct:

  • 400K cache write tokens (5min): 400K × 3.75/M=3.75/M = 1.50
  • 600K cache hit tokens: 600K × 0.30/M=0.30/M = 0.18
  • 500K output tokens: 500K × 15.00/M=15.00/M = 7.50
  • Daily total: $9.18

With Crazyrouter (no native cache, but 45% off base):

  • 1M input tokens: 1M × 1.65/M=1.65/M = 1.65
  • 500K output tokens: 500K × 8.25/M=8.25/M = 4.125
  • Daily total: $5.775

Even with Anthropic's caching at 60% hit rate, Crazyrouter's flat 45% discount still comes out cheaper for this workload. The gap narrows at very high cache hit rates (80%+), where Anthropic's cache reads at $0.30/M become extremely cheap.

Break-even analysis: when is Anthropic direct + caching cheaper?#

At what cache hit rate does going direct to Anthropic beat Crazyrouter?

For a pure input workload (ignoring output for simplicity):

  • Crazyrouter cost per 1M input: $1.65
  • Anthropic with cache: (1 - hit_rate) × 3.75+hitrate×3.75 + hit_rate × 0.30

Solving: 1.65=(1x)×1.65 = (1 - x) × 3.75 + x × $0.30

  • 1.65=1.65 = 3.75 - $3.45x
  • 3.45x=3.45x = 2.10
  • x = 60.9%

At cache hit rates above ~61%, going direct to Anthropic with 5-minute caching is cheaper for input tokens. But remember: output tokens have no caching discount, and Crazyrouter's 45% off applies to output too. For output-heavy workloads, Crazyrouter wins at any cache hit rate.


Pricing Summary Table#

ComponentAnthropic OfficialCrazyrouter
Base input$3.00/M$1.65/M
Base output$15.00/M$8.25/M
5-min cache write$3.75/M (1.25x)
1-hour cache write$6.00/M (2.0x)
Cache hit$0.30/M (0.1x)
Batch input$1.50/M (50% off)
Batch output$7.50/M (50% off)
US-only surcharge1.1x all prices
Supported formatsAnthropic APIOpenAI + Anthropic

Key Takeaways#

  1. Base pricing is 3/Minput,3/M input, 15/M output. Output is 5x more expensive — optimize for shorter outputs when possible.

  2. Prompt caching saves up to 90% on input tokens. The 5-minute cache pays for itself after just 1 reuse. The 1-hour cache needs 2 reuses.

  3. Batch API cuts everything by 50%. Stack it with caching for up to 95% savings on cached input tokens.

  4. Crazyrouter offers a flat 45% discount on base token rates, with no caching complexity to manage. For output-heavy workloads, this is often the better deal.

  5. The optimal strategy depends on your workload. High cache hit rates + input-heavy = go direct. Output-heavy or unpredictable traffic = Crazyrouter wins.


Get Started#

Sign up at crazyrouter.com to access Claude Sonnet 4.6 at 45% off — along with 300+ other models from OpenAI, Google, xAI, and more, all through one API key.

Try Claude Sonnet 4.6 on Crazyrouter →

Related Articles