Claude Sonnet 4.6 定价详解 — 缓存、层级以及如何通过 Crazyrouter 节省 45%

title: Claude Sonnet 4.6 定价详解 — 缓存、层级以及如何通过 Crazyrouter 节省 45% slug: claude-sonnet-4-6-pricing summary: 全面解析 Claude Sonnet 4.6 API 定价 — 基础 token、5 分钟和 1 小时提示缓存、批量 API 折扣、数据驻留附加费，以及 Crazyrouter 如何为您节省 45% 的费用。 tag: 定价 language: zh-Hans cover_image_url: https://media.crazyrouter.com/task-artifacts/playground/user-1/images/2026/04/30/blog-image-34ec391d18e0ad2e-9c9dd1a824f6.webp meta_title: Claude Sonnet 4.6 定价 2026 — 缓存、批量 API 和 Crazyrouter 折扣 meta_description: 完整的 Claude Sonnet 4.6 定价指南。了解基础 token、提示缓存（5 分钟和 1 小时）、批量 API 和数据驻留如何影响您的成本 — 此外，通过 Crazyrouter 节省 45%。 meta_keywords: Claude Sonnet 4.6 定价, Claude API 成本, Anthropic 定价 2026, 提示缓存成本, Claude Sonnet API, Crazyrouter Claude 折扣#

Claude Sonnet 4.6 定价详解 — 缓存、层级以及如何通过 Crazyrouter 节省 45%#

Claude Sonnet 4.6 Pricing Guide

Claude Sonnet 4.6 是 Anthropic 于 2026 年 2 月发布的最新中端模型。它介于经济实惠的 Haiku 系列和高级的 Opus 层级之间，使其成为大多数生产工作负载（编码、聊天、文档分析和工具使用）的默认选择。

但 Anthropic 的定价并非仅仅是“输入 + 输出”。它包含一个分层的缓存系统，具有两个 TTL 层级、一个批量 API 折扣以及一个数据驻留附加费，这些都可能相互叠加。本指南将详细分解每个组成部分，让您清楚了解您的支出 — 以及如何减少支出。

最后更新：2026 年 4 月 27 日

免责声明： 本文中的价格截至发布日期是准确的。Anthropic 可能会随时调整定价。在做出生产决策之前，请务必在 Anthropic 官方定价页面上核实。

基础 Token 定价#

Claude Sonnet 4.6 的基础定价非常直接：

Token 类型	每 1M Token 价格
输入 (基础)	$3.00
输出	$15.00

这是一个 5:1 的输出与输入比率。您在输入 token 上每花费一美元，在相同数量的输出 token 上将花费五美元。这个比率很重要 — 如果您的工作负载是输出密集型（代码生成、长篇写作），那么输出成本将主导您的账单。

快速参考：这实际花费多少？#

工作负载	Token 数	成本
1 次简短聊天 (500 输入 / 200 输出)	700 总计	$0.0045
1 次代码审查 (2K 输入 / 1K 输出)	3K 总计	$0.021
1 份文档摘要 (10K 输入 / 2K 输出)	12K 总计	$0.06
1 小时聊天机器人流量 (500K 输入 / 200K 输出)	700K 总计	$4.50
1 天大量 API 使用 (5M 输入 / 2M 输出)	7M 总计	$45.00

提示缓存：最大的成本杠杆#

提示缓存是 Anthropic 定价变得有趣的地方 — 也是真正节省成本的地方。

How Claude prompt caching works — write once, read cheap

工作原理#

当您发送带有 cache_control 的请求时，Anthropic 会存储您的提示前缀的计算状态。在后续以相同字节（相同的系统提示、相同的 few-shot 示例、相同的前言）开头的请求中，这些 token 将从缓存中提供，而不是重新处理。

缓存持续时间分为两个层级：

缓存操作	每 1M Token 价格	相对于基础输入的乘数	持续时间
5 分钟缓存写入	$3.75	1.25x	5 分钟
1 小时缓存写入	$6.00	2.0x	1 小时
缓存命中 (读取)	$0.30	0.1x	—

数学计算：何时使用缓存划算？#

5 分钟缓存 (1.25x 写入)：

写入成本：$3.75/M（首次请求时比基础输入多支付 25%）
读取成本：$0.30/M（后续每次请求节省 90%）
回本点：1 次缓存读取。 仅需一次缓存命中，您就节省了费用。
- 写入： $3.75 → 读取：$ 0.30 → 2 次请求总计：$4.05
- 不使用缓存： $3.00 × 2 =$ 6.00
- 节省：$1.95 (32.5%)

1 小时缓存 (2.0x 写入)：

写入成本：$6.00/M（首次请求时支付双倍）
读取成本：$0.30/M（读取时同样节省 90%）
回本点：2 次缓存读取。 两次命中后，您就开始盈利。
- 写入： $6.00 → 2 次读取：$ 0.60 → 3 次请求总计：$6.60
- 不使用缓存： $3.00 × 3 =$ 9.00
- 节省：$2.40 (26.7%)

何时使用哪种缓存层级#

场景	推荐缓存	原因
实时聊天机器人 (每分钟大量请求)	5 分钟	请求频率高，缓存保持“热”状态
批量处理 (每隔几分钟爆发一次)	5 分钟	请求集中在 5 分钟窗口内
长时间运行的代理会话	1 小时	请求分布在 10-60 分钟内
定时任务 (每小时报告)	1 小时	可预测的每小时模式
一次性请求	不缓存	没有重复利用的机会

如何启用缓存#

自动缓存 (适用于大多数情况)：

python

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-6-20260213",
    max_tokens=1024,
    cache_control={"type": "auto"},  # Automatic cache management
    system="You are a senior code reviewer. Review the following code for bugs, security issues, and performance problems.",
    messages=[
        {"role": "user", "content": "Review this Python function:\n\ndef process_data(items):\n    results = []\n    for item in items:\n        if item.get('status') == 'active':\n            results.append(transform(item))\n    return results"}
    ]
)
print(response.usage)
# Look for cache_creation_input_tokens and cache_read_input_tokens

显式缓存断点 (精细控制)：

python

response = client.messages.create(
    model="claude-sonnet-4-6-20260213",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a senior code reviewer...",
            "cache_control": {"type": "ephemeral"}  # Cache this block
        }
    ],
    messages=[
        {"role": "user", "content": "Review this code..."}
    ]
)

在响应中读取您的缓存使用情况#

每个响应的 usage 对象中都包含缓存指标：

json

{
  "usage": {
    "input_tokens": 50,
    "output_tokens": 320,
    "cache_creation_input_tokens": 1200,
    "cache_read_input_tokens": 0
  }
}

cache_creation_input_tokens：写入缓存的 token（按 1.25x 或 2x 收费）
cache_read_input_tokens：从缓存中读取的 token（按 0.1x 收费）
input_tokens：正常处理的 token（按基础费率收费）

请求的实际输入成本为：

code

cost = (input_tokens × $3.00/M)
     + (cache_creation_input_tokens × $3.75/M or $6.00/M)
     + (cache_read_input_tokens × $0.30/M)
     + (output_tokens × $15.00/M)

批量 API 折扣#

Anthropic 为异步处理提供了批量 API。您批量提交请求，结果将在 24 小时内返回。缺点是：没有实时响应，但您可以获得所有 token 类型50% 的折扣。

Token 类型	标准	批量 API
输入	$3.00/M	$1.50/M
输出	$15.00/M	$7.50/M
5 分钟缓存写入	$3.75/M	$1.875/M
1 小时缓存写入	$6.00/M	$3.00/M
缓存命中	$0.30/M	$0.15/M

批量折扣可以与缓存叠加。如果您运行一个具有一致系统提示的夜间批处理作业，您将获得 50% 的批量折扣以及重复前缀的 0.1x 缓存读取折扣。这意味着在批量模式下，缓存输入 token 的价格为 $0.15/M — 比标准基础输入便宜 95%。

何时使用批量 API#

批量内容生成（产品描述、摘要）
大规模数据提取或分类
对数百个测试提示进行评估运行
任何对延迟不敏感的工作负载

数据驻留附加费#

从 Claude Sonnet 4.5 及更新的模型（包括 Sonnet 4.6）开始，如果您通过 inference_geo 参数指定仅限美国境内的推理，Anthropic 将收取 1.1 倍的乘数。

Token 类型	全球 (默认)	仅限美国 (1.1x)
输入	$3.00/M	$3.30/M
输出	$15.00/M	$16.50/M
缓存写入 (5 分钟)	$3.75/M	$4.125/M
缓存命中	$0.30/M	$0.33/M

此附加费与所有其他费用叠加。如果您使用仅限美国 + 批量 + 缓存，所有乘数都将适用。

大多数用户不需要此功能 — 全球路由是默认设置且没有附加费。仅当您有严格的数据驻留要求时才启用 inference_geo。

Crazyrouter 定价：45% 折扣#

Comparing direct Anthropic pricing vs Crazyrouter discounted pricing

通过 Crazyrouter，Claude Sonnet 4.6 可享受官方定价的 55% — 即基础 token 费率的 45% 折扣。

Token 类型	Anthropic 官方	Crazyrouter (55%)
输入	$3.00/M	$1.65/M
输出	$15.00/M	$8.25/M

Crazyrouter 支持 OpenAI 兼容和原生 Anthropic API 格式，因此您可以使用任何您喜欢的 SDK。

代码示例：通过 Crazyrouter 使用 Claude Sonnet 4.6#

OpenAI 兼容格式：

python

from openai import OpenAI

client = OpenAI(
    api_key="your-crazyrouter-key",
    base_url="https://crazyrouter.com/v1"
)

response = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[
        {"role": "system", "content": "You are a helpful coding assistant."},
        {"role": "user", "content": "Write a Python function to merge two sorted lists."}
    ]
)
print(response.choices[0].message.content)

Anthropic 原生格式：

python

import anthropic

client = anthropic.Anthropic(
    api_key="your-crazyrouter-key",
    base_url="https://crazyrouter.com"
)

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Write a Python function to merge two sorted lists."}
    ]
)
print(response.content[0].text)

curl：

bash

curl https://crazyrouter.com/v1/chat/completions \
  -H "Authorization: Bearer your-crazyrouter-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "messages": [
      {"role": "system", "content": "You are a helpful coding assistant."},
      {"role": "user", "content": "Write a Python function to merge two sorted lists."}
    ]
  }'

实际成本比较#

让我们比较三种常见工作负载的成本：Anthropic 直接使用 vs Crazyrouter。

场景 1：聊天机器人 — 每天 1M 输入 + 500K 输出 token#

	Anthropic 直接	Crazyrouter
输入成本	$3.00	$1.65
输出成本	$7.50	$4.125
每日总计	$10.50	$5.775
每月 (30 天)	$315.00	$173.25
每月节省	—	$141.75 (45%)

场景 2：代码生成 — 每天 500K 输入 + 2M 输出 token#

	Anthropic 直接	Crazyrouter
输入成本	$1.50	$0.825
输出成本	$30.00	$16.50
每日总计	$31.50	$17.325
每月 (30 天)	$945.00	$519.75
每月节省	—	$425.25 (45%)

场景 3：聊天机器人，缓存命中率 60% — 每天 1M 输入 + 500K 输出#

使用 Anthropic 直接缓存：

400K 缓存写入 token (5 分钟)：400K × $3.75/M =$ 1.50
600K 缓存命中 token：600K × $0.30/M =$ 0.18
500K 输出 token：500K × $15.00/M =$ 7.50
每日总计：$9.18

使用 Crazyrouter (无原生缓存，但基础折扣 45%)：

1M 输入 token：1M × $1.65/M =$ 1.65
500K 输出 token：500K × $8.25/M =$ 4.125
每日总计：$5.775

即使 Anthropic 的缓存命中率达到 60%，Crazyrouter 的固定 45% 折扣对于此工作负载仍然更便宜。在缓存命中率非常高（80% 以上）的情况下，差距会缩小，因为 Anthropic 的缓存读取价格为 $0.30/M 变得极其便宜。

盈亏平衡分析：何时 Anthropic 直接 + 缓存更便宜？#

在何种缓存命中率下，直接使用 Anthropic 会比 Crazyrouter 更划算？

对于纯输入工作负载（为简化起见，忽略输出）：

Crazyrouter 每 1M 输入成本：$1.65
Anthropic 带缓存：(1 - 命中率) × $3.75 + 命中率 ×$ 0.30

求解： $1.65 = (1 - x) ×$ 3.75 + x × $0.30

$1.65 =$ 3.75 - $3.45x
$3.45x =$ 2.10
x = 60.9%

当缓存命中率高于约 61% 时，对于输入 token，直接使用 Anthropic 并启用 5 分钟缓存会更便宜。 但请记住：输出 token 没有缓存折扣，而 Crazyrouter 的 45% 折扣也适用于输出。对于输出密集型工作负载，Crazyrouter 在任何缓存命中率下都更具优势。

定价摘要表#

组件	Anthropic 官方	Crazyrouter
基础输入	$3.00/M	$1.65/M
基础输出	$15.00/M	$8.25/M
5 分钟缓存写入	$3.75/M (1.25x)	—
1 小时缓存写入	$6.00/M (2.0x)	—
缓存命中	$0.30/M (0.1x)	—
批量输入	$1.50/M (50% 折扣)	—
批量输出	$7.50/M (50% 折扣)	—
仅限美国附加费	1.1x 所有价格	—
支持格式	Anthropic API	OpenAI + Anthropic

主要要点#

基础定价为 3 美元/M 输入，15 美元/M 输出。 输出比输入贵 5 倍 — 尽可能优化以缩短输出。
提示缓存可为输入 token 节省高达 90% 的费用。 5 分钟缓存仅需 1 次重用即可回本。1 小时缓存需要 2 次重用。
批量 API 可将所有费用降低 50%。 与缓存叠加使用，可为缓存的输入 token 节省高达 95% 的费用。
Crazyrouter 提供 45% 的固定折扣，无需管理复杂的缓存。对于输出密集型工作负载，这通常是更好的选择。
最佳策略取决于您的工作负载。 高缓存命中率 + 输入密集型 = 直接使用 Anthropic。输出密集型或不可预测的流量 = Crazyrouter 胜出。