
"Grok 4.1 Thinking Pricing Explained — Reasoning Tokens, Caching, and How to Save with Crazyrouter"
Grok 4.1 Thinking Pricing Explained — Reasoning Tokens, Caching, and How to Save with Crazyrouter#
xAI's Grok 4.1 Thinking is the reasoning-enhanced variant of the Grok 4.1 model family. It extends the already capable Grok 4.1 base model with chain-of-thought reasoning — the model "thinks" through problems step by step before producing a final answer. This makes it exceptionally strong for math, code generation, logic puzzles, multi-step planning, and any task where deliberate reasoning outperforms pattern matching.
But reasoning comes at a cost. Grok 4.1 Thinking generates reasoning tokens — internal chain-of-thought tokens that are billed at the output token rate but never appear in the final response. If you're not careful, a simple prompt can quietly consume 5–10x more tokens than you expected.
This guide breaks down every component of Grok 4.1 Thinking pricing, explains how reasoning tokens work, shows you how to control costs with caching and the reasoning_effort parameter, and demonstrates how to save an additional 10% by routing through Crazyrouter.
Last updated: April 27, 2026.
Base Pricing#
Here's the official Grok 4.1 Thinking pricing from xAI:
| Component | Price per Million Tokens |
|---|---|
| Input tokens | $0.20 |
| Cached input tokens | $0.05 |
| Output tokens | $0.50 |
| Reasoning tokens | $0.50 (same as output) |
At first glance, these rates look extremely competitive. Input at 0.50/MTok undercuts GPT-5 and Claude Opus 4 significantly. But the real cost story is in the reasoning tokens — more on that below.
Context Window#
Grok 4.1 Thinking supports a 131,072-token context window — the same as the base Grok 4.1 model. The output limit is 65,536 tokens, which includes both visible output tokens and invisible reasoning tokens. This means heavy reasoning can eat into your available output space.
Reasoning Tokens: The Hidden Cost Multiplier#
What Are Reasoning Tokens?#
When you send a prompt to Grok 4.1 Thinking, the model doesn't jump straight to an answer. It first generates an internal chain of thought — a sequence of reasoning steps that help it work through the problem. These intermediate steps are called reasoning tokens.
Reasoning tokens are:
- Generated by the model as part of its thinking process
- Billed as output tokens at $0.50 per million tokens
- Not returned in the API response — you don't see them in the
contentfield - Reported in the
usageobject undercompletion_tokens_details.reasoning_tokens
How Are They Billed?#
Reasoning tokens are billed at the same rate as output tokens: $0.50/MTok. They count toward your total completion_tokens in the usage response.
Here's what a typical usage response looks like:
{
"usage": {
"prompt_tokens": 1200,
"completion_tokens": 8500,
"total_tokens": 9700,
"completion_tokens_details": {
"reasoning_tokens": 7000,
"text_tokens": 1500
}
}
}
In this example, the model generated 7,000 reasoning tokens and 1,500 visible output tokens. You're billed for all 8,500 completion tokens at the output rate. The reasoning tokens account for 82% of the output cost — and you never see them.
Why Are Reasoning Tokens So Costly?#
The issue isn't the per-token rate — $0.50/MTok is reasonable. The issue is volume. Reasoning tokens typically outnumber visible output tokens by a factor of 2x to 10x, depending on task complexity:
| Task Type | Typical Reasoning:Output Ratio | Example |
|---|---|---|
| Simple Q&A | 2:1 | "What's the capital of France?" |
| Code generation | 3–5:1 | "Write a Python function to merge two sorted lists" |
| Math/logic problems | 5–8:1 | "Prove that √2 is irrational" |
| Complex multi-step reasoning | 8–10:1 | "Analyze this codebase and find the bug" |
A prompt that generates 500 visible output tokens might silently produce 3,000–5,000 reasoning tokens. Your effective output cost isn't 2–3/MTok when reasoning is factored in.
Controlling Costs with reasoning_effort#
xAI provides a reasoning_effort parameter that lets you control how much thinking the model does. This directly impacts the number of reasoning tokens generated:
| Value | Behavior | Reasoning Token Reduction |
|---|---|---|
high | Full reasoning (default) | Baseline |
medium | Balanced reasoning | ~40–60% fewer reasoning tokens |
low | Minimal reasoning | ~70–80% fewer reasoning tokens |
from openai import OpenAI
client = OpenAI(
api_key="your-api-key",
base_url="https://api.x.ai/v1"
)
response = client.chat.completions.create(
model="grok-4.1-thinking",
reasoning_effort="medium",
messages=[
{"role": "user", "content": "Explain the difference between TCP and UDP."}
]
)
When to use each level:
high: Math proofs, complex debugging, multi-step logic, competitive programmingmedium: General coding tasks, analysis, summarization with nuancelow: Simple Q&A, classification, extraction, formatting tasks
Using low for simple tasks can cut your total cost by 60–70% compared to the default high setting. This is the single most impactful cost optimization available.
Caching: Automatic 75% Input Discount#
Grok 4.1 Thinking supports automatic prompt caching. When you send repeated or overlapping prompts, xAI's infrastructure automatically caches the common prefix and charges cached tokens at a reduced rate:
- Standard input: $0.20/MTok
- Cached input: $0.05/MTok (75% discount)
Caching is automatic — you don't need to enable it or manage cache keys. The system detects when a new request shares a prefix with a recent request and applies the cached rate.
When Caching Helps Most#
Caching is most effective for:
- System prompts: If you use the same system prompt across requests, it gets cached after the first call
- Multi-turn conversations: The conversation history from previous turns is cached
- Few-shot examples: Static examples in your prompt are cached
- Document analysis: When asking multiple questions about the same document
Caching Example#
Suppose you have a 10,000-token system prompt and send 50 requests with different user messages:
Without caching:
- 50 × 10,000 = 500,000 input tokens × 0.10
With caching (first request uncached, 49 cached):
- 1 × 10,000 = 10,000 tokens × 0.002
- 49 × 10,000 = 490,000 tokens × 0.0245
- Total: $0.0265 (73.5% savings)
For high-volume applications with consistent system prompts, caching alone can reduce input costs by 70%+.
Tool Costs#
Grok 4.1 Thinking supports the same tool/function calling capabilities as the base Grok 4.1 model. There is no additional surcharge for tool use — you pay the standard input and output token rates.
However, tool definitions do consume input tokens. Each tool definition in your request adds to the prompt token count. If you define 20 tools with detailed descriptions, that could add 2,000–5,000 tokens to every request.
Cost optimization tips for tools:
- Only include tools relevant to the current request
- Keep tool descriptions concise but clear
- Use caching to offset the cost of repeated tool definitions
- Consider whether
reasoning_effort="low"is sufficient for tool-routing decisions
Batch API: 50% Off#
xAI offers a Batch API for asynchronous processing at half the standard price:
| Component | Standard | Batch (50% off) |
|---|---|---|
| Input tokens | $0.20/MTok | $0.10/MTok |
| Cached input | $0.05/MTok | $0.025/MTok |
| Output tokens | $0.50/MTok | $0.25/MTok |
| Reasoning tokens | $0.50/MTok | $0.25/MTok |
Batch requests are processed within a 24-hour window. You submit a JSONL file of requests and poll for results. This is ideal for:
- Bulk content generation
- Large-scale data analysis
- Evaluation and benchmarking
- Any workload that doesn't need real-time responses
The 50% discount applies to all token types, including reasoning tokens. For reasoning-heavy workloads, the Batch API can reduce your effective cost from ~1.50/MTok.
Save More with Crazyrouter#
Crazyrouter is an OpenAI-compatible API gateway that provides access to Grok 4.1 Thinking (and 200+ other models) at 90% of official pricing — a flat 10% discount on all token costs.
Crazyrouter Pricing for Grok 4.1 Thinking#
| Component | Official | Crazyrouter (10% off) |
|---|---|---|
| Input tokens | $0.20/MTok | $0.18/MTok |
| Cached input | $0.05/MTok | $0.045/MTok |
| Output tokens | $0.50/MTok | $0.45/MTok |
| Reasoning tokens | $0.50/MTok | $0.45/MTok |
Why Crazyrouter?#
- OpenAI-compatible API: Drop-in replacement — just change the
base_url - 200+ models: Access Grok, GPT, Claude, Gemini, DeepSeek, and more from a single API key
- 10% discount: On every model, every token, every request
- No rate limit surprises: Generous rate limits across all models
- Single billing: One account, one invoice, all providers
Integration: OpenAI Python SDK#
from openai import OpenAI
client = OpenAI(
api_key="your-crazyrouter-key",
base_url="https://crazyrouter.com/v1"
)
response = client.chat.completions.create(
model="grok-4.1-thinking",
reasoning_effort="medium",
messages=[
{
"role": "system",
"content": "You are a helpful coding assistant."
},
{
"role": "user",
"content": "Write a Python function to find the longest palindromic substring."
}
]
)
print(response.choices[0].message.content)
# Check reasoning token usage
usage = response.usage
print(f"Input tokens: {usage.prompt_tokens}")
print(f"Output tokens: {usage.completion_tokens}")
if hasattr(usage, 'completion_tokens_details'):
details = usage.completion_tokens_details
print(f"Reasoning tokens: {details.reasoning_tokens}")
print(f"Text tokens: {details.text_tokens}")
Integration: cURL#
curl https://crazyrouter.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-crazyrouter-key" \
-d '{
"model": "grok-4.1-thinking",
"reasoning_effort": "medium",
"messages": [
{
"role": "user",
"content": "Explain how B-trees work and why databases use them."
}
]
}'
That's it. Change the base URL, use your Crazyrouter API key, and you're saving 10% on every call.
Real-World Cost Scenarios#
Let's walk through three realistic scenarios to see how reasoning tokens, caching, and Crazyrouter affect your bill.
Scenario 1: Simple Chatbot (Low Reasoning)#
Use case: Customer support bot answering FAQ-style questions.
| Parameter | Value |
|---|---|
| Reasoning effort | low |
| Avg input tokens per request | 800 |
| Avg reasoning tokens per request | 300 |
| Avg output tokens per request | 200 |
| Requests per day | 10,000 |
| Caching hit rate | 70% (system prompt cached) |
Monthly cost calculation (30 days):
- Input: 300,000 × 0.3 × 0.05 = 10.50 = $28.50/MTok-adjusted
- Actually: 10,000 × 800 = 8M tokens/day → 240M tokens/month
- Uncached (30%): 72M × 14.40
- Cached (70%): 168M × 8.40
- Output + Reasoning: 10,000 × 500 = 5M tokens/day → 150M tokens/month
- 150M × 75.00
Total (official): 88.02/month — save $9.78/month
Scenario 2: Code Assistant (Medium Reasoning)#
Use case: Developer tool that generates and explains code.
| Parameter | Value |
|---|---|
| Reasoning effort | medium |
| Avg input tokens per request | 3,000 |
| Avg reasoning tokens per request | 4,000 |
| Avg output tokens per request | 1,200 |
| Requests per day | 2,000 |
| Caching hit rate | 50% |
Monthly cost calculation (30 days):
- Input: 2,000 × 3,000 = 6M tokens/day → 180M tokens/month
- Uncached (50%): 90M × 18.00
- Cached (50%): 90M × 4.50
- Output + Reasoning: 2,000 × 5,200 = 10.4M tokens/day → 312M tokens/month
- 312M × 156.00
Total (official): 160.65/month — save $17.85/month
Notice how reasoning tokens (4,000) dwarf the visible output (1,200). The output line is 3.3x what you'd expect from visible tokens alone.
Scenario 3: Research Agent (High Reasoning)#
Use case: Autonomous agent solving complex multi-step problems with tool use.
| Parameter | Value |
|---|---|
| Reasoning effort | high |
| Avg input tokens per request | 8,000 |
| Avg reasoning tokens per request | 15,000 |
| Avg output tokens per request | 2,000 |
| Requests per day | 500 |
| Caching hit rate | 40% |
Monthly cost calculation (30 days):
- Input: 500 × 8,000 = 4M tokens/day → 120M tokens/month
- Uncached (60%): 72M × 14.40
- Cached (40%): 48M × 2.40
- Output + Reasoning: 500 × 17,000 = 8.5M tokens/day → 255M tokens/month
- 255M × 127.50
Total (official): 129.87/month — save $14.43/month
Here, reasoning tokens are 7.5x the visible output. The model is doing serious thinking — and you're paying for every step. If you switched to medium reasoning effort, you could cut the reasoning tokens roughly in half and save ~$60/month.
Grok 4.1 Thinking vs. GPT-5 vs. Claude Opus 4 Reasoning#
How does Grok 4.1 Thinking stack up against other reasoning models?
| Model | Input $/MTok | Output $/MTok | Reasoning Rate | Batch Discount |
|---|---|---|---|---|
| Grok 4.1 Thinking | $0.20 | $0.50 | Same as output ($0.50) | 50% off |
| GPT-5 | $2.00 | $8.00 | Same as output ($8.00) | 50% off |
| Claude Opus 4 | $15.00 | $75.00 | N/A (extended thinking billed at output) | Not available |
The pricing gap is dramatic:
- Grok 4.1 Thinking is 10x cheaper on input and 16x cheaper on output than GPT-5
- Grok 4.1 Thinking is 75x cheaper on input and 150x cheaper on output than Claude Opus 4
Of course, pricing isn't everything — benchmark performance, latency, and output quality all matter. But for cost-sensitive reasoning workloads, Grok 4.1 Thinking offers an extraordinary value proposition. It's the most affordable frontier reasoning model available today.
When to choose each:
- Grok 4.1 Thinking: Best value for reasoning tasks, especially at scale. Strong on math, code, and logic.
- GPT-5: Broader general knowledge, stronger on creative and nuanced tasks. Worth the premium for customer-facing applications.
- Claude Opus 4: Best-in-class for long-context analysis, complex writing, and tasks requiring deep understanding. Premium pricing reflects premium capability.
Key Takeaways#
-
Base rates are cheap, but reasoning tokens multiply your costs. A 2–5/MTok when reasoning tokens are factored in.
-
Use
reasoning_effortaggressively. Set it tolowfor simple tasks andmediumfor most workloads. Reservehighfor genuinely complex problems. -
Caching is free money. Consistent system prompts and multi-turn conversations automatically benefit from 75% input discounts.
-
Batch API halves everything. If you can tolerate async processing, the 50% discount applies to all token types including reasoning.
-
Crazyrouter saves 10% on top. An OpenAI-compatible drop-in that requires changing one line of code.
-
Monitor
reasoning_tokensin your usage data. If you're not tracking this field, you're flying blind on costs. -
Grok 4.1 Thinking is the most cost-effective reasoning model available. At 10–75x cheaper than GPT-5 and Claude Opus 4, it's the clear choice for budget-conscious reasoning workloads.
Get Started with Crazyrouter#
Ready to use Grok 4.1 Thinking at 10% off?
- Sign up at crazyrouter.com
- Get your API key from the dashboard
- Change your base URL to
https://crazyrouter.com/v1 - Start saving on every request
Crazyrouter supports 200+ models from xAI, OpenAI, Anthropic, Google, DeepSeek, and more — all through a single OpenAI-compatible API. One key, one bill, every model.
👉 Get your API key at crazyrouter.com
Disclaimer: Pricing information is accurate as of April 27, 2026 and is based on publicly available data from xAI. Prices may change without notice. Crazyrouter is an independent API gateway and is not affiliated with xAI. Always verify current pricing on the official xAI pricing page before making purchasing decisions. Token usage estimates in the scenarios above are approximations and actual usage will vary based on prompt complexity, model behavior, and other factors.





