Login
Back to Blog
"Grok 4.1 Thinking Pricing Explained — Reasoning Tokens, Caching, and How to Save with Crazyrouter"

"Grok 4.1 Thinking Pricing Explained — Reasoning Tokens, Caching, and How to Save with Crazyrouter"

C
Crazyrouter Team
April 27, 2026
0 viewsEnglishPricing
Share:

Grok 4.1 Thinking Pricing Explained — Reasoning Tokens, Caching, and How to Save with Crazyrouter#

xAI's Grok 4.1 Thinking is the reasoning-enhanced variant of the Grok 4.1 model family. It extends the already capable Grok 4.1 base model with chain-of-thought reasoning — the model "thinks" through problems step by step before producing a final answer. This makes it exceptionally strong for math, code generation, logic puzzles, multi-step planning, and any task where deliberate reasoning outperforms pattern matching.

But reasoning comes at a cost. Grok 4.1 Thinking generates reasoning tokens — internal chain-of-thought tokens that are billed at the output token rate but never appear in the final response. If you're not careful, a simple prompt can quietly consume 5–10x more tokens than you expected.

This guide breaks down every component of Grok 4.1 Thinking pricing, explains how reasoning tokens work, shows you how to control costs with caching and the reasoning_effort parameter, and demonstrates how to save an additional 10% by routing through Crazyrouter.

Last updated: April 27, 2026.


Base Pricing#

Here's the official Grok 4.1 Thinking pricing from xAI:

ComponentPrice per Million Tokens
Input tokens$0.20
Cached input tokens$0.05
Output tokens$0.50
Reasoning tokens$0.50 (same as output)

At first glance, these rates look extremely competitive. Input at 0.20/MTokischeaperthanmostfrontiermodels,andoutputat0.20/MTok is cheaper than most frontier models, and output at 0.50/MTok undercuts GPT-5 and Claude Opus 4 significantly. But the real cost story is in the reasoning tokens — more on that below.

Context Window#

Grok 4.1 Thinking supports a 131,072-token context window — the same as the base Grok 4.1 model. The output limit is 65,536 tokens, which includes both visible output tokens and invisible reasoning tokens. This means heavy reasoning can eat into your available output space.


Reasoning Tokens: The Hidden Cost Multiplier#

What Are Reasoning Tokens?#

When you send a prompt to Grok 4.1 Thinking, the model doesn't jump straight to an answer. It first generates an internal chain of thought — a sequence of reasoning steps that help it work through the problem. These intermediate steps are called reasoning tokens.

Reasoning tokens are:

  • Generated by the model as part of its thinking process
  • Billed as output tokens at $0.50 per million tokens
  • Not returned in the API response — you don't see them in the content field
  • Reported in the usage object under completion_tokens_details.reasoning_tokens

How Are They Billed?#

Reasoning tokens are billed at the same rate as output tokens: $0.50/MTok. They count toward your total completion_tokens in the usage response.

Here's what a typical usage response looks like:

json
{
  "usage": {
    "prompt_tokens": 1200,
    "completion_tokens": 8500,
    "total_tokens": 9700,
    "completion_tokens_details": {
      "reasoning_tokens": 7000,
      "text_tokens": 1500
    }
  }
}

In this example, the model generated 7,000 reasoning tokens and 1,500 visible output tokens. You're billed for all 8,500 completion tokens at the output rate. The reasoning tokens account for 82% of the output cost — and you never see them.

Why Are Reasoning Tokens So Costly?#

The issue isn't the per-token rate — $0.50/MTok is reasonable. The issue is volume. Reasoning tokens typically outnumber visible output tokens by a factor of 2x to 10x, depending on task complexity:

Task TypeTypical Reasoning:Output RatioExample
Simple Q&A2:1"What's the capital of France?"
Code generation3–5:1"Write a Python function to merge two sorted lists"
Math/logic problems5–8:1"Prove that √2 is irrational"
Complex multi-step reasoning8–10:1"Analyze this codebase and find the bug"

A prompt that generates 500 visible output tokens might silently produce 3,000–5,000 reasoning tokens. Your effective output cost isn't 0.50/MTokitscloserto0.50/MTok — it's closer to 2–3/MTok when reasoning is factored in.

Controlling Costs with reasoning_effort#

xAI provides a reasoning_effort parameter that lets you control how much thinking the model does. This directly impacts the number of reasoning tokens generated:

ValueBehaviorReasoning Token Reduction
highFull reasoning (default)Baseline
mediumBalanced reasoning~40–60% fewer reasoning tokens
lowMinimal reasoning~70–80% fewer reasoning tokens
python
from openai import OpenAI

client = OpenAI(
    api_key="your-api-key",
    base_url="https://api.x.ai/v1"
)

response = client.chat.completions.create(
    model="grok-4.1-thinking",
    reasoning_effort="medium",
    messages=[
        {"role": "user", "content": "Explain the difference between TCP and UDP."}
    ]
)

When to use each level:

  • high: Math proofs, complex debugging, multi-step logic, competitive programming
  • medium: General coding tasks, analysis, summarization with nuance
  • low: Simple Q&A, classification, extraction, formatting tasks

Using low for simple tasks can cut your total cost by 60–70% compared to the default high setting. This is the single most impactful cost optimization available.


Caching: Automatic 75% Input Discount#

Grok 4.1 Thinking supports automatic prompt caching. When you send repeated or overlapping prompts, xAI's infrastructure automatically caches the common prefix and charges cached tokens at a reduced rate:

  • Standard input: $0.20/MTok
  • Cached input: $0.05/MTok (75% discount)

Caching is automatic — you don't need to enable it or manage cache keys. The system detects when a new request shares a prefix with a recent request and applies the cached rate.

When Caching Helps Most#

Caching is most effective for:

  • System prompts: If you use the same system prompt across requests, it gets cached after the first call
  • Multi-turn conversations: The conversation history from previous turns is cached
  • Few-shot examples: Static examples in your prompt are cached
  • Document analysis: When asking multiple questions about the same document

Caching Example#

Suppose you have a 10,000-token system prompt and send 50 requests with different user messages:

Without caching:

  • 50 × 10,000 = 500,000 input tokens × 0.20/MTok=0.20/MTok = 0.10

With caching (first request uncached, 49 cached):

  • 1 × 10,000 = 10,000 tokens × 0.20/MTok=0.20/MTok = 0.002
  • 49 × 10,000 = 490,000 tokens × 0.05/MTok=0.05/MTok = 0.0245
  • Total: $0.0265 (73.5% savings)

For high-volume applications with consistent system prompts, caching alone can reduce input costs by 70%+.


Tool Costs#

Grok 4.1 Thinking supports the same tool/function calling capabilities as the base Grok 4.1 model. There is no additional surcharge for tool use — you pay the standard input and output token rates.

However, tool definitions do consume input tokens. Each tool definition in your request adds to the prompt token count. If you define 20 tools with detailed descriptions, that could add 2,000–5,000 tokens to every request.

Cost optimization tips for tools:

  • Only include tools relevant to the current request
  • Keep tool descriptions concise but clear
  • Use caching to offset the cost of repeated tool definitions
  • Consider whether reasoning_effort="low" is sufficient for tool-routing decisions

Batch API: 50% Off#

xAI offers a Batch API for asynchronous processing at half the standard price:

ComponentStandardBatch (50% off)
Input tokens$0.20/MTok$0.10/MTok
Cached input$0.05/MTok$0.025/MTok
Output tokens$0.50/MTok$0.25/MTok
Reasoning tokens$0.50/MTok$0.25/MTok

Batch requests are processed within a 24-hour window. You submit a JSONL file of requests and poll for results. This is ideal for:

  • Bulk content generation
  • Large-scale data analysis
  • Evaluation and benchmarking
  • Any workload that doesn't need real-time responses

The 50% discount applies to all token types, including reasoning tokens. For reasoning-heavy workloads, the Batch API can reduce your effective cost from ~3/MTokto 3/MTok to ~1.50/MTok.


Save More with Crazyrouter#

Crazyrouter is an OpenAI-compatible API gateway that provides access to Grok 4.1 Thinking (and 200+ other models) at 90% of official pricing — a flat 10% discount on all token costs.

Crazyrouter Pricing for Grok 4.1 Thinking#

ComponentOfficialCrazyrouter (10% off)
Input tokens$0.20/MTok$0.18/MTok
Cached input$0.05/MTok$0.045/MTok
Output tokens$0.50/MTok$0.45/MTok
Reasoning tokens$0.50/MTok$0.45/MTok

Why Crazyrouter?#

  • OpenAI-compatible API: Drop-in replacement — just change the base_url
  • 200+ models: Access Grok, GPT, Claude, Gemini, DeepSeek, and more from a single API key
  • 10% discount: On every model, every token, every request
  • No rate limit surprises: Generous rate limits across all models
  • Single billing: One account, one invoice, all providers

Integration: OpenAI Python SDK#

python
from openai import OpenAI

client = OpenAI(
    api_key="your-crazyrouter-key",
    base_url="https://crazyrouter.com/v1"
)

response = client.chat.completions.create(
    model="grok-4.1-thinking",
    reasoning_effort="medium",
    messages=[
        {
            "role": "system",
            "content": "You are a helpful coding assistant."
        },
        {
            "role": "user",
            "content": "Write a Python function to find the longest palindromic substring."
        }
    ]
)

print(response.choices[0].message.content)

# Check reasoning token usage
usage = response.usage
print(f"Input tokens: {usage.prompt_tokens}")
print(f"Output tokens: {usage.completion_tokens}")
if hasattr(usage, 'completion_tokens_details'):
    details = usage.completion_tokens_details
    print(f"Reasoning tokens: {details.reasoning_tokens}")
    print(f"Text tokens: {details.text_tokens}")

Integration: cURL#

bash
curl https://crazyrouter.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-crazyrouter-key" \
  -d '{
    "model": "grok-4.1-thinking",
    "reasoning_effort": "medium",
    "messages": [
      {
        "role": "user",
        "content": "Explain how B-trees work and why databases use them."
      }
    ]
  }'

That's it. Change the base URL, use your Crazyrouter API key, and you're saving 10% on every call.


Real-World Cost Scenarios#

Let's walk through three realistic scenarios to see how reasoning tokens, caching, and Crazyrouter affect your bill.

Scenario 1: Simple Chatbot (Low Reasoning)#

Use case: Customer support bot answering FAQ-style questions.

ParameterValue
Reasoning effortlow
Avg input tokens per request800
Avg reasoning tokens per request300
Avg output tokens per request200
Requests per day10,000
Caching hit rate70% (system prompt cached)

Monthly cost calculation (30 days):

  • Input: 300,000 × 0.3 × 0.20+300,000×0.7×0.20 + 300,000 × 0.7 × 0.05 = 18.00+18.00 + 10.50 = $28.50/MTok-adjusted
  • Actually: 10,000 × 800 = 8M tokens/day → 240M tokens/month
    • Uncached (30%): 72M × 0.20/MTok=0.20/MTok = 14.40
    • Cached (70%): 168M × 0.05/MTok=0.05/MTok = 8.40
  • Output + Reasoning: 10,000 × 500 = 5M tokens/day → 150M tokens/month
    • 150M × 0.50/MTok=0.50/MTok = 75.00

Total (official): 97.80/monthTotal(Crazyrouter):97.80/month **Total (Crazyrouter)**: 88.02/month — save $9.78/month

Scenario 2: Code Assistant (Medium Reasoning)#

Use case: Developer tool that generates and explains code.

ParameterValue
Reasoning effortmedium
Avg input tokens per request3,000
Avg reasoning tokens per request4,000
Avg output tokens per request1,200
Requests per day2,000
Caching hit rate50%

Monthly cost calculation (30 days):

  • Input: 2,000 × 3,000 = 6M tokens/day → 180M tokens/month
    • Uncached (50%): 90M × 0.20/MTok=0.20/MTok = 18.00
    • Cached (50%): 90M × 0.05/MTok=0.05/MTok = 4.50
  • Output + Reasoning: 2,000 × 5,200 = 10.4M tokens/day → 312M tokens/month
    • 312M × 0.50/MTok=0.50/MTok = 156.00

Total (official): 178.50/monthTotal(Crazyrouter):178.50/month **Total (Crazyrouter)**: 160.65/month — save $17.85/month

Notice how reasoning tokens (4,000) dwarf the visible output (1,200). The output line is 3.3x what you'd expect from visible tokens alone.

Scenario 3: Research Agent (High Reasoning)#

Use case: Autonomous agent solving complex multi-step problems with tool use.

ParameterValue
Reasoning efforthigh
Avg input tokens per request8,000
Avg reasoning tokens per request15,000
Avg output tokens per request2,000
Requests per day500
Caching hit rate40%

Monthly cost calculation (30 days):

  • Input: 500 × 8,000 = 4M tokens/day → 120M tokens/month
    • Uncached (60%): 72M × 0.20/MTok=0.20/MTok = 14.40
    • Cached (40%): 48M × 0.05/MTok=0.05/MTok = 2.40
  • Output + Reasoning: 500 × 17,000 = 8.5M tokens/day → 255M tokens/month
    • 255M × 0.50/MTok=0.50/MTok = 127.50

Total (official): 144.30/monthTotal(Crazyrouter):144.30/month **Total (Crazyrouter)**: 129.87/month — save $14.43/month

Here, reasoning tokens are 7.5x the visible output. The model is doing serious thinking — and you're paying for every step. If you switched to medium reasoning effort, you could cut the reasoning tokens roughly in half and save ~$60/month.


Grok 4.1 Thinking vs. GPT-5 vs. Claude Opus 4 Reasoning#

How does Grok 4.1 Thinking stack up against other reasoning models?

ModelInput $/MTokOutput $/MTokReasoning RateBatch Discount
Grok 4.1 Thinking$0.20$0.50Same as output ($0.50)50% off
GPT-5$2.00$8.00Same as output ($8.00)50% off
Claude Opus 4$15.00$75.00N/A (extended thinking billed at output)Not available

The pricing gap is dramatic:

  • Grok 4.1 Thinking is 10x cheaper on input and 16x cheaper on output than GPT-5
  • Grok 4.1 Thinking is 75x cheaper on input and 150x cheaper on output than Claude Opus 4

Of course, pricing isn't everything — benchmark performance, latency, and output quality all matter. But for cost-sensitive reasoning workloads, Grok 4.1 Thinking offers an extraordinary value proposition. It's the most affordable frontier reasoning model available today.

When to choose each:

  • Grok 4.1 Thinking: Best value for reasoning tasks, especially at scale. Strong on math, code, and logic.
  • GPT-5: Broader general knowledge, stronger on creative and nuanced tasks. Worth the premium for customer-facing applications.
  • Claude Opus 4: Best-in-class for long-context analysis, complex writing, and tasks requiring deep understanding. Premium pricing reflects premium capability.

Key Takeaways#

  1. Base rates are cheap, but reasoning tokens multiply your costs. A 0.50/MTokoutputratecaneffectivelybecome0.50/MTok output rate can effectively become 2–5/MTok when reasoning tokens are factored in.

  2. Use reasoning_effort aggressively. Set it to low for simple tasks and medium for most workloads. Reserve high for genuinely complex problems.

  3. Caching is free money. Consistent system prompts and multi-turn conversations automatically benefit from 75% input discounts.

  4. Batch API halves everything. If you can tolerate async processing, the 50% discount applies to all token types including reasoning.

  5. Crazyrouter saves 10% on top. An OpenAI-compatible drop-in that requires changing one line of code.

  6. Monitor reasoning_tokens in your usage data. If you're not tracking this field, you're flying blind on costs.

  7. Grok 4.1 Thinking is the most cost-effective reasoning model available. At 10–75x cheaper than GPT-5 and Claude Opus 4, it's the clear choice for budget-conscious reasoning workloads.


Get Started with Crazyrouter#

Ready to use Grok 4.1 Thinking at 10% off?

  1. Sign up at crazyrouter.com
  2. Get your API key from the dashboard
  3. Change your base URL to https://crazyrouter.com/v1
  4. Start saving on every request

Crazyrouter supports 200+ models from xAI, OpenAI, Anthropic, Google, DeepSeek, and more — all through a single OpenAI-compatible API. One key, one bill, every model.

👉 Get your API key at crazyrouter.com


Disclaimer: Pricing information is accurate as of April 27, 2026 and is based on publicly available data from xAI. Prices may change without notice. Crazyrouter is an independent API gateway and is not affiliated with xAI. Always verify current pricing on the official xAI pricing page before making purchasing decisions. Token usage estimates in the scenarios above are approximations and actual usage will vary based on prompt complexity, model behavior, and other factors.

Related Articles