
Gemini 3.1 Pro Pricing Explained — Context Tiers, Caching, and How to Save with Crazyrouter
Gemini 3.1 Pro Pricing Explained — Context Tiers, Caching, and How to Save with Crazyrouter#
Google's Gemini 3.1 Pro Preview is one of the most capable large language models available today, offering a massive 1 million token context window, multimodal input support, and strong reasoning performance. But with great capability comes the question every developer and team asks first: how much does it actually cost?
Unlike many competing models that charge a flat per-token rate, Gemini 3.1 Pro introduces a context-tiered pricing structure — meaning the cost per token changes depending on how much of that 1M context window you use. This is a meaningful shift in how API pricing works, and understanding it can save you hundreds or even thousands of dollars per month.
In this guide, we break down every aspect of Gemini 3.1 Pro Preview pricing: base rates, context tiers, caching discounts, the free tier, grounding costs, and how to access it at a 10% discount through Crazyrouter. We also run through real-world cost scenarios and compare pricing head-to-head with GPT-5.4 and Claude Sonnet 4.6.
Let's get into it.
Base Pricing: Context-Tiered Input and Output#
The defining feature of Gemini 3.1 Pro's pricing model is its two-tier context structure. Google splits pricing based on whether your total prompt (input) falls within 200K tokens or exceeds it.
Tier 1: Prompts ≤ 200K Tokens#
| Component | Price per Million Tokens (MTok) |
|---|---|
| Input | $2.00 |
| Output | $12.00 |
Tier 2: Prompts > 200K Tokens#
| Component | Price per Million Tokens (MTok) |
|---|---|
| Input | $4.00 |
| Output | $18.00 |
Audio Input#
| Component | Price per Million Tokens (MTok) |
|---|---|
| Audio | $1.00 |
The tier boundary is 200,000 tokens of input context. Once your prompt crosses that threshold, both input and output pricing jump — input doubles from 4 per MTok, and output increases by 50% from 18 per MTok.
This matters because the 1M context window is one of Gemini 3.1 Pro's headline features. If you're using it to process entire codebases, long documents, or extended conversation histories, you'll likely cross the 200K boundary regularly. Planning your prompt architecture around this tier can yield significant savings.
For most standard API use cases — chatbots, summarization, code generation with moderate context — you'll stay comfortably within the ≤200K tier. The higher tier is designed for power users who genuinely need deep context: legal document analysis, full-repository code review, or multi-document research synthesis.
Audio input is priced separately at $1.00 per MTok regardless of context length, making Gemini 3.1 Pro a competitive option for voice and audio processing workloads.
Context Caching: The Real Cost Saver#
Context caching is where Gemini 3.1 Pro's pricing gets genuinely interesting. If you're sending the same large context repeatedly — a system prompt, a reference document, a codebase — you can cache it and pay dramatically less on subsequent requests.
Cache Pricing#
| Cache Tier | Price per Million Tokens (MTok) |
|---|---|
| ≤ 200K context | $0.20 |
| > 200K context | $0.40 |
| Cache storage | $4.50 per MTok per hour |
Cached input tokens cost just **2.00 input rate. For the >200K tier, cached tokens cost 4.00 standard rate.
The trade-off is the storage cost of $4.50 per MTok per hour. This means caching is most cost-effective when you're making frequent requests against the same context within a short time window.
When Caching Makes Sense#
Context caching is ideal for:
- Chatbots with large system prompts: If your system prompt is 50K tokens and you're handling hundreds of conversations per hour, caching that prompt saves massively.
- Document Q&A systems: Upload a document once, cache it, and run multiple queries against it.
- Code assistants: Cache a repository's codebase and let users ask questions about it throughout a session.
- Batch processing: When running the same analysis across many inputs with a shared context.
When Caching Doesn't Make Sense#
If you're sending unique, one-off prompts with no repeated context, caching adds storage cost without benefit. The break-even point depends on your request frequency — generally, if you're reusing the same context more than a few times per hour, caching pays for itself.
Quick Math: Caching ROI#
Say you have a 100K token context that you query 50 times in an hour:
Without caching:
- 50 requests × 100K input tokens = 5M input tokens
- Cost: 5 × 10.00
With caching:
- 1 uncached request: 100K tokens × 0.20
- 49 cached requests: 4.9M tokens × 0.98
- Storage: 0.1 MTok × 0.45
- Total: $1.63
That's an 84% savings in this scenario. The more requests you make against cached context, the better the economics get.
Free Tier: Try Before You Pay#
Google offers a free tier for Gemini 3.1 Pro Preview, which is generous enough for prototyping and light development:
- Rate limits: Lower than paid tier (specific RPM/TPM limits apply)
- Access: Available through Google AI Studio and the Gemini API
- Restrictions: Free tier requests may be used for model improvement; not recommended for production or sensitive data
The free tier lets you test Gemini 3.1 Pro's capabilities — its reasoning, code generation, multimodal understanding, and that massive context window — without committing any budget. It's a solid way to benchmark the model against your specific use case before scaling up.
For production workloads, you'll want to move to the paid tier for higher rate limits, data privacy guarantees, and SLA coverage.
Grounding with Google Search#
Gemini 3.1 Pro supports grounding with Google Search, which lets the model pull in real-time web information to improve factual accuracy and provide up-to-date responses.
Grounding Pricing#
| Component | Price |
|---|---|
| Grounding requests | $35.00 per 1,000 requests |
| Free daily allowance | 1,500 requests per day (RPD) |
At $0.035 per grounded request, this adds a meaningful cost layer if you're using it at scale. However, the 1,500 free requests per day provide a decent buffer for moderate usage.
Grounding is particularly valuable for:
- News and current events: Queries about recent developments that fall outside the model's training data
- Fact-checking: Verifying claims against live web sources
- Research assistants: Pulling in the latest papers, articles, or data points
If your application doesn't need real-time information, you can skip grounding entirely and avoid this cost. For applications that do, budget approximately $35 per 1,000 grounded queries on top of your token costs.
Access Gemini 3.1 Pro at 10% Off with Crazyrouter#
Crazyrouter provides access to Gemini 3.1 Pro Preview at 90% of Google's official pricing — a flat 10% discount on all token costs.
Crazyrouter Pricing for Gemini 3.1 Pro#
| Component | Google Official | Crazyrouter (10% off) |
|---|---|---|
| Input ≤200K | $2.00/MTok | $1.80/MTok |
| Output ≤200K | $12.00/MTok | $10.80/MTok |
| Input >200K | $4.00/MTok | $3.60/MTok |
| Output >200K | $18.00/MTok | $16.20/MTok |
The integration is seamless — Crazyrouter uses the OpenAI-compatible API format, so you can switch over by changing just two lines in your code.
Python (OpenAI SDK)#
from openai import OpenAI
client = OpenAI(
api_key="your-crazyrouter-api-key",
base_url="https://crazyrouter.com/v1"
)
response = client.chat.completions.create(
model="gemini-3.1-pro-preview",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain context-tiered pricing in AI APIs."}
],
max_tokens=2048
)
print(response.choices[0].message.content)
cURL#
curl https://crazyrouter.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-crazyrouter-api-key" \
-d '{
"model": "gemini-3.1-pro-preview",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain context-tiered pricing in AI APIs."}
],
"max_tokens": 2048
}'
No SDK changes, no new libraries, no migration headaches. If your app already uses the OpenAI SDK format, you're two lines away from a 10% discount on every Gemini 3.1 Pro request.
Real-World Cost Scenarios#
Let's put these numbers into context with three practical scenarios.
Scenario 1: Customer Support Chatbot#
Setup: 10K token system prompt, average 2K token user message, 1K token response, 5,000 conversations per day.
Monthly token usage:
- Input: (10K + 2K) × 5,000 × 30 = 1.8B tokens = 1,800 MTok
- Output: 1K × 5,000 × 30 = 150M tokens = 150 MTok
Google direct:
- Input: 1,800 × 3,600
- Output: 150 × 1,800
- Total: $5,400/month
Crazyrouter (10% off):
- Input: 1,800 × 3,240
- Output: 150 × 1,620
- **Total: 540/month)
With context caching on the system prompt, input costs drop further — the 10K system prompt cached across all 150K daily requests would cost pennies compared to re-sending it each time.
Scenario 2: Legal Document Analyzer (Long Context)#
Setup: 500K token legal documents (>200K tier), 5K token queries, 10K token analysis outputs, 200 analyses per day.
Monthly token usage:
- Input: 505K × 200 × 30 = 3.03B tokens = 3,030 MTok (>200K tier)
- Output: 10K × 200 × 30 = 60M tokens = 60 MTok
Google direct:
- Input: 3,030 × 12,120
- Output: 60 × 1,080
- Total: $13,200/month
Crazyrouter (10% off):
- Input: 3,030 × 10,908
- Output: 60 × 972
- **Total: 1,320/month)
This scenario highlights why the context tier matters. At 500K tokens per document, you're firmly in the >200K pricing tier. Caching the document for repeated queries within a session would dramatically reduce costs here.
Scenario 3: Developer Tool with Code Context#
Setup: 80K token codebase context (cached), 3K token queries, 2K token responses, 1,000 requests per day.
Monthly token usage:
- Cached input: 80K × 1,000 × 30 = 2.4B tokens = 2,400 MTok
- Fresh input: 3K × 1,000 × 30 = 90M tokens = 90 MTok
- Output: 2K × 1,000 × 30 = 60M tokens = 60 MTok
Google direct (with caching):
- Cached input: 2,400 × 480
- Fresh input: 90 × 180
- Output: 60 × 720
- Cache storage: 0.08 MTok × 259.20
- Total: ~$1,639/month
Crazyrouter (10% off on token costs):
- Cached input: 2,400 × 432
- Fresh input: 90 × 162
- Output: 60 × 648
- Cache storage: ~$259.20
- **Total: ~138/month)
Even with caching doing the heavy lifting, the 10% Crazyrouter discount still adds up over time.
Gemini 3.1 Pro vs GPT-5.4 vs Claude Sonnet 4.6#
How does Gemini 3.1 Pro stack up against the other leading models on price?
| Model | Input Price (MTok) | Output Price (MTok) | Context Window | Notes |
|---|---|---|---|---|
| Gemini 3.1 Pro (≤200K) | $2.00 | $12.00 | 1M tokens | Tiered pricing, caching available |
| Gemini 3.1 Pro (>200K) | $4.00 | $18.00 | 1M tokens | Higher tier for deep context |
| GPT-5.4 | $2.50 | $10.00 | 256K tokens | Flat pricing, no context tiers |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 200K tokens | Extended thinking available |
Key Comparisons#
Gemini 3.1 Pro vs GPT-5.4: Gemini wins on input cost (2.50 per MTok) but loses on output (10.00). If your workload is input-heavy (large context, short responses), Gemini is cheaper. If you generate long outputs, GPT-5.4 has the edge. Gemini's 1M context window is 4× larger than GPT-5.4's 256K, which is a decisive advantage for long-document workloads.
Gemini 3.1 Pro vs Claude Sonnet 4.6: Gemini is cheaper on both input (3.00) and output (15.00) within the ≤200K tier. Claude Sonnet 4.6 offers extended thinking capabilities that may justify the premium for complex reasoning tasks, but on pure price-per-token, Gemini 3.1 Pro is the more economical choice.
The context window factor: Gemini's 1M token context is unmatched. If your use case requires processing documents longer than 200K-256K tokens, Gemini 3.1 Pro is effectively your only option among these three — and even at the >200K tier pricing (18.00), it's enabling workloads that simply aren't possible with the competition.
Caching advantage: Gemini's context caching at $0.20/MTok has no direct equivalent in GPT-5.4 or Claude Sonnet 4.6's standard pricing. For repetitive-context workloads, this can make Gemini 3.1 Pro dramatically cheaper than the headline rates suggest.
Key Takeaways#
-
Context tiers matter: Stay under 200K input tokens when possible to get the 12.00 rates instead of 18.00. Architect your prompts accordingly.
-
Caching is a game-changer: At 4.50/MTok/hr) means you should cache strategically — high-frequency, short-duration sessions benefit most.
-
The 1M context window is unique: No other major model offers this much context. If you need it, Gemini 3.1 Pro is the clear choice — and the tiered pricing means you only pay the premium when you actually use deep context.
-
Free tier for prototyping: Test everything before committing budget. Google's free tier is generous enough for meaningful evaluation.
-
Grounding adds up: At $35 per 1,000 requests, grounding with Google Search is powerful but not cheap. Use the 1,500 free daily requests wisely, and only enable grounding where real-time information genuinely improves output quality.
-
Crazyrouter saves 10%: A flat 10% discount with zero integration friction. For teams spending 500+ back in your pocket — every month.
Start Building with Gemini 3.1 Pro#
Gemini 3.1 Pro Preview delivers frontier-level performance with a pricing structure that rewards smart architecture. The context-tiered model, combined with aggressive caching discounts, means developers who understand the pricing can build powerful applications at surprisingly reasonable costs.
Ready to get started? Sign up for Crazyrouter to access Gemini 3.1 Pro at 10% off official pricing — no contracts, no minimums, just a better rate on every API call. Your existing OpenAI SDK code works out of the box. Change your base_url, and you're live.
👉 Get Started with Crazyrouter →
Last updated: April 27, 2026
Disclaimer: Pricing information is based on publicly available data from Google as of the publication date. Prices may change without notice. Crazyrouter discount rates are subject to Crazyrouter's current terms and pricing policy. Always verify current pricing on the official Google AI and Crazyrouter websites before making purchasing decisions. This article is for informational purposes only and does not constitute financial advice.





