Login
Back to Blog
"Gemini 3.1 Pro Pricing Explained — Context Tiers, Caching, and How to Save with Crazyrouter"

"Gemini 3.1 Pro Pricing Explained — Context Tiers, Caching, and How to Save with Crazyrouter"

C
Crazyrouter Team
April 27, 2026
0 viewsEnglishPricing
Share:

Gemini 3.1 Pro Pricing Explained — Context Tiers, Caching, and How to Save with Crazyrouter#

Google's Gemini 3.1 Pro Preview is one of the most capable large language models available today, offering a massive 1 million token context window, multimodal input support, and strong reasoning performance. But with great capability comes the question every developer and team asks first: how much does it actually cost?

Unlike many competing models that charge a flat per-token rate, Gemini 3.1 Pro introduces a context-tiered pricing structure — meaning the cost per token changes depending on how much of that 1M context window you use. This is a meaningful shift in how API pricing works, and understanding it can save you hundreds or even thousands of dollars per month.

In this guide, we break down every aspect of Gemini 3.1 Pro Preview pricing: base rates, context tiers, caching discounts, the free tier, grounding costs, and how to access it at a 10% discount through Crazyrouter. We also run through real-world cost scenarios and compare pricing head-to-head with GPT-5.4 and Claude Sonnet 4.6.

Let's get into it.

Base Pricing: Context-Tiered Input and Output#

The defining feature of Gemini 3.1 Pro's pricing model is its two-tier context structure. Google splits pricing based on whether your total prompt (input) falls within 200K tokens or exceeds it.

Tier 1: Prompts ≤ 200K Tokens#

ComponentPrice per Million Tokens (MTok)
Input$2.00
Output$12.00

Tier 2: Prompts > 200K Tokens#

ComponentPrice per Million Tokens (MTok)
Input$4.00
Output$18.00

Audio Input#

ComponentPrice per Million Tokens (MTok)
Audio$1.00

The tier boundary is 200,000 tokens of input context. Once your prompt crosses that threshold, both input and output pricing jump — input doubles from 2to2 to 4 per MTok, and output increases by 50% from 12to12 to 18 per MTok.

This matters because the 1M context window is one of Gemini 3.1 Pro's headline features. If you're using it to process entire codebases, long documents, or extended conversation histories, you'll likely cross the 200K boundary regularly. Planning your prompt architecture around this tier can yield significant savings.

For most standard API use cases — chatbots, summarization, code generation with moderate context — you'll stay comfortably within the ≤200K tier. The higher tier is designed for power users who genuinely need deep context: legal document analysis, full-repository code review, or multi-document research synthesis.

Audio input is priced separately at $1.00 per MTok regardless of context length, making Gemini 3.1 Pro a competitive option for voice and audio processing workloads.

Context Caching: The Real Cost Saver#

Context caching is where Gemini 3.1 Pro's pricing gets genuinely interesting. If you're sending the same large context repeatedly — a system prompt, a reference document, a codebase — you can cache it and pay dramatically less on subsequent requests.

Cache Pricing#

Cache TierPrice per Million Tokens (MTok)
≤ 200K context$0.20
> 200K context$0.40
Cache storage$4.50 per MTok per hour

Cached input tokens cost just **0.20perMTokforpromptswithinthe200Ktierthatsa900.20 per MTok** for prompts within the 200K tier — that's a **90% discount** compared to the standard 2.00 input rate. For the >200K tier, cached tokens cost 0.40perMTok,alsoa900.40 per MTok, also a 90% reduction from the 4.00 standard rate.

The trade-off is the storage cost of $4.50 per MTok per hour. This means caching is most cost-effective when you're making frequent requests against the same context within a short time window.

When Caching Makes Sense#

Context caching is ideal for:

  • Chatbots with large system prompts: If your system prompt is 50K tokens and you're handling hundreds of conversations per hour, caching that prompt saves massively.
  • Document Q&A systems: Upload a document once, cache it, and run multiple queries against it.
  • Code assistants: Cache a repository's codebase and let users ask questions about it throughout a session.
  • Batch processing: When running the same analysis across many inputs with a shared context.

When Caching Doesn't Make Sense#

If you're sending unique, one-off prompts with no repeated context, caching adds storage cost without benefit. The break-even point depends on your request frequency — generally, if you're reusing the same context more than a few times per hour, caching pays for itself.

Quick Math: Caching ROI#

Say you have a 100K token context that you query 50 times in an hour:

Without caching:

  • 50 requests × 100K input tokens = 5M input tokens
  • Cost: 5 × 2.00=2.00 = 10.00

With caching:

  • 1 uncached request: 100K tokens × 2.00/MTok=2.00/MTok = 0.20
  • 49 cached requests: 4.9M tokens × 0.20/MTok=0.20/MTok = 0.98
  • Storage: 0.1 MTok × 4.50/hr×1hr=4.50/hr × 1 hr = 0.45
  • Total: $1.63

That's an 84% savings in this scenario. The more requests you make against cached context, the better the economics get.

Free Tier: Try Before You Pay#

Google offers a free tier for Gemini 3.1 Pro Preview, which is generous enough for prototyping and light development:

  • Rate limits: Lower than paid tier (specific RPM/TPM limits apply)
  • Access: Available through Google AI Studio and the Gemini API
  • Restrictions: Free tier requests may be used for model improvement; not recommended for production or sensitive data

The free tier lets you test Gemini 3.1 Pro's capabilities — its reasoning, code generation, multimodal understanding, and that massive context window — without committing any budget. It's a solid way to benchmark the model against your specific use case before scaling up.

For production workloads, you'll want to move to the paid tier for higher rate limits, data privacy guarantees, and SLA coverage.

Gemini 3.1 Pro supports grounding with Google Search, which lets the model pull in real-time web information to improve factual accuracy and provide up-to-date responses.

Grounding Pricing#

ComponentPrice
Grounding requests$35.00 per 1,000 requests
Free daily allowance1,500 requests per day (RPD)

At $0.035 per grounded request, this adds a meaningful cost layer if you're using it at scale. However, the 1,500 free requests per day provide a decent buffer for moderate usage.

Grounding is particularly valuable for:

  • News and current events: Queries about recent developments that fall outside the model's training data
  • Fact-checking: Verifying claims against live web sources
  • Research assistants: Pulling in the latest papers, articles, or data points

If your application doesn't need real-time information, you can skip grounding entirely and avoid this cost. For applications that do, budget approximately $35 per 1,000 grounded queries on top of your token costs.

Access Gemini 3.1 Pro at 10% Off with Crazyrouter#

Crazyrouter provides access to Gemini 3.1 Pro Preview at 90% of Google's official pricing — a flat 10% discount on all token costs.

Crazyrouter Pricing for Gemini 3.1 Pro#

ComponentGoogle OfficialCrazyrouter (10% off)
Input ≤200K$2.00/MTok$1.80/MTok
Output ≤200K$12.00/MTok$10.80/MTok
Input >200K$4.00/MTok$3.60/MTok
Output >200K$18.00/MTok$16.20/MTok

The integration is seamless — Crazyrouter uses the OpenAI-compatible API format, so you can switch over by changing just two lines in your code.

Python (OpenAI SDK)#

python
from openai import OpenAI

client = OpenAI(
    api_key="your-crazyrouter-api-key",
    base_url="https://crazyrouter.com/v1"
)

response = client.chat.completions.create(
    model="gemini-3.1-pro-preview",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain context-tiered pricing in AI APIs."}
    ],
    max_tokens=2048
)

print(response.choices[0].message.content)

cURL#

bash
curl https://crazyrouter.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-crazyrouter-api-key" \
  -d '{
    "model": "gemini-3.1-pro-preview",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain context-tiered pricing in AI APIs."}
    ],
    "max_tokens": 2048
  }'

No SDK changes, no new libraries, no migration headaches. If your app already uses the OpenAI SDK format, you're two lines away from a 10% discount on every Gemini 3.1 Pro request.

Real-World Cost Scenarios#

Let's put these numbers into context with three practical scenarios.

Scenario 1: Customer Support Chatbot#

Setup: 10K token system prompt, average 2K token user message, 1K token response, 5,000 conversations per day.

Monthly token usage:

  • Input: (10K + 2K) × 5,000 × 30 = 1.8B tokens = 1,800 MTok
  • Output: 1K × 5,000 × 30 = 150M tokens = 150 MTok

Google direct:

  • Input: 1,800 × 2.00=2.00 = 3,600
  • Output: 150 × 12.00=12.00 = 1,800
  • Total: $5,400/month

Crazyrouter (10% off):

  • Input: 1,800 × 1.80=1.80 = 3,240
  • Output: 150 × 10.80=10.80 = 1,620
  • **Total: 4,860/month(saves4,860/month** (saves 540/month)

With context caching on the system prompt, input costs drop further — the 10K system prompt cached across all 150K daily requests would cost pennies compared to re-sending it each time.

Setup: 500K token legal documents (>200K tier), 5K token queries, 10K token analysis outputs, 200 analyses per day.

Monthly token usage:

  • Input: 505K × 200 × 30 = 3.03B tokens = 3,030 MTok (>200K tier)
  • Output: 10K × 200 × 30 = 60M tokens = 60 MTok

Google direct:

  • Input: 3,030 × 4.00=4.00 = 12,120
  • Output: 60 × 18.00=18.00 = 1,080
  • Total: $13,200/month

Crazyrouter (10% off):

  • Input: 3,030 × 3.60=3.60 = 10,908
  • Output: 60 × 16.20=16.20 = 972
  • **Total: 11,880/month(saves11,880/month** (saves 1,320/month)

This scenario highlights why the context tier matters. At 500K tokens per document, you're firmly in the >200K pricing tier. Caching the document for repeated queries within a session would dramatically reduce costs here.

Scenario 3: Developer Tool with Code Context#

Setup: 80K token codebase context (cached), 3K token queries, 2K token responses, 1,000 requests per day.

Monthly token usage:

  • Cached input: 80K × 1,000 × 30 = 2.4B tokens = 2,400 MTok
  • Fresh input: 3K × 1,000 × 30 = 90M tokens = 90 MTok
  • Output: 2K × 1,000 × 30 = 60M tokens = 60 MTok

Google direct (with caching):

  • Cached input: 2,400 × 0.20=0.20 = 480
  • Fresh input: 90 × 2.00=2.00 = 180
  • Output: 60 × 12.00=12.00 = 720
  • Cache storage: 0.08 MTok × 4.50×720hrs=4.50 × 720 hrs = 259.20
  • Total: ~$1,639/month

Crazyrouter (10% off on token costs):

  • Cached input: 2,400 × 0.18=0.18 = 432
  • Fresh input: 90 × 1.80=1.80 = 162
  • Output: 60 × 10.80=10.80 = 648
  • Cache storage: ~$259.20
  • **Total: ~1,501/month(saves 1,501/month** (saves ~138/month)

Even with caching doing the heavy lifting, the 10% Crazyrouter discount still adds up over time.

Gemini 3.1 Pro vs GPT-5.4 vs Claude Sonnet 4.6#

How does Gemini 3.1 Pro stack up against the other leading models on price?

ModelInput Price (MTok)Output Price (MTok)Context WindowNotes
Gemini 3.1 Pro (≤200K)$2.00$12.001M tokensTiered pricing, caching available
Gemini 3.1 Pro (>200K)$4.00$18.001M tokensHigher tier for deep context
GPT-5.4$2.50$10.00256K tokensFlat pricing, no context tiers
Claude Sonnet 4.6$3.00$15.00200K tokensExtended thinking available

Key Comparisons#

Gemini 3.1 Pro vs GPT-5.4: Gemini wins on input cost (2.00vs2.00 vs 2.50 per MTok) but loses on output (12.00vs12.00 vs 10.00). If your workload is input-heavy (large context, short responses), Gemini is cheaper. If you generate long outputs, GPT-5.4 has the edge. Gemini's 1M context window is 4× larger than GPT-5.4's 256K, which is a decisive advantage for long-document workloads.

Gemini 3.1 Pro vs Claude Sonnet 4.6: Gemini is cheaper on both input (2.00vs2.00 vs 3.00) and output (12.00vs12.00 vs 15.00) within the ≤200K tier. Claude Sonnet 4.6 offers extended thinking capabilities that may justify the premium for complex reasoning tasks, but on pure price-per-token, Gemini 3.1 Pro is the more economical choice.

The context window factor: Gemini's 1M token context is unmatched. If your use case requires processing documents longer than 200K-256K tokens, Gemini 3.1 Pro is effectively your only option among these three — and even at the >200K tier pricing (4.00/4.00/18.00), it's enabling workloads that simply aren't possible with the competition.

Caching advantage: Gemini's context caching at $0.20/MTok has no direct equivalent in GPT-5.4 or Claude Sonnet 4.6's standard pricing. For repetitive-context workloads, this can make Gemini 3.1 Pro dramatically cheaper than the headline rates suggest.

Key Takeaways#

  1. Context tiers matter: Stay under 200K input tokens when possible to get the 2.00/2.00/12.00 rates instead of 4.00/4.00/18.00. Architect your prompts accordingly.

  2. Caching is a game-changer: At 0.20/MTok(900.20/MTok (90% off standard input), context caching can slash your bill for any workload with repeated context. The storage cost (4.50/MTok/hr) means you should cache strategically — high-frequency, short-duration sessions benefit most.

  3. The 1M context window is unique: No other major model offers this much context. If you need it, Gemini 3.1 Pro is the clear choice — and the tiered pricing means you only pay the premium when you actually use deep context.

  4. Free tier for prototyping: Test everything before committing budget. Google's free tier is generous enough for meaningful evaluation.

  5. Grounding adds up: At $35 per 1,000 requests, grounding with Google Search is powerful but not cheap. Use the 1,500 free daily requests wisely, and only enable grounding where real-time information genuinely improves output quality.

  6. Crazyrouter saves 10%: A flat 10% discount with zero integration friction. For teams spending 5K+permonthonGeminiAPIcalls,thats5K+ per month on Gemini API calls, that's 500+ back in your pocket — every month.

Start Building with Gemini 3.1 Pro#

Gemini 3.1 Pro Preview delivers frontier-level performance with a pricing structure that rewards smart architecture. The context-tiered model, combined with aggressive caching discounts, means developers who understand the pricing can build powerful applications at surprisingly reasonable costs.

Ready to get started? Sign up for Crazyrouter to access Gemini 3.1 Pro at 10% off official pricing — no contracts, no minimums, just a better rate on every API call. Your existing OpenAI SDK code works out of the box. Change your base_url, and you're live.

👉 Get Started with Crazyrouter →


Last updated: April 27, 2026

Disclaimer: Pricing information is based on publicly available data from Google as of the publication date. Prices may change without notice. Crazyrouter discount rates are subject to Crazyrouter's current terms and pricing policy. Always verify current pricing on the official Google AI and Crazyrouter websites before making purchasing decisions. This article is for informational purposes only and does not constitute financial advice.

Related Articles