EnglishPricing

Gemini 3 Flash Pricing Explained — Balanced Speed and Cost with Crazyrouter Savings

Complete breakdown of Gemini 3 Flash Preview API pricing — $0.50/$3.00 per MTok, context caching, free tier, and Crazyrouter savings.

Crazyrouter Team

April 27, 2026 / 354 views

Crazyrouter

Check live pricing Open API Playground Open image tool Read the docs

Gemini 3 Flash Pricing Explained — Balanced Speed and Cost with Crazyrouter Savings#

Google's Gemini 3 Flash Preview sits in a sweet spot that many developers have been waiting for: faster than the heavyweight Pro models, smarter than the ultra-cheap Lite tier, and priced to make production workloads genuinely affordable. With input tokens at just $0.50 per million, a generous 1 million token context window, and built-in context caching, Gemini 3 Flash is designed for teams that need strong reasoning without burning through their API budget.

In this guide, we break down every aspect of Gemini 3 Flash Preview pricing — base rates, caching economics, the free tier, and how routing through Crazyrouter can shave an additional 10% off your bill. Whether you're building a chatbot, processing documents at scale, or running multimodal pipelines, you'll walk away knowing exactly what Gemini 3 Flash will cost you.

Last updated: April 27, 2026.

Base Pricing — What You Pay Per Token#

Gemini 3 Flash Preview uses a straightforward per-token pricing model. Here's the full rate card:

Category	Price per Million Tokens
Text Input	$0.50
Image Input	$0.50
Video Input	$0.50
Audio Input	$1.00
Text Output	$3.00

A few things stand out immediately:

Text, image, and video inputs share the same rate. At $0.50/MTok, Google isn't charging a premium for multimodal inputs (except audio). This is a significant advantage if your application processes screenshots, diagrams, video frames, or mixed-media documents — you pay the same flat rate regardless of modality.

Audio input costs double. At $1.00/MTok, audio is still very affordable compared to dedicated speech-to-text services, but it's worth noting the 2x multiplier if you're building voice-heavy applications.

Output tokens are 6x the input price. The $3.00/MTok output rate follows the industry pattern where generation costs significantly more than comprehension. This makes prompt engineering and output length management important cost levers.

Context window: 1 million tokens. Gemini 3 Flash supports up to 1M tokens of context, which is enormous for a model at this price point. You can feed entire codebases, lengthy legal documents, or hours of meeting transcripts in a single request.

How This Compares to Raw Numbers#

To put these prices in perspective:

1 million input tokens ≈ 750,000 words ≈ roughly 10 full-length novels
Processing 1M input tokens costs just $0.50
Generating a 2,000-word response (~2,700 tokens) costs about $0.008 — less than a penny

For most applications, the per-request cost with Gemini 3 Flash is measured in fractions of a cent.

Context Caching — Slash Repeat Costs by 90%#

One of the most powerful cost-saving features in the Gemini API is context caching, and Gemini 3 Flash supports it fully. If your application repeatedly sends the same large context (system prompts, reference documents, few-shot examples), caching lets you pay for that context once and reuse it at a steep discount.

Caching Rates#

Component	Price
Cached Input Tokens	$0.05 / MTok
Cache Storage	$1.00 / MTok / hour

**Cached input tokens cost just $0.05/MTok** — that's a 90% discount compared to the standard$ 0.50/MTok input rate. If you're sending a 200K-token system prompt with every request, caching turns that from $0.10 per call to$ 0.01 per call.

Cache Storage Economics#

The storage cost of $1.00/MTok/hour means you need to think about cache lifetime. Here's a quick calculation:

100K cached tokens stored for 1 hour = $0.10
100K cached tokens used in 50 requests over that hour = saves $2.25 in input costs (50 × 100K ×$ 0.45 savings per MTok)
Net savings: $2.15 for that hour

The breakeven point is low. If you're making more than a handful of requests per hour with shared context, caching pays for itself quickly.

When to Use Caching#

Context caching makes the most sense when:

Your system prompt or reference documents exceed 10K tokens
You're serving multiple users with the same base context
You're running batch processing where every request shares a common prefix
You have RAG pipelines with stable knowledge bases

For applications with highly dynamic, per-request contexts, caching provides less benefit — but for the majority of production use cases, it's a no-brainer.

Free Tier — Experiment Before You Spend#

Google offers a free tier for Gemini 3 Flash Preview, making it one of the most accessible frontier models to experiment with. The free tier lets developers:

Test the model's capabilities without entering payment information
Build and iterate on prototypes at zero cost
Run small-scale evaluations against competing models

The free tier comes with rate limits (lower requests per minute and tokens per day compared to paid), but for development and experimentation, it's more than sufficient. This is especially valuable if you're evaluating whether Gemini 3 Flash meets your quality bar before committing to production spend.

Pro tip: Use the free tier to benchmark Gemini 3 Flash against your current model. If quality meets your threshold, the paid tier's economics are hard to beat.

Crazyrouter — Save an Extra 10% on Every Call#

If you're already planning to use Gemini 3 Flash in production, routing your API calls through Crazyrouter gives you an automatic 10% discount on all token costs.

Crazyrouter Pricing for Gemini 3 Flash#

Category	Official Price	Crazyrouter Price	Savings
Text/Image/Video Input	$0.50/MTok	$0.45/MTok	10%
Audio Input	$1.00/MTok	$0.90/MTok	10%
Output	$3.00/MTok	$2.70/MTok	10%
Cached Input	$0.05/MTok	$0.045/MTok	10%

The discount applies uniformly across all token types, including cached tokens. For high-volume applications, this adds up fast.

Integration — Drop-In Compatible#

Crazyrouter is fully compatible with the OpenAI SDK format. You don't need a custom client library — just change your base_url and API key.

Using the OpenAI Python SDK:

python

from openai import OpenAI

client = OpenAI(
    api_key="your-crazyrouter-api-key",
    base_url="https://crazyrouter.com/v1"
)

response = client.chat.completions.create(
    model="gemini-3-flash-preview",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    max_tokens=1024
)

print(response.choices[0].message.content)

Using curl:

bash

curl https://crazyrouter.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-crazyrouter-api-key" \
  -d '{
    "model": "gemini-3-flash-preview",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    "max_tokens": 1024
  }'

That's it. Two lines changed (base URL and API key), and you're saving 10% on every request. Crazyrouter handles routing, load balancing, and billing transparently.

Real-World Cost Scenarios#

Let's walk through three practical scenarios to see what Gemini 3 Flash actually costs in production.

Scenario 1: Customer Support Chatbot#

Setup: A chatbot handling 10,000 conversations per day. Each conversation averages 2,000 input tokens (system prompt + user message + history) and 500 output tokens.

Component	Daily Tokens	Daily Cost (Official)	Daily Cost (Crazyrouter)
Input	20M tokens	$10.00	$9.00
Output	5M tokens	$15.00	$13.50
Total		$25.00/day	$22.50/day

Monthly cost: ~ $750 official, ~$ 675 via Crazyrouter. That's $75/month saved just by changing your base URL.

With context caching (assuming a shared 1,500-token system prompt across all requests):

Cached input savings: 15M tokens/day × $0.45 savings =$ 6.75/day
Storage cost: ~1.5K tokens cached for 24h = negligible
Monthly cost with caching via Crazyrouter: ~$472

Scenario 2: Document Processing Pipeline#

Setup: Processing 500 legal documents per day, each averaging 50,000 input tokens. Output is a 1,000-token summary per document.

Component	Daily Tokens	Daily Cost (Official)	Daily Cost (Crazyrouter)
Input	25M tokens	$12.50	$11.25
Output	500K tokens	$1.50	$1.35
Total		$14.00/day	$12.60/day

Monthly cost: ~ $420 official, ~$ 378 via Crazyrouter. For processing 15,000 legal documents a month, that's remarkably affordable.

Scenario 3: Multimodal Content Moderation#

Setup: Analyzing 50,000 images per day for content moderation. Each image averages 1,000 tokens, with a 200-token classification output.

Component	Daily Tokens	Daily Cost (Official)	Daily Cost (Crazyrouter)
Image Input	50M tokens	$25.00	$22.50
Output	10M tokens	$30.00	$27.00
Total		$55.00/day	$49.50/day

Monthly cost: ~ $1,650 official, ~$ 1,485 via Crazyrouter. $165/month saved — enough to cover other infrastructure costs.

Gemini 3 Flash vs. 3.1 Pro vs. 2.5 Flash — Where It Fits#

Understanding where Gemini 3 Flash sits in Google's model lineup helps you pick the right tool for the job.

Gemini 3.1 Pro — The Heavyweight#

Gemini 3.1 Pro is Google's most capable model, designed for complex reasoning, advanced code generation, and tasks where quality is the top priority. It comes at a higher price point and slower inference speed. Choose 3.1 Pro when:

You need the absolute best reasoning quality
Tasks involve complex multi-step logic
Cost is secondary to output quality
You're doing research or high-stakes analysis

Gemini 3 Flash Preview — The Sweet Spot#

Gemini 3 Flash occupies the middle ground: strong reasoning capabilities at a fraction of the Pro price, with significantly faster response times. Choose 3 Flash when:

You need a balance of quality, speed, and cost
Production workloads require low latency
Your application handles high request volumes
Multimodal processing is a core requirement

Gemini 2.5 Flash — The Budget Option#

The previous-generation Flash model remains available at even lower prices, but with reduced capabilities. Choose 2.5 Flash when:

You're running extremely cost-sensitive workloads
Tasks are relatively simple (classification, extraction, summarization)
You've tested and confirmed 2.5 Flash quality is sufficient
Maximum cost savings outweigh incremental quality gains

Quick Comparison#

Aspect	2.5 Flash	3 Flash Preview	3.1 Pro
Input Price	Lower	$0.50/MTok	Higher
Output Price	Lower	$3.00/MTok	Higher
Reasoning	Good	Strong	Best
Speed	Fast	Fast	Moderate
Context Window	1M	1M	1M+
Best For	Simple tasks	Production workloads	Complex reasoning

For most production applications, Gemini 3 Flash Preview hits the optimal price-performance ratio. You get meaningfully better quality than 2.5 Flash without the cost premium of 3.1 Pro.

Key Takeaways#

Input is cheap. At $0.50/MTok for text, image, and video, Gemini 3 Flash makes multimodal processing accessible for virtually any budget.
Output is where costs add up. The $3.00/MTok output rate means controlling response length is your biggest cost lever. Use max_tokens wisely.
Context caching is a game-changer. If you're sending repeated context, caching cuts input costs by 90%. The storage fees are negligible for most use cases.
The free tier removes barriers. Test and prototype without spending a dime. Validate quality before committing to production.
Crazyrouter saves 10% across the board. A two-line code change (base URL + API key) gives you an instant discount on every token. For high-volume applications, this compounds into meaningful savings.
Gemini 3 Flash is the production workhorse. It's not the cheapest model and it's not the most powerful — it's the one that makes the most sense for the majority of real-world applications.

Get Started with Gemini 3 Flash on Crazyrouter#

Ready to build with Gemini 3 Flash at discounted rates?

Sign up at crazyrouter.com and grab your API key
Set your base URL to https://crazyrouter.com/v1
Use model gemini-3-flash-preview in your requests
Start saving 10% on every API call — no contracts, no minimums

Crazyrouter supports the full OpenAI-compatible API format, so you can switch from any existing provider in minutes. All Gemini models are available, along with Claude, GPT, and other frontier models — all at discounted rates.

👉 Start using Gemini 3 Flash on Crazyrouter →

Disclaimer: Pricing information is based on publicly available data as of April 27, 2026. Google may update Gemini API pricing at any time. "Preview" models may have different pricing when they reach general availability. Crazyrouter discount rates are subject to change. Always verify current pricing on the official Google AI and Crazyrouter websites before making purchasing decisions. This article is for informational purposes only and does not constitute financial advice.