Login
Back to Blog
GPT-5-mini Pricing Explained — Ultra-Low Cost AI with Caching and Batch Discounts

GPT-5-mini Pricing Explained — Ultra-Low Cost AI with Caching and Batch Discounts

C
Crazyrouter Team
April 27, 2026
0 viewsEnglishPricing
Share:

GPT-5-mini Pricing Explained — Ultra-Low Cost AI with Caching and Batch Discounts#

OpenAI's GPT-5 family arrived with a lineup designed to cover every use case and budget. At the bottom of the price sheet — but certainly not at the bottom of capability — sits GPT-5-mini. It's the cheapest model in the GPT-5 generation, and it punches well above its weight class.

If you're running high-volume pipelines, building chatbots, or moderating content at scale, GPT-5-mini is the model that lets you do it without watching your API bill spiral out of control. At just 0.75permillioninputtokensand0.75 per million input tokens** and **4.50 per million output tokens, it delivers GPT-5-level reasoning at a fraction of the cost of its bigger siblings.

In this guide, we'll break down every angle of GPT-5-mini pricing — base rates, automatic caching, Batch API discounts, and how Crazyrouter can cut your costs even further. We'll also run through real-world cost scenarios so you can estimate exactly what your workload will cost before you commit a single token.

Base Pricing#

GPT-5-mini keeps things simple with a single pricing tier. There's no separate "long context" rate — you get one price regardless of context length.

ComponentPrice per Million Tokens
Input$0.75
Cached Input$0.075
Output$4.50

That's it. No hidden tiers, no surprise multipliers for longer prompts. Whether you're sending 1,000 tokens or 100,000 tokens in a single request, the per-token rate stays the same.

For context, here's how GPT-5-mini stacks up against the rest of the GPT-5 family:

ModelInput (per MTok)Output (per MTok)
GPT-5-nano$0.30$1.20
GPT-5-mini$0.75$4.50
GPT-5.4$2.50$10.00

GPT-5-mini sits in the sweet spot — significantly more capable than GPT-5-nano for tasks that need real reasoning, while costing a fraction of what GPT-5.4 charges. For most production workloads, it's the default choice.

Automatic Caching — 90% Off Repeated Input#

One of the most powerful cost-saving features in the OpenAI API is automatic prompt caching, and GPT-5-mini supports it fully. When you send requests that share the same prefix (system prompt, few-shot examples, or any repeated content at the beginning of your messages), OpenAI automatically caches that prefix and charges you only 10% of the standard input price for the cached portion.

That means cached input tokens cost just $0.075 per million tokens — essentially free compared to the base rate.

How It Works#

Caching is automatic. You don't need to enable it, configure it, or change your API calls. OpenAI detects when the beginning of your prompt matches a recently sent request and applies the cached rate automatically.

Here's what qualifies for caching:

  • System prompts — If every request in your app uses the same system prompt, that entire block gets cached after the first request.
  • Few-shot examples — Static examples at the beginning of your prompt are prime caching candidates.
  • Shared context — Any repeated prefix across requests within a short time window.

The Math#

Let's say your typical request looks like this:

  • System prompt + few-shot examples: 2,000 tokens (cached after first request)
  • User-specific content: 500 tokens (never cached)
  • Output: 300 tokens

Without caching:

  • Input: 2,500 tokens × 0.75/MTok=0.75/MTok = 0.001875
  • Output: 300 tokens × 4.50/MTok=4.50/MTok = 0.00135
  • Total per request: $0.003225

With caching (after first request):

  • Cached input: 2,000 tokens × 0.075/MTok=0.075/MTok = 0.00015
  • Fresh input: 500 tokens × 0.75/MTok=0.75/MTok = 0.000375
  • Output: 300 tokens × 4.50/MTok=4.50/MTok = 0.00135
  • Total per request: $0.001875

That's a 42% reduction in cost per request — and it happens automatically. The more of your prompt that's cacheable, the bigger the savings.

Caching + Batch API: The Ultimate Combo#

Here's where it gets really interesting. Caching stacks with the Batch API discount (which we'll cover next). If you're running batch jobs with repeated system prompts, you're looking at cached input tokens at $0.0375 per million tokens — that's 95% off the base input rate.

Batch API — 50% Off Everything#

OpenAI's Batch API is designed for workloads that don't need real-time responses. You submit a batch of requests, and OpenAI processes them within a 24-hour window. In exchange for that flexibility, you get a flat 50% discount on both input and output tokens.

ComponentStandard PriceBatch API Price
Input$0.75/MTok$0.375/MTok
Cached Input$0.075/MTok$0.0375/MTok
Output$4.50/MTok$2.25/MTok

The Batch API is perfect for:

  • Data classification — Label thousands of records overnight.
  • Content generation — Generate product descriptions, summaries, or translations in bulk.
  • Evaluation pipelines — Score model outputs or run quality checks on large datasets.
  • Content moderation — Process flagged content in batches rather than one-by-one.

When to Use Batch vs. Real-Time#

Use the Batch API when latency doesn't matter. If your user is waiting for a response, use the standard API. If you're processing a queue of items that can wait minutes or hours, batch it and save 50%.

Crazyrouter Pricing — Even Cheaper#

Crazyrouter offers GPT-5-mini at 55% of OpenAI's official pricing. That's a 45% discount on every token, applied on top of the already-low base rates.

ComponentOpenAI OfficialCrazyrouter PriceSavings
Input$0.75/MTok$0.4125/MTok45% off
Output$4.50/MTok$2.475/MTok45% off

The API is fully compatible with OpenAI's SDK — you just change the base_url and use your Crazyrouter API key. Everything else stays the same: same request format, same response format, same model names.

Code Example: OpenAI Python SDK#

python
from openai import OpenAI

client = OpenAI(
    api_key="your-crazyrouter-api-key",
    base_url="https://crazyrouter.com/v1"
)

response = client.chat.completions.create(
    model="gpt-5-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in simple terms."}
    ]
)

print(response.choices[0].message.content)

Code Example: cURL#

bash
curl https://crazyrouter.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-crazyrouter-api-key" \
  -d '{
    "model": "gpt-5-mini",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing in simple terms."}
    ]
  }'

That's it. Two lines changed (base_url and api_key), and you're saving 45% on every request. No SDK changes, no migration headaches.

Real-World Cost Comparison#

Let's put these numbers into context with three high-volume production scenarios.

Scenario 1: Classification Pipeline#

You're classifying 100,000 customer support tickets per day. Each ticket averages 200 input tokens, with a 50-token classification output. You use a 1,500-token system prompt that gets cached.

Daily token usage:

  • Cached input: 1,500 × 100,000 = 150M tokens
  • Fresh input: 200 × 100,000 = 20M tokens
  • Output: 50 × 100,000 = 5M tokens
ProviderCached InputFresh InputOutputDaily TotalMonthly (30d)
OpenAI Standard$11.25$15.00$22.50$48.75$1,462.50
OpenAI Batch$5.63$7.50$11.25$24.38$731.25
Crazyrouter$6.19$8.25$12.38$26.81$804.38

With the Batch API, you're processing 100K tickets daily for under $25. With Crazyrouter's real-time API, you get instant responses at roughly the same cost as OpenAI's batch pricing.

Scenario 2: Customer-Facing Chatbot#

Your chatbot handles 50,000 conversations per day. Average conversation: 800 input tokens (including a 500-token system prompt that gets cached) and 400 output tokens.

Daily token usage:

  • Cached input: 500 × 50,000 = 25M tokens
  • Fresh input: 300 × 50,000 = 15M tokens
  • Output: 400 × 50,000 = 20M tokens
ProviderCached InputFresh InputOutputDaily TotalMonthly (30d)
OpenAI Standard$1.88$11.25$90.00$103.13$3,093.75
Crazyrouter$1.03$6.19$49.50$56.72$1,701.56

Crazyrouter saves you $1,392/month on this chatbot workload. Note that Batch API isn't applicable here since chatbots need real-time responses.

Scenario 3: Content Moderation at Scale#

You're moderating 500,000 user-generated posts per day. Each post averages 150 input tokens with a 30-token moderation verdict. System prompt is 1,000 tokens (cached).

Daily token usage:

  • Cached input: 1,000 × 500,000 = 500M tokens
  • Fresh input: 150 × 500,000 = 75M tokens
  • Output: 30 × 500,000 = 15M tokens
ProviderCached InputFresh InputOutputDaily TotalMonthly (30d)
OpenAI Standard$37.50$56.25$67.50$161.25$4,837.50
OpenAI Batch$18.75$28.13$33.75$80.63$2,418.75
Crazyrouter$20.63$30.94$37.13$88.69$2,660.63

At half a million posts per day, the Batch API is the clear winner if you can tolerate the latency. For real-time moderation, Crazyrouter keeps you close to batch pricing while delivering instant results.

GPT-5-mini vs GPT-5-nano vs GPT-5.4: Which One Should You Use?#

The GPT-5 family gives you three distinct tiers. Here's how to think about them:

GPT-5-nano (0.30/0.30/1.20 per MTok)#

The ultra-lightweight option. GPT-5-nano is built for the simplest tasks where speed and cost matter more than depth. Think: basic classification, entity extraction, simple reformatting, or routing queries to the right model. It's fast and dirt cheap, but it won't handle nuanced reasoning or complex instructions well.

Best for: High-volume, low-complexity tasks. Use it as a router or pre-filter.

GPT-5-mini (0.75/0.75/4.50 per MTok)#

The workhorse. GPT-5-mini handles the vast majority of production tasks with solid reasoning, good instruction-following, and reliable output quality. It's where most teams should start — capable enough for real work, cheap enough for scale.

Best for: Chatbots, classification, summarization, content generation, code assistance, and most production workloads.

GPT-5.4 (2.50/2.50/10.00 per MTok)#

The heavy hitter. GPT-5.4 is for tasks where quality is non-negotiable — complex analysis, creative writing that needs to be genuinely good, multi-step reasoning, or anything where a wrong answer is expensive. It costs 3-4x more than GPT-5-mini, so use it selectively.

Best for: Complex reasoning, high-stakes decisions, premium user-facing experiences, and tasks where GPT-5-mini's output quality isn't sufficient.

The Smart Approach: Tiered Routing#

Many production systems use all three models together:

  1. GPT-5-nano routes incoming requests and handles trivial tasks.
  2. GPT-5-mini handles the bulk of standard workloads.
  3. GPT-5.4 steps in for complex or high-value requests.

This approach keeps your average cost close to GPT-5-mini rates while delivering GPT-5.4 quality where it matters.

Key Takeaways#

  • GPT-5-mini costs 0.75/MTokinputand0.75/MTok input and 4.50/MTok output — the cheapest capable model in the GPT-5 family.
  • Automatic caching cuts input costs by 90% for repeated prefixes like system prompts and few-shot examples. No configuration needed.
  • Batch API saves 50% on both input and output for workloads that can tolerate up to 24-hour latency.
  • Caching + Batch stacks — cached input in batch mode costs just $0.0375/MTok, a 95% discount from the base rate.
  • Crazyrouter offers GPT-5-mini at 55% of official pricing — 45% savings with zero migration effort. Just swap the base_url.
  • Single-tier pricing means no surprises. The same rate applies regardless of context length.
  • For most production workloads, GPT-5-mini is the default choice. Use GPT-5-nano for simple routing and GPT-5.4 for complex reasoning.

Get Started with GPT-5-mini on Crazyrouter#

Ready to run GPT-5-mini at 45% off? Crazyrouter gives you full OpenAI API compatibility with lower prices, no rate limit headaches, and zero migration effort.

  1. Sign up at crazyrouter.com
  2. Get your API key from the dashboard
  3. Change your base_url to https://crazyrouter.com/v1
  4. Start saving on every request

No contracts, no minimums. Pay as you go, and only for what you use.

👉 Try Crazyrouter Now →


Last updated: April 27, 2026

Disclaimer: Pricing information is based on publicly available data from OpenAI as of the publication date. Prices are subject to change. Crazyrouter pricing is based on current rates and may be adjusted. Always verify current pricing on the official OpenAI pricing page and Crazyrouter before making purchasing decisions. Token counts in cost scenarios are estimates and actual usage may vary based on your specific implementation.

Related Articles