GPT-5-nano Pricing Explained — The Cheapest GPT Model for High-Throughput Workloads

GPT-5-nano Pricing Explained — The Cheapest GPT Model for High-Throughput Workloads#

OpenAI's GPT-5-nano is the smallest and cheapest model in the GPT-5 family, purpose-built for high-throughput workloads where speed and cost matter more than deep reasoning. At just $0.20 per million input tokens** and **$ 1.25 per million output tokens, it's the most affordable GPT model ever released — and through Crazyrouter, you can access it for even less.

Whether you're running classification pipelines, content filtering at scale, or intent routing for millions of requests per day, GPT-5-nano delivers GPT-level intelligence at a fraction of the cost. This guide breaks down every pricing tier, discount mechanism, and real-world cost scenario so you can plan your budget with confidence.

Base Pricing#

GPT-5-nano follows OpenAI's standard per-token pricing model. Here's the official rate card:

Tier	Input	Output
Standard	$0.20 / MTok	$1.25 / MTok
Cached Input	$0.02 / MTok	—
Batch API	$0.10 / MTok	$0.625 / MTok

Key details:

No long-context pricing tier — GPT-5-nano has a fixed context window without the premium pricing that larger models charge for extended contexts.
Pay-per-token — You're billed only for the tokens you actually use. No minimum commitments, no reserved capacity fees.
Same billing infrastructure — Works with your existing OpenAI billing setup, usage tiers, and rate limits.

To put this in perspective: processing 1 million tokens of input costs just $0.20 — that's roughly 750,000 words of text for twenty cents. For classification and routing tasks where outputs are short (a single label or score), your effective cost per request can drop to fractions of a cent.

Automatic Caching#

One of GPT-5-nano's most powerful cost-saving features is automatic prompt caching. OpenAI automatically caches frequently-used prompt prefixes and charges only 10% of the standard input rate for cached tokens.

How It Works#

Cached input tokens: $0.02 / MTok (90% discount vs. standard input)
Automatic — no configuration needed. OpenAI detects repeated prompt prefixes and caches them transparently.
Cache hits happen when your requests share a common system prompt or prefix of at least 1,024 tokens.

When Caching Saves You the Most#

Caching is most effective for workloads with:

Long system prompts — If you use a detailed system prompt (classification rules, output schemas, few-shot examples), that prefix gets cached after the first request.
Batch classification — Sending thousands of items through the same classification pipeline means only the first request pays full input price for the system prompt.
Consistent prefixes — Any workflow where the first N tokens of your prompt remain identical across requests.

Cost Impact Example#

Imagine a classification pipeline with a 2,000-token system prompt processing 100,000 items per day, each with a 200-token user input:

Without caching: (2,000 + 200) × 100,000 = 220M input tokens → $44.00/day
With caching: (2,000 × $0.02/MTok + 200 ×$ 0.20/MTok) × 100,000 = $4.00 +$ 4.00 = $8.00/day

That's an 82% reduction in input costs just from automatic caching — no code changes required.

Batch API#

For workloads that don't require real-time responses, OpenAI's Batch API offers a flat 50% discount on both input and output tokens.

Batch API Pricing#

Token Type	Standard	Batch API	Savings
Input	$0.20 / MTok	$0.10 / MTok	50% off
Output	$1.25 / MTok	$0.625 / MTok	50% off

How Batch API Works#

Submit a batch — Upload a JSONL file containing up to 50,000 requests.
Wait for processing — Batches complete within 24 hours (typically much faster).
Retrieve results — Download the completed batch with all responses.

Ideal Use Cases for Batch + GPT-5-nano#

Nightly content moderation — Process the day's user-generated content in one batch overnight.
Bulk classification — Categorize millions of products, tickets, or documents.
Data enrichment — Add labels, summaries, or metadata to large datasets.
Evaluation pipelines — Score model outputs or run quality checks on training data.

Combining Batch API + Caching#

Yes, caching works with Batch API too. If your batch requests share a common prefix, you get both discounts stacked:

Cached input in Batch mode: effectively ** $0.01 / MTok** (the 50% batch discount applied to the$ 0.02 cached rate)
That's 95% cheaper than standard input pricing.

Crazyrouter Pricing#

Through Crazyrouter, you can access GPT-5-nano at 55% of OpenAI's official pricing — a 45% discount with no rate limit reductions or feature limitations.

Crazyrouter Rates#

Token Type	OpenAI Official	Crazyrouter	You Save
Input	$0.20 / MTok	$0.11 / MTok	45%
Output	$1.25 / MTok	$0.6875 / MTok	45%

Why Crazyrouter Is Cheaper#

Crazyrouter aggregates demand across thousands of developers, negotiates volume pricing with OpenAI, and passes the savings directly to you. You get:

Same model, same quality — identical outputs to calling OpenAI directly
OpenAI-compatible API — drop-in replacement, no code changes needed
No rate limit penalties — same throughput as direct access
Pay-as-you-go — no minimums, no commitments

Code Example: OpenAI Python SDK#

python

from openai import OpenAI

client = OpenAI(
    api_key="your-crazyrouter-api-key",
    base_url="https://crazyrouter.com/v1"
)

response = client.chat.completions.create(
    model="gpt-5-nano",
    messages=[
        {"role": "system", "content": "Classify the following text into one of: positive, negative, neutral."},
        {"role": "user", "content": "This product exceeded my expectations in every way!"}
    ],
    max_tokens=10
)

print(response.choices[0].message.content)
# Output: positive

Code Example: cURL#

bash

curl https://crazyrouter.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-crazyrouter-api-key" \
  -d '{
    "model": "gpt-5-nano",
    "messages": [
      {"role": "system", "content": "Classify sentiment: positive, negative, or neutral."},
      {"role": "user", "content": "Terrible experience, would not recommend."}
    ],
    "max_tokens": 10
  }'

Code Example: Node.js#

javascript

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'your-crazyrouter-api-key',
  baseURL: 'https://crazyrouter.com/v1',
});

const response = await client.chat.completions.create({
  model: 'gpt-5-nano',
  messages: [
    { role: 'system', content: 'Extract the intent: greeting, question, complaint, or other.' },
    { role: 'user', content: 'Hey, can you help me reset my password?' },
  ],
  max_tokens: 10,
});

console.log(response.choices[0].message.content);
// Output: question

Real-World Cost Comparison#

Let's look at three ultra-high-volume scenarios to see how GPT-5-nano pricing plays out in production.

Scenario 1: 100M Tokens/Day — Text Classification#

Use case: E-commerce platform classifying 500,000 product reviews daily into sentiment categories.

Average input: 150 tokens (review) + 50 tokens (system prompt) = 200 tokens/request
Average output: 5 tokens (label)
Daily volume: 500,000 requests → 100M input tokens, 2.5M output tokens

Provider	Daily Input Cost	Daily Output Cost	Monthly Total
OpenAI Direct	$20.00	$3.13	$694
OpenAI + Caching	$4.00	$3.13	$214
Crazyrouter	$11.00	$1.72	$382
Crazyrouter + Caching	$2.20	$1.72	$118
Batch API (OpenAI)	$10.00	$1.56	$347
Batch API (Crazyrouter)	$5.50	$0.86	$191

Best option: Crazyrouter + Caching at $118/month for 100M tokens/day of classification.

Scenario 2: 200M Tokens/Day — Content Filtering#

Use case: Social media platform filtering 2 million posts daily for policy violations.

Average input: 80 tokens (post) + 500 tokens (policy rules) = 580 tokens/request (but 500 cached)
Average output: 20 tokens (verdict + reason)
Daily volume: 2,000,000 requests → ~160M uncached + 1B cached input tokens, 40M output tokens

Provider	Daily Cost	Monthly Cost
OpenAI Direct	$82.00	$2,460
OpenAI + Caching	$52.00	$1,560
Crazyrouter + Caching	$28.60	$858
Batch + Crazyrouter	$14.30	$429

Best option: If latency allows, Batch API through Crazyrouter brings this down to $429/month for 2M daily content moderation decisions.

Scenario 3: 50M Tokens/Day — Intent Routing#

Use case: Customer service platform routing 1 million incoming messages to the correct department.

Average input: 30 tokens (message) + 200 tokens (routing rules) = 230 tokens/request (200 cached)
Average output: 10 tokens (department + confidence)
Daily volume: 1,000,000 requests → 30M fresh + 200M cached input tokens, 10M output tokens

Provider	Daily Cost	Monthly Cost
OpenAI Direct	$18.50	$555
OpenAI + Caching	$10.00	$300
Crazyrouter + Caching	$5.50	$165

Best option: Crazyrouter + Caching at $165/month for 1M daily routing decisions with sub-second latency.

GPT-5-nano vs Gemini 2.5 Flash-Lite vs Claude Haiku#

How does GPT-5-nano stack up against other budget-tier models from competing providers?

Feature	GPT-5-nano	Gemini 2.5 Flash-Lite	Claude 3.5 Haiku
Input Price	$0.20 / MTok	$0.075 / MTok	$0.80 / MTok
Output Price	$1.25 / MTok	$0.30 / MTok	$4.00 / MTok
Cached Input	$0.02 / MTok	$0.01875 / MTok	$0.08 / MTok
Batch Discount	50% off	Not available	Not available
Context Window	128K	1M	200K
Speed	Very fast	Very fast	Fast
Best For	Classification, routing	Long-context cheap tasks	Balanced quality/cost

Key Takeaways from the Comparison#

Gemini 2.5 Flash-Lite is cheaper on paper for raw token costs, but lacks a Batch API discount and has less predictable latency for high-throughput workloads.
Claude 3.5 Haiku offers better reasoning quality but costs 4× more on input and 3.2× more on output — overkill for simple classification tasks.
GPT-5-nano hits the sweet spot for OpenAI-ecosystem users: cheapest GPT model, excellent Batch API support, automatic caching, and proven reliability at scale.

When to Choose GPT-5-nano#

You're already in the OpenAI ecosystem and want the cheapest option
Your tasks are simple: classification, routing, extraction, filtering
You need Batch API for offline processing
You want automatic caching without configuration
You value the OpenAI API's reliability and tooling

When to Consider Alternatives#

You need 1M+ context windows → Gemini 2.5 Flash-Lite
You need stronger reasoning at budget prices → Claude 3.5 Haiku
You're optimizing purely on token cost with no ecosystem preference → Gemini 2.5 Flash-Lite

Key Takeaways#

$0.20/MTok input,$ 1.25/MTok output — GPT-5-nano is the cheapest GPT model available, period.
Automatic caching cuts input costs by 90% — No configuration needed. Repeated prefixes are cached at $0.02/MTok automatically.
Batch API saves 50% — For non-real-time workloads, submit batches and pay half price on both input and output.
Crazyrouter saves an additional 45% — Access GPT-5-nano at $0.11/MTok input through Crazyrouter's volume pricing, with zero feature limitations.
Stack the discounts — Caching + Batch + Crazyrouter can bring effective costs below $0.01/MTok for cached inputs in batch mode.
Purpose-built for volume — GPT-5-nano isn't trying to be the smartest model. It's trying to be the fastest and cheapest for tasks that don't need deep reasoning.

Get Started with GPT-5-nano on Crazyrouter#

Ready to run GPT-5-nano at 45% off? Getting started takes less than a minute:

Sign up at crazyrouter.com
Get your API key from the dashboard
Point your SDK to https://crazyrouter.com/v1
Use model gpt-5-nano — that's it. Same API, same responses, lower bill.

No contracts. No minimums. Pay only for what you use.

→ Get Your API Key

Last updated: April 27, 2026

Disclaimer: Pricing information is based on publicly available data from OpenAI as of the publication date. Prices may change without notice. Crazyrouter pricing is subject to the terms and conditions on crazyrouter.com. Always verify current rates on the respective provider's pricing page before making purchasing decisions.

GPT-5-nano Pricing Explained — The Cheapest GPT Model for High-Throughput Workloads