Login
Back to Blog
GPT-5 Pricing Explained — Reasoning Tokens, Caching, Batch API, and How to Save with Crazyrouter

GPT-5 Pricing Explained — Reasoning Tokens, Caching, Batch API, and How to Save with Crazyrouter

C
Crazyrouter Team
April 27, 2026
0 viewsEnglishPricing
Share:

GPT-5 Pricing Explained — Reasoning Tokens, Caching, Batch API, and How to Save with Crazyrouter#

GPT-5 is OpenAI's most powerful model to date — a unified reasoning model that succeeds the o3 and o4-mini series while folding in the conversational fluency of GPT-4o. It ships with a 400K context window, 128K max output tokens, and built-in chain-of-thought reasoning that can tackle everything from multi-step math proofs to complex code generation.

But power comes at a price. GPT-5's pricing structure introduces a concept that catches many developers off guard: reasoning tokens. These invisible tokens are generated during the model's internal thinking process, and they're billed at the output rate. If you don't understand how they work, your API bill can balloon to 5–10x what you expected.

This guide breaks down every aspect of GPT-5 API pricing — base rates, reasoning token mechanics, automatic caching, the Batch API discount, and how to cut your costs by 45% using Crazyrouter as an API proxy. Whether you're building a production app or experimenting with GPT-5 for the first time, this is the pricing reference you need.

Last updated: April 27, 2026.


Base Pricing#

GPT-5's pricing follows OpenAI's standard per-token model, but with rates that reflect its position as a frontier reasoning model.

ComponentPrice per Million Tokens
Input tokens$1.25
Cached input tokens$0.125 (90% discount)
Output tokens$10.00

Key specs#

  • Context window: 400,000 tokens
  • Max output tokens: 128,000 tokens
  • Knowledge cutoff: Early 2026
  • Supported modalities: Text, image, audio input; text and audio output

At first glance, the input price looks reasonable — 1.25permilliontokensiscompetitivewithotherfrontiermodels.Theoutputpriceof1.25 per million tokens is competitive with other frontier models. The output price of 10.00/MTok is where costs add up, especially once you factor in reasoning tokens (more on that below).

For comparison, here's how GPT-5 stacks up against other OpenAI models:

ModelInput ($/MTok)Output ($/MTok)Context
GPT-5$1.25$10.00400K
o3$2.00$8.00200K
o4-mini$0.40$1.60200K
GPT-4o$2.50$10.00128K
GPT-4.1$2.00$8.001M

GPT-5 actually has cheaper input tokens than GPT-4o and GPT-4.1, while matching or exceeding their capabilities. The 400K context window is double what o3 offered. On paper, it's a strong value proposition — until reasoning tokens enter the picture.


Reasoning Tokens: The Hidden Cost Multiplier#

This is the single most important concept to understand about GPT-5 pricing. Get this wrong, and your costs will be unpredictable.

What are reasoning tokens?#

When GPT-5 processes a complex request, it doesn't jump straight to an answer. It thinks first. The model generates an internal chain of thought — breaking down the problem, considering approaches, checking its work — before producing the visible response you see in the API output.

These internal thinking steps consume reasoning tokens. You don't see them in the response content (they're hidden by default), but they absolutely show up on your bill.

How are reasoning tokens billed?#

Reasoning tokens are billed at the output token rate — $10.00 per million tokens. This is the critical detail. Even though you never see these tokens, they cost the same as the visible output.

Here's what a typical API response looks like:

json
{
  "usage": {
    "prompt_tokens": 1200,
    "completion_tokens": 8500,
    "completion_tokens_details": {
      "reasoning_tokens": 6400,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0
    }
  }
}

In this example, the total completion tokens are 8,500 — but only 2,100 of those are the visible response. The remaining 6,400 are reasoning tokens. You're paying for 8,500 output tokens, not 2,100.

Why reasoning tokens can be 2–10x the visible output#

The ratio of reasoning tokens to visible output depends on the complexity of the task:

  • Simple Q&A or text generation: Reasoning tokens might be 0.5–1x the visible output. The model doesn't need to think hard.
  • Multi-step math or logic: Reasoning tokens can be 3–5x the visible output. The model is working through steps internally.
  • Complex code generation or debugging: Reasoning tokens can hit 5–10x the visible output. The model is planning, writing, reviewing, and revising internally before showing you the final answer.

This means a request that produces 1,000 visible output tokens might actually consume 5,000–10,000 total output tokens. At 10/MTok,thatsthedifferencebetween10/MTok, that's the difference between 0.01 and $0.10 for a single request.

Controlling costs with reasoning_effort#

OpenAI provides a reasoning_effort parameter that lets you control how much thinking GPT-5 does. This is your primary lever for managing reasoning token costs.

LevelBehaviorReasoning Token Impact
minimalBare minimum reasoning~0.5x visible output
lowLight reasoning~1–2x visible output
mediumBalanced (default for many tasks)~2–5x visible output
highDeep reasoning, maximum accuracy~5–10x visible output
python
from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-5",
    reasoning_effort="low",  # Reduce reasoning for simpler tasks
    messages=[
        {"role": "user", "content": "Summarize this article in 3 bullet points."}
    ]
)

Rule of thumb: Use low or minimal for straightforward tasks (summarization, translation, simple Q&A). Reserve medium and high for tasks where accuracy matters — math, code, complex analysis. This single parameter can cut your costs by 50–80% on routine requests.


Automatic Caching: 90% Off Repeated Input#

GPT-5 supports automatic prompt caching — and unlike previous OpenAI models, you don't need to do anything to enable it. It just works.

How it works#

When you send a request to GPT-5, OpenAI automatically caches the prompt prefix. If a subsequent request shares the same prefix (system prompt, few-shot examples, or any repeated content at the beginning of the prompt), the cached portion is billed at the cached input rate: **0.125/MTokinsteadof0.125/MTok** instead of 1.25/MTok — a 90% discount.

Cache retention#

Cached prompts are retained for up to 24 hours with extended retention, though the exact duration depends on usage patterns. Frequently accessed caches are kept longer. Infrequently used caches may expire sooner.

When caching saves the most#

Caching is most valuable when you have:

  • Long system prompts that stay the same across requests (e.g., a 5,000-token system prompt for a customer support bot)
  • Few-shot examples prepended to every request
  • Document context that multiple users query against (e.g., RAG pipelines where the retrieved context is the same document)

Practical example#

Suppose your system prompt is 10,000 tokens and you make 1,000 requests per day:

  • Without caching: 10,000 × 1,000 = 10M input tokens × 1.25/MTok=1.25/MTok = **12.50/day**
  • With caching: 10,000 × 1,000 = 10M input tokens × 0.125/MTok=0.125/MTok = **1.25/day**

That's 11.25savedperday11.25 saved per day — 337.50 per month — just from automatic caching on the system prompt alone.

Tips to maximize cache hits#

  1. Put static content first. The cache matches from the beginning of the prompt. Your system prompt and few-shot examples should come before any dynamic user content.
  2. Keep system prompts consistent. Even a single character change invalidates the cache for everything after that point.
  3. Batch similar requests. If multiple users are querying the same document, route them through the same prompt structure.

Batch API: 50% Off for Async Workloads#

OpenAI's Batch API lets you submit large batches of requests and receive results within 24 hours. The tradeoff: you give up real-time responses in exchange for a 50% discount on all token costs.

ComponentStandard PriceBatch API Price
Input tokens$1.25/MTok$0.625/MTok
Cached input$0.125/MTok$0.0625/MTok
Output tokens$10.00/MTok$5.00/MTok

When to use Batch API#

The Batch API is ideal for:

  • Content generation pipelines — generating hundreds of product descriptions, blog drafts, or translations
  • Data processing — classifying, extracting, or summarizing large datasets
  • Evaluation and testing — running model evaluations across thousands of test cases
  • Nightly jobs — any workload where results don't need to be instant

How it works#

  1. Upload a .jsonl file with your requests
  2. Create a batch job
  3. Poll for completion (typically within 24 hours)
  4. Download results
python
from openai import OpenAI

client = OpenAI()

# Upload the batch file
batch_file = client.files.create(
    file=open("requests.jsonl", "rb"),
    purpose="batch"
)

# Create the batch
batch = client.batches.create(
    input_file_id=batch_file.id,
    endpoint="/v1/chat/completions",
    completion_window="24h"
)

Combining Batch API with automatic caching can yield dramatic savings. If your batch requests share common prefixes, you get the 50% batch discount on top of the 90% caching discount on input tokens — effectively paying $0.0625/MTok for cached input in batch mode.


Save 45% with Crazyrouter#

Crazyrouter is an API proxy that gives you access to GPT-5 (and 200+ other models) at significantly reduced prices. It's fully compatible with the OpenAI SDK — you just change the base_url and API key.

Crazyrouter GPT-5 Pricing#

ComponentOpenAI OfficialCrazyrouter (55%)You Save
Input tokens$1.25/MTok$0.6875/MTok45%
Output tokens$10.00/MTok$5.50/MTok45%

Crazyrouter charges 55% of OpenAI's official pricing, which means you save 45% on every token — input and output, including reasoning tokens.

Setup with OpenAI Python SDK#

Switching to Crazyrouter takes two lines of code:

python
from openai import OpenAI

client = OpenAI(
    api_key="your-crazyrouter-key",
    base_url="https://crazyrouter.com/v1"
)

response = client.chat.completions.create(
    model="gpt-5",
    reasoning_effort="medium",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum entanglement in simple terms."}
    ]
)

print(response.choices[0].message.content)

Setup with curl#

bash
curl https://crazyrouter.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-crazyrouter-key" \
  -d '{
    "model": "gpt-5",
    "reasoning_effort": "medium",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum entanglement in simple terms."}
    ]
  }'

Why Crazyrouter?#

  • Drop-in replacement — same OpenAI SDK, same API format, same response structure
  • 200+ models — access GPT-5, Claude, Gemini, DeepSeek, and more from a single API key
  • No rate limit surprises — Crazyrouter handles load balancing across multiple upstream keys
  • Pay-as-you-go — no subscriptions, no minimums

Real-World Cost Scenarios#

Let's walk through three realistic scenarios to see how GPT-5 costs play out in practice — and how reasoning tokens, caching, and Crazyrouter affect the bottom line.

Scenario 1: Customer Support Chatbot#

Setup: 5,000-token system prompt, average 500-token user message, 800-token visible response, low reasoning effort.

  • Reasoning tokens: ~1x visible output = 800 tokens
  • Total output tokens: 1,600 per request
  • Requests per day: 10,000
Cost ComponentTokens/DayOpenAI PriceCrazyrouter Price
Input (first request, uncached)5.5M$6.88$3.78
Input (cached, 9,999 requests)49.995M × cached$6.25$3.44
Output (incl. reasoning)16M$160.00$88.00
Daily total$173.13$95.22
Monthly total$5,193.90$2,856.60

Monthly savings with Crazyrouter: $2,337.30

Note how output costs dominate even with low reasoning effort. The 5,000-token system prompt is almost free after the first request thanks to caching.

Scenario 2: Code Generation Pipeline#

Setup: 2,000-token system prompt, 3,000-token code context, 2,000-token visible output, high reasoning effort for maximum code quality.

  • Reasoning tokens: ~8x visible output = 16,000 tokens
  • Total output tokens: 18,000 per request
  • Requests per day: 500
Cost ComponentTokens/DayOpenAI PriceCrazyrouter Price
Input (mostly cached)2.5M$0.63$0.35
Output (incl. reasoning)9M$90.00$49.50
Daily total$90.63$49.85
Monthly total$2,718.90$1,495.50

Monthly savings with Crazyrouter: $1,223.40

This scenario shows the reasoning token multiplier in action. The visible output is only 2,000 tokens, but you're paying for 18,000 tokens of output per request. At high reasoning effort, the model is doing extensive internal planning and code review — great for quality, expensive for your wallet.

Cost optimization tip: Use high reasoning for complex algorithmic tasks and low for boilerplate code generation. A smart routing layer that adjusts reasoning_effort based on task complexity can cut costs by 60%+ without sacrificing quality where it matters.

Scenario 3: Batch Data Processing#

Setup: Processing 50,000 product descriptions. 200-token input per item, 500-token output, minimal reasoning effort. Using Batch API.

  • Reasoning tokens: ~0.5x visible output = 250 tokens
  • Total output tokens: 750 per request
  • Batch discount: 50% off
Cost ComponentTotal TokensOpenAI Batch PriceCrazyrouter Price
Input10M$6.25 (batch)$3.44
Output (incl. reasoning)37.5M$187.50 (batch)$103.13
Total$193.75$106.57

Savings with Crazyrouter: $87.18 on a single batch run

For batch workloads, combining the Batch API's 50% discount with Crazyrouter's 45% discount yields massive savings. The same job at standard OpenAI rates would cost 387.50yourepayingjust387.50 — you're paying just 106.57 through Crazyrouter, a 72% total reduction.


Key Takeaways#

  1. Reasoning tokens are the biggest cost driver. They're billed at the output rate ($10/MTok) and can be 2–10x the visible output. Always check completion_tokens_details.reasoning_tokens in your API responses to understand your actual costs.

  2. Use reasoning_effort strategically. Not every request needs deep thinking. Set low or minimal for simple tasks, medium for general use, and high only when accuracy is critical. This single parameter can reduce output costs by 50–80%.

  3. Automatic caching is free money. Structure your prompts with static content first (system prompt, few-shot examples) and dynamic content last. The 90% discount on cached input tokens adds up fast at scale.

  4. Batch API for async workloads. If you don't need real-time responses, the 50% discount on all tokens is too good to ignore. Content pipelines, data processing, and evaluation runs should always use batch.

  5. Crazyrouter saves 45% on everything. Same API, same SDK, same response format — just cheaper. At scale, this translates to thousands of dollars per month in savings.


Start Saving on GPT-5 Today#

GPT-5 is a remarkable model, but its costs can escalate quickly if you're not paying attention to reasoning tokens. The good news: between reasoning_effort tuning, automatic caching, Batch API, and Crazyrouter's 45% discount, you have multiple levers to keep costs under control.

Ready to cut your GPT-5 API costs by 45%?

👉 Get started with Crazyrouter — create an account, grab your API key, and swap your base_url. It takes less than a minute.

No subscriptions. No minimums. Just cheaper tokens.


Disclaimer: Pricing information is accurate as of April 27, 2026. OpenAI may change pricing at any time. Crazyrouter pricing is based on current rates and subject to change. Always verify current pricing on the official OpenAI and Crazyrouter websites before making purchasing decisions. Token usage estimates in the scenarios above are approximations — actual reasoning token consumption varies by task complexity, prompt structure, and model behavior.

Related Articles