Login
Back to Blog
GPT-5.4 Pricing Explained — Cached Input, Context Tiers, Batch API, and How to Save with Crazyrouter

GPT-5.4 Pricing Explained — Cached Input, Context Tiers, Batch API, and How to Save with Crazyrouter

C
Crazyrouter Team
April 27, 2026
0 viewsEnglishPricing
Share:

GPT-5.4 is OpenAI's current flagship model — the successor to GPT-5 that pushed the boundaries of reasoning, coding, and multimodal understanding even further. If you're building applications on top of the OpenAI API, understanding GPT-5.4's pricing structure is essential for managing costs and making smart architectural decisions.

This guide breaks down every aspect of GPT-5.4 pricing: the dual context tiers, automatic caching that slashes input costs by 90%, the Batch API for async workloads, data residency options, and how routing through Crazyrouter can save you 45% on every API call.

What Makes GPT-5.4 Worth the Price#

Before diving into numbers, let's talk about what you're paying for. GPT-5.4 represents a significant leap over GPT-5 in several key areas:

  • Advanced reasoning: Multi-step logical reasoning with improved accuracy on complex tasks, benchmarks show consistent gains over GPT-5 on MATH, GPQA, and ARC-AGI evaluations.
  • Superior coding: Stronger code generation, debugging, and refactoring capabilities across dozens of programming languages.
  • Longer context window: Supports up to 270K tokens in standard mode, with a long-context tier that extends well beyond that.
  • Multimodal fluency: Seamless handling of text, images, and structured data in a single conversation.
  • Instruction following: Tighter adherence to system prompts and complex multi-constraint instructions.

For production applications that demand top-tier intelligence, GPT-5.4 is the model to beat. The question isn't whether it's capable — it's how to use it cost-effectively.

Base Pricing: Short Context vs. Long Context#

GPT-5.4 uses a two-tier pricing model based on context length. This is important to understand because the prices differ significantly between the two tiers.

Short Context (Standard)#

For requests that fit within the standard 270K token context window:

ComponentPrice per MTok
Input tokens$2.50
Cached input tokens$0.25
Output tokens$15.00

Long Context (>270K tokens)#

When your request exceeds 270K tokens, the long-context tier kicks in with higher prices:

ComponentPrice per MTok
Input tokens$5.00
Cached input tokens$0.50
Output tokens$22.50

What This Means in Practice#

The long-context tier costs 2x for input and 1.5x for output compared to short context. This pricing structure incentivizes you to keep requests under 270K tokens when possible.

For most applications — chatbots, code assistants, content generation, data extraction — you'll comfortably stay within the short-context tier. The long-context tier is designed for specialized use cases like analyzing entire codebases, processing lengthy legal documents, or working with large datasets in a single pass.

Pro tip: If you're regularly hitting the long-context tier, consider whether you can split your workload into smaller chunks. Processing two 200K-token requests is cheaper than one 400K-token request.

Automatic Caching: 90% Off Repeated Input#

This is where GPT-5.4 pricing gets interesting — and where you can save the most money without changing a single line of code.

How OpenAI's Automatic Caching Works#

Unlike Anthropic's Claude, which requires you to manually set cache_control breakpoints in your prompts, OpenAI's caching is fully automatic. Here's how it works:

  1. Prefix matching: OpenAI's infrastructure automatically detects when the beginning of your prompt matches a recently sent prompt.
  2. Automatic caching: When a match is found, the cached portion is served at the cached input price — just 10% of the standard input cost.
  3. No TTL management: You don't need to worry about cache expiration, cache keys, or cache invalidation. OpenAI handles everything server-side.
  4. No code changes: There's no special parameter to set, no API flag to enable. It just works.

The Math Behind Caching#

Let's say you have a system prompt of 5,000 tokens that you send with every request. Without caching, that's:

  • 5,000 tokens × 2.50/MTok=2.50/MTok = 0.0125 per request (just for the system prompt)

With automatic caching (after the first request):

  • 5,000 tokens × 0.25/MTok=0.25/MTok = 0.00125 per request

That's a 90% reduction on the cached portion. Over thousands of requests, this adds up fast.

When Caching Kicks In#

Caching is most effective when you have:

  • Consistent system prompts: The same instructions sent with every request (the most common scenario).
  • Few-shot examples: Static examples that precede the user's actual query.
  • Document context: When multiple queries reference the same uploaded document or context block.
  • Multi-turn conversations: Earlier turns in a conversation are automatically cached for subsequent turns.

Caching in the Long-Context Tier#

Caching applies to the long-context tier as well:

  • Standard long-context input: $5.00/MTok
  • Cached long-context input: $0.50/MTok

Same 90% discount. If you're working with large documents and making multiple queries against them, caching can dramatically reduce your costs even in the long-context tier.

OpenAI Caching vs. Anthropic Caching#

FeatureOpenAI (GPT-5.4)Anthropic (Claude)
ActivationAutomaticManual (cache_control)
Code changes neededNoneYes
TTL managementAutomaticDeveloper-managed
Cache write costNoneAdditional charge
Discount on cached tokens90% off90% off

OpenAI's approach is simpler — you get the savings without any implementation overhead. Anthropic's approach gives you more control but requires explicit cache management in your code.

Batch API: 50% Off for Async Workloads#

If your workload doesn't need real-time responses, the Batch API is the single biggest cost lever available to you.

How the Batch API Works#

  1. Submit a batch: Upload a JSONL file containing multiple requests.
  2. Async processing: OpenAI processes your batch within a 24-hour window.
  3. Retrieve results: Download the completed results when ready.

Batch API Pricing#

The Batch API gives you a flat 50% discount on all token prices:

ComponentStandard PriceBatch Price
Short input$2.50/MTok$1.25/MTok
Short cached input$0.25/MTok$0.125/MTok
Short output$15.00/MTok$7.50/MTok
Long input$5.00/MTok$2.50/MTok
Long cached input$0.50/MTok$0.25/MTok
Long output$22.50/MTok$11.25/MTok

When to Use the Batch API#

The Batch API is ideal for:

  • Content generation at scale: Generating product descriptions, blog posts, or marketing copy in bulk.
  • Data processing pipelines: Extracting structured data from documents, classifying text, or summarizing large datasets.
  • Evaluation and testing: Running model evaluations across hundreds or thousands of test cases.
  • Nightly jobs: Any processing that can wait until the next business day.

Batch API Example#

python
from openai import OpenAI

client = OpenAI()

# Create a batch input file
batch_input = client.files.create(
    file=open("batch_requests.jsonl", "rb"),
    purpose="batch"
)

# Submit the batch
batch = client.batches.create(
    input_file_id=batch_input.id,
    endpoint="/v1/chat/completions",
    completion_window="24h"
)

# Check status later
status = client.batches.retrieve(batch.id)
print(f"Status: {status.status}")

Combining the Batch API with caching can yield extraordinary savings. If your batch requests share common prefixes (like a system prompt), you get the 50% batch discount on top of the 90% caching discount on the cached portion.

Data Residency: 10% Uplift#

For organizations with data sovereignty requirements, OpenAI offers data residency options that guarantee your data is processed and stored within specific geographic regions.

Cost: A 10% uplift on all standard prices.

ComponentStandardData Residency
Short input$2.50/MTok$2.75/MTok
Short output$15.00/MTok$16.50/MTok
Long input$5.00/MTok$5.50/MTok
Long output$22.50/MTok$24.75/MTok

Data residency is typically required for:

  • Healthcare applications handling PHI under HIPAA
  • Financial services with regulatory data requirements
  • Government and public sector applications
  • EU-based companies needing GDPR-compliant processing

For most developers and startups, standard processing is sufficient. Only opt into data residency if your compliance requirements demand it.

Crazyrouter Pricing: 45% Savings on Every Call#

Here's where it gets really good. Crazyrouter offers GPT-5.4 at 55% of OpenAI's official pricing — that's a 45% discount on every single API call.

Crazyrouter GPT-5.4 Prices#

ComponentOpenAI OfficialCrazyrouter (55%)You Save
Short input$2.50/MTok$1.375/MTok$1.125/MTok
Short cached input$0.25/MTok$0.1375/MTok$0.1125/MTok
Short output$15.00/MTok$8.25/MTok$6.75/MTok
Long input$5.00/MTok$2.75/MTok$2.25/MTok
Long cached input$0.50/MTok$0.275/MTok$0.225/MTok
Long output$22.50/MTok$12.375/MTok$10.125/MTok

How to Use GPT-5.4 via Crazyrouter#

Switching to Crazyrouter takes about 30 seconds. You just change the base URL — everything else stays the same.

Python (OpenAI SDK)#

python
from openai import OpenAI

client = OpenAI(
    api_key="your-crazyrouter-api-key",
    base_url="https://crazyrouter.com/v1"
)

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in simple terms."}
    ]
)

print(response.choices[0].message.content)

Node.js (OpenAI SDK)#

javascript
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "your-crazyrouter-api-key",
  baseURL: "https://crazyrouter.com/v1",
});

const response = await client.chat.completions.create({
  model: "gpt-5.4",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Explain quantum computing in simple terms." },
  ],
});

console.log(response.choices[0].message.content);

cURL#

bash
curl https://crazyrouter.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-crazyrouter-api-key" \
  -d '{
    "model": "gpt-5.4",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing in simple terms."}
    ]
  }'

Why Crazyrouter Can Offer Lower Prices#

Crazyrouter is an OpenAI-compatible API gateway that aggregates demand across thousands of developers. By routing traffic efficiently and negotiating volume pricing, Crazyrouter passes the savings directly to you. You get the same GPT-5.4 model, the same API compatibility, and the same response quality — just at a lower price.

Key benefits:

  • Full OpenAI API compatibility: Drop-in replacement. No code changes beyond the base URL.
  • Same model, same quality: Requests are routed to OpenAI's infrastructure. You get the real GPT-5.4.
  • Automatic caching still works: OpenAI's server-side caching applies regardless of how you access the API.
  • No commitment required: Pay as you go, no minimum spend.

Real-World Cost Comparison: 3 Scenarios#

Let's put these numbers into context with three realistic usage scenarios.

Scenario 1: SaaS Chatbot (Customer Support)#

A customer support chatbot handling 50,000 conversations per month.

Assumptions:

  • System prompt: 2,000 tokens (cached after first request)
  • Average user message: 200 tokens
  • Average response: 500 tokens
  • 3 turns per conversation on average

Monthly token usage:

  • Input tokens: 50,000 × 3 × 200 = 30M tokens (user messages)
  • Cached input: 50,000 × 3 × 2,000 = 300M tokens (system prompt, cached)
  • Output tokens: 50,000 × 3 × 500 = 75M tokens
ProviderInput CostCached CostOutput CostTotal/Month
OpenAI Direct$75.00$75.00$1,125.00$1,275.00
Crazyrouter$41.25$41.25$618.75$701.25

Savings with Crazyrouter: 573.75/month(573.75/month (6,885/year)

Scenario 2: Code Review Pipeline (Batch API)#

An engineering team running nightly code reviews on 500 pull requests.

Assumptions:

  • Average PR context: 8,000 tokens
  • System prompt: 3,000 tokens (cached)
  • Average review output: 1,500 tokens
  • Using Batch API (50% off)

Monthly token usage (22 working days):

  • Input tokens: 500 × 22 × 8,000 = 88M tokens
  • Cached input: 500 × 22 × 3,000 = 33M tokens
  • Output tokens: 500 × 22 × 1,500 = 16.5M tokens
ProviderInput CostCached CostOutput CostTotal/Month
OpenAI Batch$110.00$4.13$123.75$237.88
Crazyrouter + Batch$60.50$2.27$68.06$130.83

Savings with Crazyrouter: 107.05/month(107.05/month (1,284.60/year)

Scenario 3: Document Analysis (Long Context)#

A legal tech company analyzing 200 contracts per month, each requiring the long-context tier.

Assumptions:

  • Average document: 300K tokens (long-context tier)
  • System prompt: 5,000 tokens (cached)
  • Average analysis output: 3,000 tokens
  • Multiple queries per document: 5 queries each

Monthly token usage:

  • Input tokens: 200 × 300,000 = 60M tokens (first query per doc)
  • Cached input: 200 × 4 × 300,000 = 240M tokens (subsequent queries)
  • Cached system prompt: 200 × 5 × 5,000 = 5M tokens
  • Output tokens: 200 × 5 × 3,000 = 3M tokens
ProviderInput CostCached CostOutput CostTotal/Month
OpenAI Direct$300.00$122.50$67.50$490.00
Crazyrouter$165.00$67.38$37.13$269.50

Savings with Crazyrouter: 220.50/month(220.50/month (2,646/year)

GPT-5.4 vs. Competitors: Pricing Comparison#

How does GPT-5.4 stack up against other frontier models on price?

GPT-5.4 vs. Claude Sonnet 4.6#

GPT-5.4Claude Sonnet 4.6
Input$2.50/MTok$3.00/MTok
Output$15.00/MTok$15.00/MTok
Cached input$0.25/MTok$0.30/MTok
Caching methodAutomaticManual (cache_control)
Max context270K+ (tiered)200K
Batch API50% off50% off

Verdict: GPT-5.4 is slightly cheaper on input tokens and offers automatic caching, which is simpler to implement. Claude Sonnet 4.6 gives you more granular cache control but requires code changes. Output pricing is identical. For pure cost optimization, GPT-5.4 has a slight edge — especially if you value the zero-effort caching.

GPT-5.4 vs. Gemini 3.1 Pro#

GPT-5.4Gemini 3.1 Pro
Input$2.50/MTok$1.25/MTok
Output$15.00/MTok$10.00/MTok
Cached input$0.25/MTok$0.3125/MTok
Max context270K+ (tiered)1M+
Batch API50% offN/A

Verdict: Gemini 3.1 Pro is cheaper on base pricing and offers a massive context window. However, GPT-5.4 generally outperforms on complex reasoning tasks, coding benchmarks, and instruction following. The Batch API discount also gives GPT-5.4 an edge for async workloads. Choose based on your quality requirements — if Gemini 3.1 Pro meets your quality bar, it's the more economical choice. If you need peak performance, GPT-5.4 justifies the premium.

The Crazyrouter Advantage Across Models#

Worth noting: Crazyrouter offers discounted pricing across all major models, not just GPT-5.4. If you're using multiple models in your stack, routing everything through Crazyrouter simplifies billing and maximizes savings across the board.

Key Takeaways#

  1. Two context tiers matter: Keep requests under 270K tokens when possible. The long-context tier costs 1.5–2x more.

  2. Caching is free money: OpenAI's automatic caching gives you a 90% discount on repeated prompt prefixes with zero code changes. Design your prompts with consistent prefixes to maximize cache hits.

  3. Batch API for async work: If you don't need real-time responses, the Batch API cuts all prices in half. Combine with caching for maximum savings.

  4. Data residency only if required: The 10% uplift is worth it for compliance, but skip it if you don't need it.

  5. Crazyrouter saves 45%: Same model, same API, same quality — just change the base URL and save on every call.

  6. Stack your discounts: Caching + Batch API + Crazyrouter can reduce your effective cost by 70–80% compared to standard OpenAI pricing.

Start Saving on GPT-5.4 Today#

Ready to cut your GPT-5.4 costs by 45%? Getting started with Crazyrouter takes less than a minute:

  1. Sign up at crazyrouter.com and get your API key.
  2. Change your base URL to https://crazyrouter.com/v1.
  3. That's it. Your existing code works as-is. No SDK changes, no migration headaches.

Crazyrouter supports GPT-5.4 and 200+ other models from OpenAI, Anthropic, Google, and more — all through a single, unified API. Pay-as-you-go pricing, no minimums, no commitments.

👉 Get your Crazyrouter API key →


Last updated: April 27, 2026

Disclaimer: Pricing information is based on publicly available data from OpenAI as of the publication date. Prices are subject to change. Crazyrouter pricing is based on current rates and may be adjusted. Always verify current pricing on the respective provider's website before making purchasing decisions. This article is for informational purposes only and does not constitute financial advice.

Related Articles