GPT-4o Pricing Explained — The Legacy Flagship That's Still Worth Using

GPT-4o Pricing Explained — The Legacy Flagship That's Still Worth Using#

GPT-4o launched in May 2024 as OpenAI's flagship multimodal model — fast, capable, and significantly cheaper than GPT-4 Turbo. It dominated the API landscape for the better part of a year. Now, with the GPT-5 family taking center stage, GPT-4o has settled into a different role: the reliable, battle-tested workhorse that millions of developers still depend on every day.

And honestly? For a lot of use cases, it's still the smart choice.

This guide breaks down everything you need to know about GPT-4o API pricing in 2026 — base rates, caching discounts, Batch API savings, and how Crazyrouter can cut your costs even further.

Last updated: April 27, 2026

Base Pricing#

GPT-4o uses a straightforward per-token pricing model. Here's what you're looking at on the standard OpenAI API:

Component	Price
Input tokens	$2.50 / 1M tokens
Cached input tokens	$1.25 / 1M tokens
Output tokens	$10.00 / 1M tokens

For context, when GPT-4o first launched, these prices represented a massive drop from GPT-4 Turbo ( $10/$ 30 per MTok). Even now, they remain competitive — especially when you factor in caching and batch discounts.

What does this look like in practice?#

A typical API call with ~1,000 input tokens and ~500 output tokens costs roughly:

Input cost: 1,000 tokens × ( $2.50 / 1,000,000) =$ 0.0025
Output cost: 500 tokens × ( $10.00 / 1,000,000) =$ 0.005
Total per call: ~$0.0075

That's less than a cent per request for most conversational interactions. For a chatbot handling 10,000 conversations per day with similar token counts, you're looking at about $75/day or ~$ 2,250/month at list price.

128K Context Window#

GPT-4o supports a 128K token context window — that's roughly 96,000 words or about 300 pages of text in a single prompt. This is the same context length as GPT-4 Turbo, and it remains one of the largest context windows available at this price point.

What can you fit in 128K tokens?

An entire novel (most novels are 60K–100K words)
A full codebase for a medium-sized project
Hundreds of pages of documentation
Long conversation histories without truncation

The key advantage: GPT-4o charges the same per-token rate regardless of how much of that 128K window you use. There's no "long context surcharge" like some newer models implement. Whether you send 1K tokens or 120K tokens, the input rate stays at $2.50/MTok.

This makes GPT-4o particularly cost-effective for tasks that require large context — document analysis, code review, long-form summarization — where newer models with tiered pricing might actually cost more.

Automatic Caching (Prompt Caching)#

One of the most impactful cost-saving features for GPT-4o is OpenAI's automatic prompt caching. Introduced in late 2024, this feature requires zero code changes — it just works.

How it works#

When you send a request to the API, OpenAI automatically caches the prefix of your prompt. If a subsequent request shares the same prefix (at least 1,024 tokens), the cached portion is served at a 50% discount:

Regular input: $2.50 / MTok
Cached input: $1.25 / MTok

Caching is automatic. You don't need to set any flags, manage cache keys, or change your API calls. OpenAI handles it server-side.

When caching kicks in#

Caching is most effective when you have:

System prompts that stay consistent across requests
Few-shot examples that you include in every call
Document context that multiple queries reference
Conversation history where earlier messages remain unchanged

Real savings example#

Imagine you're building a customer support bot with a 2,000-token system prompt and 3,000 tokens of product documentation included in every request. For 10,000 daily requests:

Without caching:

5,000 tokens × 10,000 requests = 50M input tokens
Cost: 50 × $2.50 =$ 125/day

With caching (system prompt + docs cached):

5,000 cached tokens × 10,000 requests = 50M cached tokens
Cost: 50 × $1.25 =$ 62.50/day
Savings: $62.50/day (~$ 1,875/month)

The cache has a lifetime of 5–10 minutes of inactivity, so it works best for applications with steady traffic. For bursty workloads, you'll see partial caching benefits.

Batch API — 50% Off#

For workloads that don't need real-time responses, OpenAI's Batch API is a game-changer. It offers a flat 50% discount on all token costs:

Component	Standard	Batch API
Input tokens	$2.50 / MTok	$1.25 / MTok
Output tokens	$10.00 / MTok	$5.00 / MTok

How Batch API works#

Instead of sending individual requests, you upload a JSONL file containing multiple requests. OpenAI processes them asynchronously and returns results within 24 hours (usually much faster).

python

from openai import OpenAI

client = OpenAI()

# 1. Create a batch input file
batch_input = client.files.create(
    file=open("batch_requests.jsonl", "rb"),
    purpose="batch"
)

# 2. Submit the batch
batch = client.batches.create(
    input_file_id=batch_input.id,
    endpoint="/v1/chat/completions",
    completion_window="24h"
)

# 3. Check status and retrieve results
batch_status = client.batches.retrieve(batch.id)
print(batch_status.status)  # "completed"

Best use cases for Batch API#

Content generation — blog posts, product descriptions, translations
Data processing — classification, extraction, summarization of large datasets
Evaluation pipelines — running test suites against your prompts
Embedding generation — processing large document collections
Offline analytics — sentiment analysis, categorization

Combining Batch + Caching#

Here's where it gets interesting. Batch API and prompt caching can stack. If your batch requests share common prefixes, you get:

50% off from Batch API
Additional 50% off cached input tokens

That means cached input in a batch costs just $0.625/MTok — a 75% reduction from the standard$ 2.50 rate.

Crazyrouter Pricing — 45% Off Official Rates#

If you're already optimizing with caching and batching, there's one more lever to pull: routing your API calls through Crazyrouter.

Crazyrouter offers GPT-4o at 55% of OpenAI's official pricing — that's a 45% discount on every token:

Component	OpenAI Official	Crazyrouter	Savings
Input tokens	$2.50 / MTok	$1.375 / MTok	45% off
Cached input	$1.25 / MTok	$0.6875 / MTok	45% off
Output tokens	$10.00 / MTok	$5.50 / MTok	45% off

How to use Crazyrouter#

Switching is dead simple. You just change the base_url — your existing code works as-is.

Python (OpenAI SDK):

python

from openai import OpenAI

client = OpenAI(
    api_key="your-crazyrouter-api-key",
    base_url="https://crazyrouter.com/v1"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in simple terms."}
    ]
)

print(response.choices[0].message.content)

curl:

bash

curl https://crazyrouter.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-crazyrouter-api-key" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing in simple terms."}
    ]
  }'

Node.js:

javascript

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "your-crazyrouter-api-key",
  baseURL: "https://crazyrouter.com/v1",
});

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Explain quantum computing in simple terms." },
  ],
});

console.log(response.choices[0].message.content);

That's it. Same SDK, same parameters, same response format. Just a different URL and API key.

Why Crazyrouter?#

Full OpenAI API compatibility — drop-in replacement, no code changes beyond the base URL
All models available — GPT-4o, GPT-5.4, o3, o4-mini, and more
Pay-as-you-go — no minimums, no commitments
Transparent pricing — what you see is what you pay

Real-World Cost Comparison#

Let's put all the savings together with a realistic scenario. Imagine a SaaS product using GPT-4o for customer support, processing 50,000 requests per day with an average of 2,000 input tokens (1,500 cached) and 800 output tokens per request.

Monthly token volumes:

Total input: 50,000 × 2,000 × 30 = 3B tokens (3,000 MTok)
Cached input: 50,000 × 1,500 × 30 = 2.25B tokens (2,250 MTok)
Non-cached input: 750 MTok
Output: 50,000 × 800 × 30 = 1.2B tokens (1,200 MTok)

Scenario	Input Cost	Output Cost	Monthly Total
OpenAI standard (no caching)	3,000 × $2.50 =$ 7,500	1,200 × $10.00 =$ 12,000	$19,500
OpenAI with caching	(750 × $2.50) + (2,250 ×$ 1.25) = $4,687.50	1,200 × $10.00 =$ 12,000	$16,687.50
Crazyrouter with caching	(750 × $1.375) + (2,250 ×$ 0.6875) = $2,578.13	1,200 × $5.50 =$ 6,600	$9,178.13
Crazyrouter + Batch API	(750 × $0.6875) + (2,250 ×$ 0.34375) = $1,289.06	1,200 × $2.75 =$ 3,300	$4,589.06

From $19,500 down to$ 4,589 — that's a 76% reduction by stacking caching, Batch API, and Crazyrouter together.

Even without Batch API (which requires async processing), Crazyrouter with caching saves you 53% compared to standard OpenAI pricing.

Should You Upgrade to GPT-5.4?#

GPT-5.4 is OpenAI's current flagship, and it's undeniably more capable than GPT-4o. But "more capable" doesn't always mean "better value." Here's how they compare:

Feature	GPT-4o	GPT-5.4
Input price	$2.50 / MTok	$2.50 / MTok
Output price	$10.00 / MTok	$10.00 / MTok
Context window	128K	1M
Max output	16,384 tokens	64K tokens
Reasoning	Good	Excellent
Coding	Strong	Stronger
Multimodal	Text + Vision	Text + Vision + Audio
Speed	Fast	Comparable
Reliability	Battle-tested	Newer, still stabilizing
Crazyrouter price (input)	$1.375 / MTok	$1.375 / MTok
Crazyrouter price (output)	$5.50 / MTok	$5.50 / MTok

When to stick with GPT-4o#

Your prompts work well already. If GPT-4o handles your use case reliably, switching introduces risk with minimal upside.
You need predictability. GPT-4o has been in production for nearly two years. Its behavior is well-understood and stable.
128K context is enough. Most applications don't need a 1M context window.
You're cost-sensitive on output. At the same price point, GPT-4o's shorter, more concise outputs can actually save money if you don't need GPT-5.4's verbosity.

When to upgrade to GPT-5.4#

Complex reasoning tasks where GPT-4o falls short
Very long documents that exceed 128K tokens
Audio processing requirements
Coding tasks where the quality difference matters
You need the latest capabilities and can absorb the migration effort

The honest take: if GPT-4o is working for you, there's no rush to migrate. It's not going anywhere, and the price-to-performance ratio is still excellent.

Key Takeaways#

GPT-4o remains a strong value proposition at $2.50/$ 10 per MTok — especially for applications that don't need cutting-edge reasoning.
Automatic caching is free money. Design your prompts with consistent prefixes and you'll save 50% on cached input tokens with zero code changes.
Batch API halves everything for async workloads. If you can tolerate up to 24-hour turnaround, there's no reason not to use it.
Crazyrouter stacks on top with 45% savings across the board. Combined with caching and batching, you can reduce costs by up to 76%.
Don't upgrade just because something newer exists. GPT-4o is battle-tested, fast, and reliable. Upgrade when your use case demands it, not because of FOMO.

Get Started with Crazyrouter#

Ready to cut your GPT-4o costs by 45%? Getting started takes about 30 seconds:

Sign up at crazyrouter.com
Get your API key from the dashboard
Change your base URL to https://crazyrouter.com/v1
That's it. Your existing code works immediately.

No contracts. No minimums. Pay only for what you use.

Get Your API Key →

Pricing information is accurate as of April 27, 2026. OpenAI may adjust pricing at any time. Crazyrouter pricing is subject to change — check crazyrouter.com for the latest rates. This article is for informational purposes only and does not constitute financial advice. Always verify current pricing on the official provider websites before making purchasing decisions.

GPT-4o Pricing Explained — The Legacy Flagship That's Still Worth Using