Login
Back to Blog
"GPT-5.2 Pricing Explained — Caching, Batch API, and How to Save with Crazyrouter"

"GPT-5.2 Pricing Explained — Caching, Batch API, and How to Save with Crazyrouter"

C
Crazyrouter Team
April 27, 2026
0 viewsEnglishPricing
Share:

GPT-5.2 Pricing Explained — Caching, Batch API, and How to Save with Crazyrouter#

OpenAI's GPT-5 family has reshaped the landscape of large language model APIs, and GPT-5.2 sits right in the sweet spot. Positioned between the lightweight GPT-5-mini and the flagship GPT-5.4, GPT-5.2 delivers strong reasoning, excellent instruction-following, and broad multimodal capabilities — all at a price point that makes it genuinely practical for production workloads.

If you've been running GPT-4o or GPT-4.1 in production and wondering whether to upgrade, GPT-5.2 is likely where you'll land. It offers a significant intelligence boost over the GPT-4 generation without the premium price tag of GPT-5.4. And with built-in automatic caching, Batch API discounts, and third-party routing options like Crazyrouter, the effective cost can drop dramatically.

In this guide, we'll break down every aspect of GPT-5.2 pricing — base rates, caching mechanics, batch discounts, and how to stack savings. Whether you're building a chatbot, processing documents at scale, or running agentic workflows, you'll walk away knowing exactly what GPT-5.2 will cost you and how to minimize that number.

Last updated: April 27, 2026.

GPT-5.2 Base Pricing#

Let's start with the official numbers straight from OpenAI's pricing page:

ComponentPrice per Million Tokens
Input tokens$1.50 / MTok
Cached input tokens$0.15 / MTok
Output tokens$10.00 / MTok

A few things jump out immediately.

First, the input-to-output price ratio is roughly 1:7. This is a common pattern across the GPT-5 family — output tokens are significantly more expensive than input tokens because they require sequential generation (each token depends on the previous one), while input tokens can be processed in parallel. This ratio matters a lot for your cost optimization strategy: if your application is output-heavy (long-form generation, code writing, detailed analysis), output tokens will dominate your bill.

Second, cached input tokens cost just 10% of the standard input price. That's a 90% discount on repeated context, and it happens automatically. We'll dig into this in the next section.

Third, compared to the previous generation, GPT-5.2 offers substantially more capability per dollar. GPT-4o charged 2.50/2.50/10.00 for input/output, while GPT-5.2 comes in at 1.50/1.50/10.00 — a 40% reduction on input costs with meaningfully better performance across benchmarks.

Context Window and Rate Limits#

GPT-5.2 supports a context window of up to 128K tokens with standard API access. OpenAI's rate limits vary by usage tier, but most production accounts (Tier 3+) get generous throughput. The model supports text, image, and audio inputs, though image and audio tokens are priced according to their respective token conversion rates.

For most text-based applications, you can estimate token counts at roughly 1 token per 4 characters in English, or about 750 words per 1,000 tokens.

Automatic Caching — The Hidden Cost Saver#

One of the most impactful pricing features in the GPT-5 family is automatic prompt caching, and GPT-5.2 benefits from it fully.

How It Works#

OpenAI automatically caches the prefix of your prompt. When you send a request, the system checks whether the beginning of your input matches a recently cached prompt prefix. If it does, those matching tokens are billed at the cached rate (0.15/MTok)insteadofthestandardinputrate(0.15/MTok) instead of the standard input rate (1.50/MTok).

Key details:

  • Caching is automatic. You don't need to enable it, configure it, or change your API calls. It just works.
  • Minimum prefix length is 1,024 tokens. Shorter prompts won't benefit from caching.
  • Cache matches are prefix-based. The system matches from the beginning of your prompt forward. If your system prompt is 2,000 tokens and it matches a cached version, those 2,000 tokens get the cached rate even if the user message that follows is different.
  • Cache lifetime is typically 5–10 minutes for active prompts, though heavily-used prefixes may persist longer.
  • Cache hits are reported in the API response via the usage object, so you can track exactly how much you're saving.

Why This Matters#

For most production applications, a significant portion of every request is identical: system prompts, few-shot examples, tool definitions, conversation history prefixes. With automatic caching, you're only paying full price for this shared context once per cache window.

Consider a typical chatbot with a 3,000-token system prompt. Without caching, every request pays 1.50/MTokforthose3,000tokens.Withcaching(afterthefirstrequest),thosetokensdropto1.50/MTok for those 3,000 tokens. With caching (after the first request), those tokens drop to 0.15/MTok — saving you $4.05 per million cached tokens on every subsequent request.

To maximize cache hits:

  1. Put static content first. System prompts, tool definitions, and few-shot examples should come before dynamic content (user messages, variable context).
  2. Keep your prompt prefix stable. Don't randomize or reorder the beginning of your prompts between requests.
  3. Batch similar requests together in time. Cache entries persist for minutes, so bursts of similar requests benefit more than spread-out ones.
  4. Use longer system prompts confidently. The caching discount means that detailed system prompts with comprehensive instructions are much cheaper than you'd expect — the marginal cost of adding 1,000 tokens to a cached system prompt is just $0.15 per million requests.

Estimating Your Caching Savings#

Here's a quick formula:

code
Effective input cost = (uncached_tokens × $1.50 + cached_tokens × $0.15) / total_input_tokens

If 80% of your input tokens hit the cache (common for chatbots with stable system prompts), your effective input rate drops to:

code
(0.2 × $1.50) + (0.8 × $0.15) = $0.30 + $0.12 = $0.42 / MTok

That's a 72% reduction from the base input price. Combined with the already-competitive base rate, GPT-5.2 becomes remarkably affordable for high-volume applications.

Batch API — 50% Off for Async Workloads#

OpenAI's Batch API offers a flat 50% discount on both input and output tokens for workloads that don't need real-time responses.

Batch API Pricing for GPT-5.2#

ComponentStandard PriceBatch API Price
Input tokens$1.50 / MTok$0.75 / MTok
Cached input tokens$0.15 / MTok$0.075 / MTok
Output tokens$10.00 / MTok$5.00 / MTok

Yes, automatic caching still applies within Batch API requests. If your batch contains many requests with shared prefixes, you get both the caching discount and the batch discount — they stack.

When to Use Batch API#

The Batch API is designed for workloads where you can tolerate a completion window of up to 24 hours (though most batches complete much faster). Ideal use cases include:

  • Document processing and classification — Categorizing thousands of support tickets, extracting data from contracts, summarizing research papers.
  • Content generation pipelines — Generating product descriptions, blog drafts, email templates at scale.
  • Evaluation and testing — Running model evaluations across large datasets, A/B testing prompt variations.
  • Data enrichment — Adding AI-generated metadata, tags, or summaries to existing databases.
  • Offline analytics — Sentiment analysis, entity extraction, or topic modeling on historical data.

How to Submit a Batch#

You prepare a JSONL file where each line is a standard chat completion request, upload it, and create a batch. OpenAI processes the requests and returns results when complete.

python
from openai import OpenAI

client = OpenAI()

# Upload your JSONL file
batch_file = client.files.create(
    file=open("requests.jsonl", "rb"),
    purpose="batch"
)

# Create the batch
batch = client.batches.create(
    input_file_id=batch_file.id,
    endpoint="/v1/chat/completions",
    completion_window="24h"
)

# Check status later
status = client.batches.retrieve(batch.id)
print(status.status)  # "completed", "in_progress", etc.

Each line in requests.jsonl looks like:

json
{"custom_id": "req-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-5.2", "messages": [{"role": "user", "content": "Summarize this article..."}], "max_tokens": 1000}}

For workloads that can tolerate async processing, the Batch API is essentially free money — same model, same quality, half the price.

Crazyrouter — 55% of Official Pricing#

If you want real-time API access (not batch) but still want significant savings, Crazyrouter offers GPT-5.2 at 55% of OpenAI's official pricing.

Crazyrouter Pricing for GPT-5.2#

ComponentOpenAI OfficialCrazyrouter (55%)
Input tokens$1.50 / MTok$0.825 / MTok
Output tokens$10.00 / MTok$5.50 / MTok

That's a 45% discount on every token, with no change to the model, no quality degradation, and no batch delays. You get the same GPT-5.2 model with real-time streaming responses.

How It Works#

Crazyrouter is an OpenAI-compatible API proxy. You use the standard OpenAI SDK or any HTTP client — just change the base URL. Your existing code works with a one-line change.

Integration with OpenAI Python SDK#

python
from openai import OpenAI

client = OpenAI(
    api_key="your-crazyrouter-api-key",
    base_url="https://crazyrouter.com/v1"
)

response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in simple terms."}
    ]
)

print(response.choices[0].message.content)

Integration with cURL#

bash
curl https://crazyrouter.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-crazyrouter-api-key" \
  -d '{
    "model": "gpt-5.2",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing in simple terms."}
    ]
  }'

Integration with Node.js / TypeScript#

typescript
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "your-crazyrouter-api-key",
  baseURL: "https://crazyrouter.com/v1",
});

const response = await client.chat.completions.create({
  model: "gpt-5.2",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Explain quantum computing in simple terms." },
  ],
});

console.log(response.choices[0].message.content);

Crazyrouter supports streaming, function calling, JSON mode, vision inputs, and all other GPT-5.2 features. It's a drop-in replacement — the only difference is the price.

Real-World Cost Scenarios#

Let's put these numbers into context with three practical scenarios.

Scenario 1: Customer Support Chatbot#

Setup: A chatbot handling 50,000 conversations per day. Each conversation averages 5 turns. System prompt is 2,500 tokens (cached after first request). Average user message is 150 tokens, average response is 300 tokens.

Token math per conversation:

  • Input: 2,500 (system, cached on turns 2-5) + 5 × 150 (user messages) + accumulated history ≈ 5,000 input tokens total
  • Of which ~80% hits cache ≈ 4,000 cached, 1,000 uncached
  • Output: 5 × 300 = 1,500 tokens

Daily cost with OpenAI direct:

  • Input: (1,000 × 1.50+4,000×1.50 + 4,000 × 0.15) / 1M × 50,000 = (0.0015 + 0.0006) × 50,000 = $105/day
  • Output: 1,500 × 10.00/1M×50,000=10.00 / 1M × 50,000 = **750/day**
  • Total: 855/day( 855/day (~25,650/month)

With Crazyrouter (55%):

  • Total: 470/day( 470/day (~14,108/month)
  • Savings: $11,542/month

Scenario 2: Document Processing Pipeline#

Setup: Processing 10,000 legal documents per day through a classification and extraction pipeline. Each document averages 8,000 tokens input, 2,000 tokens output. Using Batch API.

Daily cost with OpenAI Batch API (50% off):

  • Input: 8,000 × 0.75/1M×10,000=0.75 / 1M × 10,000 = **60/day**
  • Output: 2,000 × 5.00/1M×10,000=5.00 / 1M × 10,000 = **100/day**
  • Total: 160/day( 160/day (~4,800/month)

Compare to standard pricing without batch: 320/day(320/day (9,600/month). The Batch API cuts your bill in half for the same results.

Scenario 3: AI-Powered Content Generation#

Setup: A content platform generating 500 articles per day. Each article requires a 3,000-token system prompt (cached), 1,000-token brief, and produces 4,000 tokens of output.

Daily cost with OpenAI direct:

  • Input: (1,000 × 1.50+3,000×1.50 + 3,000 × 0.15) / 1M × 500 = (0.0015 + 0.00045) × 500 = $0.98/day
  • Output: 4,000 × 10.00/1M×500=10.00 / 1M × 500 = **20.00/day**
  • Total: ~20.98/day( 20.98/day (~629/month)

With Crazyrouter:

  • Total: ~11.54/day( 11.54/day (~346/month)
  • Savings: $283/month

Even at moderate scale, the savings add up. And notice how output tokens dominate the cost in every scenario — that 1:7 input-to-output ratio means optimizing output length (using concise instructions, setting appropriate max_tokens) has the biggest impact on your bill.

GPT-5.2 vs GPT-5.4 vs GPT-5-mini — Where Does It Fit?#

Understanding GPT-5.2's position in the GPT-5 family helps you choose the right model for your workload.

GPT-5-mini — The Budget Option#

GPT-5-mini
Input$0.40 / MTok
Cached Input$0.04 / MTok
Output$1.60 / MTok

GPT-5-mini is OpenAI's most affordable GPT-5 model. It's fast, cheap, and surprisingly capable for straightforward tasks. Use it for classification, simple extraction, routing, and any task where raw intelligence isn't the bottleneck. It's the natural successor to GPT-4o-mini and handles high-volume, low-complexity workloads efficiently.

Best for: High-volume simple tasks, classification, routing, basic Q&A, cost-sensitive applications.

GPT-5.2 — The Balanced Choice#

GPT-5.2
Input$1.50 / MTok
Cached Input$0.15 / MTok
Output$10.00 / MTok

GPT-5.2 is the workhorse of the family. It handles complex reasoning, nuanced writing, code generation, and multi-step analysis with strong reliability. For most production applications that need more than basic capabilities, GPT-5.2 offers the best balance of performance and cost. It's roughly 4x the price of GPT-5-mini but delivers substantially better results on complex tasks.

Best for: Production chatbots, content generation, code assistance, document analysis, agentic workflows, any task requiring strong reasoning.

GPT-5.4 — The Flagship#

GPT-5.4
Input$2.50 / MTok
Cached Input$0.25 / MTok
Output$15.00 / MTok

GPT-5.4 is OpenAI's most capable model. It excels at the hardest tasks — complex mathematical reasoning, PhD-level scientific analysis, intricate code architecture, and creative writing that requires deep understanding. The price premium over GPT-5.2 is about 50-67%, so it's worth reserving for tasks where that extra capability actually matters.

Best for: Research, complex reasoning chains, high-stakes content, tasks where GPT-5.2 falls short.

Choosing the Right Model#

A practical approach: start with GPT-5-mini, upgrade to GPT-5.2 where needed, reserve GPT-5.4 for the hard stuff. Many production systems use a tiered approach — routing simple queries to GPT-5-mini and complex ones to GPT-5.2, with GPT-5.4 as a fallback for edge cases.

GPT-5.2 is the default choice for most teams because it handles 90%+ of real-world tasks well. You only need GPT-5-mini if cost is your primary constraint, and you only need GPT-5.4 if you're hitting GPT-5.2's capability ceiling on specific tasks.

Key Takeaways#

  1. GPT-5.2 base pricing is 1.50/MTokinputand1.50/MTok input and 10.00/MTok output — competitive for its capability tier and a meaningful upgrade from GPT-4o.

  2. Automatic caching can reduce input costs by up to 90%. Structure your prompts with static content first to maximize cache hits. No configuration needed — it just works.

  3. Batch API gives you 50% off everything for workloads that don't need real-time responses. Caching discounts stack on top.

  4. Crazyrouter offers 45% savings (55% of official pricing) on real-time API access with zero code changes beyond updating the base URL.

  5. Output tokens dominate your costs. Focus optimization efforts on output length — use concise system prompts, set appropriate max_tokens, and consider whether you really need 2,000 tokens of output or if 500 would do.

  6. GPT-5.2 is the sweet spot in the GPT-5 family for most production use cases. It's significantly more capable than GPT-5-mini and significantly cheaper than GPT-5.4.

  7. Stack your discounts. Use caching (automatic) + Crazyrouter (45% off) for real-time workloads, or caching + Batch API (50% off) for async workloads. Either combination makes GPT-5.2 remarkably cost-effective.

Get Started with Crazyrouter#

Ready to cut your GPT-5.2 costs by 45%? Getting started with Crazyrouter takes about 30 seconds:

  1. Sign up at crazyrouter.com and grab your API key.
  2. Change one line in your code — set base_url="https://crazyrouter.com/v1".
  3. That's it. Same model, same features, same quality. Lower price.

Crazyrouter supports the full OpenAI API surface — chat completions, streaming, function calling, vision, JSON mode, and more. It works with the official OpenAI SDKs for Python, Node.js, and any HTTP client. All GPT-5 family models are available, along with Claude, Gemini, and other leading models.

Browse all available models and pricing at crazyrouter.com/pricing.


Disclaimer: Pricing information is accurate as of April 27, 2026. OpenAI may change pricing at any time — always verify current rates on OpenAI's official pricing page. Crazyrouter pricing is subject to its own terms and may be adjusted independently. Token counts and cost estimates in this article are approximations for illustrative purposes. This article is for informational purposes only and does not constitute financial advice.

Related Articles