Login
Back to Blog

GPT-5.5 Pricing Explained — OpenAI's Latest Flagship, Reasoning Tokens, and How to Save with Crazyrouter

C
Crazyrouter Team
April 27, 2026
1 viewsEnglishPricing
Share:

GPT-5.5 Pricing Explained — OpenAI's Latest Flagship, Reasoning Tokens, and How to Save with Crazyrouter#

OpenAI just dropped GPT-5.5 — their newest flagship model — and it's a serious step forward. Released in April 2026, GPT-5.5 sits in a sweet spot between the premium GPT-5.4 and the cost-efficient GPT-5, offering a massive 1 million token context window, built-in reasoning capabilities, and competitive pricing that makes it attractive for production workloads.

Whether you're building AI-powered applications, running large-scale data processing, or just trying to figure out which OpenAI model fits your budget, this guide breaks down everything you need to know about GPT-5.5 API pricing — including how to cut costs by up to 45% through Crazyrouter.

GPT-5.5 Base Pricing#

Here's the straightforward pricing for GPT-5.5 through the OpenAI API:

ComponentPrice per 1M Tokens
Input tokens$2.00
Cached input tokens$0.50 (75% discount)
Output tokens$8.00
Reasoning tokens$8.00 (billed at output rate)

Key specs:

  • Context window: 1,000,000 tokens (1M)
  • Max output tokens: 100,000 tokens (100K)
  • Knowledge cutoff: March 2026
  • Reasoning effort levels: low, medium, high

Compared to its predecessor GPT-5.4 at 2.50/2.50/10.00 (input/output), GPT-5.5 delivers a 20% reduction in both input and output pricing while adding a larger context window and improved reasoning performance. That's not a minor upgrade — it's a meaningful cost reduction for teams processing millions of tokens daily.

Understanding Reasoning Tokens#

GPT-5.5 is a reasoning model, which means it can "think" through complex problems before generating its final response. This is powerful, but it comes with a pricing nuance you need to understand.

What Are Reasoning Tokens?#

When GPT-5.5 encounters a complex task — multi-step math, code debugging, logical analysis — it generates internal reasoning tokens before producing the visible output. These reasoning tokens represent the model's chain-of-thought process. You don't see them in the response (unless you request them via the reasoning parameter), but they still count toward your bill.

How Are Reasoning Tokens Billed?#

Reasoning tokens are billed at the output token rate of $8.00 per million tokens. This is critical to understand because a single API call might generate significantly more tokens than what appears in the response.

For example, if you ask GPT-5.5 to solve a complex coding problem:

  • Input tokens: 500 (your prompt)
  • Reasoning tokens: 3,000 (internal thinking)
  • Output tokens: 1,500 (visible response)
  • Total billed output: 4,500 tokens (reasoning + output)

Your actual cost for output in this case is based on 4,500 tokens, not just the 1,500 you see.

Controlling Costs with reasoning_effort#

OpenAI provides a reasoning_effort parameter that lets you control how much thinking the model does:

  • low — Minimal reasoning. Best for straightforward tasks like text generation, summarization, or simple Q&A. Uses fewer reasoning tokens, keeping costs down.
  • medium — Balanced reasoning. Good for moderately complex tasks like code generation, data analysis, or structured extraction.
  • high — Maximum reasoning. Use for complex math, multi-step logic, advanced code debugging, or tasks where accuracy is paramount.
python
from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-5.5",
    reasoning_effort="low",  # "low", "medium", or "high"
    messages=[
        {"role": "user", "content": "Summarize this article in 3 bullet points."}
    ]
)

Pro tip: Default to low or medium for most production workloads. Reserve high for tasks where you've verified that increased reasoning actually improves output quality. Many developers over-spend by leaving reasoning at high for tasks that don't benefit from it.

The 1M Context Window Advantage#

GPT-5.5's 1 million token context window is one of its standout features. To put that in perspective, 1M tokens is roughly equivalent to:

  • ~750,000 words of text
  • ~15 full-length novels
  • An entire medium-sized codebase
  • Hundreds of pages of documentation

This opens up use cases that were previously impractical or required complex chunking strategies:

  • Full codebase analysis — Feed an entire repository into a single prompt for comprehensive code review or refactoring suggestions.
  • Long document processing — Analyze complete legal contracts, research papers, or financial reports without splitting them up.
  • Extended conversations — Maintain context across very long multi-turn conversations without losing earlier context.
  • RAG with large retrieval sets — Include more retrieved documents in your prompt for better-informed responses.

The 100K max output limit is equally generous, allowing GPT-5.5 to generate substantial responses — complete reports, lengthy code files, or detailed analyses — in a single API call.

Automatic Caching: 75% Off Repeated Input#

OpenAI's automatic caching is a game-changer for GPT-5.5 pricing, and it requires zero effort on your part.

How It Works#

When you send a request to the API, OpenAI automatically caches the input tokens. If a subsequent request shares the same prefix (the beginning of the prompt matches), those cached tokens are billed at just **0.50permilliontokensa750.50 per million tokens** — a 75% discount from the standard 2.00 rate.

This happens automatically. No special API parameters, no cache management, no configuration. OpenAI handles it behind the scenes.

When Caching Kicks In#

Caching is most effective when you have:

  • System prompts — If you use the same system prompt across multiple requests (which most applications do), those tokens get cached after the first call.
  • Few-shot examples — Static examples in your prompt are cached automatically.
  • Shared context — Any repeated prefix across requests benefits from caching.
  • Multi-turn conversations — Earlier messages in the conversation history that remain unchanged are cached.

Real Impact on Costs#

Consider an application with a 2,000-token system prompt that handles 10,000 requests per day:

  • Without caching: 2,000 × 10,000 = 20M input tokens × 2.00/M=2.00/M = **40.00/day**
  • With caching: 2,000 × 10,000 = 20M cached tokens × 0.50/M=0.50/M = **10.00/day**

That's 30savedperday30 saved per day — 900 per month — just from automatic caching of the system prompt. The savings scale linearly with volume.

Batch API: 50% Off for Async Workloads#

If your workload doesn't require real-time responses, OpenAI's Batch API offers a flat 50% discount on GPT-5.5 pricing:

ComponentStandard PriceBatch API Price
Input tokens$2.00/MTok$1.00/MTok
Output tokens$8.00/MTok$4.00/MTok

When to Use Batch API#

The Batch API processes requests asynchronously with a completion window of up to 24 hours (though most batches complete much faster). It's ideal for:

  • Content generation at scale — Generating product descriptions, blog drafts, or marketing copy in bulk.
  • Data extraction and classification — Processing large datasets where real-time response isn't needed.
  • Evaluation and testing — Running model evaluations across thousands of test cases.
  • Embedding generation — Batch processing documents for search or RAG pipelines.

You can combine Batch API with automatic caching for even deeper savings — cached input tokens in batch mode cost just 0.25/MTok(500.25/MTok (50% of the already-discounted 0.50 cached rate).

Save 45% with Crazyrouter#

Here's where it gets interesting. Crazyrouter offers GPT-5.5 at 55% of OpenAI's official pricing — that's a 45% discount with zero compromises on quality or reliability.

Crazyrouter GPT-5.5 Pricing#

ComponentOpenAI OfficialCrazyrouterSavings
Input tokens$2.00/MTok$1.10/MTok45% off
Output tokens$8.00/MTok$4.40/MTok45% off

How to Switch#

Switching to Crazyrouter takes about 30 seconds. You just change the base_url in your existing OpenAI SDK setup:

Python (OpenAI SDK):

python
from openai import OpenAI

client = OpenAI(
    api_key="your-crazyrouter-api-key",
    base_url="https://crazyrouter.com/v1"
)

response = client.chat.completions.create(
    model="gpt-5.5",
    reasoning_effort="medium",
    messages=[
        {"role": "user", "content": "Explain quantum computing in simple terms."}
    ]
)

print(response.choices[0].message.content)

cURL:

bash
curl https://crazyrouter.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-crazyrouter-api-key" \
  -d '{
    "model": "gpt-5.5",
    "reasoning_effort": "medium",
    "messages": [
      {"role": "user", "content": "Explain quantum computing in simple terms."}
    ]
  }'

Node.js:

javascript
import OpenAI from "openai";

const client = new OpenAI({
    apiKey: "your-crazyrouter-api-key",
    baseURL: "https://crazyrouter.com/v1"
});

const response = await client.chat.completions.create({
    model: "gpt-5.5",
    reasoning_effort: "medium",
    messages: [
        { role: "user", content: "Explain quantum computing in simple terms." }
    ]
});

The API is fully compatible with OpenAI's specification — same request format, same response format, same streaming support. Your existing code works as-is with just the URL change.

Cost Scenarios: Real-World Examples#

Let's walk through three realistic scenarios to see how GPT-5.5 costs play out in practice, including the impact of reasoning tokens.

Scenario 1: Customer Support Chatbot#

Setup: 1,000-token system prompt, average 500-token user message, 800-token response, low reasoning effort (~200 reasoning tokens), 50,000 requests/day.

ComponentTokens/RequestDaily TokensOpenAI CostCrazyrouter Cost
Input (cached system)1,00050M$25.00$13.75
Input (user message)50025M$50.00$27.50
Reasoning (low)20010M$80.00$44.00
Output80040M$320.00$176.00
Daily Total$475.00$261.25
Monthly Total$14,250$7,837

Monthly savings with Crazyrouter: $6,413

Scenario 2: Code Review Pipeline (Batch API)#

Setup: Analyzing pull requests in batch mode. Average 10,000-token code input, high reasoning effort (~5,000 reasoning tokens), 2,000-token review output, 500 PRs/day.

ComponentTokens/RequestDaily TokensBatch API CostCrazyrouter Cost
Input10,0005M$5.00$5.50
Reasoning (high)5,0002.5M$10.00$11.00
Output2,0001M$4.00$4.40
Daily Total$19.00$20.90

Note: For batch workloads, OpenAI's native Batch API (50% off) can be cheaper than Crazyrouter's standard pricing. Choose based on your latency requirements — Batch API is async, Crazyrouter is real-time.

Scenario 3: Document Analysis with Large Context#

Setup: Analyzing 200,000-token legal documents, medium reasoning effort (~8,000 reasoning tokens), 5,000-token summary output, 100 documents/day. Same document template means ~50,000 tokens cached.

ComponentTokens/RequestDaily TokensOpenAI CostCrazyrouter Cost
Input (cached prefix)50,0005M$2.50$1.38
Input (unique content)150,00015M$30.00$16.50
Reasoning (medium)8,0000.8M$6.40$3.52
Output5,0000.5M$4.00$2.20
Daily Total$42.90$23.60
Monthly Total$1,287$708

Monthly savings with Crazyrouter: $579

GPT-5.5 vs GPT-5.4 vs GPT-5: Which Model Should You Choose?#

Here's how GPT-5.5 stacks up against its siblings:

FeatureGPT-5GPT-5.5GPT-5.4
Input price$1.25/MTok$2.00/MTok$2.50/MTok
Output price$10.00/MTok$8.00/MTok$10.00/MTok
Context window256K1M256K
Max output32K100K32K
ReasoningBasicAdvancedAdvanced
Best forCost-sensitive workloadsBalanced performanceMaximum capability

When to Choose GPT-5.5#

  • You need the 1M context window for large documents or codebases
  • You want strong reasoning at a lower output price than GPT-5.4
  • You need 100K max output for generating long-form content
  • You want the best price-to-performance ratio in the GPT-5 family

When to Choose GPT-5#

  • Cost is the primary concern and you don't need the extended context
  • Your tasks are straightforward and don't require heavy reasoning
  • Input volume is high but output is minimal (GPT-5's $1.25 input rate wins)

When to Choose GPT-5.4#

  • You need absolute peak performance regardless of cost
  • Your tasks require the most advanced reasoning capabilities
  • You're working on research or complex analysis where marginal quality improvements matter

GPT-5.5 hits the sweet spot for most production applications — it's 20% cheaper on output than GPT-5.4 while offering a 4x larger context window and comparable reasoning quality.

Key Takeaways#

  1. GPT-5.5 pricing is 2.00/MTokinput,2.00/MTok input, 8.00/MTok output — 20% cheaper than GPT-5.4 on both dimensions.

  2. Reasoning tokens are billed at the output rate ($8.00/MTok). Use reasoning_effort to control costs — default to low or medium unless you need deep reasoning.

  3. Automatic caching saves 75% on repeated input tokens with zero configuration. Design your prompts with stable prefixes to maximize cache hits.

  4. Batch API cuts costs by 50% for async workloads. Combine with caching for maximum savings.

  5. Crazyrouter offers GPT-5.5 at 55% of official pricing (1.10/1.10/4.40 per MTok) — switch with a single line of code by changing base_url.

  6. The 1M context window and 100K max output make GPT-5.5 uniquely suited for large document processing, full codebase analysis, and long-form generation.

Get Started with GPT-5.5#

Ready to start building with GPT-5.5? Here's how:

  • OpenAI direct: Sign up at platform.openai.com and use model name gpt-5.5
  • Via Crazyrouter (45% off): Sign up at crazyrouter.com, get your API key, and set base_url="https://crazyrouter.com/v1" in your OpenAI SDK

Crazyrouter supports all GPT-5.5 features including streaming, function calling, reasoning effort control, and the full 1M context window. No feature compromises — just lower prices.

👉 Get your Crazyrouter API key →


Last updated: April 27, 2026

Disclaimer: Pricing information is based on publicly available data from OpenAI as of the publication date. Prices may change without notice. Crazyrouter pricing is subject to Crazyrouter's terms of service. Always verify current pricing on the respective provider's website before making purchasing decisions. Token counts in scenarios are estimates and actual usage may vary based on specific inputs and model behavior.

Related Articles