
"GPT-5 API Complete Guide: Features, Pricing, and Code Examples"
GPT-5 represents OpenAI's biggest leap since GPT-4. Longer context, better reasoning, native tool use, and multimodal capabilities that actually work in production. If you're building with the OpenAI API, here's what you need to know.
What's New in GPT-5#
Key Improvements Over GPT-4#
| Feature | GPT-4o | GPT-5 | GPT-5.2 |
|---|---|---|---|
| Context Window | 128K | 256K | 256K |
| Output Tokens | 16K | 32K | 64K |
| Reasoning | Good | Excellent | Best-in-class |
| Tool Use | Basic | Native | Advanced |
| Vision | ✅ | ✅ Enhanced | ✅ Enhanced |
| Audio | ✅ | ✅ Native | ✅ Native |
| Code Generation | Good | Very Good | Excellent |
| Instruction Following | Good | Excellent | Excellent |
| Latency (TTFT) | ~300ms | ~400ms | ~350ms |
What Makes GPT-5 Different#
- 256K Context Window — Process entire codebases, long documents, or extended conversations without truncation
- Native Tool Use — Function calling is deeply integrated, not bolted on. Fewer hallucinated tool calls, better parameter extraction
- Improved Reasoning — Chain-of-thought is built into the model, not just prompted. Complex multi-step problems see 30-40% accuracy improvement
- Better Code — Significant improvements in code generation, debugging, and understanding large codebases
- Multimodal Native — Vision and audio aren't separate models anymore; they're part of the core architecture
Getting Started with GPT-5 API#
Python Setup#
from openai import OpenAI
# Direct via OpenAI
client = OpenAI(api_key="sk-your-openai-key")
# Or via Crazyrouter (recommended for cost savings)
client = OpenAI(
api_key="your-crazyrouter-key",
base_url="https://api.crazyrouter.com/v1"
)
Basic Chat Completion#
response = client.chat.completions.create(
model="gpt-5",
messages=[
{
"role": "system",
"content": "You are a senior software engineer. Be concise and practical."
},
{
"role": "user",
"content": "Explain the difference between async/await and Promises in JavaScript. When should I use each?"
}
],
temperature=0.7,
max_tokens=1024
)
print(response.choices[0].message.content)
Streaming Response#
stream = client.chat.completions.create(
model="gpt-5",
messages=[
{"role": "user", "content": "Write a Python async web scraper with rate limiting"}
],
stream=True
)
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)
Node.js Setup#
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: 'your-crazyrouter-key',
baseURL: 'https://api.crazyrouter.com/v1'
});
async function chat(prompt) {
const response = await client.chat.completions.create({
model: 'gpt-5',
messages: [{ role: 'user', content: prompt }],
temperature: 0.7
});
return response.choices[0].message.content;
}
const answer = await chat('Design a database schema for a multi-tenant SaaS app');
console.log(answer);
Function Calling (Tool Use)#
GPT-5's function calling is significantly more reliable:
import json
tools = [
{
"type": "function",
"function": {
"name": "search_products",
"description": "Search for products in the catalog",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"},
"category": {
"type": "string",
"enum": ["electronics", "clothing", "books", "home"],
"description": "Product category filter"
},
"max_price": {"type": "number", "description": "Maximum price in USD"},
"in_stock": {"type": "boolean", "description": "Only show in-stock items"}
},
"required": ["query"]
}
}
},
{
"type": "function",
"function": {
"name": "get_order_status",
"description": "Check the status of an order",
"parameters": {
"type": "object",
"properties": {
"order_id": {"type": "string", "description": "The order ID"}
},
"required": ["order_id"]
}
}
}
]
response = client.chat.completions.create(
model="gpt-5",
messages=[
{"role": "user", "content": "Find me wireless headphones under $100 that are in stock"}
],
tools=tools,
tool_choice="auto"
)
tool_call = response.choices[0].message.tool_calls[0]
print(f"Function: {tool_call.function.name}")
print(f"Args: {tool_call.function.arguments}")
# Function: search_products
# Args: {"query": "wireless headphones", "category": "electronics", "max_price": 100, "in_stock": true}
Vision (Image Analysis)#
response = client.chat.completions.create(
model="gpt-5",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this architecture diagram? List all services and their connections."},
{
"type": "image_url",
"image_url": {"url": "https://example.com/architecture.png"}
}
]
}
],
max_tokens=2048
)
print(response.choices[0].message.content)
GPT-5 vs GPT-4o vs Claude Opus 4.5#
Performance Benchmarks#
| Benchmark | GPT-5 | GPT-5.2 | GPT-4o | Claude Opus 4.5 | Gemini 3 Pro |
|---|---|---|---|---|---|
| MMLU | 90.2% | 92.1% | 87.5% | 91.8% | 89.5% |
| HumanEval | 93.5% | 95.2% | 90.2% | 94.1% | 88.7% |
| MATH | 78.3% | 82.1% | 72.6% | 80.5% | 76.2% |
| GPQA | 65.8% | 69.2% | 58.3% | 67.1% | 62.4% |
Practical Comparison#
| Use Case | Best Model | Why |
|---|---|---|
| General chat | GPT-5 | Best balance of quality and speed |
| Complex reasoning | GPT-5.2 or Claude Opus | Highest accuracy on hard problems |
| Code generation | GPT-5.2 | Best HumanEval scores |
| Long documents | Claude Opus 4.5 | 200K context with better recall |
| Cost-sensitive | GPT-5-mini | 90% quality at 20% cost |
| Real-time apps | Gemini 2.5 Flash | Lowest latency |
Pricing#
Official OpenAI Pricing#
| Model | Input (1M tokens) | Output (1M tokens) | Context |
|---|---|---|---|
| GPT-5 | $5.00 | $15.00 | 256K |
| GPT-5.2 | $10.00 | $30.00 | 256K |
| GPT-5-mini | $0.50 | $1.50 | 128K |
| GPT-4o | $2.50 | $10.00 | 128K |
Crazyrouter Pricing (Save 20-40%)#
| Model | Input (1M tokens) | Output (1M tokens) | Savings vs Official |
|---|---|---|---|
| GPT-5 | $3.50 | $10.50 | 30% |
| GPT-5.2 | $7.00 | $21.00 | 30% |
| GPT-5-mini | $0.35 | $1.05 | 30% |
| GPT-4o | $1.75 | $7.00 | 30% |
Monthly Cost Estimates#
| Usage Level | GPT-5 (Official) | GPT-5 (Crazyrouter) | Savings |
|---|---|---|---|
| Light (1M tokens/day) | ~$600/mo | ~$420/mo | $180/mo |
| Medium (10M tokens/day) | ~$6,000/mo | ~$4,200/mo | $1,800/mo |
| Heavy (100M tokens/day) | ~$60,000/mo | ~$42,000/mo | $18,000/mo |
At scale, switching to Crazyrouter saves thousands per month with zero code changes — just swap the base URL.
Migration from GPT-4#
What Changes#
# Before (GPT-4o)
response = client.chat.completions.create(
model="gpt-4o",
messages=[...],
max_tokens=4096
)
# After (GPT-5) — just change the model name
response = client.chat.completions.create(
model="gpt-5",
messages=[...],
max_tokens=4096 # Can now go up to 32K
)
The API is backward compatible. In most cases, you literally just change the model string. But there are a few things to watch:
- Output format may differ — GPT-5 tends to be more structured in its responses. If you're parsing output with regex, test thoroughly.
- Tool calling is stricter — GPT-5 follows function schemas more precisely. Loose schemas that worked with GPT-4 might need tightening.
- System prompts matter more — GPT-5 follows system instructions more faithfully. Vague prompts get vague results.
- Cost increase — GPT-5 is 2x the price of GPT-4o. Consider GPT-5-mini for cost-sensitive workloads.
Migration Checklist#
- Update model string to
gpt-5 - Test all function calling schemas
- Review system prompts for clarity
- Update max_tokens limits if needed
- Run regression tests on output parsing
- Monitor costs for the first week
- Consider GPT-5-mini for non-critical paths
Best Practices#
- Use GPT-5-mini for simple tasks — classification, extraction, summarization don't need the full model
- Stream everything — GPT-5's TTFT is slightly higher; streaming masks the latency
- Leverage the 256K context — but be strategic. Put important info at the beginning and end
- Use structured outputs —
response_format: { type: "json_object" }for reliable parsing - Cache aggressively — same input = same output at temperature 0. Cache it.
- Batch non-urgent requests — OpenAI's batch API gives 50% discount
FAQ#
Is GPT-5 worth the upgrade from GPT-4o?#
For complex reasoning, code generation, and tool use — yes. For simple chat and classification, GPT-5-mini or even GPT-4o-mini is more cost-effective.
Can I use GPT-5 for free?#
ChatGPT Free tier includes limited GPT-5 access. For API usage, there's no free tier, but Crazyrouter offers pay-as-you-go with no minimum.
What's the difference between GPT-5 and GPT-5.2?#
GPT-5.2 is the latest iteration with improved reasoning and code generation. It costs 2x more than GPT-5. Use it when accuracy on hard problems justifies the cost.
Does GPT-5 support fine-tuning?#
Not yet for GPT-5. Fine-tuning is available for GPT-4o and GPT-4o-mini. OpenAI has indicated GPT-5 fine-tuning is coming.
How does GPT-5 handle rate limits?#
Same tier system as GPT-4. Tier 1 starts at 500 RPM. Through Crazyrouter, rate limits are pooled across providers, giving you effectively higher throughput.
Summary#
GPT-5 is a meaningful upgrade for developers building AI-powered applications. The improved reasoning, native tool use, and 256K context make it the go-to model for complex tasks. For cost-sensitive workloads, GPT-5-mini delivers most of the capability at a fraction of the price.
Get started with GPT-5 through Crazyrouter — same OpenAI SDK, 30% lower costs, and access to Claude, Gemini, and 300+ other models with the same API key.


