Login
Back to Blog
GPT-5 API Parameters Guide 2026: max_tokens, reasoning_effort, verbosity, and Unsupported Parameter Fixes

GPT-5 API Parameters Guide 2026: max_tokens, reasoning_effort, verbosity, and Unsupported Parameter Fixes

C
Crazyrouter Team
June 8, 2026
1 viewsEnglishGuide
Share:

GPT-5 API Parameters Guide 2026: max_tokens, reasoning_effort, verbosity, and Unsupported Parameter Fixes#

If you are moving an app from older chat models to GPT-5, the hardest part is often not the prompt. It is the request body.

Fields that worked for older OpenAI-style chat models can behave differently on newer reasoning models. Some providers accept them. Some ignore them. Some reject the whole request with an unsupported parameter error. That is painful when your production app depends on retries, streaming, tools, and stable latency.

This guide explains the GPT-5 API parameters that matter in 2026: max_tokens, max_completion_tokens, reasoning_effort, verbosity, presence_penalty, frequency_penalty, logprobs, seed, stop, and tool-related fields. It is based on real API validation against a GPT-5 model list exposed by a Crazyrouter-compatible endpoint on June 8, 2026.

The short version: keep GPT-5 requests simple, prefer max_completion_tokens, use reasoning_effort and verbosity intentionally, and strip legacy parameters before routing to strict upstream providers.

GPT-5 API parameters guide architecture

GPT-5 unsupported parameter cleanup flow

Why GPT-5 API parameters break older apps#

Older chat apps often send the same default payload to every model:

json
{
  "model": "gpt-5-mini",
  "messages": [{"role": "user", "content": "Summarize this."}],
  "temperature": 0.7,
  "top_p": 1,
  "presence_penalty": 0,
  "frequency_penalty": 0,
  "max_tokens": 512
}

That looks harmless. The problem is that reasoning models are not always parameter-compatible with classic chat models. A field with a default value can still be treated as present. If a strict upstream does not support that field, it may reject the request even when the value is 0.

This matters for three common cases:

  1. SDK defaults: a client library may add fields automatically.
  2. Gateway routing: one request may route to different upstream providers.
  3. Model fallback: a fallback model may not support the same parameter set.

A good GPT-5 API strategy is not “send everything and hope.” It is “send the smallest payload that preserves intent.”

Tested GPT-5 models and practical result#

In a June 2026 validation run, the endpoint returned these GPT-5-related models:

  • gpt-5-nano
  • gpt-5-mini
  • gpt-5
  • gpt-5.1
  • gpt-5.4
  • gpt-5.5
  • gpt-5-codex
  • gpt-5.1-codex-mini
  • gpt-5.1-codex
  • gpt-5.1-codex-max

Each model was tested with a minimal Chat Completions request and then with one extra parameter at a time. The tested fields included presence_penalty, frequency_penalty, max_tokens, temperature, top_p, stop, n, seed, logprobs, top_logprobs, response_format, reasoning_effort, verbosity, tools, and tool_choice.

The endpoint accepted most tested parameters. No stable unsupported parameter failure was reproduced in that run. The remaining failures looked like temporary availability or routing noise, not parameter rejection.

That sounds comforting, but it should not make teams careless. A compatibility layer may accept parameters today while another upstream route rejects them tomorrow. Production gateways should still normalize GPT-5 payloads before sending requests upstream.

GPT-5 parameter compatibility table#

ParameterRecommended statusWhy
modelKeepRequired
messagesKeepRequired for Chat Completions
streamKeepCommon and useful
max_completion_tokensPreferBetter fit for reasoning models
max_tokensConvertLegacy field; may fail on strict reasoning routes
reasoning_effortKeepImportant GPT-5 reasoning control
verbosityKeepUseful output-length/style control
temperatureKeep with cautionUsually supported, but not always meaningful for reasoning
top_pKeep with cautionSame as temperature
presence_penaltyStripLow value for reasoning; can trigger strict upstream errors
frequency_penaltyStripSame risk as presence penalty
logprobsStrip unless requiredOften unsupported or expensive
top_logprobsStrip unless requiredSame as logprobs
seedStripDeterminism is not guaranteed across routes
best_ofStripLegacy/completions-style parameter
stopGreylistUseful, but can be model-specific
nGreylistCan multiply cost and may be unsupported
toolsKeepNeeded for tool calling
tool_choiceKeepNeeded for tool control
response_formatKeep with validationUseful for JSON output

max_tokens vs max_completion_tokens#

For older chat models, many apps use max_tokens to cap output length. GPT-5-style reasoning models can spend hidden tokens on reasoning before producing visible output. That is why newer APIs often prefer max_completion_tokens.

A safe gateway rule is simple:

js
function normalizeTokenLimit(body) {
  if (body.max_tokens && !body.max_completion_tokens) {
    body.max_completion_tokens = body.max_tokens;
  }
  delete body.max_tokens;
  return body;
}

This keeps the user intent: “do not let this request run forever.” But it avoids sending a legacy field to a strict model route.

If you operate a gateway, log both values during migration. Many teams discover that old SDKs, plugins, or wrappers still send max_tokens even after the app code was updated.

reasoning_effort: when to use low, medium, or high#

reasoning_effort tells a reasoning model how hard it should think before answering. It is not just a quality toggle. It affects latency, cost, and answer depth.

Use this practical rule:

reasoning_effortBest forAvoid for
lowclassification, short rewrite, simple extraction, routingdeep debugging, complex math, architecture decisions
mediumnormal chat, coding help, product Q&A, support agentsultra-low-latency tasks
highhard coding tasks, multi-step analysis, legal/financial review drafts, complex planninghigh-volume cheap traffic

Example:

js
const response = await client.chat.completions.create({
  model: "gpt-5-mini",
  messages: [
    { role: "system", content: "Be concise and accurate." },
    { role: "user", content: "Find the bug in this retry function." }
  ],
  reasoning_effort: "medium",
  max_completion_tokens: 700
});

Do not set high everywhere. It can improve hard tasks, but it may waste budget on simple requests.

verbosity: the underrated GPT-5 parameter#

verbosity controls how much detail the model should include. It is different from max_completion_tokens.

  • max_completion_tokens is a hard budget limit.
  • verbosity is a style and detail preference.

For support bots, use low or medium. For tutorials, reports, and code reviews, use medium or high.

json
{
  "model": "gpt-5",
  "messages": [
    {"role": "user", "content": "Explain this API migration plan."}
  ],
  "reasoning_effort": "medium",
  "verbosity": "high",
  "max_completion_tokens": 1200
}

A useful pattern is to map product surfaces to verbosity:

Surfaceverbosity
Search snippetlow
Chat supportmedium
Developer docshigh
Internal logs summarylow
Code review explanationmedium or high

The safest GPT-5 cleanup function#

Here is a production-friendly JavaScript normalizer for GPT-5 requests:

js
function normalizeGpt5Request(body) {
  const normalized = { ...body };

  // Prefer the newer completion budget field.
  if (normalized.max_tokens && !normalized.max_completion_tokens) {
    normalized.max_completion_tokens = normalized.max_tokens;
  }
  delete normalized.max_tokens;

  // Strip legacy or high-risk fields for strict reasoning routes.
  delete normalized.presence_penalty;
  delete normalized.frequency_penalty;
  delete normalized.logprobs;
  delete normalized.top_logprobs;
  delete normalized.best_of;
  delete normalized.seed;

  // Optional: remove risky fields for stricter routes.
  // delete normalized.stop;
  // delete normalized.n;

  return normalized;
}

If you support both GPT-5 and classic models, apply this only to GPT-5-like model names:

js
function isGpt5Model(model) {
  return /^gpt-5(\.|-|$)/.test(model);
}

function normalizeByModel(body) {
  if (isGpt5Model(body.model)) {
    return normalizeGpt5Request(body);
  }
  return body;
}

How to use GPT-5 through a unified API gateway#

With a unified gateway, you can call GPT-5-style models through one OpenAI-compatible client. This helps when you need fallback, cost control, or multi-region routing.

With Crazyrouter, you can access many AI models through one API key while keeping an OpenAI-compatible code path. Use the base URL below in code. Do not add UTM parameters to API endpoints.

python
from openai import OpenAI

client = OpenAI(
    api_key="CRAZYROUTER_API_KEY",
    base_url="https://crazyrouter.com/v1"
)

response = client.chat.completions.create(
    model="gpt-5-mini",
    messages=[
        {"role": "system", "content": "You are a careful API migration assistant."},
        {"role": "user", "content": "Convert this old GPT request body to a GPT-5-safe payload."}
    ],
    reasoning_effort="medium",
    verbosity="medium",
    max_completion_tokens=800
)

print(response.choices[0].message.content)

For browser-facing links, add UTM tracking:

How to debug unsupported parameter errors#

When you see an unsupported parameter error, do not start by changing the prompt. Start by shrinking the payload.

Use this checklist:

  1. Send only model, messages, and max_completion_tokens.
  2. Add reasoning_effort.
  3. Add verbosity.
  4. Add temperature or top_p only if needed.
  5. Add tools only after the basic call works.
  6. Avoid presence_penalty, frequency_penalty, logprobs, top_logprobs, seed, and best_of.
  7. If max_tokens exists, convert it to max_completion_tokens.

A minimal safe request looks like this:

json
{
  "model": "gpt-5-mini",
  "messages": [
    {"role": "user", "content": "Return a short migration checklist."}
  ],
  "reasoning_effort": "low",
  "verbosity": "medium",
  "max_completion_tokens": 500
}

If that works, add fields one by one. This makes the failing parameter obvious.

For most production apps, this whitelist is enough:

js
const GPT5_ALLOWED = new Set([
  "model",
  "messages",
  "stream",
  "max_completion_tokens",
  "temperature",
  "top_p",
  "reasoning_effort",
  "verbosity",
  "tools",
  "tool_choice",
  "parallel_tool_calls",
  "response_format",
  "metadata"
]);

A strict gateway can remove anything outside the whitelist:

js
function whitelistGpt5Params(body) {
  const output = {};
  for (const [key, value] of Object.entries(body)) {
    if (GPT5_ALLOWED.has(key)) output[key] = value;
  }
  return output;
}

This approach is less flexible, but safer for enterprise traffic.

FAQ: GPT-5 API parameters#

1. Should I use max_tokens or max_completion_tokens for GPT-5?#

Use max_completion_tokens for GPT-5-style reasoning models. If your old app sends max_tokens, convert it to max_completion_tokens before routing the request.

2. Does GPT-5 support presence_penalty and frequency_penalty?#

Some compatible routes may accept them, but production gateways should strip them for GPT-5. They are low-value for reasoning tasks and can trigger unsupported parameter errors on strict upstream routes.

3. What does reasoning_effort do in the GPT-5 API?#

reasoning_effort controls how much reasoning the model should spend before answering. Use low for simple tasks, medium for normal work, and high for hard coding or analysis tasks.

4. What does verbosity do in GPT-5?#

verbosity controls how detailed the answer should be. It is a style control, not a hard token cap. Use it together with max_completion_tokens.

5. How do I fix an unsupported parameter error with GPT-5?#

Start with a minimal payload. Remove legacy fields such as presence_penalty, frequency_penalty, logprobs, top_logprobs, seed, and best_of. Convert max_tokens to max_completion_tokens. Then add fields back one at a time.

Final recommendation#

GPT-5 API migration is mostly a request-normalization problem. The model can be powerful, but your production app needs stable payload rules.

Use this default strategy:

  • Prefer max_completion_tokens over max_tokens.
  • Keep reasoning_effort and verbosity.
  • Strip legacy penalty and logprob fields.
  • Treat stop, n, temperature, and top_p as greylist fields.
  • Test the smallest payload first.
  • Use a gateway when you need fallback, cost control, and multi-model routing.

That gives you the best balance: fewer unsupported parameter errors, cleaner GPT-5 requests, and more predictable production behavior.

Implementation Guides

Related Posts