
Can Claude Code Build a World Cup 2026 Match Predictor? A Real Crazyrouter API Test
Can Claude Code Build a World Cup 2026 Match Predictor? A Real Crazyrouter API Test#
Claude Code is most interesting when it builds a working project, not when it only generates snippets. For this experiment, I used a Claude Code-style workflow to build a small World Cup 2026 match predictor: fixture data, a transparent Elo/Poisson prediction model, charts, and real LLM-generated match previews through Crazyrouter.
This is an analytics and developer workflow demo. It is not betting advice.
The live API layer was tested through:
Base URL: https://cn.crazyrouter.com/v1
Date: 2026-06-12 UTC
Endpoints tested:
- GET /v1/models
- POST /v1/chat/completions

Why this is a good Claude Code project#
A sports prediction dashboard is a useful coding-agent demo because it forces the agent to do more than write UI code:
- collect and normalize messy event data;
- define a transparent model;
- generate reproducible outputs;
- create charts;
- call an LLM API for structured explanations;
- handle model failure modes like malformed JSON.
That makes it a better article than a generic “Claude Code project ideas” list. The project produces real files, real numbers and real API traces.
Public data signals used#
For this demo, I used public fixture information available on World Cup schedule pages and focused on early tournament matches around June 11-18, 2026. The fetched pages confirmed the tournament format: 48 teams, 104 matches, 16 stadiums, group-stage fixtures, and early fixtures such as Mexico vs South Africa, South Korea vs Czechia, USA vs Paraguay and Brazil vs Morocco.
The local demo data is intentionally small and reproducible:
fixtures.json— nine early fixtures;team_ratings_seed.json— seed Elo-style ratings for demo purposes;predictions.json— model output for each match;crazyrouter_test_results.json— raw LLM API test summary.
The model is deliberately simple. A production system should replace the seed ratings with live Elo/FIFA ranking data, injury data, lineups, odds, travel, rest days and result updates.
Prediction method: transparent beats mystical#
The predictor combines three simple ideas:
- Elo-style rating difference for relative team strength;
- host boost for host-nation matches in relevant venues;
- Poisson expected goals for scoreline probabilities.
The core idea looks like this:
def expected_goals(rating_for, rating_against, host_boost=0):
diff = (rating_for + host_boost) - rating_against
return max(0.45, min(2.65, 1.28 + diff / 520))
Then the script enumerates likely scorelines, estimates home/draw/away probabilities, and saves a prediction object for each fixture.
Sample predictions#
Here are the first seven predictions from the demo run:
| Date | Match | Group | xG | Home / Draw / Away | Pick |
|---|---|---|---|---|---|
| 2026-06-11 | Mexico vs South Africa | A | 1.68-0.98 | 55.8% / 24.2% / 19.9% | Mexico |
| 2026-06-11 | South Korea vs Czechia | A | 1.35-1.21 | 40.1% / 26.6% / 33.3% | South Korea |
| 2026-06-12 | USA vs Paraguay | D | 1.53-1.14 | 48.2% / 25.5% / 26.3% | USA |
| 2026-06-13 | Brazil vs Morocco | C | 1.64-0.92 | 54.9% / 24.7% / 20.4% | Brazil |
| 2026-06-13 | Qatar vs Canada | B | 1.1-1.46 | 27.8% / 26.1% / 46.1% | Canada |
| 2026-06-14 | Germany vs Curaçao | E | 2.08-0.48 | 75.1% / 17.7% / 7.2% | Germany |
| 2026-06-14 | Netherlands vs Japan | F | 1.53-1.03 | 49.5% / 25.7% / 24.8% | Netherlands |

Notice the USA vs Paraguay example: the model gives the USA a slight edge, but not a dominant one. The home win probability is 48.2%, while draw plus Paraguay win is 51.8%. That is exactly the kind of uncertainty a good demo should preserve.
Crazyrouter real API test#
After generating model probabilities, the script asked four model routes to turn the USA vs Paraguay model output into a structured JSON match preview.
Task:
Generate structured match preview JSON for USA vs Paraguay using model output.
Do not give betting advice.
Return valid compact JSON only.
The Crazyrouter model-list endpoint also worked:
GET /v1/models
HTTP status: 200
Latency: 502 ms
Models returned: 262
The chat-completion test results:
| Model | HTTP | Latency | Prompt tokens | Completion tokens | Total tokens | Valid JSON |
|---|---|---|---|---|---|---|
gpt-4o-mini | 200 | 4.53s | 305 | 292 | 597 | True |
gpt-5.5 | 200 | 6.28s | 600 | 278 | 878 | True |
gemini-2.5-flash | 200 | 11.33s | 327 | 356 | 683 | False |
qwen-plus | 200 | 8.13s | 325 | 177 | 502 | True |

What the model outputs showed#
Three routes returned valid JSON:
gpt-4o-minigpt-5.5qwen-plus
One route returned HTTP 200 but failed JSON validation:
gemini-2.5-flash
That failure is not a problem for the article; it is the article. In production AI workflows, HTTP 200 is not enough. A coding agent should build validators, retries and fallback routes.
A valid output example from gpt-4o-mini:
{
"match": {
"date": "2026-06-12",
"group": "D",
"home": "USA",
"away": "Paraguay",
"venue": "Los Angeles Stadium"
},
"predicted_edge": "USA",
"probability_summary": {
"home_win": 0.482,
"draw": 0.255,
"away_win": 0.263
},
"key_factors": {
"home_rating": 1785,
"away_rating": 1710,
"home_host_boost": 55,
"away_host_boost": 0,
"xg_home": 1.53,
"xg_away": 1.14
},
"uncertainty": {
"top_scores": [
{"score": "1-1", "probability": 0.121},
{"score": "1-0", "probability": 0.107},
{"score": "2-1", "probability": 0.093},
{"score": "2-0", "probability": 0.081},
{"score": "0-1", "probability": 0.079}
]
},
"disclaimer": "Analytics demo only; not betting advice."
}
What Claude Code should build around the LLM call#
A robust Claude Code workflow should not simply call an LLM and trust the text. It should build an execution pipeline:
- read fixture and rating data;
- calculate deterministic probabilities;
- send compact model output to Crazyrouter;
- require JSON schema;
- validate JSON;
- retry or fallback when invalid;
- save raw responses for audit;
- render dashboard and article assets.
This is the practical reason to use an AI API gateway. You can route the same task through multiple models while keeping one API base URL and one client shape.
Minimal reproduction structure#
The demo folder contains:
generated/worldcup_predictor_claude_code_20260612/
├── build_worldcup_predictor.py
├── fixtures.json
├── team_ratings_seed.json
├── predictions.json
├── crazyrouter_test_results.json
├── crazyrouter_raw_gpt-4o-mini.json
├── crazyrouter_raw_gpt-5.5.json
├── crazyrouter_raw_gemini-2.5-flash.json
├── crazyrouter_raw_qwen-plus.json
├── image_urls.json
└── claude-code-world-cup-2026-match-predictor-crazyrouter.md
The key API client configuration is standard OpenAI-compatible code:
from openai import OpenAI
client = OpenAI(
api_key="YOUR_CRAZYROUTER_API_KEY",
base_url="https://cn.crazyrouter.com/v1",
)
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "Return only valid JSON. No betting advice."},
{"role": "user", "content": prompt},
],
temperature=0.2,
max_tokens=360,
)
Do not add UTM parameters to API base URLs. UTM belongs on human-facing links only.
Engineering lessons#
1. Claude Code is good at project scaffolding#
This is a strong Claude Code project because the deliverable is not one function. It is a pipeline with data files, modeling code, API calls, raw logs and visual assets.
2. Simple models are better for explainability#
For an article/demo, a transparent Elo/Poisson model is better than a black-box “AI says Brazil wins” answer. The LLM should explain the model, not replace the model.
3. JSON validation is mandatory#
The Gemini route returned HTTP 200 but invalid JSON in this constrained test. That is a real production lesson: validate the output and fallback automatically.
4. A gateway makes experiments easier#
The same request shape worked across GPT, Gemini and Qwen-style routes. That makes model comparison and fallback easier than wiring every provider directly.
Limitations#
This demo intentionally keeps the model small. It does not include:
- live injury reports;
- confirmed lineups;
- rest days and travel fatigue;
- bookmaker odds;
- live xG;
- player-level form;
- calibrated historical backtesting.
So the predictions should be read as a coding and data workflow demonstration, not as betting recommendations.
Next improvements#
A stronger version could add:
- automatic fixture updates from a World Cup API;
- real Elo/FIFA ranking ingestion;
- Monte Carlo group-stage simulation;
- odds movement monitoring;
- automatic fallback when JSON validation fails;
- a static web dashboard;
- daily World Cup prediction articles generated from the same pipeline.
Final verdict#
Yes: Claude Code can build a credible World Cup 2026 match predictor demo, as long as the project is framed correctly.
The right architecture is:
Claude Code builds the pipeline.
Deterministic code calculates probabilities.
Crazyrouter provides multi-model explanations and structured output.
Validators decide whether the LLM output is usable.
In this run, Crazyrouter returned 262 models from /v1/models, and three of four tested model routes produced valid JSON match previews. The failed JSON route is useful evidence too: real AI projects need validation and fallback, not just prompts.
If you want to build Claude Code projects that compare multiple AI models through one API layer, start with Crazyrouter.




