What Actually Predicts a Deep World Cup Run
The predictors that hold up, and the ones that fool everyone.
Every World Cup produces the same genre of pre-tournament writing: a team has momentum, a team is peaking at the right time, a team has the feel-good factor. Most of it is noise. When you ask the harder question — what features measured before a tournament actually correlate with how far a team goes — the list is short, a little boring, and mostly about quality you could have measured years earlier. Here is what holds up, and what reliably fools people.
The single best predictor is also the dullest
If you are allowed one number, take the team's pre-tournament power rating. Across past tournaments, the sides that go deep are overwhelmingly the sides that were strong going in — the ones with the best players, accumulated over a long sample of results against good opposition. This is not a profound insight, but it is the one most often abandoned in favour of a story. A well-built Elo- or SPI-style rating already compresses squad quality, recent results that actually matter, and strength of opposition into a single defensible estimate, which is exactly why the forecasting pipeline starts there; see how models rate the field before a World Cup is played and the mechanics in soccer power ratings: Elo, SPI and why they disagree.
Everything else on the list of genuine predictors is, in effect, a refinement of "is this team good?" — picking up the specific kinds of good that a tournament rewards.
Depth, because a tournament is a war of attrition
A World Cup is not one match; the winner now plays seven, increasingly in heat and on short rest, with a longer bracket than past editions. That structure rewards the bench as much as the starting eleven. A side whose drop-off from first-choice to replacement is gentle can absorb a suspension, a hamstring, or a third match in nine days without collapsing; a top-heavy side cannot. The case is strong enough that it has its own article: squad depth and the five-sub era. The five-substitute rule, now permanent, amplifies the effect — managers can change a third of the outfield, so having a useful eleventh through fourteenth man is worth more than it used to be, a shift traced in the five-substitutions era.
A goalkeeper who steals matches
Knockout football is low-scoring and high-variance, which hands an outsized role to the one player who can win a tie almost single-handedly. A goalkeeper having a hot tournament — saving shots worth more than they "should" concede, by the post-shot xG measure — is a recurring feature of deep runs and shoot-out survival. It is partly skill and partly a hot streak that will not repeat, but over a six-or-seven-match run a keeper a notch above expectation can be the difference between the quarter-finals and the final. The metric that captures this is explained in our coverage of post-shot xG and goalkeeping; in shoot-outs specifically, keeper preparation is its own edge, covered in preparing for penalties.
The draw: luck you can measure
Two equally strong teams can have very different expected finishes purely because of the bracket they land in. A favourable group and a soft side of the knockout tree can add a real chunk to a team's title and semi-final probabilities, while an equally good side buried among other contenders sees its odds cut. This is not a flaw in the tournament; it is variance you can quantify, and it is precisely why forecasters simulate the specific bracket rather than reason about it in the abstract — the logic is laid out in how a World Cup simulation works. The practical lesson: when comparing two contenders, look at the path, not just the rating.
What does not predict a deep run
The mirror image of the list above is the set of features that grab attention and carry little signal. Friendlies and warm-up form top the list. Pre-tournament fixtures are played at half-throttle, with experimental line-ups and managers hiding their hand; the results are a poor guide to anything, a point our work on whether pre-season form predicts anything makes at the club level and which transfers directly. A 4–0 win over a weak opponent the week before a tournament shifts public opinion far more than it shifts a sensible rating.
Momentum is the next culprit — the idea that a team arriving "in form" carries that form into the tournament. The honest reading is that most of what looks like momentum is either pre-existing quality (already in the rating) or a hot streak that regresses. A run of wins against modest qualifying opposition tells you less than the breathless coverage implies. And recency in general — the last vivid result, the latest squad headline — is weighted by human observers far above its true predictive value, which is one reason a disciplined model, indifferent to narrative, often disagrees with the consensus in useful ways. For the broader point about why models and intuition diverge, see why league projection models disagree.
Putting it together without fooling yourself
A defensible pre-tournament read on any team is roughly: start from the power rating, adjust upward for genuine depth and a goalkeeper who can win ties, adjust for the kindness or cruelty of the draw, and then resist almost every temptation to move further on the basis of friendlies, narrative, or the last thing you saw. None of this names a winner — and it should not, because a short tournament is variance-soaked by design. What it does is keep your expectations anchored to the features that have actually mattered, rather than the ones that merely feel like they should. The dark-horse case, which is really the search for a team whose quality the consensus has under-weighted, builds directly on this foundation in how to spot a dark horse.
Sources & further reading
- Free textbook: Chapter 20: Predictive Modeling — the theory behind this, at DataField.dev.
- StatsBomb — event data and research on chance quality, goalkeeping (post-shot xG) and squad-level performance.
- FBref — international xG, xGA, squad usage and minutes, useful for assessing depth and keeper performance.
- ClubElo — a worked example of a results-based rating system, the backbone of any "how good are they really" estimate.
- FIFA — tournament format, scheduling structure and the official world ranking.


