Tutorials

Expected Points from xG in Python: Build an xPts Table to Find Over- and Under-Achievers

Convert a season of match xG into the points each team deserved.

A league table is a record of what happened. It rewards the goal that took a deflection, the penalty won in stoppage time, the save that should not have been made. An expected-points table asks a different and often more useful question: given the chances each team created and conceded, how many points should they have taken? Build one in Python and you get a clean view of who is riding their luck and who is quietly being robbed by it — and the whole thing is about thirty lines of code.

What expected points means

Expected points, or xPts, converts a match's expected goals into the points a team would earn on average if that match were replayed many times. The logic runs in three steps. First, take the two xG totals — home and away. Second, turn them into the probability of each result: a home win, a draw, or an away win. Third, weight those probabilities by the points each result is worth:

Expected points for one match
xPts(home) = 3 × P(home win) + 1 × P(draw) + 0 × P(away win)
xPts(away) = 3 × P(away win) + 1 × P(draw) + 0 × P(home win)

A team that creates a 1.8 xG chance profile against an opponent's 0.4 is very likely to win, so its xPts for that match sits close to 3. A genuinely even game splits the points three ways and lands each team somewhere near the 1.3–1.5 region. Sum a team's match xPts across a whole season and you have its expected-points total — the points its underlying performances deserved. Subtract that from its real points and the gap is the story: positive means over-achieving, negative means under-achieving.

The one piece we still need is the middle step — turning two xG numbers into win, draw, and loss probabilities. For that we borrow the Poisson goals model.

From two xG numbers to match probabilities

The standard trick is to treat each team's xG as the mean of a Poisson distribution over goals, then build the matrix of every scoreline's probability. If you have not seen this machinery before, we derive it from scratch in build a Poisson goals model in Python; here we apply it directly to a single match's expected goals.

import numpy as np
from scipy.stats import poisson


def match_probabilities(home_xg, away_xg, max_goals=10):
    """Win/draw/loss probabilities from a match's home and away xG.

    Treats each team's xG as a Poisson mean and assumes the two
    scores are independent (the simple model's key assumption).
    """
    goals = np.arange(0, max_goals + 1)
    home_probs = poisson.pmf(goals, home_xg)   # P(0), P(1), ... home goals
    away_probs = poisson.pmf(goals, away_xg)   # ... and away goals

    # Joint scoreline matrix: entry [i, j] = P(home i, away j).
    score_matrix = np.outer(home_probs, away_probs)

    home_win = np.tril(score_matrix, -1).sum()  # home goals > away goals
    draw     = np.trace(score_matrix)           # equal goals
    away_win = np.triu(score_matrix,  1).sum()  # away goals > home goals
    return home_win, draw, away_win


def match_xpts(home_xg, away_xg):
    """Expected points for the home and away team in one match."""
    hw, dr, aw = match_probabilities(home_xg, away_xg)
    home_xpts = 3 * hw + 1 * dr
    away_xpts = 3 * aw + 1 * dr
    return round(home_xpts, 3), round(away_xpts, 3)


# Sanity check: an even game should split close to a draw-ish value.
print(match_xpts(1.4, 1.4))   # roughly symmetric, each ~1.3-1.4
print(match_xpts(2.2, 0.5))   # home heavily favoured, home xPts near 3

The three outcome probabilities are read straight off the scoreline matrix: everything below the diagonal is a home win, the diagonal is a draw, everything above is an away win. Feed in 1.4 against 1.4 and the two xPts values come out near-symmetric; feed in a lopsided 2.2 against 0.5 and the home figure climbs toward 3. That is the entire engine. Everything from here is bookkeeping over a season of matches.

A simulation alternative

If you would rather not lean on the Poisson closed form — or you want to extend the model later with correlated scores or extra-time rules — you can get the same probabilities by simulating. Draw a large number of random scorelines from the two Poisson means and count how often each result occurs.

def match_xpts_sim(home_xg, away_xg, n=200_000, seed=0):
    """Expected points by Monte Carlo simulation of scorelines."""
    rng = np.random.default_rng(seed)
    home_goals = rng.poisson(home_xg, n)
    away_goals = rng.poisson(away_xg, n)

    home_win = np.mean(home_goals > away_goals)
    draw     = np.mean(home_goals == away_goals)
    away_win = np.mean(home_goals < away_goals)

    home_xpts = 3 * home_win + 1 * draw
    away_xpts = 3 * away_win + 1 * draw
    return round(home_xpts, 3), round(away_xpts, 3)


print(match_xpts_sim(2.2, 0.5))   # should closely match the Poisson version

With a couple of hundred thousand simulated matches the answer converges to the analytic one within rounding. The simulation is slower and slightly noisy, but it is easier to extend, and seeing the two methods agree is a reassuring check that neither has a bug.

Reading your own season of match xG

To build a season table you need one row per match with four pieces of information: the home team, the away team, the home xG, and the away xG. Export that from your own data source into a CSV that looks like the columns below, and the rest is automatic. The toy table here is invented purely so the script runs end to end — replace it with your real file and nothing else changes.

import pandas as pd

# Expected CSV columns: home, away, home_xg, away_xg  (one row per match).
# In practice:  matches = pd.read_csv("season_xg.csv")
#
# --- TOY DATA (invented, for illustration only) -------------------
matches = pd.DataFrame([
    # home,    away,     home_xg, away_xg
    ("Reds",   "Blues",  2.1, 0.6),
    ("Greens", "Reds",   0.9, 1.7),
    ("Blues",  "Greens", 1.3, 1.2),
    ("Reds",   "Greens", 2.4, 0.5),
    ("Blues",  "Reds",   0.8, 1.5),
    ("Greens", "Blues",  1.1, 1.0),
], columns=["home", "away", "home_xg", "away_xg"])

One honesty note before we run it: these team names and xG values are fictional, picked only to make the code executable. Do not read them as a real league. When you point the script at genuine match xG — your own export from a public source — the numbers become meaningful; until then they are scaffolding.

Summing into an xPts table

Now walk every match, compute both teams' xPts, and accumulate per team. The same loop tallies actual points from the scorelines if you have them, so the final table can put expected and real points side by side.

from collections import defaultdict

xpts_total = defaultdict(float)

for row in matches.itertuples(index=False):
    h_xpts, a_xpts = match_xpts(row.home_xg, row.away_xg)
    xpts_total[row.home] += h_xpts
    xpts_total[row.away] += a_xpts

table = (
    pd.Series(xpts_total, name="xpts")
      .sort_values(ascending=False)
      .round(2)
      .rename_axis("team")
      .reset_index()
)
print(table)

That prints an expected-points league table: each team and the points its chances created and conceded say it deserved. If your data source also gives you final scores, add a parallel tally of real points in the same loop and merge it in, so you can compute the column that actually matters — the gap.

Finding the over- and under-achievers

The interesting number is real points minus expected points. A large positive gap flags a team taking more from its matches than the underlying play warrants — clinical finishing, hot goalkeeping, a run of won coin-flips — and such teams tend to regress. A large negative gap flags the reverse: a side performing better than its results, with reason to expect improvement. Assuming you have merged a pts column of real points, the comparison is one line.

# Assumes `table` now also has a real-points column named "pts".
table["pts_minus_xpts"] = (table["pts"] - table["xpts"]).round(2)
table = table.sort_values("pts_minus_xpts", ascending=False)

print("Over-achievers (luckiest):")
print(table.head())
print("\nUnder-achievers (most unlucky):")
print(table.tail())

Sorted this way, the top of the table is the teams living on borrowed luck and the bottom is the teams being punished by it. Both are flags rather than verdicts — a positive gap can also reflect genuine, repeatable finishing skill — but they are the right place to start asking questions. This is the same diagnostic we build without code in build an xG-difference league table; the Python version simply scales to a full season of match-level files and lets you swap the probability engine.

Where xPts can mislead

Two caveats keep the table honest. First, it inherits every limitation of the xG model feeding it — if the underlying expected-goals figures are shaky, so is everything downstream. Second, the gap between real and expected points is not pure luck. Part of it is finishing skill, part is goalkeeping quality, part is game state, and only the remainder is noise. A persistent over-performance across several seasons is more likely to be a real edge than a fluke. Different modellers split that gap differently, which is one concrete reason two reputable projections of the same league can disagree — the tensions we lay out in why league projection models disagree. Treat the xPts table as a sharper lens on the season, not a replacement for the one that is actually played.

Sources & further reading

  • Free textbook: Chapter 9: Expected Threat (xT) and Ball Progression — the theory behind this, at DataField.dev.
  • Understat — match-level xG for the major European leagues, the natural source for the CSV this tutorial reads.
  • FBref — match logs with xG (via Opta) for a wide range of competitions.
  • StatsBomb open data — event data you can aggregate into per-match xG totals yourself.
  • StatsBomb — background on xG modelling and what shot features drive chance quality.