Chapter 8: Expected Assists & Chance Creation

Measuring Creativity

Goals require assists. While xG transformed how we evaluate finishers, Expected Assists (xA) and related metrics have revolutionized how we assess creative players. This chapter explores the full spectrum of chance creation analytics.

Assists are even noisier than goals—they depend on both the pass quality AND the teammate's finishing. A perfect through ball converted into a tap-in goal earns the same "assist" as a simple square pass that a striker bangs in from 25 yards. xA separates the quality of the chance created from the quality of the finish.

Why Chance Creation Metrics Matter

Fair Creator Evaluation: Credit creators for pass quality, not teammate finishing
Identify Playmakers: Find players who create chances even without assist totals
Scouting: Discover undervalued creators whose teammates don't finish well
Tactical Analysis: Understand how teams create chances

Expected Assists (xA)

xA is the xG value of shots resulting from a player's passes. If you play a pass that leads to a 0.3 xG shot, you earn 0.3 xA—regardless of whether the shot goes in.

xA vs. Traditional Assists

Traditional Assists - Problems

Binary outcome (0 or 1)
Depends on teammate finishing
Simple tap-in pass = brilliant through ball
High variance, low predictive power

xA - Advantages

Continuous scale (0.00 to ~0.95)
Independent of finishing quality
Values pass quality appropriately
More stable, better predictor

# Calculate Expected Assists from statsbombpy import sb import pandas as pd # Load all World Cup 2022 matches matches = sb.matches(competition_id=43, season_id=106) all_events = [] for mid in matches["match_id"]: events = sb.events(match_id=mid) events["match_id"] = mid all_events.append(events) events_df = pd.concat(all_events, ignore_index=True) # Get shots with xG shots = events_df[events_df["type"] == "Shot"].copy() # Key passes are passes that led to shots # In StatsBomb, we can identify these via pass_shot_assist key_passes = events_df[ (events_df["type"] == "Pass") & ((events_df["pass_shot_assist"] == True) | (events_df["pass_goal_assist"] == True)) ].copy() print(f"Total key passes (created shots): {len(key_passes)}") print(f"Total assists (led to goals): {key_passes[\"pass_goal_assist\"].sum()}") # Calculate xA by player # For each key pass, we need to find the resulting shot's xG # This requires matching pass_id to shot's key_pass_id # Simplified: Calculate team-level xA from chances created team_xa = shots.groupby("team").agg( shots=("shot_statsbomb_xg", "count"), total_xG=("shot_statsbomb_xg", "sum"), goals=("shot_outcome", lambda x: (x == "Goal").sum()) ).reset_index() # xA is the xG of shots from assisted chances # Most shots are from key passes, so team xG ≈ sum of xA received print("\nTeam Chance Creation:") print(team_xa.sort_values("total_xG", ascending=False))

# Calculate Expected Assists
library(StatsBombR)
library(dplyr)

# Load World Cup data
comps <- FreeCompetitions() %>%
  filter(competition_id == 43, season_id == 106)
matches <- FreeMatches(comps)
events <- free_allevents(MatchesDF = matches)

# Find passes that led to shots
# In StatsBomb, shots have pass info; we need to link back

# Get shots with their xG
shots <- events %>%
  filter(type.name == "Shot") %>%
  select(id, match_id, player.name, team.name, shot.statsbomb_xg,
         shot.outcome.name, shot.key_pass_id)

# Get key passes (passes leading to shots)
key_passes <- events %>%
  filter(type.name == "Pass") %>%
  select(id, match_id, player.name, team.name,
         pass.goal_assist, pass.shot_assist,
         location.x, location.y,
         pass.end_location.x, pass.end_location.y)

# Link key passes to shots
# pass.shot_assist is TRUE for passes leading to shots
shot_assists <- key_passes %>%
  filter(pass.shot_assist == TRUE | pass.goal_assist == TRUE)

# Calculate xA by player (using shots data)
# xA = sum of xG from shots created by player passes
player_xa <- shots %>%
  filter(!is.na(shot.key_pass_id)) %>%
  # Would need to join with key_passes to get passer name
  # For simplicity, we group by the shooter's team assists
  group_by(team.name) %>%
  summarise(
    chances_created = n(),
    total_xA = sum(shot.statsbomb_xg, na.rm = TRUE),
    goals_from_chances = sum(shot.outcome.name == "Goal")
  )

print("Team Chance Creation (xA):")
print(player_xa %>% arrange(desc(total_xA)))
chapter8-xa-basics

Output

Calculating Expected Assists basics

Player xA Analysis

# Detailed player xA analysis # Identify passes that created shots key_pass_cols = ["pass_shot_assist", "pass_goal_assist", "pass_cross", "pass_through_ball", "pass_cut_back"] passes = events_df[events_df["type"] == "Pass"].copy() # Key passes only key_passes = passes[ (passes["pass_shot_assist"] == True) | (passes["pass_goal_assist"] == True) ].copy() # Player creativity stats player_creativity = key_passes.groupby(["player", "team"]).agg( matches=("match_id", "nunique"), key_passes=("type", "count"), assists=("pass_goal_assist", "sum"), crosses_to_shots=("pass_cross", lambda x: x.fillna(False).sum()), through_balls=("pass_through_ball", lambda x: x.fillna(False).sum()), cutbacks=("pass_cut_back", lambda x: x.fillna(False).sum()) ).reset_index() player_creativity = player_creativity[player_creativity["key_passes"] >= 3].copy() player_creativity["kp_per_match"] = ( player_creativity["key_passes"] / player_creativity["matches"]).round(2) player_creativity["assist_conv"] = ( player_creativity["assists"] / player_creativity["key_passes"] * 100).round(1) print("Top Chance Creators (Key Passes):") print(player_creativity.sort_values("key_passes", ascending=False).head(15))

# Detailed player xA analysis
# Using FBref-style calculation where xA = xG of shots from player passes

# Identify all pass -> shot connections
passes_to_shots <- events %>%
  filter(type.name == "Pass",
         pass.shot_assist == TRUE | pass.goal_assist == TRUE) %>%
  select(pass_id = id, passer = player.name, passer_team = team.name,
         match_id, minute,
         pass.goal_assist, pass.shot_assist,
         pass.cross, pass.through_ball, pass.cut_back)

# Match with shot xG (would need shot.key_pass_id linkage)
# For demo, we calculate key pass stats

player_creativity <- passes_to_shots %>%
  group_by(passer, passer_team) %>%
  summarise(
    matches = n_distinct(match_id),
    key_passes = n(),
    assists = sum(pass.goal_assist, na.rm = TRUE),
    crosses_to_shots = sum(pass.cross, na.rm = TRUE),
    through_balls_to_shots = sum(pass.through_ball, na.rm = TRUE),
    cutbacks_to_shots = sum(pass.cut_back, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  filter(key_passes >= 3) %>%
  mutate(
    key_passes_per_match = round(key_passes / matches, 2),
    assist_conversion = round(assists / key_passes * 100, 1)
  ) %>%
  arrange(desc(key_passes))

print("Top Chance Creators (Key Passes):")
print(head(player_creativity, 15))
chapter8-player-xa

Output

Analyzing player chance creation

xA Overperformance

Just like with xG, players can over or underperform their xA. High assists relative to xA usually regresses:

# xA vs Actual Assists analysis creator_performance = pd.DataFrame({ "player": ["De Bruyne", "Bruno Fernandes", "Odegaard", "Saka", "Maddison", "Trent AA"], "team": ["Man City", "Man United", "Arsenal", "Arsenal", "Tottenham", "Liverpool"], "assists": [16, 8, 10, 11, 12, 4], "xA": [14.2, 10.5, 8.8, 7.2, 9.1, 7.8], "key_passes": [115, 95, 78, 68, 82, 92] }) creator_performance["assists_minus_xA"] = ( creator_performance["assists"] - creator_performance["xA"]) creator_performance["xa_per_kp"] = ( creator_performance["xA"] / creator_performance["key_passes"]).round(3) creator_performance["assist_rate"] = ( creator_performance["assists"] / creator_performance["key_passes"] * 100).round(1) print("Creator Performance vs xA:") print(creator_performance.sort_values("assists_minus_xA", ascending=False)[ ["player", "assists", "xA", "assists_minus_xA", "assist_rate"]]) print("\nInterpretation:") print("- Saka: +3.8 over xA - teammates finishing well, may regress") print("- De Bruyne: +1.8 - slight overperformance, sustainable") print("- Bruno: -2.5 under xA - unlucky, should bounce back") print("- Trent: -3.8 under xA - teammates missing his chances")

# xA vs Actual Assists analysis
# When Assists >> xA, likely to regress
# When Assists << xA, bounce-back candidate

# Simulated season data (use actual xA from FBref/StatsBomb)
creator_performance <- data.frame(
  player = c("De Bruyne", "Bruno Fernandes", "Odegaard",
             "Saka", "Maddison", "Trent AA"),
  team = c("Man City", "Man United", "Arsenal",
           "Arsenal", "Tottenham", "Liverpool"),
  assists = c(16, 8, 10, 11, 12, 4),
  xA = c(14.2, 10.5, 8.8, 7.2, 9.1, 7.8),
  key_passes = c(115, 95, 78, 68, 82, 92)
)

creator_performance <- creator_performance %>%
  mutate(
    assists_minus_xA = assists - xA,
    xa_per_kp = round(xA / key_passes, 3),
    assist_rate = round(assists / key_passes * 100, 1)
  )

print("Creator Performance vs xA:")
print(creator_performance %>%
        select(player, assists, xA, assists_minus_xA, assist_rate) %>%
        arrange(desc(assists_minus_xA)))

cat("\nInterpretation:\n")
cat("- Saka: +3.8 over xA - teammates finishing well, may regress\n")
cat("- De Bruyne: +1.8 - slight overperformance, sustainable\n")
cat("- Bruno: -2.5 under xA - unlucky, should bounce back\n")
cat("- Trent: -3.8 under xA - teammates missing his chances\n")
chapter8-xa-performance

Output

Analyzing xA performance and regression candidates

Shot-Creating Actions (SCA)

Key passes only capture the final pass. But what about the dribble that beat a defender before the pass? Or the ball recovery that started the attack? Shot-Creating Actions capture more of the creative process.

What Counts as an SCA?

Shot-Creating Action (SCA): The two offensive actions directly leading to a shot.

Pass leading to shot (key pass)
Pass leading to the key pass
Dribble leading to shot
Foul won leading to shot
Defensive action that starts the sequence

# Calculate Shot-Creating Actions # SCA = the two offensive actions before each shot # Get passes that created shots sca_passes = events_df[ (events_df["type"] == "Pass") & (events_df["pass_shot_assist"] == True) ].copy() # Get successful dribbles (often precede chances) sca_dribbles = events_df[ (events_df["type"] == "Dribble") & (events_df["dribble_outcome"] == "Complete") ].copy() # Combine for SCA totals sca_by_player = pd.concat([ sca_passes.assign(sca_type="pass"), sca_dribbles.assign(sca_type="dribble") ]).groupby(["player", "team"]).agg( matches=("match_id", "nunique"), sca_passes=("sca_type", lambda x: (x == "pass").sum()), sca_dribbles=("sca_type", lambda x: (x == "dribble").sum()) ).reset_index() # Weight: passes count full, dribbles count 1/3 (simplified) sca_by_player["total_sca"] = ( sca_by_player["sca_passes"] + sca_by_player["sca_dribbles"] / 3) sca_by_player["sca_per_90"] = ( sca_by_player["total_sca"] / sca_by_player["matches"]).round(2) print("Shot-Creating Actions Leaders:") print(sca_by_player.sort_values("sca_per_90", ascending=False).head(15))

# Calculate Shot-Creating Actions
library(dplyr)

# For each shot, identify the two preceding offensive actions
# This requires possession sequence tracking

# Get shots and their preceding events
shots_with_context <- events %>%
  filter(type.name == "Shot") %>%
  select(shot_id = id, match_id, minute, second, team.name,
         shot.statsbomb_xg, shot.outcome.name)

# For each shot, look at the 2 prior events from same team
# (Simplified - actual SCA requires possession tracking)

# Alternative: Use pass.shot_assist and pass.goal_assist
# along with dribble info

sca_by_player <- events %>%
  filter(
    # Passes leading to shots
    (type.name == "Pass" & pass.shot_assist == TRUE) |
    # Dribbles completed before shots
    (type.name == "Dribble" & dribble.outcome.name == "Complete")
  ) %>%
  group_by(player.name, team.name) %>%
  summarise(
    matches = n_distinct(match_id),
    sca_passes = sum(type.name == "Pass" & pass.shot_assist == TRUE, na.rm = TRUE),
    sca_dribbles = sum(type.name == "Dribble", na.rm = TRUE),
    .groups = "drop"
  ) %>%
  mutate(
    total_sca = sca_passes + sca_dribbles / 3,  # Weight dribbles less
    sca_per_90 = round(total_sca / matches, 2)
  ) %>%
  arrange(desc(sca_per_90))

print("Shot-Creating Actions Leaders:")
print(head(sca_by_player, 15))
chapter8-sca

Output

Calculating Shot-Creating Actions

Goal-Creating Actions (GCA)

GCA is the same concept but for goals instead of shots—more valuable but rarer:

# Goal-Creating Actions gca_passes = events_df[ (events_df["type"] == "Pass") & (events_df["pass_goal_assist"] == True) ].copy() gca_by_player = gca_passes.groupby(["player", "team"]).agg( matches=("match_id", "nunique"), assists=("type", "count") ).reset_index() gca_by_player["gca_per_90"] = ( gca_by_player["assists"] / gca_by_player["matches"]).round(2) print("Goal-Creating Actions (Assists) Leaders:") print(gca_by_player.sort_values("assists", ascending=False).head(10))

# Goal-Creating Actions
gca_by_player <- events %>%
  filter(
    (type.name == "Pass" & pass.goal_assist == TRUE)
  ) %>%
  group_by(player.name, team.name) %>%
  summarise(
    matches = n_distinct(match_id),
    assists = n(),
    .groups = "drop"
  ) %>%
  mutate(
    gca_per_90 = round(assists / matches, 2)
  ) %>%
  arrange(desc(assists))

print("Goal-Creating Actions (Assists) Leaders:")
print(head(gca_by_player, 10))
chapter8-gca

Output

Calculating Goal-Creating Actions

xGChain and xGBuildup

StatsBomb introduced xGChain and xGBuildup to credit players involved earlier in attacking sequences.

Definitions

Metric	Definition	What It Measures
xGChain	Total xG of all possessions a player was involved in	Overall attacking involvement
xGBuildup	xGChain minus xG from shots and xA from key passes	Contribution to build-up play (not final actions)

Why xGBuildup Matters

Some players contribute heavily to attacks without getting shots or assists. Deep-lying playmakers, progressive ball carriers, and pivot players add value that xG and xA miss. xGBuildup captures this.

# Calculate xGChain and xGBuildup (conceptual) # Full implementation requires possession sequence tracking # Simplified: Credit players for being in possessions that end in shots # Mark possession changes events_sorted = events_df.sort_values(["match_id", "minute", "second"]).copy() events_sorted["new_poss"] = events_sorted["possession_team"] != events_sorted["possession_team"].shift(1) events_sorted["possession_id"] = events_sorted.groupby("match_id")["new_poss"].cumsum() # Find possessions with shots poss_with_shots = events_sorted.groupby(["match_id", "possession_id"]).filter( lambda x: (x["type"] == "Shot").any() ) # Get xG of each possession poss_xg = poss_with_shots.groupby(["match_id", "possession_id"]).agg( poss_xg=("shot_statsbomb_xg", "sum") ).reset_index() # Merge back to get all players involved poss_with_shots = poss_with_shots.merge(poss_xg, on=["match_id", "possession_id"]) # xGChain: total xG from possessions player touched xgchain = poss_with_shots.groupby(["player", "team"]).agg( matches=("match_id", "nunique"), possessions=("possession_id", "nunique"), xGChain=("poss_xg", "sum") ).reset_index() # Get player xG to calculate xGBuildup player_xg = events_df[events_df["type"] == "Shot"].groupby("player")[ "shot_statsbomb_xg"].sum().reset_index() player_xg.columns = ["player", "xG"] xgchain = xgchain.merge(player_xg, on="player", how="left") xgchain["xG"] = xgchain["xG"].fillna(0) xgchain["xGBuildup"] = xgchain["xGChain"] - xgchain["xG"] print("xGChain and xGBuildup Leaders:") print(xgchain.sort_values("xGBuildup", ascending=False).head(15))

# Calculate xGChain and xGBuildup
# Requires tracking possession sequences

# For each possession sequence ending in a shot:
# 1. Identify all players involved
# 2. Credit each with the shot's xG (for xGChain)
# 3. Exclude shooter and assister for xGBuildup

# Simplified example using possession tracking
possession_sequences <- events %>%
  arrange(match_id, minute, second) %>%
  group_by(match_id) %>%
  mutate(
    # New possession when team changes or ball goes out
    new_possession = team.name != lag(team.name, default = ""),
    possession_id = cumsum(new_possession)
  ) %>%
  ungroup()

# Find possessions ending in shots
shot_possessions <- possession_sequences %>%
  group_by(match_id, possession_id) %>%
  filter(any(type.name == "Shot")) %>%
  mutate(
    possession_xg = sum(shot.statsbomb_xg[type.name == "Shot"], na.rm = TRUE)
  ) %>%
  ungroup()

# Credit all players in possession with xGChain
xgchain_by_player <- shot_possessions %>%
  filter(!is.na(player.name)) %>%
  group_by(player.name, team.name) %>%
  summarise(
    matches = n_distinct(match_id),
    possessions_involved = n_distinct(paste(match_id, possession_id)),
    xGChain = sum(possession_xg, na.rm = TRUE) / n(),  # Avg xG per action
    .groups = "drop"
  )

# Subtract xG and xA for xGBuildup
player_xg <- events %>%
  filter(type.name == "Shot") %>%
  group_by(player.name) %>%
  summarise(xG = sum(shot.statsbomb_xg, na.rm = TRUE))

# Join and calculate
xgchain_final <- xgchain_by_player %>%
  left_join(player_xg, by = "player.name") %>%
  mutate(
    xG = ifelse(is.na(xG), 0, xG),
    xGBuildup = xGChain - xG  # Would also subtract xA
  ) %>%
  arrange(desc(xGBuildup))

print("xGChain and xGBuildup Leaders:")
print(head(xgchain_final, 15))
chapter8-xgchain

Output

Calculating xGChain and xGBuildup

Creator Profiles

Different creators have different styles. Understanding these profiles helps with scouting and tactical analysis.

Types of Creators

Through Ball Specialist

Example: Kevin De Bruyne

High through ball %
Creates 1v1 chances
Requires fast runners
High xA per key pass

Crossing Specialist

Example: Trent Alexander-Arnold

High cross volume
Creates headed chances
Requires aerial threat
Lower xA per key pass

Cutback Specialist

Example: Bukayo Saka

Gets to byline
Pulls ball back across goal
High quality chances
Requires late runners

# Analyze creator profiles by pass type key_passes = events_df[ (events_df["type"] == "Pass") & ((events_df["pass_shot_assist"] == True) | (events_df["pass_goal_assist"] == True)) ].copy() creator_profiles = key_passes.groupby(["player", "team"]).agg( key_passes=("type", "count"), through_balls=("pass_through_ball", lambda x: x.fillna(False).sum()), crosses=("pass_cross", lambda x: x.fillna(False).sum()), cutbacks=("pass_cut_back", lambda x: x.fillna(False).sum()) ).reset_index() creator_profiles = creator_profiles[creator_profiles["key_passes"] >= 5].copy() # Calculate percentages creator_profiles["through_ball_pct"] = ( creator_profiles["through_balls"] / creator_profiles["key_passes"] * 100).round(1) creator_profiles["cross_pct"] = ( creator_profiles["crosses"] / creator_profiles["key_passes"] * 100).round(1) creator_profiles["cutback_pct"] = ( creator_profiles["cutbacks"] / creator_profiles["key_passes"] * 100).round(1) # Classify style def classify_style(row): if row["through_ball_pct"] > 30: return "Through Ball" elif row["cross_pct"] > 50: return "Crosser" elif row["cutback_pct"] > 20: return "Cutback" return "Ground Pass" creator_profiles["style"] = creator_profiles.apply(classify_style, axis=1) print("Creator Profiles:") print(creator_profiles.sort_values("key_passes", ascending=False)[ ["player", "key_passes", "through_ball_pct", "cross_pct", "cutback_pct", "style"] ].head(15))

# Analyze creator profiles by pass type
creator_profiles <- events %>%
  filter(type.name == "Pass",
         pass.shot_assist == TRUE | pass.goal_assist == TRUE) %>%
  group_by(player.name, team.name) %>%
  summarise(
    key_passes = n(),
    through_balls = sum(pass.through_ball, na.rm = TRUE),
    crosses = sum(pass.cross, na.rm = TRUE),
    cutbacks = sum(pass.cut_back, na.rm = TRUE),
    ground_passes = sum(!pass.cross & !pass.through_ball & !pass.cut_back,
                        na.rm = TRUE),
    .groups = "drop"
  ) %>%
  filter(key_passes >= 5) %>%
  mutate(
    through_ball_pct = round(through_balls / key_passes * 100, 1),
    cross_pct = round(crosses / key_passes * 100, 1),
    cutback_pct = round(cutbacks / key_passes * 100, 1),

    # Classify style
    style = case_when(
      through_ball_pct > 30 ~ "Through Ball",
      cross_pct > 50 ~ "Crosser",
      cutback_pct > 20 ~ "Cutback",
      TRUE ~ "Ground Pass"
    )
  )

print("Creator Profiles:")
print(creator_profiles %>%
        select(player.name, key_passes, through_ball_pct,
               cross_pct, cutback_pct, style) %>%
        arrange(desc(key_passes)) %>%
        head(15))
chapter8-creator-profiles

Output

Analyzing creator profiles and styles

Chapter Summary

Key Takeaways

xA = xG of shots from player's passes - measures pass quality, not teammate finishing
Key Passes - passes directly leading to shots
Shot-Creating Actions (SCA) - the two offensive actions before each shot
Goal-Creating Actions (GCA) - the two offensive actions before each goal
xGChain - total xG from possessions a player was involved in
xGBuildup - xGChain minus xG and xA (build-up contribution)
Creator profiles - through ball vs. cross vs. cutback specialists

Creativity Metrics Hierarchy

Metric	Scope	Best For
xA	Final pass only	Evaluating assist providers
Key Passes	Passes to shots	Chance creation volume
SCA	Last 2 actions before shot	Overall offensive contribution
xGBuildup	Entire possession	Evaluating deep playmakers

Practice Exercises

Apply what you've learned about chance creation metrics with these hands-on exercises.

Exercise 8.1: Calculate Player xA Leaders

Task: Find the top 10 players by Expected Assists (xA) in a tournament. Calculate both total xA and xA per 90 minutes. Identify which players are overperforming or underperforming their xA.

Steps:

Load StatsBomb free data for World Cup 2022
Find all key passes (passes that led to shots)
Match key passes to the resulting shot's xG value
Sum xG values by passer to get xA
Compare actual assists vs xA to find over/underperformers

# Exercise 8.1 Solution: xA Leaders Analysis from statsbombpy import sb import pandas as pd # Load World Cup 2022 data matches = sb.matches(competition_id=43, season_id=106) all_events = pd.concat([ sb.events(mid).assign(match_id=mid) for mid in matches["match_id"] ]) # Get shots with xG shots = all_events[all_events["type"] == "Shot"][ ["id", "match_id", "player", "shot_statsbomb_xg", "shot_outcome", "shot_key_pass_id"] ].copy() shots.columns = ["shot_id", "match_id", "shooter", "xG", "outcome", "key_pass_id"] # Get key passes (passes leading to shots) key_passes = all_events[ (all_events["type"] == "Pass") & ((all_events["pass_shot_assist"] == True) | (all_events["pass_goal_assist"] == True)) ][["id", "match_id", "player", "team", "pass_goal_assist"]].copy() key_passes.columns = ["pass_id", "match_id", "passer", "team", "is_assist"] # Join to calculate xA merged = key_passes.merge( shots[["key_pass_id", "xG"]], left_on="pass_id", right_on="key_pass_id", how="left" ) # Aggregate by player player_xa = merged.groupby(["passer", "team"]).agg( matches=("match_id", "nunique"), key_passes=("pass_id", "count"), assists=("is_assist", "sum"), total_xA=("xG", "sum") ).reset_index() player_xa["xA_per_90"] = (player_xa["total_xA"] / player_xa["matches"]).round(2) player_xa["assists_minus_xA"] = (player_xa["assists"] - player_xa["total_xA"]).round(2) player_xa["performance"] = player_xa["assists_minus_xA"].apply( lambda x: "Overperforming" if x > 1 else ("Underperforming" if x < -1 else "As Expected") ) player_xa = player_xa[player_xa["key_passes"] >= 3] print("Top 10 Players by xA:") print(player_xa.sort_values("total_xA", ascending=False).head(10)) print("\nOver/Underperformers:") print(player_xa[player_xa["performance"] != "As Expected"][ ["passer", "assists", "total_xA", "assists_minus_xA", "performance"]])

# Exercise 8.1 Solution: xA Leaders Analysis
library(StatsBombR)
library(dplyr)

# Load World Cup 2022 data
comps <- FreeCompetitions() %>%
  filter(competition_id == 43, season_id == 106)
matches <- FreeMatches(comps)
events <- free_allevents(MatchesDF = matches)

# Get shots with their xG and key pass info
shots <- events %>%
  filter(type.name == "Shot") %>%
  select(shot_id = id, match_id, shooter = player.name,
         xG = shot.statsbomb_xg, outcome = shot.outcome.name,
         key_pass_id = shot.key_pass_id)

# Get key passes
key_passes <- events %>%
  filter(type.name == "Pass",
         pass.shot_assist == TRUE | pass.goal_assist == TRUE) %>%
  select(pass_id = id, match_id, passer = player.name,
         team = team.name, is_assist = pass.goal_assist)

# Join to get passer xA (xG of resulting shots)
player_xa <- key_passes %>%
  left_join(shots, by = c("pass_id" = "key_pass_id", "match_id")) %>%
  group_by(passer, team) %>%
  summarise(
    matches = n_distinct(match_id),
    key_passes = n(),
    assists = sum(is_assist, na.rm = TRUE),
    total_xA = sum(xG, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  mutate(
    xA_per_90 = round(total_xA / matches, 2),
    assists_minus_xA = round(assists - total_xA, 2),
    performance = case_when(
      assists_minus_xA > 1 ~ "Overperforming",
      assists_minus_xA < -1 ~ "Underperforming",
      TRUE ~ "As Expected"
    )
  ) %>%
  filter(key_passes >= 3) %>%
  arrange(desc(total_xA))

print("Top 10 Players by xA:")
print(head(player_xa, 10))

print("\nOver/Underperformers:")
print(player_xa %>%
        filter(performance != "As Expected") %>%
        select(passer, assists, total_xA, assists_minus_xA, performance))
ex81-solution

Output

Exercise 8.1: Calculate player xA and find over/underperformers

Exercise 8.2: Creator Profile Classification

Task: Classify creative players by their passing style. Determine whether each player is primarily a through ball specialist, crosser, cutback specialist, or ground passer based on their key pass types.

Requirements:

Through Ball Specialist: >25% through balls
Crosser: >40% crosses
Cutback Specialist: >15% cutbacks
Ground Passer: Default category

# Exercise 8.2 Solution: Creator Profile Classification from statsbombpy import sb import pandas as pd # Get key passes with type info key_passes = all_events[ (all_events["type"] == "Pass") & ((all_events["pass_shot_assist"] == True) | (all_events["pass_goal_assist"] == True)) ][["player", "team", "match_id", "pass_through_ball", "pass_cross", "pass_cut_back"]].copy() # Aggregate by player profiles = key_passes.groupby(["player", "team"]).agg( matches=("match_id", "nunique"), total_kp=("player", "count"), through_balls=("pass_through_ball", lambda x: x.fillna(False).sum()), crosses=("pass_cross", lambda x: x.fillna(False).sum()), cutbacks=("pass_cut_back", lambda x: x.fillna(False).sum()) ).reset_index() profiles = profiles[profiles["total_kp"] >= 5].copy() # Calculate percentages profiles["through_pct"] = (profiles["through_balls"] / profiles["total_kp"] * 100).round(1) profiles["cross_pct"] = (profiles["crosses"] / profiles["total_kp"] * 100).round(1) profiles["cutback_pct"] = (profiles["cutbacks"] / profiles["total_kp"] * 100).round(1) # Classify def classify_profile(row): if row["through_pct"] > 25: return "Through Ball Specialist" elif row["cross_pct"] > 40: return "Crosser" elif row["cutback_pct"] > 15: return "Cutback Specialist" return "Ground Passer" profiles["profile"] = profiles.apply(classify_profile, axis=1) print("Creator Profile Distribution:") print(profiles["profile"].value_counts()) print("\nTop Players by Profile:") for profile in profiles["profile"].unique(): subset = profiles[profiles["profile"] == profile].nlargest(3, "total_kp") print(f"\n{profile}:") print(subset[["player", "total_kp", "through_pct", "cross_pct", "cutback_pct"]])

# Exercise 8.2 Solution: Creator Profile Classification
library(StatsBombR)
library(dplyr)

# Get key passes with pass type info
key_passes <- events %>%
  filter(type.name == "Pass",
         pass.shot_assist == TRUE | pass.goal_assist == TRUE) %>%
  select(player.name, team.name, match_id,
         through_ball = pass.through_ball,
         cross = pass.cross,
         cutback = pass.cut_back)

# Calculate pass type percentages
creator_profiles <- key_passes %>%
  group_by(player.name, team.name) %>%
  summarise(
    matches = n_distinct(match_id),
    total_key_passes = n(),
    through_balls = sum(through_ball == TRUE, na.rm = TRUE),
    crosses = sum(cross == TRUE, na.rm = TRUE),
    cutbacks = sum(cutback == TRUE, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  filter(total_key_passes >= 5) %>%
  mutate(
    through_pct = round(through_balls / total_key_passes * 100, 1),
    cross_pct = round(crosses / total_key_passes * 100, 1),
    cutback_pct = round(cutbacks / total_key_passes * 100, 1),
    ground_pct = round(100 - through_pct - cross_pct - cutback_pct, 1),

    # Classify profile
    profile = case_when(
      through_pct > 25 ~ "Through Ball Specialist",
      cross_pct > 40 ~ "Crosser",
      cutback_pct > 15 ~ "Cutback Specialist",
      TRUE ~ "Ground Passer"
    )
  )

# Summary by profile type
print("Creator Profile Distribution:")
print(table(creator_profiles$profile))

print("\nTop Players by Profile:")
creator_profiles %>%
  group_by(profile) %>%
  slice_max(total_key_passes, n = 3) %>%
  select(player.name, profile, total_key_passes, through_pct, cross_pct, cutback_pct) %>%
  print()
ex82-solution

Output

Exercise 8.2: Classify creative players by passing style

Exercise 8.3: Shot-Creating Actions Visualization

Task: Create a visualization comparing SCA and GCA for the top creative players. Build a scatter plot with SCA per 90 on one axis and GCA per 90 on the other.

Bonus: Add player labels and size points by total minutes played.

# Exercise 8.3 Solution: SCA vs GCA Visualization import matplotlib.pyplot as plt import pandas as pd # Calculate SCA sca = all_events[ ((all_events["type"] == "Pass") & (all_events["pass_shot_assist"] == True)) | ((all_events["type"] == "Dribble") & (all_events["dribble_outcome"] == "Complete")) ].groupby("player").size().reset_index(name="sca") # Calculate GCA gca = all_events[ (all_events["type"] == "Pass") & (all_events["pass_goal_assist"] == True) ].groupby("player").size().reset_index(name="gca") # Get matches per player matches_played = all_events.groupby("player")["match_id"].nunique().reset_index(name="matches") # Merge creativity = matches_played.merge(sca, on="player", how="left") creativity = creativity.merge(gca, on="player", how="left") creativity = creativity.fillna(0) creativity["sca_per_90"] = (creativity["sca"] / creativity["matches"]).round(2) creativity["gca_per_90"] = (creativity["gca"] / creativity["matches"]).round(2) # Filter creativity = creativity[(creativity["matches"] >= 4) & (creativity["sca"] >= 5)] # Create plot fig, ax = plt.subplots(figsize=(10, 8)) scatter = ax.scatter( creativity["sca_per_90"], creativity["gca_per_90"], s=creativity["matches"] * 30, alpha=0.6, c="#1B5E20" ) # Add labels for top creators top_creators = creativity[(creativity["sca_per_90"] > 2) | (creativity["gca_per_90"] > 0.5)] for _, row in top_creators.iterrows(): ax.annotate( row["player"].split()[-1], # Last name only (row["sca_per_90"], row["gca_per_90"]), fontsize=8, alpha=0.8 ) ax.set_xlabel("SCA per 90", fontsize=12) ax.set_ylabel("GCA per 90", fontsize=12) ax.set_title("Shot-Creating vs Goal-Creating Actions\nWorld Cup 2022", fontsize=14, fontweight="bold") plt.tight_layout() plt.savefig("sca_vs_gca.png", dpi=150) plt.show()

# Exercise 8.3 Solution: SCA vs GCA Visualization
library(StatsBombR)
library(dplyr)
library(ggplot2)
library(ggrepel)

# Calculate SCA (key passes + dribbles to shots)
sca_data <- events %>%
  filter(
    (type.name == "Pass" & pass.shot_assist == TRUE) |
    (type.name == "Dribble" & dribble.outcome.name == "Complete")
  ) %>%
  mutate(is_sca = TRUE)

# Calculate GCA (passes to goals)
gca_data <- events %>%
  filter(type.name == "Pass" & pass.goal_assist == TRUE) %>%
  mutate(is_gca = TRUE)

# Combine and aggregate
player_creativity <- events %>%
  filter(!is.na(player.name)) %>%
  group_by(player.name, team.name) %>%
  summarise(matches = n_distinct(match_id), .groups = "drop") %>%
  left_join(
    sca_data %>%
      group_by(player.name) %>%
      summarise(sca = n()),
    by = "player.name"
  ) %>%
  left_join(
    gca_data %>%
      group_by(player.name) %>%
      summarise(gca = n()),
    by = "player.name"
  ) %>%
  mutate(
    sca = ifelse(is.na(sca), 0, sca),
    gca = ifelse(is.na(gca), 0, gca),
    sca_per_90 = round(sca / matches, 2),
    gca_per_90 = round(gca / matches, 2)
  ) %>%
  filter(matches >= 4, sca >= 5)

# Create scatter plot
ggplot(player_creativity, aes(x = sca_per_90, y = gca_per_90)) +
  geom_point(aes(size = matches), alpha = 0.6, color = "#1B5E20") +
  geom_text_repel(
    data = filter(player_creativity, sca_per_90 > 2 | gca_per_90 > 0.5),
    aes(label = player.name),
    size = 3, max.overlaps = 15
  ) +
  geom_smooth(method = "lm", se = FALSE, color = "#FFA000", linetype = "dashed") +
  labs(
    title = "Shot-Creating vs Goal-Creating Actions",
    subtitle = "World Cup 2022 | Min. 4 matches, 5 SCA",
    x = "SCA per 90",
    y = "GCA per 90",
    size = "Matches"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold"),
    legend.position = "bottom"
  )

ggsave("sca_vs_gca.png", width = 10, height = 8)
ex83-solution

Output

Exercise 8.3: Visualize SCA vs GCA relationship

Next: Passing Analytics

Go deeper into passing metrics - progressive passes, pass networks, centrality, and pass value models.

Continue to Passing Analytics

Capstone - Complete Analytics System

Measuring Creativity

Why Chance Creation Metrics Matter

Expected Assists (xA)

xA vs. Traditional Assists

Player xA Analysis

xA Overperformance

Shot-Creating Actions (SCA)

What Counts as an SCA?

Goal-Creating Actions (GCA)

xGChain and xGBuildup

Definitions

Why xGBuildup Matters

Creator Profiles

Types of Creators

Chapter Summary

Key Takeaways

Creativity Metrics Hierarchy

Practice Exercises

Exercise 8.1: Calculate Player xA Leaders

Exercise 8.2: Creator Profile Classification

Exercise 8.3: Shot-Creating Actions Visualization

Next: Passing Analytics

On This Page

Exercises

Chapter Info

Capstone - Complete Analytics System

Measuring Creativity

Why Chance Creation Metrics Matter

Expected Assists (xA)

xA vs. Traditional Assists

Player xA Analysis

xA Overperformance

Shot-Creating Actions (SCA)

What Counts as an SCA?

Goal-Creating Actions (GCA)

xGChain and xGBuildup

Definitions

Why xGBuildup Matters

Creator Profiles

Types of Creators

Chapter Summary

Key Takeaways

Creativity Metrics Hierarchy

Practice Exercises

Exercise 8.1: Calculate Player xA Leaders

View Solution

Exercise 8.2: Creator Profile Classification

View Solution

Exercise 8.3: Shot-Creating Actions Visualization

View Solution

Next: Passing Analytics

On This Page

Exercises

Chapter Info