Chapter 60

Capstone - Complete Analytics System

Intermediate 30 min read 5 sections 10 code examples
0 of 60 chapters completed (0%)

Measuring Creativity

Goals require assists. While xG transformed how we evaluate finishers, Expected Assists (xA) and related metrics have revolutionized how we assess creative players. This chapter explores the full spectrum of chance creation analytics.

Assists are even noisier than goals—they depend on both the pass quality AND the teammate's finishing. A perfect through ball converted into a tap-in goal earns the same "assist" as a simple square pass that a striker bangs in from 25 yards. xA separates the quality of the chance created from the quality of the finish.

Why Chance Creation Metrics Matter
  • Fair Creator Evaluation: Credit creators for pass quality, not teammate finishing
  • Identify Playmakers: Find players who create chances even without assist totals
  • Scouting: Discover undervalued creators whose teammates don't finish well
  • Tactical Analysis: Understand how teams create chances

Expected Assists (xA)

xA is the xG value of shots resulting from a player's passes. If you play a pass that leads to a 0.3 xG shot, you earn 0.3 xA—regardless of whether the shot goes in.

xA vs. Traditional Assists

Traditional Assists - Problems
  • Binary outcome (0 or 1)
  • Depends on teammate finishing
  • Simple tap-in pass = brilliant through ball
  • High variance, low predictive power
xA - Advantages
  • Continuous scale (0.00 to ~0.95)
  • Independent of finishing quality
  • Values pass quality appropriately
  • More stable, better predictor
# Calculate Expected Assists from statsbombpy import sb import pandas as pd # Load all World Cup 2022 matches matches = sb.matches(competition_id=43, season_id=106) all_events = [] for mid in matches["match_id"]: events = sb.events(match_id=mid) events["match_id"] = mid all_events.append(events) events_df = pd.concat(all_events, ignore_index=True) # Get shots with xG shots = events_df[events_df["type"] == "Shot"].copy() # Key passes are passes that led to shots # In StatsBomb, we can identify these via pass_shot_assist key_passes = events_df[ (events_df["type"] == "Pass") & ((events_df["pass_shot_assist"] == True) | (events_df["pass_goal_assist"] == True)) ].copy() print(f"Total key passes (created shots): {len(key_passes)}") print(f"Total assists (led to goals): {key_passes[\"pass_goal_assist\"].sum()}") # Calculate xA by player # For each key pass, we need to find the resulting shot's xG # This requires matching pass_id to shot's key_pass_id # Simplified: Calculate team-level xA from chances created team_xa = shots.groupby("team").agg( shots=("shot_statsbomb_xg", "count"), total_xG=("shot_statsbomb_xg", "sum"), goals=("shot_outcome", lambda x: (x == "Goal").sum()) ).reset_index() # xA is the xG of shots from assisted chances # Most shots are from key passes, so team xG ≈ sum of xA received print("\nTeam Chance Creation:") print(team_xa.sort_values("total_xG", ascending=False))
# Calculate Expected Assists
library(StatsBombR)
library(dplyr)

# Load World Cup data
comps <- FreeCompetitions() %>%
  filter(competition_id == 43, season_id == 106)
matches <- FreeMatches(comps)
events <- free_allevents(MatchesDF = matches)

# Find passes that led to shots
# In StatsBomb, shots have pass info; we need to link back

# Get shots with their xG
shots <- events %>%
  filter(type.name == "Shot") %>%
  select(id, match_id, player.name, team.name, shot.statsbomb_xg,
         shot.outcome.name, shot.key_pass_id)

# Get key passes (passes leading to shots)
key_passes <- events %>%
  filter(type.name == "Pass") %>%
  select(id, match_id, player.name, team.name,
         pass.goal_assist, pass.shot_assist,
         location.x, location.y,
         pass.end_location.x, pass.end_location.y)

# Link key passes to shots
# pass.shot_assist is TRUE for passes leading to shots
shot_assists <- key_passes %>%
  filter(pass.shot_assist == TRUE | pass.goal_assist == TRUE)

# Calculate xA by player (using shots data)
# xA = sum of xG from shots created by player passes
player_xa <- shots %>%
  filter(!is.na(shot.key_pass_id)) %>%
  # Would need to join with key_passes to get passer name
  # For simplicity, we group by the shooter's team assists
  group_by(team.name) %>%
  summarise(
    chances_created = n(),
    total_xA = sum(shot.statsbomb_xg, na.rm = TRUE),
    goals_from_chances = sum(shot.outcome.name == "Goal")
  )

print("Team Chance Creation (xA):")
print(player_xa %>% arrange(desc(total_xA)))
chapter8-xa-basics
Output
Calculating Expected Assists basics

Player xA Analysis

# Detailed player xA analysis # Identify passes that created shots key_pass_cols = ["pass_shot_assist", "pass_goal_assist", "pass_cross", "pass_through_ball", "pass_cut_back"] passes = events_df[events_df["type"] == "Pass"].copy() # Key passes only key_passes = passes[ (passes["pass_shot_assist"] == True) | (passes["pass_goal_assist"] == True) ].copy() # Player creativity stats player_creativity = key_passes.groupby(["player", "team"]).agg( matches=("match_id", "nunique"), key_passes=("type", "count"), assists=("pass_goal_assist", "sum"), crosses_to_shots=("pass_cross", lambda x: x.fillna(False).sum()), through_balls=("pass_through_ball", lambda x: x.fillna(False).sum()), cutbacks=("pass_cut_back", lambda x: x.fillna(False).sum()) ).reset_index() player_creativity = player_creativity[player_creativity["key_passes"] >= 3].copy() player_creativity["kp_per_match"] = ( player_creativity["key_passes"] / player_creativity["matches"]).round(2) player_creativity["assist_conv"] = ( player_creativity["assists"] / player_creativity["key_passes"] * 100).round(1) print("Top Chance Creators (Key Passes):") print(player_creativity.sort_values("key_passes", ascending=False).head(15))
# Detailed player xA analysis
# Using FBref-style calculation where xA = xG of shots from player passes

# Identify all pass -> shot connections
passes_to_shots <- events %>%
  filter(type.name == "Pass",
         pass.shot_assist == TRUE | pass.goal_assist == TRUE) %>%
  select(pass_id = id, passer = player.name, passer_team = team.name,
         match_id, minute,
         pass.goal_assist, pass.shot_assist,
         pass.cross, pass.through_ball, pass.cut_back)

# Match with shot xG (would need shot.key_pass_id linkage)
# For demo, we calculate key pass stats

player_creativity <- passes_to_shots %>%
  group_by(passer, passer_team) %>%
  summarise(
    matches = n_distinct(match_id),
    key_passes = n(),
    assists = sum(pass.goal_assist, na.rm = TRUE),
    crosses_to_shots = sum(pass.cross, na.rm = TRUE),
    through_balls_to_shots = sum(pass.through_ball, na.rm = TRUE),
    cutbacks_to_shots = sum(pass.cut_back, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  filter(key_passes >= 3) %>%
  mutate(
    key_passes_per_match = round(key_passes / matches, 2),
    assist_conversion = round(assists / key_passes * 100, 1)
  ) %>%
  arrange(desc(key_passes))

print("Top Chance Creators (Key Passes):")
print(head(player_creativity, 15))
chapter8-player-xa
Output
Analyzing player chance creation

xA Overperformance

Just like with xG, players can over or underperform their xA. High assists relative to xA usually regresses:

# xA vs Actual Assists analysis creator_performance = pd.DataFrame({ "player": ["De Bruyne", "Bruno Fernandes", "Odegaard", "Saka", "Maddison", "Trent AA"], "team": ["Man City", "Man United", "Arsenal", "Arsenal", "Tottenham", "Liverpool"], "assists": [16, 8, 10, 11, 12, 4], "xA": [14.2, 10.5, 8.8, 7.2, 9.1, 7.8], "key_passes": [115, 95, 78, 68, 82, 92] }) creator_performance["assists_minus_xA"] = ( creator_performance["assists"] - creator_performance["xA"]) creator_performance["xa_per_kp"] = ( creator_performance["xA"] / creator_performance["key_passes"]).round(3) creator_performance["assist_rate"] = ( creator_performance["assists"] / creator_performance["key_passes"] * 100).round(1) print("Creator Performance vs xA:") print(creator_performance.sort_values("assists_minus_xA", ascending=False)[ ["player", "assists", "xA", "assists_minus_xA", "assist_rate"]]) print("\nInterpretation:") print("- Saka: +3.8 over xA - teammates finishing well, may regress") print("- De Bruyne: +1.8 - slight overperformance, sustainable") print("- Bruno: -2.5 under xA - unlucky, should bounce back") print("- Trent: -3.8 under xA - teammates missing his chances")
# xA vs Actual Assists analysis
# When Assists >> xA, likely to regress
# When Assists << xA, bounce-back candidate

# Simulated season data (use actual xA from FBref/StatsBomb)
creator_performance <- data.frame(
  player = c("De Bruyne", "Bruno Fernandes", "Odegaard",
             "Saka", "Maddison", "Trent AA"),
  team = c("Man City", "Man United", "Arsenal",
           "Arsenal", "Tottenham", "Liverpool"),
  assists = c(16, 8, 10, 11, 12, 4),
  xA = c(14.2, 10.5, 8.8, 7.2, 9.1, 7.8),
  key_passes = c(115, 95, 78, 68, 82, 92)
)

creator_performance <- creator_performance %>%
  mutate(
    assists_minus_xA = assists - xA,
    xa_per_kp = round(xA / key_passes, 3),
    assist_rate = round(assists / key_passes * 100, 1)
  )

print("Creator Performance vs xA:")
print(creator_performance %>%
        select(player, assists, xA, assists_minus_xA, assist_rate) %>%
        arrange(desc(assists_minus_xA)))

cat("\nInterpretation:\n")
cat("- Saka: +3.8 over xA - teammates finishing well, may regress\n")
cat("- De Bruyne: +1.8 - slight overperformance, sustainable\n")
cat("- Bruno: -2.5 under xA - unlucky, should bounce back\n")
cat("- Trent: -3.8 under xA - teammates missing his chances\n")
chapter8-xa-performance
Output
Analyzing xA performance and regression candidates

Shot-Creating Actions (SCA)

Key passes only capture the final pass. But what about the dribble that beat a defender before the pass? Or the ball recovery that started the attack? Shot-Creating Actions capture more of the creative process.

What Counts as an SCA?

Shot-Creating Action (SCA): The two offensive actions directly leading to a shot.

  • Pass leading to shot (key pass)
  • Pass leading to the key pass
  • Dribble leading to shot
  • Foul won leading to shot
  • Defensive action that starts the sequence
# Calculate Shot-Creating Actions # SCA = the two offensive actions before each shot # Get passes that created shots sca_passes = events_df[ (events_df["type"] == "Pass") & (events_df["pass_shot_assist"] == True) ].copy() # Get successful dribbles (often precede chances) sca_dribbles = events_df[ (events_df["type"] == "Dribble") & (events_df["dribble_outcome"] == "Complete") ].copy() # Combine for SCA totals sca_by_player = pd.concat([ sca_passes.assign(sca_type="pass"), sca_dribbles.assign(sca_type="dribble") ]).groupby(["player", "team"]).agg( matches=("match_id", "nunique"), sca_passes=("sca_type", lambda x: (x == "pass").sum()), sca_dribbles=("sca_type", lambda x: (x == "dribble").sum()) ).reset_index() # Weight: passes count full, dribbles count 1/3 (simplified) sca_by_player["total_sca"] = ( sca_by_player["sca_passes"] + sca_by_player["sca_dribbles"] / 3) sca_by_player["sca_per_90"] = ( sca_by_player["total_sca"] / sca_by_player["matches"]).round(2) print("Shot-Creating Actions Leaders:") print(sca_by_player.sort_values("sca_per_90", ascending=False).head(15))
# Calculate Shot-Creating Actions
library(dplyr)

# For each shot, identify the two preceding offensive actions
# This requires possession sequence tracking

# Get shots and their preceding events
shots_with_context <- events %>%
  filter(type.name == "Shot") %>%
  select(shot_id = id, match_id, minute, second, team.name,
         shot.statsbomb_xg, shot.outcome.name)

# For each shot, look at the 2 prior events from same team
# (Simplified - actual SCA requires possession tracking)

# Alternative: Use pass.shot_assist and pass.goal_assist
# along with dribble info

sca_by_player <- events %>%
  filter(
    # Passes leading to shots
    (type.name == "Pass" & pass.shot_assist == TRUE) |
    # Dribbles completed before shots
    (type.name == "Dribble" & dribble.outcome.name == "Complete")
  ) %>%
  group_by(player.name, team.name) %>%
  summarise(
    matches = n_distinct(match_id),
    sca_passes = sum(type.name == "Pass" & pass.shot_assist == TRUE, na.rm = TRUE),
    sca_dribbles = sum(type.name == "Dribble", na.rm = TRUE),
    .groups = "drop"
  ) %>%
  mutate(
    total_sca = sca_passes + sca_dribbles / 3,  # Weight dribbles less
    sca_per_90 = round(total_sca / matches, 2)
  ) %>%
  arrange(desc(sca_per_90))

print("Shot-Creating Actions Leaders:")
print(head(sca_by_player, 15))
chapter8-sca
Output
Calculating Shot-Creating Actions

Goal-Creating Actions (GCA)

GCA is the same concept but for goals instead of shots—more valuable but rarer:

# Goal-Creating Actions gca_passes = events_df[ (events_df["type"] == "Pass") & (events_df["pass_goal_assist"] == True) ].copy() gca_by_player = gca_passes.groupby(["player", "team"]).agg( matches=("match_id", "nunique"), assists=("type", "count") ).reset_index() gca_by_player["gca_per_90"] = ( gca_by_player["assists"] / gca_by_player["matches"]).round(2) print("Goal-Creating Actions (Assists) Leaders:") print(gca_by_player.sort_values("assists", ascending=False).head(10))
# Goal-Creating Actions
gca_by_player <- events %>%
  filter(
    (type.name == "Pass" & pass.goal_assist == TRUE)
  ) %>%
  group_by(player.name, team.name) %>%
  summarise(
    matches = n_distinct(match_id),
    assists = n(),
    .groups = "drop"
  ) %>%
  mutate(
    gca_per_90 = round(assists / matches, 2)
  ) %>%
  arrange(desc(assists))

print("Goal-Creating Actions (Assists) Leaders:")
print(head(gca_by_player, 10))
chapter8-gca
Output
Calculating Goal-Creating Actions

xGChain and xGBuildup

StatsBomb introduced xGChain and xGBuildup to credit players involved earlier in attacking sequences.

Definitions

Metric Definition What It Measures
xGChain Total xG of all possessions a player was involved in Overall attacking involvement
xGBuildup xGChain minus xG from shots and xA from key passes Contribution to build-up play (not final actions)
Why xGBuildup Matters

Some players contribute heavily to attacks without getting shots or assists. Deep-lying playmakers, progressive ball carriers, and pivot players add value that xG and xA miss. xGBuildup captures this.

# Calculate xGChain and xGBuildup (conceptual) # Full implementation requires possession sequence tracking # Simplified: Credit players for being in possessions that end in shots # Mark possession changes events_sorted = events_df.sort_values(["match_id", "minute", "second"]).copy() events_sorted["new_poss"] = events_sorted["possession_team"] != events_sorted["possession_team"].shift(1) events_sorted["possession_id"] = events_sorted.groupby("match_id")["new_poss"].cumsum() # Find possessions with shots poss_with_shots = events_sorted.groupby(["match_id", "possession_id"]).filter( lambda x: (x["type"] == "Shot").any() ) # Get xG of each possession poss_xg = poss_with_shots.groupby(["match_id", "possession_id"]).agg( poss_xg=("shot_statsbomb_xg", "sum") ).reset_index() # Merge back to get all players involved poss_with_shots = poss_with_shots.merge(poss_xg, on=["match_id", "possession_id"]) # xGChain: total xG from possessions player touched xgchain = poss_with_shots.groupby(["player", "team"]).agg( matches=("match_id", "nunique"), possessions=("possession_id", "nunique"), xGChain=("poss_xg", "sum") ).reset_index() # Get player xG to calculate xGBuildup player_xg = events_df[events_df["type"] == "Shot"].groupby("player")[ "shot_statsbomb_xg"].sum().reset_index() player_xg.columns = ["player", "xG"] xgchain = xgchain.merge(player_xg, on="player", how="left") xgchain["xG"] = xgchain["xG"].fillna(0) xgchain["xGBuildup"] = xgchain["xGChain"] - xgchain["xG"] print("xGChain and xGBuildup Leaders:") print(xgchain.sort_values("xGBuildup", ascending=False).head(15))
# Calculate xGChain and xGBuildup
# Requires tracking possession sequences

# For each possession sequence ending in a shot:
# 1. Identify all players involved
# 2. Credit each with the shot's xG (for xGChain)
# 3. Exclude shooter and assister for xGBuildup

# Simplified example using possession tracking
possession_sequences <- events %>%
  arrange(match_id, minute, second) %>%
  group_by(match_id) %>%
  mutate(
    # New possession when team changes or ball goes out
    new_possession = team.name != lag(team.name, default = ""),
    possession_id = cumsum(new_possession)
  ) %>%
  ungroup()

# Find possessions ending in shots
shot_possessions <- possession_sequences %>%
  group_by(match_id, possession_id) %>%
  filter(any(type.name == "Shot")) %>%
  mutate(
    possession_xg = sum(shot.statsbomb_xg[type.name == "Shot"], na.rm = TRUE)
  ) %>%
  ungroup()

# Credit all players in possession with xGChain
xgchain_by_player <- shot_possessions %>%
  filter(!is.na(player.name)) %>%
  group_by(player.name, team.name) %>%
  summarise(
    matches = n_distinct(match_id),
    possessions_involved = n_distinct(paste(match_id, possession_id)),
    xGChain = sum(possession_xg, na.rm = TRUE) / n(),  # Avg xG per action
    .groups = "drop"
  )

# Subtract xG and xA for xGBuildup
player_xg <- events %>%
  filter(type.name == "Shot") %>%
  group_by(player.name) %>%
  summarise(xG = sum(shot.statsbomb_xg, na.rm = TRUE))

# Join and calculate
xgchain_final <- xgchain_by_player %>%
  left_join(player_xg, by = "player.name") %>%
  mutate(
    xG = ifelse(is.na(xG), 0, xG),
    xGBuildup = xGChain - xG  # Would also subtract xA
  ) %>%
  arrange(desc(xGBuildup))

print("xGChain and xGBuildup Leaders:")
print(head(xgchain_final, 15))
chapter8-xgchain
Output
Calculating xGChain and xGBuildup

Creator Profiles

Different creators have different styles. Understanding these profiles helps with scouting and tactical analysis.

Types of Creators

Through Ball Specialist

Example: Kevin De Bruyne

  • High through ball %
  • Creates 1v1 chances
  • Requires fast runners
  • High xA per key pass
Crossing Specialist

Example: Trent Alexander-Arnold

  • High cross volume
  • Creates headed chances
  • Requires aerial threat
  • Lower xA per key pass
Cutback Specialist

Example: Bukayo Saka

  • Gets to byline
  • Pulls ball back across goal
  • High quality chances
  • Requires late runners
# Analyze creator profiles by pass type key_passes = events_df[ (events_df["type"] == "Pass") & ((events_df["pass_shot_assist"] == True) | (events_df["pass_goal_assist"] == True)) ].copy() creator_profiles = key_passes.groupby(["player", "team"]).agg( key_passes=("type", "count"), through_balls=("pass_through_ball", lambda x: x.fillna(False).sum()), crosses=("pass_cross", lambda x: x.fillna(False).sum()), cutbacks=("pass_cut_back", lambda x: x.fillna(False).sum()) ).reset_index() creator_profiles = creator_profiles[creator_profiles["key_passes"] >= 5].copy() # Calculate percentages creator_profiles["through_ball_pct"] = ( creator_profiles["through_balls"] / creator_profiles["key_passes"] * 100).round(1) creator_profiles["cross_pct"] = ( creator_profiles["crosses"] / creator_profiles["key_passes"] * 100).round(1) creator_profiles["cutback_pct"] = ( creator_profiles["cutbacks"] / creator_profiles["key_passes"] * 100).round(1) # Classify style def classify_style(row): if row["through_ball_pct"] > 30: return "Through Ball" elif row["cross_pct"] > 50: return "Crosser" elif row["cutback_pct"] > 20: return "Cutback" return "Ground Pass" creator_profiles["style"] = creator_profiles.apply(classify_style, axis=1) print("Creator Profiles:") print(creator_profiles.sort_values("key_passes", ascending=False)[ ["player", "key_passes", "through_ball_pct", "cross_pct", "cutback_pct", "style"] ].head(15))
# Analyze creator profiles by pass type
creator_profiles <- events %>%
  filter(type.name == "Pass",
         pass.shot_assist == TRUE | pass.goal_assist == TRUE) %>%
  group_by(player.name, team.name) %>%
  summarise(
    key_passes = n(),
    through_balls = sum(pass.through_ball, na.rm = TRUE),
    crosses = sum(pass.cross, na.rm = TRUE),
    cutbacks = sum(pass.cut_back, na.rm = TRUE),
    ground_passes = sum(!pass.cross & !pass.through_ball & !pass.cut_back,
                        na.rm = TRUE),
    .groups = "drop"
  ) %>%
  filter(key_passes >= 5) %>%
  mutate(
    through_ball_pct = round(through_balls / key_passes * 100, 1),
    cross_pct = round(crosses / key_passes * 100, 1),
    cutback_pct = round(cutbacks / key_passes * 100, 1),

    # Classify style
    style = case_when(
      through_ball_pct > 30 ~ "Through Ball",
      cross_pct > 50 ~ "Crosser",
      cutback_pct > 20 ~ "Cutback",
      TRUE ~ "Ground Pass"
    )
  )

print("Creator Profiles:")
print(creator_profiles %>%
        select(player.name, key_passes, through_ball_pct,
               cross_pct, cutback_pct, style) %>%
        arrange(desc(key_passes)) %>%
        head(15))
chapter8-creator-profiles
Output
Analyzing creator profiles and styles

Chapter Summary

Key Takeaways
  • xA = xG of shots from player's passes - measures pass quality, not teammate finishing
  • Key Passes - passes directly leading to shots
  • Shot-Creating Actions (SCA) - the two offensive actions before each shot
  • Goal-Creating Actions (GCA) - the two offensive actions before each goal
  • xGChain - total xG from possessions a player was involved in
  • xGBuildup - xGChain minus xG and xA (build-up contribution)
  • Creator profiles - through ball vs. cross vs. cutback specialists

Creativity Metrics Hierarchy

Metric Scope Best For
xA Final pass only Evaluating assist providers
Key Passes Passes to shots Chance creation volume
SCA Last 2 actions before shot Overall offensive contribution
xGBuildup Entire possession Evaluating deep playmakers

Practice Exercises

Apply what you've learned about chance creation metrics with these hands-on exercises.

Exercise 8.1: Calculate Player xA Leaders

Task: Find the top 10 players by Expected Assists (xA) in a tournament. Calculate both total xA and xA per 90 minutes. Identify which players are overperforming or underperforming their xA.

Steps:

  1. Load StatsBomb free data for World Cup 2022
  2. Find all key passes (passes that led to shots)
  3. Match key passes to the resulting shot's xG value
  4. Sum xG values by passer to get xA
  5. Compare actual assists vs xA to find over/underperformers

# Exercise 8.1 Solution: xA Leaders Analysis from statsbombpy import sb import pandas as pd # Load World Cup 2022 data matches = sb.matches(competition_id=43, season_id=106) all_events = pd.concat([ sb.events(mid).assign(match_id=mid) for mid in matches["match_id"] ]) # Get shots with xG shots = all_events[all_events["type"] == "Shot"][ ["id", "match_id", "player", "shot_statsbomb_xg", "shot_outcome", "shot_key_pass_id"] ].copy() shots.columns = ["shot_id", "match_id", "shooter", "xG", "outcome", "key_pass_id"] # Get key passes (passes leading to shots) key_passes = all_events[ (all_events["type"] == "Pass") & ((all_events["pass_shot_assist"] == True) | (all_events["pass_goal_assist"] == True)) ][["id", "match_id", "player", "team", "pass_goal_assist"]].copy() key_passes.columns = ["pass_id", "match_id", "passer", "team", "is_assist"] # Join to calculate xA merged = key_passes.merge( shots[["key_pass_id", "xG"]], left_on="pass_id", right_on="key_pass_id", how="left" ) # Aggregate by player player_xa = merged.groupby(["passer", "team"]).agg( matches=("match_id", "nunique"), key_passes=("pass_id", "count"), assists=("is_assist", "sum"), total_xA=("xG", "sum") ).reset_index() player_xa["xA_per_90"] = (player_xa["total_xA"] / player_xa["matches"]).round(2) player_xa["assists_minus_xA"] = (player_xa["assists"] - player_xa["total_xA"]).round(2) player_xa["performance"] = player_xa["assists_minus_xA"].apply( lambda x: "Overperforming" if x > 1 else ("Underperforming" if x < -1 else "As Expected") ) player_xa = player_xa[player_xa["key_passes"] >= 3] print("Top 10 Players by xA:") print(player_xa.sort_values("total_xA", ascending=False).head(10)) print("\nOver/Underperformers:") print(player_xa[player_xa["performance"] != "As Expected"][ ["passer", "assists", "total_xA", "assists_minus_xA", "performance"]])
# Exercise 8.1 Solution: xA Leaders Analysis
library(StatsBombR)
library(dplyr)

# Load World Cup 2022 data
comps <- FreeCompetitions() %>%
  filter(competition_id == 43, season_id == 106)
matches <- FreeMatches(comps)
events <- free_allevents(MatchesDF = matches)

# Get shots with their xG and key pass info
shots <- events %>%
  filter(type.name == "Shot") %>%
  select(shot_id = id, match_id, shooter = player.name,
         xG = shot.statsbomb_xg, outcome = shot.outcome.name,
         key_pass_id = shot.key_pass_id)

# Get key passes
key_passes <- events %>%
  filter(type.name == "Pass",
         pass.shot_assist == TRUE | pass.goal_assist == TRUE) %>%
  select(pass_id = id, match_id, passer = player.name,
         team = team.name, is_assist = pass.goal_assist)

# Join to get passer xA (xG of resulting shots)
player_xa <- key_passes %>%
  left_join(shots, by = c("pass_id" = "key_pass_id", "match_id")) %>%
  group_by(passer, team) %>%
  summarise(
    matches = n_distinct(match_id),
    key_passes = n(),
    assists = sum(is_assist, na.rm = TRUE),
    total_xA = sum(xG, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  mutate(
    xA_per_90 = round(total_xA / matches, 2),
    assists_minus_xA = round(assists - total_xA, 2),
    performance = case_when(
      assists_minus_xA > 1 ~ "Overperforming",
      assists_minus_xA < -1 ~ "Underperforming",
      TRUE ~ "As Expected"
    )
  ) %>%
  filter(key_passes >= 3) %>%
  arrange(desc(total_xA))

print("Top 10 Players by xA:")
print(head(player_xa, 10))

print("\nOver/Underperformers:")
print(player_xa %>%
        filter(performance != "As Expected") %>%
        select(passer, assists, total_xA, assists_minus_xA, performance))
ex81-solution
Output
Exercise 8.1: Calculate player xA and find over/underperformers
Exercise 8.2: Creator Profile Classification

Task: Classify creative players by their passing style. Determine whether each player is primarily a through ball specialist, crosser, cutback specialist, or ground passer based on their key pass types.

Requirements:

  • Through Ball Specialist: >25% through balls
  • Crosser: >40% crosses
  • Cutback Specialist: >15% cutbacks
  • Ground Passer: Default category

# Exercise 8.2 Solution: Creator Profile Classification from statsbombpy import sb import pandas as pd # Get key passes with type info key_passes = all_events[ (all_events["type"] == "Pass") & ((all_events["pass_shot_assist"] == True) | (all_events["pass_goal_assist"] == True)) ][["player", "team", "match_id", "pass_through_ball", "pass_cross", "pass_cut_back"]].copy() # Aggregate by player profiles = key_passes.groupby(["player", "team"]).agg( matches=("match_id", "nunique"), total_kp=("player", "count"), through_balls=("pass_through_ball", lambda x: x.fillna(False).sum()), crosses=("pass_cross", lambda x: x.fillna(False).sum()), cutbacks=("pass_cut_back", lambda x: x.fillna(False).sum()) ).reset_index() profiles = profiles[profiles["total_kp"] >= 5].copy() # Calculate percentages profiles["through_pct"] = (profiles["through_balls"] / profiles["total_kp"] * 100).round(1) profiles["cross_pct"] = (profiles["crosses"] / profiles["total_kp"] * 100).round(1) profiles["cutback_pct"] = (profiles["cutbacks"] / profiles["total_kp"] * 100).round(1) # Classify def classify_profile(row): if row["through_pct"] > 25: return "Through Ball Specialist" elif row["cross_pct"] > 40: return "Crosser" elif row["cutback_pct"] > 15: return "Cutback Specialist" return "Ground Passer" profiles["profile"] = profiles.apply(classify_profile, axis=1) print("Creator Profile Distribution:") print(profiles["profile"].value_counts()) print("\nTop Players by Profile:") for profile in profiles["profile"].unique(): subset = profiles[profiles["profile"] == profile].nlargest(3, "total_kp") print(f"\n{profile}:") print(subset[["player", "total_kp", "through_pct", "cross_pct", "cutback_pct"]])
# Exercise 8.2 Solution: Creator Profile Classification
library(StatsBombR)
library(dplyr)

# Get key passes with pass type info
key_passes <- events %>%
  filter(type.name == "Pass",
         pass.shot_assist == TRUE | pass.goal_assist == TRUE) %>%
  select(player.name, team.name, match_id,
         through_ball = pass.through_ball,
         cross = pass.cross,
         cutback = pass.cut_back)

# Calculate pass type percentages
creator_profiles <- key_passes %>%
  group_by(player.name, team.name) %>%
  summarise(
    matches = n_distinct(match_id),
    total_key_passes = n(),
    through_balls = sum(through_ball == TRUE, na.rm = TRUE),
    crosses = sum(cross == TRUE, na.rm = TRUE),
    cutbacks = sum(cutback == TRUE, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  filter(total_key_passes >= 5) %>%
  mutate(
    through_pct = round(through_balls / total_key_passes * 100, 1),
    cross_pct = round(crosses / total_key_passes * 100, 1),
    cutback_pct = round(cutbacks / total_key_passes * 100, 1),
    ground_pct = round(100 - through_pct - cross_pct - cutback_pct, 1),

    # Classify profile
    profile = case_when(
      through_pct > 25 ~ "Through Ball Specialist",
      cross_pct > 40 ~ "Crosser",
      cutback_pct > 15 ~ "Cutback Specialist",
      TRUE ~ "Ground Passer"
    )
  )

# Summary by profile type
print("Creator Profile Distribution:")
print(table(creator_profiles$profile))

print("\nTop Players by Profile:")
creator_profiles %>%
  group_by(profile) %>%
  slice_max(total_key_passes, n = 3) %>%
  select(player.name, profile, total_key_passes, through_pct, cross_pct, cutback_pct) %>%
  print()
ex82-solution
Output
Exercise 8.2: Classify creative players by passing style
Exercise 8.3: Shot-Creating Actions Visualization

Task: Create a visualization comparing SCA and GCA for the top creative players. Build a scatter plot with SCA per 90 on one axis and GCA per 90 on the other.

Bonus: Add player labels and size points by total minutes played.

# Exercise 8.3 Solution: SCA vs GCA Visualization import matplotlib.pyplot as plt import pandas as pd # Calculate SCA sca = all_events[ ((all_events["type"] == "Pass") & (all_events["pass_shot_assist"] == True)) | ((all_events["type"] == "Dribble") & (all_events["dribble_outcome"] == "Complete")) ].groupby("player").size().reset_index(name="sca") # Calculate GCA gca = all_events[ (all_events["type"] == "Pass") & (all_events["pass_goal_assist"] == True) ].groupby("player").size().reset_index(name="gca") # Get matches per player matches_played = all_events.groupby("player")["match_id"].nunique().reset_index(name="matches") # Merge creativity = matches_played.merge(sca, on="player", how="left") creativity = creativity.merge(gca, on="player", how="left") creativity = creativity.fillna(0) creativity["sca_per_90"] = (creativity["sca"] / creativity["matches"]).round(2) creativity["gca_per_90"] = (creativity["gca"] / creativity["matches"]).round(2) # Filter creativity = creativity[(creativity["matches"] >= 4) & (creativity["sca"] >= 5)] # Create plot fig, ax = plt.subplots(figsize=(10, 8)) scatter = ax.scatter( creativity["sca_per_90"], creativity["gca_per_90"], s=creativity["matches"] * 30, alpha=0.6, c="#1B5E20" ) # Add labels for top creators top_creators = creativity[(creativity["sca_per_90"] > 2) | (creativity["gca_per_90"] > 0.5)] for _, row in top_creators.iterrows(): ax.annotate( row["player"].split()[-1], # Last name only (row["sca_per_90"], row["gca_per_90"]), fontsize=8, alpha=0.8 ) ax.set_xlabel("SCA per 90", fontsize=12) ax.set_ylabel("GCA per 90", fontsize=12) ax.set_title("Shot-Creating vs Goal-Creating Actions\nWorld Cup 2022", fontsize=14, fontweight="bold") plt.tight_layout() plt.savefig("sca_vs_gca.png", dpi=150) plt.show()
# Exercise 8.3 Solution: SCA vs GCA Visualization
library(StatsBombR)
library(dplyr)
library(ggplot2)
library(ggrepel)

# Calculate SCA (key passes + dribbles to shots)
sca_data <- events %>%
  filter(
    (type.name == "Pass" & pass.shot_assist == TRUE) |
    (type.name == "Dribble" & dribble.outcome.name == "Complete")
  ) %>%
  mutate(is_sca = TRUE)

# Calculate GCA (passes to goals)
gca_data <- events %>%
  filter(type.name == "Pass" & pass.goal_assist == TRUE) %>%
  mutate(is_gca = TRUE)

# Combine and aggregate
player_creativity <- events %>%
  filter(!is.na(player.name)) %>%
  group_by(player.name, team.name) %>%
  summarise(matches = n_distinct(match_id), .groups = "drop") %>%
  left_join(
    sca_data %>%
      group_by(player.name) %>%
      summarise(sca = n()),
    by = "player.name"
  ) %>%
  left_join(
    gca_data %>%
      group_by(player.name) %>%
      summarise(gca = n()),
    by = "player.name"
  ) %>%
  mutate(
    sca = ifelse(is.na(sca), 0, sca),
    gca = ifelse(is.na(gca), 0, gca),
    sca_per_90 = round(sca / matches, 2),
    gca_per_90 = round(gca / matches, 2)
  ) %>%
  filter(matches >= 4, sca >= 5)

# Create scatter plot
ggplot(player_creativity, aes(x = sca_per_90, y = gca_per_90)) +
  geom_point(aes(size = matches), alpha = 0.6, color = "#1B5E20") +
  geom_text_repel(
    data = filter(player_creativity, sca_per_90 > 2 | gca_per_90 > 0.5),
    aes(label = player.name),
    size = 3, max.overlaps = 15
  ) +
  geom_smooth(method = "lm", se = FALSE, color = "#FFA000", linetype = "dashed") +
  labs(
    title = "Shot-Creating vs Goal-Creating Actions",
    subtitle = "World Cup 2022 | Min. 4 matches, 5 SCA",
    x = "SCA per 90",
    y = "GCA per 90",
    size = "Matches"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold"),
    legend.position = "bottom"
  )

ggsave("sca_vs_gca.png", width = 10, height = 8)
ex83-solution
Output
Exercise 8.3: Visualize SCA vs GCA relationship

Next: Passing Analytics

Go deeper into passing metrics - progressive passes, pass networks, centrality, and pass value models.

Continue to Passing Analytics