Capstone - Complete Analytics System
Measuring Creativity
Goals require assists. While xG transformed how we evaluate finishers, Expected Assists (xA) and related metrics have revolutionized how we assess creative players. This chapter explores the full spectrum of chance creation analytics.
Assists are even noisier than goals—they depend on both the pass quality AND the teammate's finishing. A perfect through ball converted into a tap-in goal earns the same "assist" as a simple square pass that a striker bangs in from 25 yards. xA separates the quality of the chance created from the quality of the finish.
Why Chance Creation Metrics Matter
- Fair Creator Evaluation: Credit creators for pass quality, not teammate finishing
- Identify Playmakers: Find players who create chances even without assist totals
- Scouting: Discover undervalued creators whose teammates don't finish well
- Tactical Analysis: Understand how teams create chances
Expected Assists (xA)
xA is the xG value of shots resulting from a player's passes. If you play a pass that leads to a 0.3 xG shot, you earn 0.3 xA—regardless of whether the shot goes in.
xA vs. Traditional Assists
- Binary outcome (0 or 1)
- Depends on teammate finishing
- Simple tap-in pass = brilliant through ball
- High variance, low predictive power
- Continuous scale (0.00 to ~0.95)
- Independent of finishing quality
- Values pass quality appropriately
- More stable, better predictor
# Calculate Expected Assists
library(StatsBombR)
library(dplyr)
# Load World Cup data
comps <- FreeCompetitions() %>%
filter(competition_id == 43, season_id == 106)
matches <- FreeMatches(comps)
events <- free_allevents(MatchesDF = matches)
# Find passes that led to shots
# In StatsBomb, shots have pass info; we need to link back
# Get shots with their xG
shots <- events %>%
filter(type.name == "Shot") %>%
select(id, match_id, player.name, team.name, shot.statsbomb_xg,
shot.outcome.name, shot.key_pass_id)
# Get key passes (passes leading to shots)
key_passes <- events %>%
filter(type.name == "Pass") %>%
select(id, match_id, player.name, team.name,
pass.goal_assist, pass.shot_assist,
location.x, location.y,
pass.end_location.x, pass.end_location.y)
# Link key passes to shots
# pass.shot_assist is TRUE for passes leading to shots
shot_assists <- key_passes %>%
filter(pass.shot_assist == TRUE | pass.goal_assist == TRUE)
# Calculate xA by player (using shots data)
# xA = sum of xG from shots created by player passes
player_xa <- shots %>%
filter(!is.na(shot.key_pass_id)) %>%
# Would need to join with key_passes to get passer name
# For simplicity, we group by the shooter's team assists
group_by(team.name) %>%
summarise(
chances_created = n(),
total_xA = sum(shot.statsbomb_xg, na.rm = TRUE),
goals_from_chances = sum(shot.outcome.name == "Goal")
)
print("Team Chance Creation (xA):")
print(player_xa %>% arrange(desc(total_xA)))chapter8-xa-basicsCalculating Expected Assists basicsPlayer xA Analysis
# Detailed player xA analysis
# Using FBref-style calculation where xA = xG of shots from player passes
# Identify all pass -> shot connections
passes_to_shots <- events %>%
filter(type.name == "Pass",
pass.shot_assist == TRUE | pass.goal_assist == TRUE) %>%
select(pass_id = id, passer = player.name, passer_team = team.name,
match_id, minute,
pass.goal_assist, pass.shot_assist,
pass.cross, pass.through_ball, pass.cut_back)
# Match with shot xG (would need shot.key_pass_id linkage)
# For demo, we calculate key pass stats
player_creativity <- passes_to_shots %>%
group_by(passer, passer_team) %>%
summarise(
matches = n_distinct(match_id),
key_passes = n(),
assists = sum(pass.goal_assist, na.rm = TRUE),
crosses_to_shots = sum(pass.cross, na.rm = TRUE),
through_balls_to_shots = sum(pass.through_ball, na.rm = TRUE),
cutbacks_to_shots = sum(pass.cut_back, na.rm = TRUE),
.groups = "drop"
) %>%
filter(key_passes >= 3) %>%
mutate(
key_passes_per_match = round(key_passes / matches, 2),
assist_conversion = round(assists / key_passes * 100, 1)
) %>%
arrange(desc(key_passes))
print("Top Chance Creators (Key Passes):")
print(head(player_creativity, 15))chapter8-player-xaAnalyzing player chance creationxA Overperformance
Just like with xG, players can over or underperform their xA. High assists relative to xA usually regresses:
# xA vs Actual Assists analysis
# When Assists >> xA, likely to regress
# When Assists << xA, bounce-back candidate
# Simulated season data (use actual xA from FBref/StatsBomb)
creator_performance <- data.frame(
player = c("De Bruyne", "Bruno Fernandes", "Odegaard",
"Saka", "Maddison", "Trent AA"),
team = c("Man City", "Man United", "Arsenal",
"Arsenal", "Tottenham", "Liverpool"),
assists = c(16, 8, 10, 11, 12, 4),
xA = c(14.2, 10.5, 8.8, 7.2, 9.1, 7.8),
key_passes = c(115, 95, 78, 68, 82, 92)
)
creator_performance <- creator_performance %>%
mutate(
assists_minus_xA = assists - xA,
xa_per_kp = round(xA / key_passes, 3),
assist_rate = round(assists / key_passes * 100, 1)
)
print("Creator Performance vs xA:")
print(creator_performance %>%
select(player, assists, xA, assists_minus_xA, assist_rate) %>%
arrange(desc(assists_minus_xA)))
cat("\nInterpretation:\n")
cat("- Saka: +3.8 over xA - teammates finishing well, may regress\n")
cat("- De Bruyne: +1.8 - slight overperformance, sustainable\n")
cat("- Bruno: -2.5 under xA - unlucky, should bounce back\n")
cat("- Trent: -3.8 under xA - teammates missing his chances\n")chapter8-xa-performanceAnalyzing xA performance and regression candidatesShot-Creating Actions (SCA)
Key passes only capture the final pass. But what about the dribble that beat a defender before the pass? Or the ball recovery that started the attack? Shot-Creating Actions capture more of the creative process.
What Counts as an SCA?
Shot-Creating Action (SCA): The two offensive actions directly leading to a shot.
- Pass leading to shot (key pass)
- Pass leading to the key pass
- Dribble leading to shot
- Foul won leading to shot
- Defensive action that starts the sequence
# Calculate Shot-Creating Actions
library(dplyr)
# For each shot, identify the two preceding offensive actions
# This requires possession sequence tracking
# Get shots and their preceding events
shots_with_context <- events %>%
filter(type.name == "Shot") %>%
select(shot_id = id, match_id, minute, second, team.name,
shot.statsbomb_xg, shot.outcome.name)
# For each shot, look at the 2 prior events from same team
# (Simplified - actual SCA requires possession tracking)
# Alternative: Use pass.shot_assist and pass.goal_assist
# along with dribble info
sca_by_player <- events %>%
filter(
# Passes leading to shots
(type.name == "Pass" & pass.shot_assist == TRUE) |
# Dribbles completed before shots
(type.name == "Dribble" & dribble.outcome.name == "Complete")
) %>%
group_by(player.name, team.name) %>%
summarise(
matches = n_distinct(match_id),
sca_passes = sum(type.name == "Pass" & pass.shot_assist == TRUE, na.rm = TRUE),
sca_dribbles = sum(type.name == "Dribble", na.rm = TRUE),
.groups = "drop"
) %>%
mutate(
total_sca = sca_passes + sca_dribbles / 3, # Weight dribbles less
sca_per_90 = round(total_sca / matches, 2)
) %>%
arrange(desc(sca_per_90))
print("Shot-Creating Actions Leaders:")
print(head(sca_by_player, 15))chapter8-scaCalculating Shot-Creating ActionsGoal-Creating Actions (GCA)
GCA is the same concept but for goals instead of shots—more valuable but rarer:
# Goal-Creating Actions
gca_by_player <- events %>%
filter(
(type.name == "Pass" & pass.goal_assist == TRUE)
) %>%
group_by(player.name, team.name) %>%
summarise(
matches = n_distinct(match_id),
assists = n(),
.groups = "drop"
) %>%
mutate(
gca_per_90 = round(assists / matches, 2)
) %>%
arrange(desc(assists))
print("Goal-Creating Actions (Assists) Leaders:")
print(head(gca_by_player, 10))chapter8-gcaCalculating Goal-Creating ActionsxGChain and xGBuildup
StatsBomb introduced xGChain and xGBuildup to credit players involved earlier in attacking sequences.
Definitions
| Metric | Definition | What It Measures |
|---|---|---|
| xGChain | Total xG of all possessions a player was involved in | Overall attacking involvement |
| xGBuildup | xGChain minus xG from shots and xA from key passes | Contribution to build-up play (not final actions) |
Why xGBuildup Matters
Some players contribute heavily to attacks without getting shots or assists. Deep-lying playmakers, progressive ball carriers, and pivot players add value that xG and xA miss. xGBuildup captures this.
# Calculate xGChain and xGBuildup
# Requires tracking possession sequences
# For each possession sequence ending in a shot:
# 1. Identify all players involved
# 2. Credit each with the shot's xG (for xGChain)
# 3. Exclude shooter and assister for xGBuildup
# Simplified example using possession tracking
possession_sequences <- events %>%
arrange(match_id, minute, second) %>%
group_by(match_id) %>%
mutate(
# New possession when team changes or ball goes out
new_possession = team.name != lag(team.name, default = ""),
possession_id = cumsum(new_possession)
) %>%
ungroup()
# Find possessions ending in shots
shot_possessions <- possession_sequences %>%
group_by(match_id, possession_id) %>%
filter(any(type.name == "Shot")) %>%
mutate(
possession_xg = sum(shot.statsbomb_xg[type.name == "Shot"], na.rm = TRUE)
) %>%
ungroup()
# Credit all players in possession with xGChain
xgchain_by_player <- shot_possessions %>%
filter(!is.na(player.name)) %>%
group_by(player.name, team.name) %>%
summarise(
matches = n_distinct(match_id),
possessions_involved = n_distinct(paste(match_id, possession_id)),
xGChain = sum(possession_xg, na.rm = TRUE) / n(), # Avg xG per action
.groups = "drop"
)
# Subtract xG and xA for xGBuildup
player_xg <- events %>%
filter(type.name == "Shot") %>%
group_by(player.name) %>%
summarise(xG = sum(shot.statsbomb_xg, na.rm = TRUE))
# Join and calculate
xgchain_final <- xgchain_by_player %>%
left_join(player_xg, by = "player.name") %>%
mutate(
xG = ifelse(is.na(xG), 0, xG),
xGBuildup = xGChain - xG # Would also subtract xA
) %>%
arrange(desc(xGBuildup))
print("xGChain and xGBuildup Leaders:")
print(head(xgchain_final, 15))chapter8-xgchainCalculating xGChain and xGBuildupCreator Profiles
Different creators have different styles. Understanding these profiles helps with scouting and tactical analysis.
Types of Creators
Example: Kevin De Bruyne
- High through ball %
- Creates 1v1 chances
- Requires fast runners
- High xA per key pass
Example: Trent Alexander-Arnold
- High cross volume
- Creates headed chances
- Requires aerial threat
- Lower xA per key pass
Example: Bukayo Saka
- Gets to byline
- Pulls ball back across goal
- High quality chances
- Requires late runners
# Analyze creator profiles by pass type
creator_profiles <- events %>%
filter(type.name == "Pass",
pass.shot_assist == TRUE | pass.goal_assist == TRUE) %>%
group_by(player.name, team.name) %>%
summarise(
key_passes = n(),
through_balls = sum(pass.through_ball, na.rm = TRUE),
crosses = sum(pass.cross, na.rm = TRUE),
cutbacks = sum(pass.cut_back, na.rm = TRUE),
ground_passes = sum(!pass.cross & !pass.through_ball & !pass.cut_back,
na.rm = TRUE),
.groups = "drop"
) %>%
filter(key_passes >= 5) %>%
mutate(
through_ball_pct = round(through_balls / key_passes * 100, 1),
cross_pct = round(crosses / key_passes * 100, 1),
cutback_pct = round(cutbacks / key_passes * 100, 1),
# Classify style
style = case_when(
through_ball_pct > 30 ~ "Through Ball",
cross_pct > 50 ~ "Crosser",
cutback_pct > 20 ~ "Cutback",
TRUE ~ "Ground Pass"
)
)
print("Creator Profiles:")
print(creator_profiles %>%
select(player.name, key_passes, through_ball_pct,
cross_pct, cutback_pct, style) %>%
arrange(desc(key_passes)) %>%
head(15))chapter8-creator-profilesAnalyzing creator profiles and stylesChapter Summary
Key Takeaways
- xA = xG of shots from player's passes - measures pass quality, not teammate finishing
- Key Passes - passes directly leading to shots
- Shot-Creating Actions (SCA) - the two offensive actions before each shot
- Goal-Creating Actions (GCA) - the two offensive actions before each goal
- xGChain - total xG from possessions a player was involved in
- xGBuildup - xGChain minus xG and xA (build-up contribution)
- Creator profiles - through ball vs. cross vs. cutback specialists
Creativity Metrics Hierarchy
| Metric | Scope | Best For |
|---|---|---|
| xA | Final pass only | Evaluating assist providers |
| Key Passes | Passes to shots | Chance creation volume |
| SCA | Last 2 actions before shot | Overall offensive contribution |
| xGBuildup | Entire possession | Evaluating deep playmakers |
Practice Exercises
Apply what you've learned about chance creation metrics with these hands-on exercises.
Exercise 8.1: Calculate Player xA Leaders
Task: Find the top 10 players by Expected Assists (xA) in a tournament. Calculate both total xA and xA per 90 minutes. Identify which players are overperforming or underperforming their xA.
Steps:
- Load StatsBomb free data for World Cup 2022
- Find all key passes (passes that led to shots)
- Match key passes to the resulting shot's xG value
- Sum xG values by passer to get xA
- Compare actual assists vs xA to find over/underperformers
# Exercise 8.1 Solution: xA Leaders Analysis
library(StatsBombR)
library(dplyr)
# Load World Cup 2022 data
comps <- FreeCompetitions() %>%
filter(competition_id == 43, season_id == 106)
matches <- FreeMatches(comps)
events <- free_allevents(MatchesDF = matches)
# Get shots with their xG and key pass info
shots <- events %>%
filter(type.name == "Shot") %>%
select(shot_id = id, match_id, shooter = player.name,
xG = shot.statsbomb_xg, outcome = shot.outcome.name,
key_pass_id = shot.key_pass_id)
# Get key passes
key_passes <- events %>%
filter(type.name == "Pass",
pass.shot_assist == TRUE | pass.goal_assist == TRUE) %>%
select(pass_id = id, match_id, passer = player.name,
team = team.name, is_assist = pass.goal_assist)
# Join to get passer xA (xG of resulting shots)
player_xa <- key_passes %>%
left_join(shots, by = c("pass_id" = "key_pass_id", "match_id")) %>%
group_by(passer, team) %>%
summarise(
matches = n_distinct(match_id),
key_passes = n(),
assists = sum(is_assist, na.rm = TRUE),
total_xA = sum(xG, na.rm = TRUE),
.groups = "drop"
) %>%
mutate(
xA_per_90 = round(total_xA / matches, 2),
assists_minus_xA = round(assists - total_xA, 2),
performance = case_when(
assists_minus_xA > 1 ~ "Overperforming",
assists_minus_xA < -1 ~ "Underperforming",
TRUE ~ "As Expected"
)
) %>%
filter(key_passes >= 3) %>%
arrange(desc(total_xA))
print("Top 10 Players by xA:")
print(head(player_xa, 10))
print("\nOver/Underperformers:")
print(player_xa %>%
filter(performance != "As Expected") %>%
select(passer, assists, total_xA, assists_minus_xA, performance))ex81-solutionExercise 8.1: Calculate player xA and find over/underperformersExercise 8.2: Creator Profile Classification
Task: Classify creative players by their passing style. Determine whether each player is primarily a through ball specialist, crosser, cutback specialist, or ground passer based on their key pass types.
Requirements:
- Through Ball Specialist: >25% through balls
- Crosser: >40% crosses
- Cutback Specialist: >15% cutbacks
- Ground Passer: Default category
# Exercise 8.2 Solution: Creator Profile Classification
library(StatsBombR)
library(dplyr)
# Get key passes with pass type info
key_passes <- events %>%
filter(type.name == "Pass",
pass.shot_assist == TRUE | pass.goal_assist == TRUE) %>%
select(player.name, team.name, match_id,
through_ball = pass.through_ball,
cross = pass.cross,
cutback = pass.cut_back)
# Calculate pass type percentages
creator_profiles <- key_passes %>%
group_by(player.name, team.name) %>%
summarise(
matches = n_distinct(match_id),
total_key_passes = n(),
through_balls = sum(through_ball == TRUE, na.rm = TRUE),
crosses = sum(cross == TRUE, na.rm = TRUE),
cutbacks = sum(cutback == TRUE, na.rm = TRUE),
.groups = "drop"
) %>%
filter(total_key_passes >= 5) %>%
mutate(
through_pct = round(through_balls / total_key_passes * 100, 1),
cross_pct = round(crosses / total_key_passes * 100, 1),
cutback_pct = round(cutbacks / total_key_passes * 100, 1),
ground_pct = round(100 - through_pct - cross_pct - cutback_pct, 1),
# Classify profile
profile = case_when(
through_pct > 25 ~ "Through Ball Specialist",
cross_pct > 40 ~ "Crosser",
cutback_pct > 15 ~ "Cutback Specialist",
TRUE ~ "Ground Passer"
)
)
# Summary by profile type
print("Creator Profile Distribution:")
print(table(creator_profiles$profile))
print("\nTop Players by Profile:")
creator_profiles %>%
group_by(profile) %>%
slice_max(total_key_passes, n = 3) %>%
select(player.name, profile, total_key_passes, through_pct, cross_pct, cutback_pct) %>%
print()ex82-solutionExercise 8.2: Classify creative players by passing styleExercise 8.3: Shot-Creating Actions Visualization
Task: Create a visualization comparing SCA and GCA for the top creative players. Build a scatter plot with SCA per 90 on one axis and GCA per 90 on the other.
Bonus: Add player labels and size points by total minutes played.
# Exercise 8.3 Solution: SCA vs GCA Visualization
library(StatsBombR)
library(dplyr)
library(ggplot2)
library(ggrepel)
# Calculate SCA (key passes + dribbles to shots)
sca_data <- events %>%
filter(
(type.name == "Pass" & pass.shot_assist == TRUE) |
(type.name == "Dribble" & dribble.outcome.name == "Complete")
) %>%
mutate(is_sca = TRUE)
# Calculate GCA (passes to goals)
gca_data <- events %>%
filter(type.name == "Pass" & pass.goal_assist == TRUE) %>%
mutate(is_gca = TRUE)
# Combine and aggregate
player_creativity <- events %>%
filter(!is.na(player.name)) %>%
group_by(player.name, team.name) %>%
summarise(matches = n_distinct(match_id), .groups = "drop") %>%
left_join(
sca_data %>%
group_by(player.name) %>%
summarise(sca = n()),
by = "player.name"
) %>%
left_join(
gca_data %>%
group_by(player.name) %>%
summarise(gca = n()),
by = "player.name"
) %>%
mutate(
sca = ifelse(is.na(sca), 0, sca),
gca = ifelse(is.na(gca), 0, gca),
sca_per_90 = round(sca / matches, 2),
gca_per_90 = round(gca / matches, 2)
) %>%
filter(matches >= 4, sca >= 5)
# Create scatter plot
ggplot(player_creativity, aes(x = sca_per_90, y = gca_per_90)) +
geom_point(aes(size = matches), alpha = 0.6, color = "#1B5E20") +
geom_text_repel(
data = filter(player_creativity, sca_per_90 > 2 | gca_per_90 > 0.5),
aes(label = player.name),
size = 3, max.overlaps = 15
) +
geom_smooth(method = "lm", se = FALSE, color = "#FFA000", linetype = "dashed") +
labs(
title = "Shot-Creating vs Goal-Creating Actions",
subtitle = "World Cup 2022 | Min. 4 matches, 5 SCA",
x = "SCA per 90",
y = "GCA per 90",
size = "Matches"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold"),
legend.position = "bottom"
)
ggsave("sca_vs_gca.png", width = 10, height = 8)ex83-solutionExercise 8.3: Visualize SCA vs GCA relationshipNext: Passing Analytics
Go deeper into passing metrics - progressive passes, pass networks, centrality, and pass value models.
Continue to Passing Analytics