Capstone - Complete Analytics System
The Hardest Position to Measure
Defensive analytics is football's greatest challenge. Unlike attacking metrics where we measure creation and conversion, defensive value often comes from what doesn't happen—the shot not taken, the pass not completed, the chance prevented.
A defender with zero tackles might be elite (never out of position) or terrible (ball watching). Context is everything. This chapter explores how to properly evaluate defensive contributions.
The Defensive Analytics Challenge
- Selection bias: More tackles might mean worse positioning
- Team context: High-line teams face different challenges
- Off-ball value: Positioning and communication are invisible in data
- Attribution: Who prevented the goal—the blocker or the one who forced the bad shot?
Basic Defensive Actions
Start with the fundamentals: tackles, interceptions, blocks, and clearances.
# Calculate basic defensive statistics
library(StatsBombR)
library(dplyr)
# Load data
comps <- FreeCompetitions() %>%
filter(competition_id == 43, season_id == 106)
matches <- FreeMatches(comps)
events <- free_allevents(MatchesDF = matches)
# Defensive actions
defensive_stats <- events %>%
filter(type.name %in% c("Tackle", "Interception", "Clearance",
"Block", "Duel")) %>%
group_by(player.name, team.name) %>%
summarise(
matches = n_distinct(match_id),
# Tackles
tackles = sum(type.name == "Tackle"),
tackles_won = sum(type.name == "Tackle" &
duel.outcome.name %in% c("Won", "Success")),
# Interceptions
interceptions = sum(type.name == "Interception"),
# Clearances
clearances = sum(type.name == "Clearance"),
# Blocks
blocks = sum(type.name == "Block"),
shot_blocks = sum(type.name == "Block" &
block.offensive == FALSE),
# Duels
duels = sum(type.name == "Duel"),
duels_won = sum(type.name == "Duel" &
duel.outcome.name %in% c("Won", "Success")),
.groups = "drop"
) %>%
mutate(
# Rates
tackle_success = round(tackles_won / tackles * 100, 1),
duel_success = round(duels_won / duels * 100, 1),
# Per 90
tackles_per_90 = round(tackles / matches, 2),
interceptions_per_90 = round(interceptions / matches, 2),
clearances_per_90 = round(clearances / matches, 2),
# Ball-winning actions
ball_winning = tackles_won + interceptions,
ball_winning_per_90 = round(ball_winning / matches, 2)
)
print("Defensive Action Leaders:")
print(defensive_stats %>%
filter(matches >= 3) %>%
arrange(desc(ball_winning_per_90)) %>%
select(player.name, team.name, matches, tackles_won,
interceptions, ball_winning_per_90, tackle_success) %>%
head(15))chapter10-basic-defenseCalculating basic defensive statisticsPossession-Adjusted Defense
Raw defensive numbers are misleading without context. A team with 70% possession has fewer defensive opportunities than one with 30%.
Defensive Actions Per Defensive Action Opportunity
# Possession-adjusted defensive metrics
# Calculate opponent possession to adjust
# First, get team possession by match
match_possession <- events %>%
filter(type.name == "Pass") %>%
group_by(match_id, team.name) %>%
summarise(passes = n(), .groups = "drop") %>%
group_by(match_id) %>%
mutate(
total_passes = sum(passes),
possession_pct = passes / total_passes * 100,
opp_possession_pct = 100 - possession_pct
)
# Join to player defensive stats
player_match_defense <- events %>%
filter(type.name %in% c("Tackle", "Interception")) %>%
group_by(player.name, team.name, match_id) %>%
summarise(
defensive_actions = n(),
.groups = "drop"
) %>%
left_join(match_possession, by = c("match_id", "team.name"))
# Possession-adjusted rate
# More opponent possession = more opportunities to defend
poss_adjusted <- player_match_defense %>%
group_by(player.name, team.name) %>%
summarise(
matches = n(),
total_def_actions = sum(defensive_actions),
avg_opp_possession = mean(opp_possession_pct),
# Raw rate
def_actions_per_match = total_def_actions / matches,
# Adjusted: normalize to 50% opponent possession baseline
adj_def_actions = total_def_actions / (avg_opp_possession / 50),
adj_def_per_match = adj_def_actions / matches,
.groups = "drop"
) %>%
filter(matches >= 3) %>%
mutate(
adjustment_factor = round(50 / avg_opp_possession, 2)
)
print("Possession-Adjusted Defensive Actions:")
print(poss_adjusted %>%
arrange(desc(adj_def_per_match)) %>%
select(player.name, matches, def_actions_per_match,
avg_opp_possession, adjustment_factor, adj_def_per_match) %>%
head(15))chapter10-poss-adjustedPossession-adjusted defensive metricsPPDA - Passes Per Defensive Action
PPDA measures team pressing intensity—fewer passes allowed per defensive action means more aggressive pressing:
# Calculate PPDA (Passes Per Defensive Action)
# Lower PPDA = more intense pressing
calculate_ppda <- function(events_df, team_name, match_id) {
match_events <- events_df %>%
filter(match_id == !!match_id)
# Opponent passes in their defensive 60%
opp_passes <- match_events %>%
filter(team.name != team_name,
type.name == "Pass",
location.x <= 72) # Opponent defensive 60%
# Team defensive actions in opponent defensive 60%
team_def_actions <- match_events %>%
filter(team.name == team_name,
type.name %in% c("Tackle", "Interception", "Foul Committed"),
location.x >= 48) # Same zone, flipped
if (nrow(team_def_actions) == 0) return(NA)
ppda <- nrow(opp_passes) / nrow(team_def_actions)
return(ppda)
}
# Calculate PPDA for all teams across tournament
team_ppda <- events %>%
select(match_id, team.name) %>%
distinct() %>%
group_by(team.name) %>%
summarise(
matches = n_distinct(match_id)
)
# Simplified PPDA calculation
team_ppda_stats <- events %>%
group_by(team.name) %>%
summarise(
matches = n_distinct(match_id),
# Approximate PPDA
opp_passes_faced = sum(type.name == "Pass" &
!is.na(team.name)), # Would need opponent filter
high_press_actions = sum(type.name %in% c("Tackle", "Interception",
"Pressure") &
location.x >= 60), # High press zone
.groups = "drop"
)
print("Team Pressing Intensity:")
print(team_ppda_stats %>%
mutate(press_actions_per_match = round(high_press_actions / matches, 1)) %>%
arrange(desc(press_actions_per_match)))chapter10-ppdaCalculating PPDA (pressing intensity)Defensive Value Models
Modern analytics attempts to quantify the actual value of defensive actions by measuring their impact on opponent scoring probability.
Expected Threat Prevented (xT Prevented)
# Calculate xT (Expected Threat) prevented by defensive actions
# xT assigns values to pitch zones based on goal probability
# Simple xT grid (12x8 zones)
# Values represent probability of scoring from that zone
xt_grid <- matrix(c(
0.01, 0.01, 0.01, 0.02, 0.02, 0.02, 0.02, 0.01, # Row 1 (own goal)
0.01, 0.01, 0.02, 0.02, 0.03, 0.03, 0.02, 0.01,
0.01, 0.02, 0.03, 0.04, 0.05, 0.04, 0.03, 0.02,
0.02, 0.03, 0.04, 0.06, 0.08, 0.06, 0.04, 0.03,
0.03, 0.04, 0.06, 0.10, 0.15, 0.10, 0.06, 0.04,
0.04, 0.06, 0.10, 0.20, 0.30, 0.20, 0.10, 0.06,
0.08, 0.15, 0.25, 0.35, 0.40, 0.35, 0.25, 0.15, # Row 7 (opp box)
0.15, 0.30, 0.40, 0.50, 0.50, 0.50, 0.40, 0.30 # Row 8 (6-yard box)
), nrow = 8, byrow = TRUE)
# Function to get xT value from coordinates
get_xt <- function(x, y) {
# Convert StatsBomb coords (120x80) to grid (8x12)
grid_x <- ceiling(x / 10)
grid_y <- ceiling(y / 10)
grid_x <- max(1, min(12, grid_x))
grid_y <- max(1, min(8, grid_y))
return(xt_grid[grid_y, grid_x])
}
# Calculate xT prevented by defensive actions
def_xt <- events %>%
filter(type.name %in% c("Tackle", "Interception", "Block"),
duel.outcome.name %in% c("Won", "Success") |
type.name %in% c("Interception", "Block")) %>%
mutate(
xt_at_action = mapply(get_xt, location.x, location.y),
# Opponent was progressing, so xT prevented is current position value
xt_prevented = xt_at_action
)
# Player xT prevented
xt_prevented_by_player <- def_xt %>%
group_by(player.name, team.name) %>%
summarise(
matches = n_distinct(match_id),
defensive_actions = n(),
total_xt_prevented = sum(xt_prevented),
xt_prevented_per_90 = round(total_xt_prevented / matches, 3),
.groups = "drop"
) %>%
filter(matches >= 3) %>%
arrange(desc(xt_prevented_per_90))
print("xT Prevented Leaders:")
print(head(xt_prevented_by_player, 15))chapter10-xt-preventedCalculating Expected Threat preventedThe Other Side: Defensive Failures
Defensive stats must include failures—being dribbled past, errors, and penalties conceded:
# Track defensive failures
defensive_failures <- events %>%
filter(
# Dribbled past (from dribble events where defender lost)
(type.name == "Dribbled Past") |
# Fouls leading to dangerous free kicks
(type.name == "Foul Committed" & location.x >= 80) |
# Errors
(type.name == "Error")
) %>%
group_by(player.name, team.name) %>%
summarise(
matches = n_distinct(match_id),
dribbled_past = sum(type.name == "Dribbled Past"),
fouls_danger_zone = sum(type.name == "Foul Committed"),
errors = sum(type.name == "Error"),
.groups = "drop"
) %>%
mutate(
failures_per_90 = round((dribbled_past + errors) / matches, 2)
)
# Combine with positive actions for net contribution
net_defensive <- defensive_stats %>%
select(player.name, team.name, matches, ball_winning_per_90) %>%
left_join(
defensive_failures %>%
select(player.name, failures_per_90),
by = "player.name"
) %>%
mutate(
failures_per_90 = ifelse(is.na(failures_per_90), 0, failures_per_90),
net_defensive = ball_winning_per_90 - failures_per_90
) %>%
filter(matches >= 3) %>%
arrange(desc(net_defensive))
print("Net Defensive Contribution (Wins - Failures):")
print(head(net_defensive, 15))chapter10-failuresTracking defensive failures and net contributionAerial Duel Analysis
# Aerial duel analysis
aerial_stats <- events %>%
filter(type.name == "Aerial Lost" | type.name == "Clearance" |
(type.name == "Duel" & duel.type.name == "Aerial Lost")) %>%
# This is simplified - actual aerial tracking needs both won/lost
group_by(player.name, team.name) %>%
summarise(
matches = n_distinct(match_id),
aerials = n(),
.groups = "drop"
) %>%
mutate(
aerials_per_90 = round(aerials / matches, 2)
) %>%
filter(matches >= 3)
# For proper aerial win rate, need both aerial won and lost events
print("Aerial Activity Leaders:")
print(head(aerial_stats %>% arrange(desc(aerials_per_90)), 10))chapter10-aerialAerial duel analysisChapter Summary
Key Takeaways
- Raw numbers mislead: Always consider possession and context
- Possession-adjust: More opponent possession = more defensive opportunities
- PPDA measures pressing: Lower = more aggressive press
- Include failures: Dribbled past, errors, fouls matter
- Net contribution: Successes minus failures gives true picture
- xT prevented: Value actions by how much threat they stopped
Defensive Metrics Reference
| Metric | Good Value | Interpretation |
|---|---|---|
| Tackle Success % | 65%+ | Wins most challenges |
| Aerial Win % | 60%+ | Dominant in the air |
| PPDA | <10 | High pressing intensity |
| Dribbled Past/90 | <1.0 | Rarely beaten 1v1 |
Practice Exercises
Test your understanding of defensive analytics with these practical exercises.
Exercise 10.1: Possession-Adjusted Defensive Actions
Task: Calculate possession-adjusted defensive statistics for players. Normalize defensive actions based on how much possession the opponent had (more opponent possession = more opportunities to defend).
Formula: Adjusted Actions = Raw Actions × (50 / Opponent Possession %)
# Exercise 10.1: Possession-Adjusted Defense
library(StatsBombR)
library(dplyr)
# Load World Cup data
comps <- FreeCompetitions() %>%
filter(competition_id == 43, season_id == 106)
matches <- FreeMatches(comps)
events <- free_allevents(MatchesDF = matches)
# Calculate possession by team per match
match_possession <- events %>%
filter(type.name == "Pass") %>%
group_by(match_id, team.name) %>%
summarise(team_passes = n(), .groups = "drop") %>%
group_by(match_id) %>%
mutate(
total_passes = sum(team_passes),
possession_pct = team_passes / total_passes * 100,
opp_possession = 100 - possession_pct
) %>%
ungroup()
# Defensive actions per player per match
player_defense <- events %>%
filter(type.name %in% c("Tackle", "Interception", "Clearance", "Block")) %>%
group_by(player.name, team.name, match_id) %>%
summarise(
tackles = sum(type.name == "Tackle"),
interceptions = sum(type.name == "Interception"),
clearances = sum(type.name == "Clearance"),
blocks = sum(type.name == "Block"),
total_actions = n(),
.groups = "drop"
)
# Join possession data
player_defense <- player_defense %>%
left_join(
match_possession %>% select(match_id, team.name, opp_possession),
by = c("match_id", "team.name")
)
# Aggregate and adjust
adjusted_defense <- player_defense %>%
group_by(player.name, team.name) %>%
summarise(
matches = n(),
raw_actions = sum(total_actions),
avg_opp_poss = mean(opp_possession, na.rm = TRUE),
# Possession-adjusted actions
adjusted_actions = sum(total_actions * (50 / opp_possession)),
.groups = "drop"
) %>%
filter(matches >= 3) %>%
mutate(
raw_per_90 = round(raw_actions / matches, 2),
adj_per_90 = round(adjusted_actions / matches, 2),
adjustment_factor = round(50 / avg_opp_poss, 2),
# Difference shows impact of adjustment
adjustment_impact = round(adj_per_90 - raw_per_90, 2)
) %>%
arrange(desc(adj_per_90))
print("Possession-Adjusted Defensive Actions:")
print(head(adjusted_defense, 15))
# Note: Players on low-possession teams get boosted
# Players on high-possession teams get penalized
print("\nBiggest Adjustment Impacts:")
print(adjusted_defense %>%
arrange(desc(abs(adjustment_impact))) %>%
select(player.name, avg_opp_poss, raw_per_90, adj_per_90, adjustment_impact) %>%
head(10))ex101-solutionExercise 10.1: Calculate possession-adjusted defensive statsExercise 10.2: Team PPDA Analysis
Task: Calculate PPDA (Passes Per Defensive Action) for all teams in a tournament. PPDA measures pressing intensity - lower values indicate more aggressive pressing.
Definition: PPDA = Opponent passes in their defensive 60% / Your defensive actions in that zone
# Exercise 10.2: Team PPDA Analysis
library(StatsBombR)
library(dplyr)
library(ggplot2)
# Calculate PPDA for each team per match
calculate_ppda <- function(match_events, team) {
# Opponent passes in their defensive 60% (x <= 72)
other_teams <- setdiff(unique(match_events$team.name), team)
opp_passes <- match_events %>%
filter(team.name %in% other_teams,
type.name == "Pass",
location.x <= 72) %>%
nrow()
# Our defensive actions in opponent defensive 60% (x >= 48 for us)
our_def_actions <- match_events %>%
filter(team.name == team,
type.name %in% c("Tackle", "Interception", "Foul Committed"),
location.x >= 48) %>%
nrow()
if (our_def_actions == 0) return(NA)
return(opp_passes / our_def_actions)
}
# Calculate for all teams across all matches
teams <- unique(events$team.name)
ppda_results <- data.frame()
for (mid in unique(events$match_id)) {
match_events <- events %>% filter(match_id == mid)
match_teams <- unique(match_events$team.name)
for (team in match_teams) {
ppda <- calculate_ppda(match_events, team)
ppda_results <- bind_rows(ppda_results, data.frame(
match_id = mid,
team = team,
ppda = ppda
))
}
}
# Aggregate by team
team_ppda <- ppda_results %>%
filter(!is.na(ppda)) %>%
group_by(team) %>%
summarise(
matches = n(),
avg_ppda = round(mean(ppda), 2),
min_ppda = round(min(ppda), 2),
max_ppda = round(max(ppda), 2),
.groups = "drop"
) %>%
mutate(
press_intensity = case_when(
avg_ppda < 8 ~ "Very High Press",
avg_ppda < 10 ~ "High Press",
avg_ppda < 12 ~ "Medium Press",
TRUE ~ "Low Block"
)
) %>%
arrange(avg_ppda)
print("Team PPDA Rankings (Lower = More Pressing):")
print(team_ppda)
# Visualization
ggplot(team_ppda %>% filter(matches >= 3),
aes(x = reorder(team, avg_ppda), y = avg_ppda, fill = press_intensity)) +
geom_col() +
geom_hline(yintercept = c(8, 10, 12), linetype = "dashed", alpha = 0.5) +
coord_flip() +
scale_fill_manual(values = c("Very High Press" = "#1B5E20",
"High Press" = "#4CAF50",
"Medium Press" = "#FFC107",
"Low Block" = "#F44336")) +
labs(title = "Team Pressing Intensity (PPDA)",
subtitle = "Lower PPDA = More Aggressive Pressing",
x = "", y = "Passes Per Defensive Action",
fill = "Press Style") +
theme_minimal()
ggsave("team_ppda.png", width = 12, height = 10)ex102-solutionExercise 10.2: Calculate and visualize team PPDAExercise 10.3: Net Defensive Value Index
Task: Create a comprehensive defensive value index that combines successful defensive actions with failures. Calculate a net score that accounts for tackles won, interceptions, and ball recoveries minus times dribbled past, errors, and fouls in dangerous areas.
Formula: Net Defensive Value = (Tackles Won + Interceptions + Recoveries) - (Dribbled Past + Errors + Dangerous Fouls)
# Exercise 10.3: Net Defensive Value Index
library(StatsBombR)
library(dplyr)
library(ggplot2)
# Positive defensive actions
positive_defense <- events %>%
filter(
(type.name == "Tackle" & duel.outcome.name %in% c("Won", "Success")) |
(type.name == "Interception") |
(type.name == "Ball Recovery" & !is.na(ball_recovery.recovery_failure))
) %>%
group_by(player.name, team.name) %>%
summarise(
matches = n_distinct(match_id),
tackles_won = sum(type.name == "Tackle"),
interceptions = sum(type.name == "Interception"),
recoveries = sum(type.name == "Ball Recovery"),
positive_actions = n(),
.groups = "drop"
)
# Negative defensive events
negative_defense <- events %>%
filter(
(type.name == "Dribbled Past") |
(type.name == "Error") |
(type.name == "Foul Committed" & location.x >= 80) # Dangerous area
) %>%
group_by(player.name, team.name) %>%
summarise(
dribbled_past = sum(type.name == "Dribbled Past"),
errors = sum(type.name == "Error"),
dangerous_fouls = sum(type.name == "Foul Committed"),
negative_actions = n(),
.groups = "drop"
)
# Combine for net value
net_defensive <- positive_defense %>%
left_join(negative_defense, by = c("player.name", "team.name")) %>%
mutate(
across(c(dribbled_past, errors, dangerous_fouls, negative_actions),
~ifelse(is.na(.), 0, .))
) %>%
filter(matches >= 3) %>%
mutate(
# Net value
net_value = positive_actions - negative_actions,
net_per_90 = round(net_value / matches, 2),
# Component breakdown
positive_per_90 = round(positive_actions / matches, 2),
negative_per_90 = round(negative_actions / matches, 2),
# Rating
rating = case_when(
net_per_90 > 5 ~ "Elite",
net_per_90 > 3 ~ "Good",
net_per_90 > 1 ~ "Average",
TRUE ~ "Below Average"
)
) %>%
arrange(desc(net_per_90))
print("Net Defensive Value Index:")
print(head(net_defensive %>%
select(player.name, matches, positive_per_90, negative_per_90,
net_per_90, rating), 20))
# Visualization: Positive vs Negative scatter
ggplot(net_defensive, aes(x = positive_per_90, y = negative_per_90)) +
geom_point(aes(color = rating, size = matches), alpha = 0.7) +
geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "gray") +
scale_color_manual(values = c("Elite" = "#1B5E20", "Good" = "#4CAF50",
"Average" = "#FFC107", "Below Average" = "#F44336")) +
labs(
title = "Net Defensive Value Analysis",
subtitle = "Players below the line have positive net defensive value",
x = "Positive Actions per 90 (Tackles Won, Interceptions, Recoveries)",
y = "Negative Actions per 90 (Dribbled Past, Errors, Dangerous Fouls)",
color = "Rating", size = "Matches"
) +
theme_minimal()
ggsave("net_defensive_value.png", width = 12, height = 10)ex103-solutionExercise 10.3: Calculate net defensive value indexContinue Your Journey
You've completed the core analytics modules! Explore positional analytics next to evaluate players by their specific roles.
Continue to Goalkeeper Analytics