Capstone - Complete Analytics System
The Foundation of Football Analysis
Before expected goals, before tracking data, before machine learning models—there were basic statistics. Goals, assists, passes, tackles, possession. These "traditional" metrics remain the backbone of football analysis and understanding them is essential for any analyst.
Traditional statistics may seem simple compared to modern advanced metrics, but they tell us fundamental truths about the game. A team that dominates possession but can't shoot accurately won't win. A striker with poor shot volume will struggle for goals regardless of quality. The basics matter.
Why Traditional Stats Still Matter
- Foundation: Advanced metrics build upon traditional stats (xG needs shots)
- Communication: Everyone understands goals, assists, and possession
- Data Availability: Basic stats exist for every match, tracking data doesn't
- Historical Comparison: Compare players across eras using consistent metrics
- Simplicity: Sometimes the simple answer is the right answer
Possession Statistics
Possession is perhaps the most discussed statistic in football, yet it's frequently misunderstood. Let's break down what possession actually measures and its limitations.
How Possession Is Calculated
There are two main methods for calculating possession:
Formula:
Possession % = Team Passes / Total Passes × 100
Most common method. Used by Opta and most broadcasters.
Limitation: Doesn't account for time on ball, just pass counts.
Formula:
Possession % = Team Time on Ball / Total Time × 100
More accurate but requires tracking data.
Advantage: Captures actual ball control duration.
# Calculate possession from pass data
library(dplyr)
# Load match events
events <- read.csv("match_events.csv")
# Pass-based possession
possession_stats <- events %>%
filter(type == "Pass") %>%
group_by(team) %>%
summarise(
total_passes = n(),
completed_passes = sum(outcome == "Complete"),
pass_completion = completed_passes / total_passes * 100
) %>%
mutate(
possession_pct = total_passes / sum(total_passes) * 100
)
print(possession_stats)
# Possession by period (halves)
possession_by_half <- events %>%
filter(type == "Pass") %>%
mutate(half = ifelse(minute <= 45, "1st Half", "2nd Half")) %>%
group_by(team, half) %>%
summarise(passes = n(), .groups = "drop") %>%
group_by(half) %>%
mutate(possession = passes / sum(passes) * 100)
print(possession_by_half)chapter5-possessionCalculating possession statisticsPossession Context: Quality Over Quantity
High possession doesn't guarantee success. Context matters more:
# Analyze possession with context
possession_analysis <- events %>%
filter(type == "Pass") %>%
mutate(
# Define pitch zones
zone = case_when(
x >= 80 ~ "Final Third",
x >= 40 ~ "Middle Third",
TRUE ~ "Defensive Third"
),
# Progressive pass (moves ball 10+ yards forward)
is_progressive = (x_end - x) >= 10 & x_end >= 60
) %>%
group_by(team, zone) %>%
summarise(
passes = n(),
completion_rate = mean(outcome == "Complete") * 100,
progressive_passes = sum(is_progressive, na.rm = TRUE),
.groups = "drop"
)
# Possession value by zone
possession_value <- possession_analysis %>%
pivot_wider(names_from = zone,
values_from = c(passes, completion_rate))
# Team A might have 60% possession but only 45% in final third
# That context changes the narrative
# Calculate possession with purpose
purposeful_possession <- events %>%
filter(type == "Pass", team == "Team A") %>%
summarise(
total_passes = n(),
final_third_passes = sum(x_end >= 80, na.rm = TRUE),
progressive_passes = sum((x_end - x) >= 10 & outcome == "Complete", na.rm = TRUE),
passes_into_box = sum(x_end >= 102 & y_end >= 18 & y_end <= 62, na.rm = TRUE)
) %>%
mutate(
final_third_pct = final_third_passes / total_passes * 100,
progressive_pct = progressive_passes / total_passes * 100
)chapter5-possession-contextAnalyzing possession quality and contextThe Possession Paradox
Research has shown that beyond 55-60% possession, there are diminishing returns. Teams can become "possession-rich but chance-poor." The key metrics to watch:
- Final Third Possession % - Where does your possession occur?
- Progressive Pass Rate - Does possession move forward?
- Shots per Possession - Does possession create chances?
- Opponent Shots Allowed - Does possession protect your goal?
Shooting Statistics
Shooting metrics form the core of offensive analysis. While xG has revolutionized shot quality assessment, traditional shot statistics remain essential for understanding volume and efficiency.
Core Shooting Metrics
| Metric | Formula | Description | Benchmark |
|---|---|---|---|
| Shots | Total shot attempts | Raw volume of attempts | ~12-15 per match (team) |
| Shots on Target (SoT) | Shots that would score without keeper | Measures accuracy | ~35-40% of shots |
| SoT% | SoT / Total Shots × 100 | Accuracy percentage | 35-45% is good |
| Goals per Shot | Goals / Total Shots | Conversion efficiency | ~0.10-0.12 (10-12%) |
| Goals per SoT | Goals / Shots on Target | Finishing quality | ~0.30-0.35 |
# Calculate comprehensive shooting statistics
library(dplyr)
# Player shooting stats
player_shooting <- events %>%
filter(type == "Shot") %>%
group_by(player) %>%
summarise(
matches = n_distinct(match_id),
shots = n(),
shots_on_target = sum(outcome %in% c("Goal", "Saved")),
goals = sum(outcome == "Goal"),
blocked = sum(outcome == "Blocked"),
off_target = sum(outcome %in% c("Off T", "Post")),
# Headers vs Feet
headers = sum(body_part == "Head"),
header_goals = sum(body_part == "Head" & outcome == "Goal"),
# Location analysis
inside_box = sum(x >= 102 & y >= 18 & y <= 62),
outside_box = sum(x < 102 | y < 18 | y > 62)
) %>%
mutate(
# Per 90 calculations (assuming 90 mins per match)
shots_per_90 = shots / matches,
goals_per_90 = goals / matches,
# Percentages
shot_accuracy = shots_on_target / shots * 100,
conversion_rate = goals / shots * 100,
finishing_rate = goals / shots_on_target * 100,
# Ratios
goals_per_shot = goals / shots,
inside_box_pct = inside_box / shots * 100
) %>%
arrange(desc(goals))
print(player_shooting)
# Team shooting comparison
team_shooting <- events %>%
filter(type == "Shot") %>%
group_by(team) %>%
summarise(
shots = n(),
sot = sum(outcome %in% c("Goal", "Saved")),
goals = sum(outcome == "Goal"),
xg = sum(xg, na.rm = TRUE)
) %>%
mutate(
sot_pct = sot / shots * 100,
conversion = goals / shots * 100,
goals_minus_xg = goals - xg # Over/under performance
)chapter5-shootingComprehensive shooting statistics analysisShot Quality Analysis Without xG
Even without xG models, you can assess shot quality using location and context:
# Shot quality zones (traditional approach)
shot_zones <- events %>%
filter(type == "Shot") %>%
mutate(
# Define quality zones
shot_zone = case_when(
# 6-yard box
x >= 114 & y >= 30 & y <= 50 ~ "Six Yard Box",
# Penalty area central
x >= 102 & x < 114 & y >= 25 & y <= 55 ~ "Central Box",
# Penalty area wide
x >= 102 & (y < 25 | y > 55) ~ "Wide Box",
# Edge of box
x >= 88 & x < 102 ~ "Edge of Box",
# Long range
TRUE ~ "Long Range"
),
# Expected conversion rates by zone (historical benchmarks)
zone_conversion = case_when(
shot_zone == "Six Yard Box" ~ 0.45,
shot_zone == "Central Box" ~ 0.15,
shot_zone == "Wide Box" ~ 0.08,
shot_zone == "Edge of Box" ~ 0.06,
shot_zone == "Long Range" ~ 0.03
)
)
# Analyze by zone
zone_analysis <- shot_zones %>%
group_by(team, shot_zone) %>%
summarise(
shots = n(),
goals = sum(outcome == "Goal"),
expected = sum(zone_conversion),
conversion = goals / shots * 100,
.groups = "drop"
) %>%
mutate(
over_performance = goals - expected
)
# Shot quality score (simple version)
team_shot_quality <- shot_zones %>%
group_by(team) %>%
summarise(
total_shots = n(),
weighted_quality = sum(zone_conversion),
avg_shot_quality = weighted_quality / total_shots,
goals = sum(outcome == "Goal"),
.groups = "drop"
)chapter5-shot-qualityAnalyzing shot quality by zone without xGPassing Statistics
Passing metrics reveal how teams build attacks and control games. From simple completion rates to progressive passes, these stats illuminate playing style and effectiveness.
Core Passing Metrics
| Metric | Formula | What It Measures |
|---|---|---|
| Pass Completion % | Completed / Attempted × 100 | Basic accuracy |
| Progressive Passes | Passes moving ball 10+ yards toward goal | Attacking intent |
| Key Passes | Passes leading directly to shots | Chance creation |
| Passes into Final Third | Passes entering last 1/3 of pitch | Territory gain |
| Passes into Box | Passes entering penalty area | Dangerous deliveries |
| Through Balls | Passes played behind defense | Line-breaking ability |
| Long Balls | Passes > 30 yards | Direct play style |
| Switches | Passes > 40 yards changing side | Width exploitation |
# Comprehensive passing analysis
library(dplyr)
passing_stats <- events %>%
filter(type == "Pass") %>%
mutate(
# Pass characteristics
pass_length = sqrt((x_end - x)^2 + (y_end - y)^2),
is_completed = outcome == "Complete" | is.na(outcome),
# Direction calculations
forward_distance = x_end - x,
lateral_distance = abs(y_end - y),
# Pass types
is_progressive = forward_distance >= 10 & x_end >= 60,
is_into_final_third = x < 80 & x_end >= 80,
is_into_box = x_end >= 102 & y_end >= 18 & y_end <= 62,
is_long_ball = pass_length >= 30,
is_switch = pass_length >= 40 & lateral_distance >= 35,
is_through_ball = pass_type == "Through Ball",
# Pass direction
direction = case_when(
forward_distance > 5 ~ "Forward",
forward_distance < -5 ~ "Backward",
TRUE ~ "Sideways"
)
)
# Player passing profile
player_passing <- passing_stats %>%
group_by(player, team) %>%
summarise(
matches = n_distinct(match_id),
total_passes = n(),
completed = sum(is_completed),
completion_pct = completed / total_passes * 100,
# Advanced metrics
progressive = sum(is_progressive & is_completed),
into_final_third = sum(is_into_final_third & is_completed),
into_box = sum(is_into_box & is_completed),
long_balls = sum(is_long_ball),
long_ball_completion = sum(is_long_ball & is_completed) / long_balls * 100,
switches = sum(is_switch & is_completed),
# Per 90
passes_per_90 = total_passes / matches,
progressive_per_90 = progressive / matches,
# Direction profile
forward_pct = sum(direction == "Forward") / total_passes * 100,
backward_pct = sum(direction == "Backward") / total_passes * 100,
.groups = "drop"
) %>%
arrange(desc(progressive_per_90))
print(player_passing)chapter5-passingComprehensive passing statistics analysisPass Completion Context
Raw pass completion % can be misleading. A center-back completing 95% short passes isn't more valuable than a midfielder completing 75% progressive passes:
# Pass completion by difficulty
pass_difficulty <- passing_stats %>%
mutate(
difficulty = case_when(
is_through_ball ~ "Very Hard",
is_into_box ~ "Hard",
is_progressive ~ "Medium",
is_long_ball ~ "Medium",
TRUE ~ "Easy"
)
) %>%
group_by(player, difficulty) %>%
summarise(
attempts = n(),
completed = sum(is_completed),
completion = completed / attempts * 100,
.groups = "drop"
)
# Weighted passing score
# Easy passes worth less, difficult passes worth more
weighted_passing <- passing_stats %>%
mutate(
pass_value = case_when(
is_through_ball & is_completed ~ 5,
is_into_box & is_completed ~ 3,
is_progressive & is_completed ~ 2,
is_into_final_third & is_completed ~ 1.5,
is_completed ~ 1,
TRUE ~ 0
)
) %>%
group_by(player) %>%
summarise(
total_passes = n(),
pass_value_total = sum(pass_value),
pass_value_per_90 = pass_value_total / n_distinct(match_id)
) %>%
arrange(desc(pass_value_per_90))chapter5-pass-contextContextualizing pass completion ratesDefensive Statistics
Defensive metrics are notoriously difficult to interpret. A player with many tackles might be excellent defensively—or might be constantly out of position. Context is everything.
Core Defensive Metrics
- Tackles: Successful challenges to win ball
- Tackle %: Tackles / Tackle Attempts
- Interceptions: Reading passes to steal ball
- Recoveries: Regaining loose balls
- Blocks: Stopping shots/passes
- Clearances: Removing ball from danger
- Aerial Duels: Heading contests won
- Fouls: Free kicks conceded
- Pressures: Pressing the opponent
- Pressure Success %: Regains from pressing
- PPDA: Passes allowed per defensive action
# Comprehensive defensive statistics
library(dplyr)
defensive_stats <- events %>%
filter(type %in% c("Tackle", "Interception", "Clearance",
"Block", "Aerial", "Foul", "Pressure")) %>%
group_by(player, team) %>%
summarise(
matches = n_distinct(match_id),
# Tackles
tackles_attempted = sum(type == "Tackle"),
tackles_won = sum(type == "Tackle" & outcome == "Won"),
tackle_pct = tackles_won / tackles_attempted * 100,
# Interceptions
interceptions = sum(type == "Interception"),
# Blocks
blocks = sum(type == "Block"),
shot_blocks = sum(type == "Block" & block_type == "Shot"),
pass_blocks = sum(type == "Block" & block_type == "Pass"),
# Clearances
clearances = sum(type == "Clearance"),
# Aerials
aerial_duels = sum(type == "Aerial"),
aerials_won = sum(type == "Aerial" & outcome == "Won"),
aerial_pct = aerials_won / aerial_duels * 100,
# Fouls
fouls_committed = sum(type == "Foul" & foul_type == "Committed"),
# Pressures
pressures = sum(type == "Pressure"),
pressure_regains = sum(type == "Pressure" & outcome == "Success"),
.groups = "drop"
) %>%
mutate(
# Per 90 metrics
tackles_per_90 = tackles_won / matches,
interceptions_per_90 = interceptions / matches,
clearances_per_90 = clearances / matches,
pressures_per_90 = pressures / matches,
# Combined metrics
ball_winning = tackles_won + interceptions,
ball_winning_per_90 = ball_winning / matches
)
# Defensive value in different zones
defensive_zones <- events %>%
filter(type %in% c("Tackle", "Interception", "Clearance")) %>%
mutate(
zone = case_when(
x <= 40 ~ "Defensive Third",
x <= 80 ~ "Middle Third",
TRUE ~ "Attacking Third"
)
) %>%
group_by(player, zone) %>%
summarise(defensive_actions = n(), .groups = "drop")chapter5-defensiveComprehensive defensive statistics analysisThe Defensive Context Problem
Why Raw Defensive Numbers Mislead
Consider two center-backs:
- Player A: 3 tackles, 2 interceptions per game
- Player B: 1 tackle, 1 interception per game
Player A looks better, but what if:
- Player A's team defends deep, facing many more attacks
- Player B's team presses high, preventing attacks from forming
- Player A gets dribbled past 3 times, Player B never
Solution: Always consider defensive actions relative to opportunities and alongside failure rates (dribbled past, errors leading to shots).
# PPDA - Passes Per Defensive Action (team pressing intensity)
calculate_ppda <- function(match_events) {
# Opposition passes in their defensive 60%
opp_passes <- match_events %>%
filter(type == "Pass",
team == "Opponent",
x <= 72) # 60% of 120
# Defensive actions in opponent defensive 60%
def_actions <- match_events %>%
filter(team == "Team",
type %in% c("Tackle", "Interception", "Foul"),
x >= 48) # Mirror zone
ppda <- nrow(opp_passes) / nrow(def_actions)
return(ppda)
}
# Lower PPDA = more intense pressing
# Liverpool typically ~8-9, low-block teams ~15+
# Defensive action success rate
defensive_success <- events %>%
filter(type %in% c("Tackle", "Duel", "Pressure")) %>%
group_by(player) %>%
summarise(
total_actions = n(),
successful = sum(outcome %in% c("Won", "Success")),
success_rate = successful / total_actions * 100,
# Failures
dribbled_past = sum(dribbled_past, na.rm = TRUE),
errors = sum(outcome == "Error")
) %>%
mutate(
# Net defensive contribution
net_actions = successful - dribbled_past - errors
)chapter5-defensive-contextContextualizing defensive statistics with PPDAPer 90 Minutes Normalization
Raw totals are misleading when comparing players with different playing time. Per 90 minute normalization creates fair comparisons.
# Per 90 calculations
library(dplyr)
# Get player minutes (from separate source)
player_minutes <- data.frame(
player = c("Player A", "Player B", "Player C"),
minutes_played = c(2700, 1800, 900), # Season totals
matches = c(30, 25, 15)
)
# Calculate per 90
player_stats <- events %>%
group_by(player) %>%
summarise(
goals = sum(type == "Shot" & outcome == "Goal"),
assists = sum(type == "Pass" & is_assist),
shots = sum(type == "Shot"),
key_passes = sum(type == "Pass" & is_key_pass),
tackles = sum(type == "Tackle" & outcome == "Won"),
interceptions = sum(type == "Interception")
) %>%
left_join(player_minutes, by = "player") %>%
mutate(
# Per 90 normalization
nineties = minutes_played / 90,
goals_per_90 = goals / nineties,
assists_per_90 = assists / nineties,
shots_per_90 = shots / nineties,
key_passes_per_90 = key_passes / nineties,
tackles_per_90 = tackles / nineties,
# Goal contributions
goal_contributions = goals + assists,
gc_per_90 = goal_contributions / nineties,
# Minutes per goal contribution
mins_per_gc = minutes_played / goal_contributions
)
# Filter for minimum playing time (avoid small sample sizes)
qualified_players <- player_stats %>%
filter(minutes_played >= 450) # 5 full matches minimumchapter5-per90Normalizing statistics per 90 minutesSample Size Considerations
Per 90 stats can be unstable with limited playing time. Guidelines:
| Minimum for per 90: | 450 minutes (5 full matches) |
| Reliable estimates: | 900+ minutes (10 full matches) |
| Stable statistics: | 1800+ minutes (20 full matches) |
Team-Level Statistics
Aggregating individual stats to the team level reveals playing style, strengths, and weaknesses.
# Comprehensive team statistics
team_profile <- events %>%
group_by(team) %>%
summarise(
# Possession
passes = sum(type == "Pass"),
pass_completion = sum(type == "Pass" & outcome == "Complete") / passes * 100,
# Attacking
shots = sum(type == "Shot"),
shots_on_target = sum(type == "Shot" & outcome %in% c("Goal", "Saved")),
goals = sum(type == "Shot" & outcome == "Goal"),
xg = sum(xg[type == "Shot"], na.rm = TRUE),
big_chances = sum(type == "Shot" & xg >= 0.3, na.rm = TRUE),
# Defensive
tackles = sum(type == "Tackle"),
interceptions = sum(type == "Interception"),
clearances = sum(type == "Clearance"),
blocks = sum(type == "Block"),
# Set pieces
corners = sum(type == "Pass" & pass_type == "Corner"),
free_kicks = sum(type == "Pass" & pass_type == "Free Kick"),
# Discipline
fouls = sum(type == "Foul" & foul_type == "Committed"),
yellow_cards = sum(type == "Card" & card_type == "Yellow"),
red_cards = sum(type == "Card" & card_type == "Red"),
.groups = "drop"
) %>%
mutate(
# Derived metrics
shot_accuracy = shots_on_target / shots * 100,
conversion_rate = goals / shots * 100,
goals_vs_xg = goals - xg,
defensive_actions = tackles + interceptions + blocks
)
# Style indicators
team_style <- team_profile %>%
mutate(
# Possession style (high pass volume, high completion)
possession_score = scale(passes) + scale(pass_completion),
# Direct style (lower passes, more long balls)
# Pressing style (high tackles + interceptions in opp half)
# Counter-attacking (low possession, high shot efficiency)
style = case_when(
passes > median(passes) & pass_completion > 85 ~ "Possession",
shots > median(shots) & conversion_rate > 12 ~ "Clinical",
defensive_actions > median(defensive_actions) ~ "Defensive",
TRUE ~ "Balanced"
)
)
print(team_style)chapter5-team-statsBuilding comprehensive team statistical profilesChapter Summary
Key Takeaways
- Possession needs context - Where, not just how much
- Shot volume and quality both matter - Track both shots and conversion
- Pass completion varies by difficulty - Weight progressive passes higher
- Defensive stats need failure context - Include dribbled past, errors
- Always use per 90 for comparisons - With minimum sample requirements
- Team aggregates reveal style - Possession vs direct, pressing vs deep block
Statistical Benchmarks Reference
| Metric | Elite Level | Good | Average |
|---|---|---|---|
| Pass Completion % | 90%+ | 85-90% | 80-85% |
| Shot Accuracy | 50%+ | 40-50% | 30-40% |
| Conversion Rate | 15%+ | 10-15% | 8-10% |
| Tackle Success % | 75%+ | 65-75% | 55-65% |
| Aerial Duel % | 65%+ | 55-65% | 45-55% |
| PPDA (Pressing) | <9 | 9-12 | 12-15 |
Practice Exercises
Exercise 5.1: Build a Player Season Summary
Task: Create a comprehensive statistical profile for a player including shooting, passing, and defensive metrics with per-90 normalization.
# Exercise 5.1: Comprehensive Player Summary
library(StatsBombR)
library(dplyr)
# Load data
comps <- FreeCompetitions() %>%
filter(competition_id == 43, season_id == 106)
matches <- FreeMatches(comps)
events <- free_allevents(MatchesDF = matches)
# Select a player
player_name <- "Lionel Messi"
player_events <- events %>%
filter(player.name == player_name)
# Calculate all stats
player_summary <- player_events %>%
summarise(
# Playing time
matches = n_distinct(match_id),
total_actions = n(),
# Shooting
shots = sum(type.name == "Shot"),
shots_on_target = sum(type.name == "Shot" &
shot.outcome.name %in% c("Goal", "Saved")),
goals = sum(type.name == "Shot" & shot.outcome.name == "Goal"),
xG = sum(shot.statsbomb_xg[type.name == "Shot"], na.rm = TRUE),
# Passing
passes = sum(type.name == "Pass"),
passes_completed = sum(type.name == "Pass" & is.na(pass.outcome.name)),
key_passes = sum(type.name == "Pass" &
(pass.shot_assist == TRUE | pass.goal_assist == TRUE)),
assists = sum(type.name == "Pass" & pass.goal_assist == TRUE, na.rm = TRUE),
# Progressive
progressive_passes = sum(type.name == "Pass" &
(pass.end_location.x - location.x) >= 10, na.rm = TRUE),
# Defensive
tackles = sum(type.name == "Tackle"),
interceptions = sum(type.name == "Interception"),
ball_recoveries = sum(type.name == "Ball Recovery")
) %>%
mutate(
# Per 90 metrics
goals_per_90 = round(goals / matches, 2),
xG_per_90 = round(xG / matches, 2),
assists_per_90 = round(assists / matches, 2),
shots_per_90 = round(shots / matches, 2),
key_passes_per_90 = round(key_passes / matches, 2),
pass_completion = round(passes_completed / passes * 100, 1),
shot_accuracy = round(shots_on_target / shots * 100, 1),
conversion_rate = round(goals / shots * 100, 1),
goals_minus_xG = round(goals - xG, 2)
)
print(paste("Player Summary:", player_name))
print(t(player_summary))ex51-solutionExercise 5.1: Build comprehensive player summaryExercise 5.2: Compare Two Teams' Styles
Task: Analyze and compare the playing styles of two teams using possession zones, passing patterns, and defensive positioning.
# Exercise 5.2: Team Style Comparison
library(StatsBombR)
library(dplyr)
library(tidyr)
# Select two teams to compare
team1 <- "Argentina"
team2 <- "France"
# Calculate style metrics for each team
calculate_team_style <- function(events, team_name) {
team_events <- events %>% filter(team.name == team_name)
# Possession by zone
passes <- team_events %>% filter(type.name == "Pass")
zone_possession <- passes %>%
mutate(zone = case_when(
location.x >= 80 ~ "Final Third",
location.x >= 40 ~ "Middle Third",
TRUE ~ "Defensive Third"
)) %>%
group_by(zone) %>%
summarise(passes = n())
# Shot profile
shots <- team_events %>% filter(type.name == "Shot")
shot_stats <- shots %>%
summarise(
total_shots = n(),
inside_box = sum(location.x >= 102, na.rm = TRUE),
outside_box = sum(location.x < 102, na.rm = TRUE),
avg_xG = mean(shot.statsbomb_xg, na.rm = TRUE)
)
# Defensive profile
defensive <- team_events %>%
filter(type.name %in% c("Tackle", "Interception", "Pressure")) %>%
mutate(high_press = location.x >= 60) %>%
summarise(
def_actions = n(),
high_press_pct = sum(high_press) / n() * 100
)
# Passing style
pass_style <- passes %>%
mutate(
is_long = sqrt((pass.end_location.x - location.x)^2 +
(pass.end_location.y - location.y)^2) >= 30,
is_forward = (pass.end_location.x - location.x) >= 5
) %>%
summarise(
completion_rate = sum(is.na(pass.outcome.name)) / n() * 100,
long_ball_pct = sum(is_long) / n() * 100,
forward_pct = sum(is_forward) / n() * 100
)
list(
team = team_name,
zone_possession = zone_possession,
shots = shot_stats,
defense = defensive,
passing = pass_style
)
}
# Compare
style1 <- calculate_team_style(events, team1)
style2 <- calculate_team_style(events, team2)
# Print comparison
cat("=== STYLE COMPARISON ===\n\n")
cat(team1, "vs", team2, "\n\n")
cat("Passing Style:\n")
cat(sprintf(" %s: %.1f%% completion, %.1f%% forward, %.1f%% long\n",
team1, style1$passing$completion_rate,
style1$passing$forward_pct, style1$passing$long_ball_pct))
cat(sprintf(" %s: %.1f%% completion, %.1f%% forward, %.1f%% long\n",
team2, style2$passing$completion_rate,
style2$passing$forward_pct, style2$passing$long_ball_pct))
cat("\nDefensive Approach:\n")
cat(sprintf(" %s: %.1f%% high press\n", team1, style1$defense$high_press_pct))
cat(sprintf(" %s: %.1f%% high press\n", team2, style2$defense$high_press_pct))ex52-solutionExercise 5.2: Compare team playing stylesExercise 5.3: Statistical Match Report
Task: Generate a statistical match report with key metrics comparison, possession breakdown by thirds, and top performers.
# Exercise 5.3: Statistical Match Report
library(StatsBombR)
library(dplyr)
# Get a single match
match_id <- matches$match_id[1]
match_events <- events %>% filter(match_id == !!match_id)
teams <- unique(match_events$team.name[!is.na(match_events$team.name)])
# Key match statistics
match_stats <- match_events %>%
group_by(team.name) %>%
summarise(
# Possession (pass-based)
Passes = sum(type.name == "Pass"),
`Pass %` = round(sum(type.name == "Pass" & is.na(pass.outcome.name)) /
sum(type.name == "Pass") * 100, 0),
# Shots
Shots = sum(type.name == "Shot"),
`On Target` = sum(type.name == "Shot" &
shot.outcome.name %in% c("Goal", "Saved")),
Goals = sum(type.name == "Shot" & shot.outcome.name == "Goal"),
xG = round(sum(shot.statsbomb_xg[type.name == "Shot"], na.rm = TRUE), 2),
# Corners and set pieces
Corners = sum(type.name == "Pass" & pass.type.name == "Corner", na.rm = TRUE),
# Defensive
Tackles = sum(type.name == "Tackle"),
Interceptions = sum(type.name == "Interception"),
Fouls = sum(type.name == "Foul Committed", na.rm = TRUE),
.groups = "drop"
) %>%
mutate(Possession = round(Passes / sum(Passes) * 100, 0))
# Possession by thirds
possession_thirds <- match_events %>%
filter(type.name == "Pass") %>%
mutate(third = case_when(
location.x >= 80 ~ "Final",
location.x >= 40 ~ "Middle",
TRUE ~ "Defensive"
)) %>%
group_by(team.name, third) %>%
summarise(passes = n(), .groups = "drop") %>%
tidyr::pivot_wider(names_from = third, values_from = passes)
# Top performers
top_passers <- match_events %>%
filter(type.name == "Pass") %>%
group_by(player.name, team.name) %>%
summarise(passes = n(), .groups = "drop") %>%
slice_max(passes, n = 3)
top_tacklers <- match_events %>%
filter(type.name %in% c("Tackle", "Interception")) %>%
group_by(player.name, team.name) %>%
summarise(defensive_actions = n(), .groups = "drop") %>%
slice_max(defensive_actions, n = 3)
# Print Report
cat("\n========================================\n")
cat(" MATCH STATISTICAL REPORT \n")
cat(sprintf(" %s vs %s\n", teams[1], teams[2]))
cat("========================================\n\n")
cat("KEY STATISTICS:\n")
print(as.data.frame(match_stats))
cat("\nPOSSESSION BY THIRDS:\n")
print(as.data.frame(possession_thirds))
cat("\nTOP PASSERS:\n")
print(as.data.frame(top_passers))
cat("\nTOP DEFENDERS:\n")
print(as.data.frame(top_tacklers))ex53-solutionExercise 5.3: Generate statistical match reportContinue Your Journey
Master the fundamentals covered here, then move on to advanced metrics like Expected Goals (xG), Expected Assists (xA), and more sophisticated models.
Continue to Expected Goals (xG)