Chapter 60

Capstone - Complete Analytics System

Intermediate 30 min read 5 sections 10 code examples
0 of 60 chapters completed (0%)

The Foundation of Football Analysis

Before expected goals, before tracking data, before machine learning models—there were basic statistics. Goals, assists, passes, tackles, possession. These "traditional" metrics remain the backbone of football analysis and understanding them is essential for any analyst.

Traditional statistics may seem simple compared to modern advanced metrics, but they tell us fundamental truths about the game. A team that dominates possession but can't shoot accurately won't win. A striker with poor shot volume will struggle for goals regardless of quality. The basics matter.

Why Traditional Stats Still Matter
  • Foundation: Advanced metrics build upon traditional stats (xG needs shots)
  • Communication: Everyone understands goals, assists, and possession
  • Data Availability: Basic stats exist for every match, tracking data doesn't
  • Historical Comparison: Compare players across eras using consistent metrics
  • Simplicity: Sometimes the simple answer is the right answer

Possession Statistics

Possession is perhaps the most discussed statistic in football, yet it's frequently misunderstood. Let's break down what possession actually measures and its limitations.

How Possession Is Calculated

There are two main methods for calculating possession:

Pass-Based Possession

Formula:

Possession % = Team Passes / Total Passes × 100

Most common method. Used by Opta and most broadcasters.

Limitation: Doesn't account for time on ball, just pass counts.

Time-Based Possession

Formula:

Possession % = Team Time on Ball / Total Time × 100

More accurate but requires tracking data.

Advantage: Captures actual ball control duration.

# Calculate possession from pass data import pandas as pd # Load match events events = pd.read_csv("match_events.csv") # Pass-based possession passes = events[events["type"] == "Pass"].copy() possession_stats = passes.groupby("team").agg( total_passes=("type", "count"), completed_passes=("outcome", lambda x: (x == "Complete").sum()) ).reset_index() possession_stats["pass_completion"] = ( possession_stats["completed_passes"] / possession_stats["total_passes"] * 100 ) possession_stats["possession_pct"] = ( possession_stats["total_passes"] / possession_stats["total_passes"].sum() * 100 ) print(possession_stats) # Possession by period passes["half"] = passes["minute"].apply( lambda x: "1st Half" if x <= 45 else "2nd Half" ) possession_by_half = passes.groupby(["team", "half"]).size().reset_index(name="passes") possession_by_half["possession"] = possession_by_half.groupby("half")["passes"].transform( lambda x: x / x.sum() * 100 ) print(possession_by_half)
# Calculate possession from pass data
library(dplyr)

# Load match events
events <- read.csv("match_events.csv")

# Pass-based possession
possession_stats <- events %>%
  filter(type == "Pass") %>%
  group_by(team) %>%
  summarise(
    total_passes = n(),
    completed_passes = sum(outcome == "Complete"),
    pass_completion = completed_passes / total_passes * 100
  ) %>%
  mutate(
    possession_pct = total_passes / sum(total_passes) * 100
  )

print(possession_stats)

# Possession by period (halves)
possession_by_half <- events %>%
  filter(type == "Pass") %>%
  mutate(half = ifelse(minute <= 45, "1st Half", "2nd Half")) %>%
  group_by(team, half) %>%
  summarise(passes = n(), .groups = "drop") %>%
  group_by(half) %>%
  mutate(possession = passes / sum(passes) * 100)

print(possession_by_half)
chapter5-possession
Output
Calculating possession statistics

Possession Context: Quality Over Quantity

High possession doesn't guarantee success. Context matters more:

# Analyze possession with context import pandas as pd import numpy as np passes = events[events["type"] == "Pass"].copy() # Define pitch zones def get_zone(x): if x >= 80: return "Final Third" elif x >= 40: return "Middle Third" else: return "Defensive Third" passes["zone"] = passes["x"].apply(get_zone) # Progressive pass definition passes["is_progressive"] = ( (passes["x_end"] - passes["x"] >= 10) & (passes["x_end"] >= 60) ) # Analyze by zone possession_analysis = passes.groupby(["team", "zone"]).agg( passes=("type", "count"), completion_rate=("outcome", lambda x: (x == "Complete").mean() * 100), progressive_passes=("is_progressive", "sum") ).reset_index() print(possession_analysis) # Possession with purpose metrics def analyze_purposeful_possession(team_passes): return pd.Series({ "total_passes": len(team_passes), "final_third_passes": (team_passes["x_end"] >= 80).sum(), "progressive_passes": ( (team_passes["x_end"] - team_passes["x"] >= 10) & (team_passes["outcome"] == "Complete") ).sum(), "passes_into_box": ( (team_passes["x_end"] >= 102) & (team_passes["y_end"] >= 18) & (team_passes["y_end"] <= 62) ).sum() }) team_a_possession = analyze_purposeful_possession( passes[passes["team"] == "Team A"] ) print(team_a_possession)
# Analyze possession with context
possession_analysis <- events %>%
  filter(type == "Pass") %>%
  mutate(
    # Define pitch zones
    zone = case_when(
      x >= 80 ~ "Final Third",
      x >= 40 ~ "Middle Third",
      TRUE ~ "Defensive Third"
    ),
    # Progressive pass (moves ball 10+ yards forward)
    is_progressive = (x_end - x) >= 10 & x_end >= 60
  ) %>%
  group_by(team, zone) %>%
  summarise(
    passes = n(),
    completion_rate = mean(outcome == "Complete") * 100,
    progressive_passes = sum(is_progressive, na.rm = TRUE),
    .groups = "drop"
  )

# Possession value by zone
possession_value <- possession_analysis %>%
  pivot_wider(names_from = zone,
              values_from = c(passes, completion_rate))

# Team A might have 60% possession but only 45% in final third
# That context changes the narrative

# Calculate possession with purpose
purposeful_possession <- events %>%
  filter(type == "Pass", team == "Team A") %>%
  summarise(
    total_passes = n(),
    final_third_passes = sum(x_end >= 80, na.rm = TRUE),
    progressive_passes = sum((x_end - x) >= 10 & outcome == "Complete", na.rm = TRUE),
    passes_into_box = sum(x_end >= 102 & y_end >= 18 & y_end <= 62, na.rm = TRUE)
  ) %>%
  mutate(
    final_third_pct = final_third_passes / total_passes * 100,
    progressive_pct = progressive_passes / total_passes * 100
  )
chapter5-possession-context
Output
Analyzing possession quality and context
The Possession Paradox

Research has shown that beyond 55-60% possession, there are diminishing returns. Teams can become "possession-rich but chance-poor." The key metrics to watch:

  • Final Third Possession % - Where does your possession occur?
  • Progressive Pass Rate - Does possession move forward?
  • Shots per Possession - Does possession create chances?
  • Opponent Shots Allowed - Does possession protect your goal?

Shooting Statistics

Shooting metrics form the core of offensive analysis. While xG has revolutionized shot quality assessment, traditional shot statistics remain essential for understanding volume and efficiency.

Core Shooting Metrics

Metric Formula Description Benchmark
Shots Total shot attempts Raw volume of attempts ~12-15 per match (team)
Shots on Target (SoT) Shots that would score without keeper Measures accuracy ~35-40% of shots
SoT% SoT / Total Shots × 100 Accuracy percentage 35-45% is good
Goals per Shot Goals / Total Shots Conversion efficiency ~0.10-0.12 (10-12%)
Goals per SoT Goals / Shots on Target Finishing quality ~0.30-0.35
# Calculate comprehensive shooting statistics import pandas as pd shots = events[events["type"] == "Shot"].copy() # Player shooting stats player_shooting = shots.groupby("player").agg( matches=("match_id", "nunique"), shots=("type", "count"), shots_on_target=("outcome", lambda x: x.isin(["Goal", "Saved"]).sum()), goals=("outcome", lambda x: (x == "Goal").sum()), blocked=("outcome", lambda x: (x == "Blocked").sum()), headers=("body_part", lambda x: (x == "Head").sum()), header_goals=("outcome", lambda x: ((shots.loc[x.index, "body_part"] == "Head") & (x == "Goal")).sum()), inside_box=("x", lambda x: ((x >= 102) & (shots.loc[x.index, "y"] >= 18) & (shots.loc[x.index, "y"] <= 62)).sum()), xg_total=("xg", "sum") ).reset_index() # Calculate derived metrics player_shooting["shots_per_90"] = player_shooting["shots"] / player_shooting["matches"] player_shooting["goals_per_90"] = player_shooting["goals"] / player_shooting["matches"] player_shooting["shot_accuracy"] = player_shooting["shots_on_target"] / player_shooting["shots"] * 100 player_shooting["conversion_rate"] = player_shooting["goals"] / player_shooting["shots"] * 100 player_shooting["inside_box_pct"] = player_shooting["inside_box"] / player_shooting["shots"] * 100 player_shooting["goals_minus_xg"] = player_shooting["goals"] - player_shooting["xg_total"] # Sort by goals player_shooting = player_shooting.sort_values("goals", ascending=False) print(player_shooting) # Team comparison team_shooting = shots.groupby("team").agg( shots=("type", "count"), sot=("outcome", lambda x: x.isin(["Goal", "Saved"]).sum()), goals=("outcome", lambda x: (x == "Goal").sum()), xg=("xg", "sum") ).reset_index() team_shooting["sot_pct"] = team_shooting["sot"] / team_shooting["shots"] * 100 team_shooting["conversion"] = team_shooting["goals"] / team_shooting["shots"] * 100 print(team_shooting)
# Calculate comprehensive shooting statistics
library(dplyr)

# Player shooting stats
player_shooting <- events %>%
  filter(type == "Shot") %>%
  group_by(player) %>%
  summarise(
    matches = n_distinct(match_id),
    shots = n(),
    shots_on_target = sum(outcome %in% c("Goal", "Saved")),
    goals = sum(outcome == "Goal"),
    blocked = sum(outcome == "Blocked"),
    off_target = sum(outcome %in% c("Off T", "Post")),

    # Headers vs Feet
    headers = sum(body_part == "Head"),
    header_goals = sum(body_part == "Head" & outcome == "Goal"),

    # Location analysis
    inside_box = sum(x >= 102 & y >= 18 & y <= 62),
    outside_box = sum(x < 102 | y < 18 | y > 62)
  ) %>%
  mutate(
    # Per 90 calculations (assuming 90 mins per match)
    shots_per_90 = shots / matches,
    goals_per_90 = goals / matches,

    # Percentages
    shot_accuracy = shots_on_target / shots * 100,
    conversion_rate = goals / shots * 100,
    finishing_rate = goals / shots_on_target * 100,

    # Ratios
    goals_per_shot = goals / shots,
    inside_box_pct = inside_box / shots * 100
  ) %>%
  arrange(desc(goals))

print(player_shooting)

# Team shooting comparison
team_shooting <- events %>%
  filter(type == "Shot") %>%
  group_by(team) %>%
  summarise(
    shots = n(),
    sot = sum(outcome %in% c("Goal", "Saved")),
    goals = sum(outcome == "Goal"),
    xg = sum(xg, na.rm = TRUE)
  ) %>%
  mutate(
    sot_pct = sot / shots * 100,
    conversion = goals / shots * 100,
    goals_minus_xg = goals - xg  # Over/under performance
  )
chapter5-shooting
Output
Comprehensive shooting statistics analysis

Shot Quality Analysis Without xG

Even without xG models, you can assess shot quality using location and context:

# Shot quality zones (traditional approach) shots = events[events["type"] == "Shot"].copy() def get_shot_zone(row): x, y = row["x"], row["y"] # 6-yard box if x >= 114 and 30 <= y <= 50: return "Six Yard Box" # Penalty area central elif x >= 102 and x < 114 and 25 <= y <= 55: return "Central Box" # Penalty area wide elif x >= 102 and (y < 25 or y > 55): return "Wide Box" # Edge of box elif x >= 88 and x < 102: return "Edge of Box" else: return "Long Range" # Zone conversion rates (historical benchmarks) zone_conversions = { "Six Yard Box": 0.45, "Central Box": 0.15, "Wide Box": 0.08, "Edge of Box": 0.06, "Long Range": 0.03 } shots["shot_zone"] = shots.apply(get_shot_zone, axis=1) shots["zone_conversion"] = shots["shot_zone"].map(zone_conversions) # Analyze by zone zone_analysis = shots.groupby(["team", "shot_zone"]).agg( shots=("type", "count"), goals=("outcome", lambda x: (x == "Goal").sum()), expected=("zone_conversion", "sum") ).reset_index() zone_analysis["conversion"] = zone_analysis["goals"] / zone_analysis["shots"] * 100 zone_analysis["over_performance"] = zone_analysis["goals"] - zone_analysis["expected"] print(zone_analysis) # Team shot quality team_quality = shots.groupby("team").agg( total_shots=("type", "count"), weighted_quality=("zone_conversion", "sum"), goals=("outcome", lambda x: (x == "Goal").sum()) ).reset_index() team_quality["avg_shot_quality"] = team_quality["weighted_quality"] / team_quality["total_shots"] print(team_quality)
# Shot quality zones (traditional approach)
shot_zones <- events %>%
  filter(type == "Shot") %>%
  mutate(
    # Define quality zones
    shot_zone = case_when(
      # 6-yard box
      x >= 114 & y >= 30 & y <= 50 ~ "Six Yard Box",
      # Penalty area central
      x >= 102 & x < 114 & y >= 25 & y <= 55 ~ "Central Box",
      # Penalty area wide
      x >= 102 & (y < 25 | y > 55) ~ "Wide Box",
      # Edge of box
      x >= 88 & x < 102 ~ "Edge of Box",
      # Long range
      TRUE ~ "Long Range"
    ),
    # Expected conversion rates by zone (historical benchmarks)
    zone_conversion = case_when(
      shot_zone == "Six Yard Box" ~ 0.45,
      shot_zone == "Central Box" ~ 0.15,
      shot_zone == "Wide Box" ~ 0.08,
      shot_zone == "Edge of Box" ~ 0.06,
      shot_zone == "Long Range" ~ 0.03
    )
  )

# Analyze by zone
zone_analysis <- shot_zones %>%
  group_by(team, shot_zone) %>%
  summarise(
    shots = n(),
    goals = sum(outcome == "Goal"),
    expected = sum(zone_conversion),
    conversion = goals / shots * 100,
    .groups = "drop"
  ) %>%
  mutate(
    over_performance = goals - expected
  )

# Shot quality score (simple version)
team_shot_quality <- shot_zones %>%
  group_by(team) %>%
  summarise(
    total_shots = n(),
    weighted_quality = sum(zone_conversion),
    avg_shot_quality = weighted_quality / total_shots,
    goals = sum(outcome == "Goal"),
    .groups = "drop"
  )
chapter5-shot-quality
Output
Analyzing shot quality by zone without xG

Passing Statistics

Passing metrics reveal how teams build attacks and control games. From simple completion rates to progressive passes, these stats illuminate playing style and effectiveness.

Core Passing Metrics

Metric Formula What It Measures
Pass Completion % Completed / Attempted × 100 Basic accuracy
Progressive Passes Passes moving ball 10+ yards toward goal Attacking intent
Key Passes Passes leading directly to shots Chance creation
Passes into Final Third Passes entering last 1/3 of pitch Territory gain
Passes into Box Passes entering penalty area Dangerous deliveries
Through Balls Passes played behind defense Line-breaking ability
Long Balls Passes > 30 yards Direct play style
Switches Passes > 40 yards changing side Width exploitation
# Comprehensive passing analysis import pandas as pd import numpy as np passes = events[events["type"] == "Pass"].copy() # Calculate pass characteristics passes["pass_length"] = np.sqrt( (passes["x_end"] - passes["x"])**2 + (passes["y_end"] - passes["y"])**2 ) passes["is_completed"] = passes["outcome"].isin(["Complete"]) | passes["outcome"].isna() passes["forward_distance"] = passes["x_end"] - passes["x"] passes["lateral_distance"] = (passes["y_end"] - passes["y"]).abs() # Pass type flags passes["is_progressive"] = (passes["forward_distance"] >= 10) & (passes["x_end"] >= 60) passes["is_into_final_third"] = (passes["x"] < 80) & (passes["x_end"] >= 80) passes["is_into_box"] = ( (passes["x_end"] >= 102) & (passes["y_end"] >= 18) & (passes["y_end"] <= 62) ) passes["is_long_ball"] = passes["pass_length"] >= 30 passes["is_switch"] = (passes["pass_length"] >= 40) & (passes["lateral_distance"] >= 35) # Direction classification def classify_direction(fd): if fd > 5: return "Forward" elif fd < -5: return "Backward" return "Sideways" passes["direction"] = passes["forward_distance"].apply(classify_direction) # Player passing profile player_passing = passes.groupby(["player", "team"]).agg( matches=("match_id", "nunique"), total_passes=("type", "count"), completed=("is_completed", "sum"), progressive=("is_progressive", lambda x: (x & passes.loc[x.index, "is_completed"]).sum()), into_final_third=("is_into_final_third", lambda x: (x & passes.loc[x.index, "is_completed"]).sum()), into_box=("is_into_box", lambda x: (x & passes.loc[x.index, "is_completed"]).sum()), long_balls=("is_long_ball", "sum"), switches=("is_switch", lambda x: (x & passes.loc[x.index, "is_completed"]).sum()), forward_count=("direction", lambda x: (x == "Forward").sum()) ).reset_index() # Derived metrics player_passing["completion_pct"] = player_passing["completed"] / player_passing["total_passes"] * 100 player_passing["passes_per_90"] = player_passing["total_passes"] / player_passing["matches"] player_passing["progressive_per_90"] = player_passing["progressive"] / player_passing["matches"] player_passing["forward_pct"] = player_passing["forward_count"] / player_passing["total_passes"] * 100 print(player_passing.sort_values("progressive_per_90", ascending=False))
# Comprehensive passing analysis
library(dplyr)

passing_stats <- events %>%
  filter(type == "Pass") %>%
  mutate(
    # Pass characteristics
    pass_length = sqrt((x_end - x)^2 + (y_end - y)^2),
    is_completed = outcome == "Complete" | is.na(outcome),

    # Direction calculations
    forward_distance = x_end - x,
    lateral_distance = abs(y_end - y),

    # Pass types
    is_progressive = forward_distance >= 10 & x_end >= 60,
    is_into_final_third = x < 80 & x_end >= 80,
    is_into_box = x_end >= 102 & y_end >= 18 & y_end <= 62,
    is_long_ball = pass_length >= 30,
    is_switch = pass_length >= 40 & lateral_distance >= 35,
    is_through_ball = pass_type == "Through Ball",

    # Pass direction
    direction = case_when(
      forward_distance > 5 ~ "Forward",
      forward_distance < -5 ~ "Backward",
      TRUE ~ "Sideways"
    )
  )

# Player passing profile
player_passing <- passing_stats %>%
  group_by(player, team) %>%
  summarise(
    matches = n_distinct(match_id),
    total_passes = n(),
    completed = sum(is_completed),
    completion_pct = completed / total_passes * 100,

    # Advanced metrics
    progressive = sum(is_progressive & is_completed),
    into_final_third = sum(is_into_final_third & is_completed),
    into_box = sum(is_into_box & is_completed),
    long_balls = sum(is_long_ball),
    long_ball_completion = sum(is_long_ball & is_completed) / long_balls * 100,
    switches = sum(is_switch & is_completed),

    # Per 90
    passes_per_90 = total_passes / matches,
    progressive_per_90 = progressive / matches,

    # Direction profile
    forward_pct = sum(direction == "Forward") / total_passes * 100,
    backward_pct = sum(direction == "Backward") / total_passes * 100,

    .groups = "drop"
  ) %>%
  arrange(desc(progressive_per_90))

print(player_passing)
chapter5-passing
Output
Comprehensive passing statistics analysis

Pass Completion Context

Raw pass completion % can be misleading. A center-back completing 95% short passes isn't more valuable than a midfielder completing 75% progressive passes:

# Pass completion by difficulty def get_difficulty(row): if row.get("pass_type") == "Through Ball": return "Very Hard" elif row["is_into_box"]: return "Hard" elif row["is_progressive"] or row["is_long_ball"]: return "Medium" return "Easy" passes["difficulty"] = passes.apply(get_difficulty, axis=1) pass_difficulty = passes.groupby(["player", "difficulty"]).agg( attempts=("type", "count"), completed=("is_completed", "sum") ).reset_index() pass_difficulty["completion"] = pass_difficulty["completed"] / pass_difficulty["attempts"] * 100 print(pass_difficulty.pivot_table(index="player", columns="difficulty", values="completion", aggfunc="first")) # Weighted passing score def calculate_pass_value(row): if not row["is_completed"]: return 0 if row.get("pass_type") == "Through Ball": return 5 if row["is_into_box"]: return 3 if row["is_progressive"]: return 2 if row["is_into_final_third"]: return 1.5 return 1 passes["pass_value"] = passes.apply(calculate_pass_value, axis=1) weighted_passing = passes.groupby("player").agg( total_passes=("type", "count"), pass_value_total=("pass_value", "sum"), matches=("match_id", "nunique") ).reset_index() weighted_passing["pass_value_per_90"] = ( weighted_passing["pass_value_total"] / weighted_passing["matches"] ) print(weighted_passing.sort_values("pass_value_per_90", ascending=False))
# Pass completion by difficulty
pass_difficulty <- passing_stats %>%
  mutate(
    difficulty = case_when(
      is_through_ball ~ "Very Hard",
      is_into_box ~ "Hard",
      is_progressive ~ "Medium",
      is_long_ball ~ "Medium",
      TRUE ~ "Easy"
    )
  ) %>%
  group_by(player, difficulty) %>%
  summarise(
    attempts = n(),
    completed = sum(is_completed),
    completion = completed / attempts * 100,
    .groups = "drop"
  )

# Weighted passing score
# Easy passes worth less, difficult passes worth more
weighted_passing <- passing_stats %>%
  mutate(
    pass_value = case_when(
      is_through_ball & is_completed ~ 5,
      is_into_box & is_completed ~ 3,
      is_progressive & is_completed ~ 2,
      is_into_final_third & is_completed ~ 1.5,
      is_completed ~ 1,
      TRUE ~ 0
    )
  ) %>%
  group_by(player) %>%
  summarise(
    total_passes = n(),
    pass_value_total = sum(pass_value),
    pass_value_per_90 = pass_value_total / n_distinct(match_id)
  ) %>%
  arrange(desc(pass_value_per_90))
chapter5-pass-context
Output
Contextualizing pass completion rates

Defensive Statistics

Defensive metrics are notoriously difficult to interpret. A player with many tackles might be excellent defensively—or might be constantly out of position. Context is everything.

Core Defensive Metrics

Ball-Winning
  • Tackles: Successful challenges to win ball
  • Tackle %: Tackles / Tackle Attempts
  • Interceptions: Reading passes to steal ball
  • Recoveries: Regaining loose balls
Defending
  • Blocks: Stopping shots/passes
  • Clearances: Removing ball from danger
  • Aerial Duels: Heading contests won
  • Fouls: Free kicks conceded
Pressing
  • Pressures: Pressing the opponent
  • Pressure Success %: Regains from pressing
  • PPDA: Passes allowed per defensive action
# Comprehensive defensive statistics import pandas as pd defensive_events = events[events["type"].isin([ "Tackle", "Interception", "Clearance", "Block", "Aerial", "Foul", "Pressure" ])].copy() # Aggregate by player defensive_stats = defensive_events.groupby(["player", "team"]).apply( lambda x: pd.Series({ "matches": x["match_id"].nunique(), "tackles_attempted": (x["type"] == "Tackle").sum(), "tackles_won": ((x["type"] == "Tackle") & (x["outcome"] == "Won")).sum(), "interceptions": (x["type"] == "Interception").sum(), "blocks": (x["type"] == "Block").sum(), "clearances": (x["type"] == "Clearance").sum(), "aerial_duels": (x["type"] == "Aerial").sum(), "aerials_won": ((x["type"] == "Aerial") & (x["outcome"] == "Won")).sum(), "pressures": (x["type"] == "Pressure").sum(), "pressure_regains": ((x["type"] == "Pressure") & (x["outcome"] == "Success")).sum() }) ).reset_index() # Calculate derived metrics defensive_stats["tackle_pct"] = ( defensive_stats["tackles_won"] / defensive_stats["tackles_attempted"] * 100 ) defensive_stats["aerial_pct"] = ( defensive_stats["aerials_won"] / defensive_stats["aerial_duels"] * 100 ) defensive_stats["tackles_per_90"] = defensive_stats["tackles_won"] / defensive_stats["matches"] defensive_stats["interceptions_per_90"] = defensive_stats["interceptions"] / defensive_stats["matches"] defensive_stats["ball_winning_per_90"] = ( (defensive_stats["tackles_won"] + defensive_stats["interceptions"]) / defensive_stats["matches"] ) print(defensive_stats.sort_values("ball_winning_per_90", ascending=False)) # Zone analysis def get_zone(x): if x <= 40: return "Defensive Third" elif x <= 80: return "Middle Third" return "Attacking Third" defensive_events["zone"] = defensive_events["x"].apply(get_zone) zone_breakdown = defensive_events.groupby(["player", "zone"]).size().unstack(fill_value=0) print(zone_breakdown)
# Comprehensive defensive statistics
library(dplyr)

defensive_stats <- events %>%
  filter(type %in% c("Tackle", "Interception", "Clearance",
                     "Block", "Aerial", "Foul", "Pressure")) %>%
  group_by(player, team) %>%
  summarise(
    matches = n_distinct(match_id),

    # Tackles
    tackles_attempted = sum(type == "Tackle"),
    tackles_won = sum(type == "Tackle" & outcome == "Won"),
    tackle_pct = tackles_won / tackles_attempted * 100,

    # Interceptions
    interceptions = sum(type == "Interception"),

    # Blocks
    blocks = sum(type == "Block"),
    shot_blocks = sum(type == "Block" & block_type == "Shot"),
    pass_blocks = sum(type == "Block" & block_type == "Pass"),

    # Clearances
    clearances = sum(type == "Clearance"),

    # Aerials
    aerial_duels = sum(type == "Aerial"),
    aerials_won = sum(type == "Aerial" & outcome == "Won"),
    aerial_pct = aerials_won / aerial_duels * 100,

    # Fouls
    fouls_committed = sum(type == "Foul" & foul_type == "Committed"),

    # Pressures
    pressures = sum(type == "Pressure"),
    pressure_regains = sum(type == "Pressure" & outcome == "Success"),

    .groups = "drop"
  ) %>%
  mutate(
    # Per 90 metrics
    tackles_per_90 = tackles_won / matches,
    interceptions_per_90 = interceptions / matches,
    clearances_per_90 = clearances / matches,
    pressures_per_90 = pressures / matches,

    # Combined metrics
    ball_winning = tackles_won + interceptions,
    ball_winning_per_90 = ball_winning / matches
  )

# Defensive value in different zones
defensive_zones <- events %>%
  filter(type %in% c("Tackle", "Interception", "Clearance")) %>%
  mutate(
    zone = case_when(
      x <= 40 ~ "Defensive Third",
      x <= 80 ~ "Middle Third",
      TRUE ~ "Attacking Third"
    )
  ) %>%
  group_by(player, zone) %>%
  summarise(defensive_actions = n(), .groups = "drop")
chapter5-defensive
Output
Comprehensive defensive statistics analysis

The Defensive Context Problem

Why Raw Defensive Numbers Mislead

Consider two center-backs:

  • Player A: 3 tackles, 2 interceptions per game
  • Player B: 1 tackle, 1 interception per game

Player A looks better, but what if:

  • Player A's team defends deep, facing many more attacks
  • Player B's team presses high, preventing attacks from forming
  • Player A gets dribbled past 3 times, Player B never

Solution: Always consider defensive actions relative to opportunities and alongside failure rates (dribbled past, errors leading to shots).

# PPDA - Passes Per Defensive Action def calculate_ppda(match_events, team_name): """ Calculate Passes Per Defensive Action Lower = more aggressive pressing """ # Opposition passes in their defensive 60% opp_passes = match_events[ (match_events["type"] == "Pass") & (match_events["team"] != team_name) & (match_events["x"] <= 72) # 60% of 120 ] # Our defensive actions in their half def_actions = match_events[ (match_events["team"] == team_name) & (match_events["type"].isin(["Tackle", "Interception", "Foul"])) & (match_events["x"] >= 48) # Opponent defensive zone ] if len(def_actions) == 0: return float("inf") return len(opp_passes) / len(def_actions) # Calculate for match ppda = calculate_ppda(events, "England") print(f"PPDA: {ppda:.1f}") # Lower = more pressing # Defensive success with failures defensive_outcomes = events[ events["type"].isin(["Tackle", "Duel", "Pressure"]) ].copy() defensive_success = defensive_outcomes.groupby("player").agg( total_actions=("type", "count"), successful=("outcome", lambda x: x.isin(["Won", "Success"]).sum()), dribbled_past=("dribbled_past", "sum") ).reset_index() defensive_success["success_rate"] = ( defensive_success["successful"] / defensive_success["total_actions"] * 100 ) defensive_success["net_actions"] = ( defensive_success["successful"] - defensive_success["dribbled_past"] ) print(defensive_success.sort_values("net_actions", ascending=False))
# PPDA - Passes Per Defensive Action (team pressing intensity)
calculate_ppda <- function(match_events) {

  # Opposition passes in their defensive 60%
  opp_passes <- match_events %>%
    filter(type == "Pass",
           team == "Opponent",
           x <= 72)  # 60% of 120

  # Defensive actions in opponent defensive 60%
  def_actions <- match_events %>%
    filter(team == "Team",
           type %in% c("Tackle", "Interception", "Foul"),
           x >= 48)  # Mirror zone

  ppda <- nrow(opp_passes) / nrow(def_actions)
  return(ppda)
}

# Lower PPDA = more intense pressing
# Liverpool typically ~8-9, low-block teams ~15+

# Defensive action success rate
defensive_success <- events %>%
  filter(type %in% c("Tackle", "Duel", "Pressure")) %>%
  group_by(player) %>%
  summarise(
    total_actions = n(),
    successful = sum(outcome %in% c("Won", "Success")),
    success_rate = successful / total_actions * 100,

    # Failures
    dribbled_past = sum(dribbled_past, na.rm = TRUE),
    errors = sum(outcome == "Error")
  ) %>%
  mutate(
    # Net defensive contribution
    net_actions = successful - dribbled_past - errors
  )
chapter5-defensive-context
Output
Contextualizing defensive statistics with PPDA

Per 90 Minutes Normalization

Raw totals are misleading when comparing players with different playing time. Per 90 minute normalization creates fair comparisons.

# Per 90 calculations import pandas as pd # Player minutes data player_minutes = pd.DataFrame({ "player": ["Player A", "Player B", "Player C"], "minutes_played": [2700, 1800, 900], "matches": [30, 25, 15] }) # Calculate raw stats player_stats = events.groupby("player").agg( goals=("outcome", lambda x: ( (events.loc[x.index, "type"] == "Shot") & (x == "Goal") ).sum()), shots=("type", lambda x: (x == "Shot").sum()), tackles=("type", lambda x: ( (x == "Tackle") & (events.loc[x.index, "outcome"] == "Won") ).sum()), interceptions=("type", lambda x: (x == "Interception").sum()) ).reset_index() # Merge with minutes player_stats = player_stats.merge(player_minutes, on="player", how="left") # Calculate per 90 player_stats["nineties"] = player_stats["minutes_played"] / 90 per_90_cols = ["goals", "shots", "tackles", "interceptions"] for col in per_90_cols: player_stats[f"{col}_per_90"] = player_stats[col] / player_stats["nineties"] # Filter qualified players (minimum 450 minutes) qualified = player_stats[player_stats["minutes_played"] >= 450] print(qualified[["player", "minutes_played", "goals", "goals_per_90", "shots", "shots_per_90"]])
# Per 90 calculations
library(dplyr)

# Get player minutes (from separate source)
player_minutes <- data.frame(
  player = c("Player A", "Player B", "Player C"),
  minutes_played = c(2700, 1800, 900),  # Season totals
  matches = c(30, 25, 15)
)

# Calculate per 90
player_stats <- events %>%
  group_by(player) %>%
  summarise(
    goals = sum(type == "Shot" & outcome == "Goal"),
    assists = sum(type == "Pass" & is_assist),
    shots = sum(type == "Shot"),
    key_passes = sum(type == "Pass" & is_key_pass),
    tackles = sum(type == "Tackle" & outcome == "Won"),
    interceptions = sum(type == "Interception")
  ) %>%
  left_join(player_minutes, by = "player") %>%
  mutate(
    # Per 90 normalization
    nineties = minutes_played / 90,

    goals_per_90 = goals / nineties,
    assists_per_90 = assists / nineties,
    shots_per_90 = shots / nineties,
    key_passes_per_90 = key_passes / nineties,
    tackles_per_90 = tackles / nineties,

    # Goal contributions
    goal_contributions = goals + assists,
    gc_per_90 = goal_contributions / nineties,

    # Minutes per goal contribution
    mins_per_gc = minutes_played / goal_contributions
  )

# Filter for minimum playing time (avoid small sample sizes)
qualified_players <- player_stats %>%
  filter(minutes_played >= 450)  # 5 full matches minimum
chapter5-per90
Output
Normalizing statistics per 90 minutes
Sample Size Considerations

Per 90 stats can be unstable with limited playing time. Guidelines:

Minimum for per 90: 450 minutes (5 full matches)
Reliable estimates: 900+ minutes (10 full matches)
Stable statistics: 1800+ minutes (20 full matches)

Team-Level Statistics

Aggregating individual stats to the team level reveals playing style, strengths, and weaknesses.

# Comprehensive team statistics team_profile = events.groupby("team").agg( # Possession passes=("type", lambda x: (x == "Pass").sum()), passes_complete=("type", lambda x: ( (x == "Pass") & (events.loc[x.index, "outcome"] == "Complete") ).sum()), # Attacking shots=("type", lambda x: (x == "Shot").sum()), shots_on_target=("type", lambda x: ( (x == "Shot") & (events.loc[x.index, "outcome"].isin(["Goal", "Saved"])) ).sum()), goals=("type", lambda x: ( (x == "Shot") & (events.loc[x.index, "outcome"] == "Goal") ).sum()), xg=("xg", lambda x: x[events.loc[x.index, "type"] == "Shot"].sum()), # Defensive tackles=("type", lambda x: (x == "Tackle").sum()), interceptions=("type", lambda x: (x == "Interception").sum()), clearances=("type", lambda x: (x == "Clearance").sum()), blocks=("type", lambda x: (x == "Block").sum()), # Discipline fouls=("type", lambda x: (x == "Foul").sum()) ).reset_index() # Derived metrics team_profile["pass_completion"] = team_profile["passes_complete"] / team_profile["passes"] * 100 team_profile["shot_accuracy"] = team_profile["shots_on_target"] / team_profile["shots"] * 100 team_profile["conversion_rate"] = team_profile["goals"] / team_profile["shots"] * 100 team_profile["goals_vs_xg"] = team_profile["goals"] - team_profile["xg"] team_profile["defensive_actions"] = ( team_profile["tackles"] + team_profile["interceptions"] + team_profile["blocks"] ) # Style classification def classify_style(row): if row["passes"] > team_profile["passes"].median() and row["pass_completion"] > 85: return "Possession" elif row["conversion_rate"] > 12: return "Clinical" elif row["defensive_actions"] > team_profile["defensive_actions"].median(): return "Defensive" return "Balanced" team_profile["style"] = team_profile.apply(classify_style, axis=1) print(team_profile)
# Comprehensive team statistics
team_profile <- events %>%
  group_by(team) %>%
  summarise(
    # Possession
    passes = sum(type == "Pass"),
    pass_completion = sum(type == "Pass" & outcome == "Complete") / passes * 100,

    # Attacking
    shots = sum(type == "Shot"),
    shots_on_target = sum(type == "Shot" & outcome %in% c("Goal", "Saved")),
    goals = sum(type == "Shot" & outcome == "Goal"),
    xg = sum(xg[type == "Shot"], na.rm = TRUE),
    big_chances = sum(type == "Shot" & xg >= 0.3, na.rm = TRUE),

    # Defensive
    tackles = sum(type == "Tackle"),
    interceptions = sum(type == "Interception"),
    clearances = sum(type == "Clearance"),
    blocks = sum(type == "Block"),

    # Set pieces
    corners = sum(type == "Pass" & pass_type == "Corner"),
    free_kicks = sum(type == "Pass" & pass_type == "Free Kick"),

    # Discipline
    fouls = sum(type == "Foul" & foul_type == "Committed"),
    yellow_cards = sum(type == "Card" & card_type == "Yellow"),
    red_cards = sum(type == "Card" & card_type == "Red"),

    .groups = "drop"
  ) %>%
  mutate(
    # Derived metrics
    shot_accuracy = shots_on_target / shots * 100,
    conversion_rate = goals / shots * 100,
    goals_vs_xg = goals - xg,
    defensive_actions = tackles + interceptions + blocks
  )

# Style indicators
team_style <- team_profile %>%
  mutate(
    # Possession style (high pass volume, high completion)
    possession_score = scale(passes) + scale(pass_completion),

    # Direct style (lower passes, more long balls)
    # Pressing style (high tackles + interceptions in opp half)
    # Counter-attacking (low possession, high shot efficiency)

    style = case_when(
      passes > median(passes) & pass_completion > 85 ~ "Possession",
      shots > median(shots) & conversion_rate > 12 ~ "Clinical",
      defensive_actions > median(defensive_actions) ~ "Defensive",
      TRUE ~ "Balanced"
    )
  )

print(team_style)
chapter5-team-stats
Output
Building comprehensive team statistical profiles

Chapter Summary

Key Takeaways
  • Possession needs context - Where, not just how much
  • Shot volume and quality both matter - Track both shots and conversion
  • Pass completion varies by difficulty - Weight progressive passes higher
  • Defensive stats need failure context - Include dribbled past, errors
  • Always use per 90 for comparisons - With minimum sample requirements
  • Team aggregates reveal style - Possession vs direct, pressing vs deep block

Statistical Benchmarks Reference

Metric Elite Level Good Average
Pass Completion % 90%+ 85-90% 80-85%
Shot Accuracy 50%+ 40-50% 30-40%
Conversion Rate 15%+ 10-15% 8-10%
Tackle Success % 75%+ 65-75% 55-65%
Aerial Duel % 65%+ 55-65% 45-55%
PPDA (Pressing) <9 9-12 12-15

Practice Exercises

Exercise 5.1: Build a Player Season Summary

Task: Create a comprehensive statistical profile for a player including shooting, passing, and defensive metrics with per-90 normalization.

# Exercise 5.1: Comprehensive Player Summary from statsbombpy import sb import pandas as pd # Load data matches = sb.matches(competition_id=43, season_id=106) all_events = pd.concat([ sb.events(mid).assign(match_id=mid) for mid in matches["match_id"] ]) # Select player player_name = "Lionel Messi" player_events = all_events[all_events["player"] == player_name].copy() # Calculate stats summary = { "matches": player_events["match_id"].nunique(), "total_actions": len(player_events), # Shooting "shots": (player_events["type"] == "Shot").sum(), "shots_on_target": ((player_events["type"] == "Shot") & (player_events["shot_outcome"].isin(["Goal", "Saved"]))).sum(), "goals": ((player_events["type"] == "Shot") & (player_events["shot_outcome"] == "Goal")).sum(), "xG": player_events[player_events["type"] == "Shot"]["shot_statsbomb_xg"].sum(), # Passing "passes": (player_events["type"] == "Pass").sum(), "passes_completed": ((player_events["type"] == "Pass") & (player_events["pass_outcome"].isna())).sum(), "key_passes": ((player_events["type"] == "Pass") & ((player_events["pass_shot_assist"] == True) | (player_events["pass_goal_assist"] == True))).sum(), "assists": ((player_events["type"] == "Pass") & (player_events["pass_goal_assist"] == True)).sum(), # Defensive "tackles": (player_events["type"] == "Tackle").sum(), "interceptions": (player_events["type"] == "Interception").sum() } # Per 90 calculations matches = summary["matches"] summary["goals_per_90"] = round(summary["goals"] / matches, 2) summary["xG_per_90"] = round(summary["xG"] / matches, 2) summary["assists_per_90"] = round(summary["assists"] / matches, 2) summary["pass_completion"] = round(summary["passes_completed"] / summary["passes"] * 100, 1) summary["shot_accuracy"] = round(summary["shots_on_target"] / summary["shots"] * 100, 1) summary["conversion_rate"] = round(summary["goals"] / summary["shots"] * 100, 1) print(f"Player Summary: {player_name}") for key, value in summary.items(): print(f" {key}: {value}")
# Exercise 5.1: Comprehensive Player Summary
library(StatsBombR)
library(dplyr)

# Load data
comps <- FreeCompetitions() %>%
  filter(competition_id == 43, season_id == 106)
matches <- FreeMatches(comps)
events <- free_allevents(MatchesDF = matches)

# Select a player
player_name <- "Lionel Messi"

player_events <- events %>%
  filter(player.name == player_name)

# Calculate all stats
player_summary <- player_events %>%
  summarise(
    # Playing time
    matches = n_distinct(match_id),
    total_actions = n(),

    # Shooting
    shots = sum(type.name == "Shot"),
    shots_on_target = sum(type.name == "Shot" &
                          shot.outcome.name %in% c("Goal", "Saved")),
    goals = sum(type.name == "Shot" & shot.outcome.name == "Goal"),
    xG = sum(shot.statsbomb_xg[type.name == "Shot"], na.rm = TRUE),

    # Passing
    passes = sum(type.name == "Pass"),
    passes_completed = sum(type.name == "Pass" & is.na(pass.outcome.name)),
    key_passes = sum(type.name == "Pass" &
                    (pass.shot_assist == TRUE | pass.goal_assist == TRUE)),
    assists = sum(type.name == "Pass" & pass.goal_assist == TRUE, na.rm = TRUE),

    # Progressive
    progressive_passes = sum(type.name == "Pass" &
                            (pass.end_location.x - location.x) >= 10, na.rm = TRUE),

    # Defensive
    tackles = sum(type.name == "Tackle"),
    interceptions = sum(type.name == "Interception"),
    ball_recoveries = sum(type.name == "Ball Recovery")
  ) %>%
  mutate(
    # Per 90 metrics
    goals_per_90 = round(goals / matches, 2),
    xG_per_90 = round(xG / matches, 2),
    assists_per_90 = round(assists / matches, 2),
    shots_per_90 = round(shots / matches, 2),
    key_passes_per_90 = round(key_passes / matches, 2),
    pass_completion = round(passes_completed / passes * 100, 1),
    shot_accuracy = round(shots_on_target / shots * 100, 1),
    conversion_rate = round(goals / shots * 100, 1),
    goals_minus_xG = round(goals - xG, 2)
  )

print(paste("Player Summary:", player_name))
print(t(player_summary))
ex51-solution
Output
Exercise 5.1: Build comprehensive player summary
Exercise 5.2: Compare Two Teams' Styles

Task: Analyze and compare the playing styles of two teams using possession zones, passing patterns, and defensive positioning.

# Exercise 5.2: Team Style Comparison from statsbombpy import sb import pandas as pd import numpy as np # Select teams team1, team2 = "Argentina", "France" def calculate_team_style(events, team_name): team = events[events["team"] == team_name].copy() # Zone possession passes = team[team["type"] == "Pass"].copy() passes["x"] = passes["location"].apply(lambda l: l[0] if l else None) def get_zone(x): if x is None: return "Unknown" if x >= 80: return "Final Third" if x >= 40: return "Middle Third" return "Defensive Third" passes["zone"] = passes["x"].apply(get_zone) zone_counts = passes["zone"].value_counts() # Shot profile shots = team[team["type"] == "Shot"].copy() shots["x"] = shots["location"].apply(lambda l: l[0] if l else None) shot_stats = { "total": len(shots), "inside_box": (shots["x"] >= 102).sum() if len(shots) > 0 else 0, "avg_xG": shots["shot_statsbomb_xg"].mean() if len(shots) > 0 else 0 } # Defensive style defense = team[team["type"].isin(["Tackle", "Interception", "Pressure"])].copy() defense["x"] = defense["location"].apply(lambda l: l[0] if l else None) high_press = (defense["x"] >= 60).sum() / len(defense) * 100 if len(defense) > 0 else 0 # Passing style passes["end_x"] = passes["pass_end_location"].apply(lambda l: l[0] if l else None) passes["forward"] = (passes["end_x"] - passes["x"]) >= 5 passes["completed"] = passes["pass_outcome"].isna() pass_style = { "completion": passes["completed"].mean() * 100, "forward_pct": passes["forward"].mean() * 100 if len(passes) > 0 else 0 } return { "zones": zone_counts, "shots": shot_stats, "high_press_pct": high_press, "passing": pass_style } # Compare style1 = calculate_team_style(all_events, team1) style2 = calculate_team_style(all_events, team2) print("=== STYLE COMPARISON ===\n") print(f"{team1} vs {team2}\n") print("Passing Style:") print(f" {team1}: {style1[\"passing\"][\"completion\"]:.1f}% completion, " f"{style1[\"passing\"][\"forward_pct\"]:.1f}% forward") print(f" {team2}: {style2[\"passing\"][\"completion\"]:.1f}% completion, " f"{style2[\"passing\"][\"forward_pct\"]:.1f}% forward") print("\nDefensive Approach:") print(f" {team1}: {style1[\"high_press_pct\"]:.1f}% high press") print(f" {team2}: {style2[\"high_press_pct\"]:.1f}% high press")
# Exercise 5.2: Team Style Comparison
library(StatsBombR)
library(dplyr)
library(tidyr)

# Select two teams to compare
team1 <- "Argentina"
team2 <- "France"

# Calculate style metrics for each team
calculate_team_style <- function(events, team_name) {
  team_events <- events %>% filter(team.name == team_name)

  # Possession by zone
  passes <- team_events %>% filter(type.name == "Pass")
  zone_possession <- passes %>%
    mutate(zone = case_when(
      location.x >= 80 ~ "Final Third",
      location.x >= 40 ~ "Middle Third",
      TRUE ~ "Defensive Third"
    )) %>%
    group_by(zone) %>%
    summarise(passes = n())

  # Shot profile
  shots <- team_events %>% filter(type.name == "Shot")
  shot_stats <- shots %>%
    summarise(
      total_shots = n(),
      inside_box = sum(location.x >= 102, na.rm = TRUE),
      outside_box = sum(location.x < 102, na.rm = TRUE),
      avg_xG = mean(shot.statsbomb_xg, na.rm = TRUE)
    )

  # Defensive profile
  defensive <- team_events %>%
    filter(type.name %in% c("Tackle", "Interception", "Pressure")) %>%
    mutate(high_press = location.x >= 60) %>%
    summarise(
      def_actions = n(),
      high_press_pct = sum(high_press) / n() * 100
    )

  # Passing style
  pass_style <- passes %>%
    mutate(
      is_long = sqrt((pass.end_location.x - location.x)^2 +
                    (pass.end_location.y - location.y)^2) >= 30,
      is_forward = (pass.end_location.x - location.x) >= 5
    ) %>%
    summarise(
      completion_rate = sum(is.na(pass.outcome.name)) / n() * 100,
      long_ball_pct = sum(is_long) / n() * 100,
      forward_pct = sum(is_forward) / n() * 100
    )

  list(
    team = team_name,
    zone_possession = zone_possession,
    shots = shot_stats,
    defense = defensive,
    passing = pass_style
  )
}

# Compare
style1 <- calculate_team_style(events, team1)
style2 <- calculate_team_style(events, team2)

# Print comparison
cat("=== STYLE COMPARISON ===\n\n")
cat(team1, "vs", team2, "\n\n")

cat("Passing Style:\n")
cat(sprintf("  %s: %.1f%% completion, %.1f%% forward, %.1f%% long\n",
           team1, style1$passing$completion_rate,
           style1$passing$forward_pct, style1$passing$long_ball_pct))
cat(sprintf("  %s: %.1f%% completion, %.1f%% forward, %.1f%% long\n",
           team2, style2$passing$completion_rate,
           style2$passing$forward_pct, style2$passing$long_ball_pct))

cat("\nDefensive Approach:\n")
cat(sprintf("  %s: %.1f%% high press\n", team1, style1$defense$high_press_pct))
cat(sprintf("  %s: %.1f%% high press\n", team2, style2$defense$high_press_pct))
ex52-solution
Output
Exercise 5.2: Compare team playing styles
Exercise 5.3: Statistical Match Report

Task: Generate a statistical match report with key metrics comparison, possession breakdown by thirds, and top performers.

# Exercise 5.3: Statistical Match Report from statsbombpy import sb import pandas as pd # Get match match_id = matches["match_id"].iloc[0] match_events = sb.events(match_id=match_id) teams = match_events["team"].dropna().unique() # Key statistics stats = [] for team in teams: team_events = match_events[match_events["team"] == team] passes = team_events[team_events["type"] == "Pass"] shots = team_events[team_events["type"] == "Shot"] stats.append({ "Team": team, "Passes": len(passes), "Pass %": round(passes["pass_outcome"].isna().mean() * 100), "Shots": len(shots), "On Target": shots["shot_outcome"].isin(["Goal", "Saved"]).sum(), "Goals": (shots["shot_outcome"] == "Goal").sum(), "xG": round(shots["shot_statsbomb_xg"].sum(), 2), "Tackles": (team_events["type"] == "Tackle").sum(), "Interceptions": (team_events["type"] == "Interception").sum() }) stats_df = pd.DataFrame(stats) stats_df["Possession"] = round(stats_df["Passes"] / stats_df["Passes"].sum() * 100) # Possession by thirds passes = match_events[match_events["type"] == "Pass"].copy() passes["x"] = passes["location"].apply(lambda l: l[0] if l else None) passes["third"] = passes["x"].apply( lambda x: "Final" if x >= 80 else ("Middle" if x >= 40 else "Defensive") ) thirds = passes.groupby(["team", "third"]).size().unstack(fill_value=0) # Top performers passers = passes.groupby(["player", "team"]).size().reset_index(name="passes") top_passers = passers.nlargest(3, "passes") defense = match_events[match_events["type"].isin(["Tackle", "Interception"])] defenders = defense.groupby(["player", "team"]).size().reset_index(name="actions") top_defenders = defenders.nlargest(3, "actions") # Print Report print("\n" + "="*50) print(" MATCH STATISTICAL REPORT") print(f" {teams[0]} vs {teams[1]}") print("="*50 + "\n") print("KEY STATISTICS:") print(stats_df.to_string(index=False)) print("\nPOSSESSION BY THIRDS:") print(thirds) print("\nTOP PASSERS:") print(top_passers.to_string(index=False)) print("\nTOP DEFENDERS:") print(top_defenders.to_string(index=False))
# Exercise 5.3: Statistical Match Report
library(StatsBombR)
library(dplyr)

# Get a single match
match_id <- matches$match_id[1]
match_events <- events %>% filter(match_id == !!match_id)
teams <- unique(match_events$team.name[!is.na(match_events$team.name)])

# Key match statistics
match_stats <- match_events %>%
  group_by(team.name) %>%
  summarise(
    # Possession (pass-based)
    Passes = sum(type.name == "Pass"),
    `Pass %` = round(sum(type.name == "Pass" & is.na(pass.outcome.name)) /
                    sum(type.name == "Pass") * 100, 0),

    # Shots
    Shots = sum(type.name == "Shot"),
    `On Target` = sum(type.name == "Shot" &
                     shot.outcome.name %in% c("Goal", "Saved")),
    Goals = sum(type.name == "Shot" & shot.outcome.name == "Goal"),
    xG = round(sum(shot.statsbomb_xg[type.name == "Shot"], na.rm = TRUE), 2),

    # Corners and set pieces
    Corners = sum(type.name == "Pass" & pass.type.name == "Corner", na.rm = TRUE),

    # Defensive
    Tackles = sum(type.name == "Tackle"),
    Interceptions = sum(type.name == "Interception"),
    Fouls = sum(type.name == "Foul Committed", na.rm = TRUE),

    .groups = "drop"
  ) %>%
  mutate(Possession = round(Passes / sum(Passes) * 100, 0))

# Possession by thirds
possession_thirds <- match_events %>%
  filter(type.name == "Pass") %>%
  mutate(third = case_when(
    location.x >= 80 ~ "Final",
    location.x >= 40 ~ "Middle",
    TRUE ~ "Defensive"
  )) %>%
  group_by(team.name, third) %>%
  summarise(passes = n(), .groups = "drop") %>%
  tidyr::pivot_wider(names_from = third, values_from = passes)

# Top performers
top_passers <- match_events %>%
  filter(type.name == "Pass") %>%
  group_by(player.name, team.name) %>%
  summarise(passes = n(), .groups = "drop") %>%
  slice_max(passes, n = 3)

top_tacklers <- match_events %>%
  filter(type.name %in% c("Tackle", "Interception")) %>%
  group_by(player.name, team.name) %>%
  summarise(defensive_actions = n(), .groups = "drop") %>%
  slice_max(defensive_actions, n = 3)

# Print Report
cat("\n========================================\n")
cat("         MATCH STATISTICAL REPORT        \n")
cat(sprintf("      %s vs %s\n", teams[1], teams[2]))
cat("========================================\n\n")

cat("KEY STATISTICS:\n")
print(as.data.frame(match_stats))

cat("\nPOSSESSION BY THIRDS:\n")
print(as.data.frame(possession_thirds))

cat("\nTOP PASSERS:\n")
print(as.data.frame(top_passers))

cat("\nTOP DEFENDERS:\n")
print(as.data.frame(top_tacklers))
ex53-solution
Output
Exercise 5.3: Generate statistical match report

Continue Your Journey

Master the fundamentals covered here, then move on to advanced metrics like Expected Goals (xG), Expected Assists (xA), and more sophisticated models.

Continue to Expected Goals (xG)