Chapter 1: Introduction to Soccer Analytics

1.1 What is Soccer Analytics?

Soccer analytics is the systematic application of data analysis and statistical methods to understand, evaluate, and improve performance in association football.

At its core, soccer analytics answers questions that traditional observation alone cannot. While a scout might say a player "looks good," analytics can quantify exactly how good they are compared to their peers, and in what specific areas they excel or struggle.

The Three Pillars of Football Analytics

Performance Analysis

Measuring how well players and teams perform through metrics like expected goals (xG), pass completion rates, pressing intensity, and defensive actions.

Recruitment & Scouting

Identifying players who fit specific profiles, finding undervalued talent, and predicting future performance to make better transfer decisions.

Tactical Analysis

Understanding team playing styles, opponent weaknesses, set piece effectiveness, and in-game decision making through data-driven insights.

Why Analytics Matters in Modern Football

The adoption of analytics has transformed how football clubs operate. Here are some key reasons why data-driven decision making has become essential:

Traditional Approach	Analytics Approach	Benefit
"He scores lots of goals"	"His xG outperformance is +3.2 this season"	Distinguishes skill from luck
"Good passer"	"Top 5% for progressive passes per 90"	Quantifiable comparison
"Works hard defensively"	"8.3 pressures per 90, 32% success rate"	Measures actual contribution
"£50m seems reasonable"	"Market value model suggests £35m"	Data-informed negotiations

Key Insight

Analytics doesn't replace traditional scouting and coaching expertise—it enhances it. The best football organizations combine data insights with human judgment to make better decisions.

1.2 The Analytics Revolution in Football

Football's analytics revolution began later than other sports like baseball (featured in "Moneyball"), but has accelerated rapidly in the past decade. Understanding this history helps contextualize where we are today.

Timeline of Key Developments

1990s - Early Pioneers

Charles Reep's long-ball theories (later debunked) represented early attempts at football analytics. Opta began collecting basic match statistics. Most analysis was simple: shots, passes, possession.

2000s - Data Collection Expands

ProZone introduced video-based tracking. Clubs like Bolton Wanderers under Sam Allardyce began using data for set-piece analysis. Event data became more detailed but remained proprietary.

2012 - Expected Goals Emerges

Sam Green at Opta and others develop expected goals (xG) models. This metric revolutionizes how we evaluate shots and chances. Analytics Twitter begins sharing insights publicly.

2017 - StatsBomb Open Data

StatsBomb releases free, detailed event data for select competitions. This democratizes football analytics, enabling students and hobbyists to learn with professional-grade data.

2018-Present - Mainstream Adoption

xG appears in TV broadcasts. Liverpool and Manchester City build world-class analytics departments. Brentford reaches the Premier League largely through data-driven recruitment. Tracking data becomes more accessible.

Case Study: Leicester City's 2015-16 Title

Leicester City's Premier League triumph wasn't just a fairytale—it was partly enabled by smart data use. Under Claudio Ranieri, Leicester identified that:

Counter-attacking efficiency could compete with possession-based football
Jamie Vardy's running statistics made him ideal for their direct style
N'Golo Kanté's ball recovery numbers were elite before he was widely recognized
Defensive compactness could be maintained without dominating possession

leicester_xg_analysis.py

                # Analyzing Leicester's 2015-16 season efficiency
import pandas as pd

# Leicester's key stats from that season
leicester_stats = {
    'matches': 38,
    'goals_scored': 68,
    'goals_conceded': 36,
    'xG_for': 55.4,  # Expected goals created
    'xG_against': 42.1,  # Expected goals conceded
    'possession_avg': 42.3,  # Below league average!
    'points': 81
}

# Calculate overperformance
xg_difference = leicester_stats['goals_scored'] - leicester_stats['xG_for']
xga_difference = leicester_stats['xG_against'] - leicester_stats['goals_conceded']

print(f"Goals vs xG: +{xg_difference:.1f} (clinical finishing)")
print(f"Conceded vs xGA: -{xga_difference:.1f} (excellent defending)")
print(f"Net xG overperformance: +{xg_difference + xga_difference:.1f}")

# This shows Leicester massively outperformed their underlying numbers
# They scored 12.6 more goals than expected and conceded 6.1 fewer
            

                # Analyzing Leicester's 2015-16 season efficiency
library(tidyverse)

# Leicester's key stats from that season
leicester_stats <- tibble(
  matches = 38,
  goals_scored = 68,
  goals_conceded = 36,
  xG_for = 55.4,  # Expected goals created
  xG_against = 42.1,  # Expected goals conceded
  possession_avg = 42.3,  # Below league average!
  points = 81
)

# Calculate overperformance
leicester_stats <- leicester_stats %>%
  mutate(
    xg_overperformance = goals_scored - xG_for,
    xga_overperformance = xG_against - goals_conceded,
    net_overperformance = xg_overperformance + xga_overperformance
  )

cat(sprintf("Goals vs xG: +%.1f (clinical finishing)\n",
            leicester_stats$xg_overperformance))
cat(sprintf("Conceded vs xGA: -%.1f (excellent defending)\n",
            leicester_stats$xga_overperformance))
cat(sprintf("Net xG overperformance: +%.1f\n",
            leicester_stats$net_overperformance))
            

Output

Goals vs xG: +12.6 (clinical finishing)
Conceded vs xGA: -6.1 (excellent defending)
Net xG overperformance: +18.7

The analysis shows Leicester overperformed their xG by a massive 18.7 goals across the season. While some of this is variance (luck), much came from Vardy and Mahrez's clinical finishing and Schmeichel's outstanding goalkeeping.

1.3 Questions Analytics Can Answer

Before diving into technical implementation, let's understand the types of questions football analytics can help answer. This will guide what skills and metrics you'll learn.

Player Evaluation

How efficient is this striker's finishing?
Which midfielder progresses the ball most effectively?
Is this defender actually good, or protected by the system?
How does player X compare to his positional peers?
Is this goalkeeper's save rate sustainable?

Team Analysis

What's this team's playing style?
Where do they create chances from?
How effectively do they press?
What's their set piece effectiveness?
Are their results sustainable?

Recruitment

Who are similar players to our target?
Is this player worth the asking price?
How might they perform in our league?
What's their development trajectory?
Which young players are breakout candidates?

Tactical Insights

Where is the opponent vulnerable?
What formations work best against them?
How should we build up against their press?
Which substitutions would be most impactful?
What set piece routines should we use?

1.4 Types of Football Data

Understanding the different types of football data is crucial before you start analyzing. Each type has different levels of detail, availability, and use cases.

1. Event Data

Event data records every on-ball action in a match: passes, shots, tackles, dribbles, fouls, and more. Each event includes:

Location - x, y coordinates on the pitch
Timestamp - when the event occurred
Player - who performed the action
Outcome - success/failure and additional details
Qualifiers - additional context (body part, technique, etc.)

explore_event_data.py

                # Exploring event data structure with StatsBomb
from statsbombpy import sb
import pandas as pd

# Load a match - 2022 World Cup Final
events = sb.events(match_id=3869685)

# See what columns are available
print("Event data columns:")
print(events.columns.tolist()[:20])  # First 20 columns

# Count events by type
print("\nEvent types in the match:")
print(events['type'].value_counts().head(15))

# Example: Look at a single pass event
pass_event = events[events['type'] == 'Pass'].iloc[0]
print("\nSample pass event:")
print(f"  Player: {pass_event['player']}")
print(f"  Team: {pass_event['team']}")
print(f"  Location: {pass_event['location']}")
print(f"  Pass end location: {pass_event['pass_end_location']}")
print(f"  Pass recipient: {pass_event['pass_recipient']}")
print(f"  Minute: {pass_event['minute']}")
            

                # Exploring event data structure with StatsBomb
library(StatsBombR)
library(tidyverse)

# Load a match - 2022 World Cup Final
events <- get.matchFree(data.frame(match_id = 3869685))

# See what columns are available
cat("Event data columns:\n")
print(names(events)[1:20])  # First 20 columns

# Count events by type
cat("\nEvent types in the match:\n")
events %>%
  count(type.name, sort = TRUE) %>%
  head(15) %>%
  print()

# Example: Look at a single pass event
pass_event <- events %>%
  filter(type.name == "Pass") %>%
  slice(1)

cat("\nSample pass event:\n")
cat(sprintf("  Player: %s\n", pass_event$player.name))
cat(sprintf("  Team: %s\n", pass_event$team.name))
cat(sprintf("  Location: %.1f, %.1f\n",
            pass_event$location.x, pass_event$location.y))
cat(sprintf("  Minute: %d\n", pass_event$minute))
            

Output

Event types in the match:
Pass              1247
Ball Receipt*      892
Carry              891
Pressure           298
Ball Recovery      124
Duel               108
Clearance           89
Block               58
Foul Committed      43
Shot                41
Interception        38
...

2. Tracking Data

Tracking data captures the position of all 22 players and the ball, typically 25 times per second. This creates incredibly rich datasets but requires specialized analysis techniques.

Tracking Data Availability

Tracking data is mostly proprietary and expensive. Providers like Second Spectrum and SkillCorner serve professional clubs. However, some public datasets exist for learning (Metrica Sports, Last Row datasets). We'll cover these in Chapter 21.

3. Aggregate Statistics

The most accessible form of football data. Sites like FBref provide season-level and match-level statistics including:

Goals, assists, minutes played
Shots, shots on target
Pass completion percentages
Tackles, interceptions, clearances
Expected goals and expected assists

aggregate_stats.py

                # Accessing aggregate stats from FBref
import soccerdata as sd

# Initialize FBref scraper
fbref = sd.FBref(leagues="ENG-Premier League", seasons="2023-2024")

# Get player season stats
player_stats = fbref.read_player_season_stats(stat_type="standard")

# Look at top scorers
top_scorers = player_stats.nlargest(10, ('Performance', 'Gls'))
print("Top 10 Premier League Scorers 2023-24:")
print(top_scorers[['Performance', 'Gls', 'xG', 'npxG']].head(10))

# Calculate goals vs xG for top scorers
top_scorers['xG_diff'] = top_scorers[('Performance', 'Gls')] - top_scorers[('Expected', 'xG')]
print("\nGoals minus xG (overperformance):")
print(top_scorers[['xG_diff']].head(10))
            

                # Accessing aggregate stats from FBref
library(worldfootballR)
library(tidyverse)

# Get Premier League player stats
player_stats <- fb_big5_advanced_season_stats(
  season_end_year = 2024,
  stat_type = "standard",
  team_or_player = "player"
) %>%
  filter(Comp == "Premier League")

# Look at top scorers
top_scorers <- player_stats %>%
  arrange(desc(Gls)) %>%
  head(10) %>%
  select(Player, Squad, Gls, xG, npxG)

print("Top 10 Premier League Scorers 2023-24:")
print(top_scorers)

# Calculate goals vs xG for top scorers
top_scorers <- top_scorers %>%
  mutate(xG_diff = Gls - xG)

print("\nGoals minus xG (overperformance):")
print(select(top_scorers, Player, Gls, xG, xG_diff))
            

Data Comparison Table

Data Type	Granularity	Accessibility	Best For
Event Data	Individual actions	Free (StatsBomb) to expensive (Opta)	Detailed match analysis, xG models
Tracking Data	25 frames/second	Expensive, limited public access	Off-ball analysis, space control
Aggregate Stats	Match/season totals	Widely free (FBref, etc.)	Player comparison, trend analysis

1.5 Setting Up Your Development Environment

Before we can analyze football data, we need to set up our tools. This textbook supports both Python and R—choose whichever you're more comfortable with, or learn both!

Python Setup (Recommended: Anaconda)

Install Anaconda
Download from anaconda.com/download. Anaconda includes Python and many data science packages pre-installed.

Create a virtual environment

# Open Anaconda Prompt or terminal
conda create -n soccer-analytics python=3.10
conda activate soccer-analytics

Install essential packages

# Core data science
pip install pandas numpy matplotlib seaborn

# Soccer-specific
pip install mplsoccer statsbombpy soccerdata

# Machine learning (for later chapters)
pip install scikit-learn xgboost

# Jupyter for interactive analysis
pip install jupyter jupyterlab

Verify installation

# Test that everything works
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from mplsoccer import Pitch
from statsbombpy import sb

print("All packages installed successfully!")
print(f"Pandas version: {pd.__version__}")

R Setup (Recommended: RStudio)

Install R
Download from cran.r-project.org
Install RStudio
Download from posit.co/download/rstudio-desktop

Install essential packages

# Core tidyverse packages
install.packages("tidyverse")
install.packages("lubridate")

# Soccer-specific
install.packages("worldfootballR")
install.packages("ggsoccer")

# StatsBomb package (from GitHub)
install.packages("devtools")
devtools::install_github("statsbomb/StatsBombR")

# Machine learning (for later chapters)
install.packages("tidymodels")
install.packages("xgboost")

Verify installation

# Test that everything works
library(tidyverse)
library(StatsBombR)
library(ggsoccer)

print("All packages installed successfully!")
print(paste("R version:", R.version.string))

Recommended Development Setup

Python: VS Code with Python extension + Jupyter notebooks
R: RStudio with R Markdown for reproducible analysis
Both: Git for version control of your analysis projects

1.6 Your First Football Analysis

Now let's put everything together and perform a real analysis. We'll analyze the 2022 FIFA World Cup Final between Argentina and France—one of the greatest matches ever played.

Step 1: Load the Match Data

world_cup_final.py

                from statsbombpy import sb
import pandas as pd
import matplotlib.pyplot as plt

# Load World Cup 2022 matches
competitions = sb.competitions()
world_cup = competitions[
    (competitions['competition_name'] == 'FIFA World Cup') &
    (competitions['season_name'] == '2022')
]

# Get all matches
matches = sb.matches(competition_id=43, season_id=106)
print(f"Total World Cup 2022 matches: {len(matches)}")

# Find the final
final = matches[matches['match_id'] == 3869685].iloc[0]
print(f"\nFinal: {final['home_team']} vs {final['away_team']}")
print(f"Score: {final['home_score']} - {final['away_score']}")

# Load all events from the final
events = sb.events(match_id=3869685)
print(f"\nTotal events in match: {len(events)}")
            

                library(StatsBombR)
library(tidyverse)

# Load World Cup 2022 matches
competitions <- FreeCompetitions()
world_cup <- competitions %>%
  filter(competition_name == "FIFA World Cup", season_name == "2022")

# Get all matches
matches <- FreeMatches(world_cup)
cat(sprintf("Total World Cup 2022 matches: %d\n", nrow(matches)))

# Find the final
final <- matches %>% filter(match_id == 3869685)
cat(sprintf("\nFinal: %s vs %s\n", final$home_team.home_team_name,
            final$away_team.away_team_name))
cat(sprintf("Score: %d - %d\n", final$home_score, final$away_score))

# Load all events from the final
events <- get.matchFree(final)
cat(sprintf("\nTotal events in match: %d\n", nrow(events)))
            

Output

Total World Cup 2022 matches: 64

Final: Argentina vs France
Score: 3 - 3 (Argentina wins on penalties)

Total events in match: 3847

Step 2: Analyze Shots and Expected Goals

shot_analysis.py

                # Filter for shots only
shots = events[events['type'] == 'Shot'].copy()
print(f"Total shots in the match: {len(shots)}")

# Calculate shot statistics by team
shot_stats = shots.groupby('team').agg(
    total_shots=('type', 'count'),
    goals=('shot_outcome', lambda x: (x == 'Goal').sum()),
    total_xG=('shot_statsbomb_xg', 'sum'),
    shots_on_target=('shot_outcome', lambda x: x.isin(['Goal', 'Saved']).sum()),
    avg_xG_per_shot=('shot_statsbomb_xg', 'mean')
).round(2)

print("\n=== Shot Analysis: World Cup 2022 Final ===")
print(shot_stats)

# Calculate xG difference (goals - xG)
for team in shot_stats.index:
    goals = shot_stats.loc[team, 'goals']
    xG = shot_stats.loc[team, 'total_xG']
    diff = goals - xG
    print(f"\n{team}: {goals} goals from {xG:.2f} xG ({'+' if diff > 0 else ''}{diff:.2f})")
            

                # Filter for shots only
shots <- events %>% filter(type.name == "Shot")
cat(sprintf("Total shots in the match: %d\n", nrow(shots)))

# Calculate shot statistics by team
shot_stats <- shots %>%
  group_by(team.name) %>%
  summarise(
    total_shots = n(),
    goals = sum(shot.outcome.name == "Goal", na.rm = TRUE),
    total_xG = sum(shot.statsbomb_xg, na.rm = TRUE),
    shots_on_target = sum(shot.outcome.name %in% c("Goal", "Saved"), na.rm = TRUE),
    avg_xG_per_shot = mean(shot.statsbomb_xg, na.rm = TRUE)
  ) %>%
  mutate(across(where(is.numeric), ~round(., 2)))

cat("\n=== Shot Analysis: World Cup 2022 Final ===\n")
print(shot_stats)

# Calculate xG difference
shot_stats %>%
  mutate(xG_diff = goals - total_xG) %>%
  select(team.name, goals, total_xG, xG_diff) %>%
  print()
            

Output

=== Shot Analysis: World Cup 2022 Final ===
              total_shots  goals  total_xG  shots_on_target  avg_xG_per_shot
team
Argentina              21      3      2.77               10             0.13
France                 20      3      2.44                9             0.12

Argentina: 3 goals from 2.77 xG (+0.23)
France: 3 goals from 2.44 xG (+0.56)

Step 3: Visualize the Shots on a Pitch

shot_map.py

                from mplsoccer import VerticalPitch
import matplotlib.pyplot as plt

# Extract shot coordinates
shots['x'] = shots['location'].apply(lambda loc: loc[0])
shots['y'] = shots['location'].apply(lambda loc: loc[1])
shots['is_goal'] = shots['shot_outcome'] == 'Goal'

# Create figure with two pitches (one per team)
fig, axes = plt.subplots(1, 2, figsize=(16, 10))

teams = ['Argentina', 'France']
colors = {'Argentina': '#75AADB', 'France': '#002654'}

for idx, team in enumerate(teams):
    pitch = VerticalPitch(
        pitch_type='statsbomb',
        half=True,
        pitch_color='#22312b',
        line_color='white'
    )
    pitch.draw(ax=axes[idx])

    team_shots = shots[shots['team'] == team]

    # Plot non-goals
    non_goals = team_shots[~team_shots['is_goal']]
    pitch.scatter(
        non_goals['x'], non_goals['y'],
        s=non_goals['shot_statsbomb_xg'] * 500 + 50,
        c=colors[team], alpha=0.5,
        edgecolors='white', linewidth=1,
        ax=axes[idx], label='Shot'
    )

    # Plot goals
    goals = team_shots[team_shots['is_goal']]
    pitch.scatter(
        goals['x'], goals['y'],
        s=goals['shot_statsbomb_xg'] * 500 + 50,
        c='#FFD700', alpha=1,
        edgecolors='white', linewidth=2,
        marker='*', ax=axes[idx], label='Goal'
    )

    # Add title with stats
    team_xg = team_shots['shot_statsbomb_xg'].sum()
    team_goals = team_shots['is_goal'].sum()
    axes[idx].set_title(
        f"{team}\n{team_goals} Goals | {team_xg:.2f} xG",
        fontsize=14, fontweight='bold', color='white'
    )
    axes[idx].legend(loc='lower right')

plt.suptitle('World Cup 2022 Final - Shot Map', fontsize=16, fontweight='bold', y=1.02)
fig.patch.set_facecolor('#22312b')
plt.tight_layout()
plt.savefig('world_cup_final_shots.png', dpi=150, bbox_inches='tight',
            facecolor='#22312b', edgecolor='none')
plt.show()
            

                library(ggsoccer)
library(ggplot2)

# Prepare shot data
shots_plot <- shots %>%
  mutate(
    is_goal = shot.outcome.name == "Goal",
    xG = shot.statsbomb_xg
  )

# Create shot map
ggplot(shots_plot) +
  annotate_pitch(colour = "white", fill = "#22312b") +
  geom_point(
    aes(x = location.x, y = location.y,
        size = xG,
        color = is_goal,
        shape = is_goal),
    alpha = 0.7
  ) +
  scale_color_manual(
    values = c("FALSE" = "#75AADB", "TRUE" = "#FFD700"),
    labels = c("Shot", "Goal")
  ) +
  scale_shape_manual(values = c("FALSE" = 16, "TRUE" = 18)) +
  scale_size_continuous(range = c(2, 10)) +
  coord_flip(xlim = c(60, 120)) +
  facet_wrap(~team.name) +
  theme_pitch() +
  theme(
    plot.background = element_rect(fill = "#22312b"),
    strip.text = element_text(color = "white", size = 12, face = "bold"),
    legend.position = "bottom",
    legend.text = element_text(color = "white"),
    legend.title = element_text(color = "white")
  ) +
  labs(
    title = "World Cup 2022 Final - Shot Map",
    subtitle = "Argentina 3-3 France (Argentina wins on penalties)",
    size = "xG",
    color = "Outcome"
  )

ggsave("world_cup_final_shots.png", width = 12, height = 8, dpi = 150)
            

What This Analysis Tells Us

The shot map reveals several insights about the World Cup Final:

Argentina's volume: More shots, more central locations, consistent threat
France's efficiency: Mbappé's hat-trick came from fewer, but high-quality chances
The xG story: 2.77 vs 2.44 xG suggests Argentina created slightly better chances overall
Both teams finished well: Each scored more than their xG suggested (clinical finishing)

1.7 Creating Advanced Visualizations

Now let's create more sophisticated visualizations that you'll use throughout your analytics career. We'll build an xG timeline, passing statistics chart, and player comparison radar.

xG Timeline Chart

An xG timeline shows how expected goals accumulate throughout a match. This reveals momentum shifts, key moments, and which team controlled the game at different phases.

script

from statsbombpy import sb
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Load World Cup Final data
events = sb.events(match_id=3869685)
shots = events[events["type"] == "Shot"].copy()

# Create xG timeline data
shots = shots.sort_values(["minute", "second"])
shots["is_goal"] = shots["shot_outcome"] == "Goal"

# Calculate cumulative xG for each team
argentina_shots = shots[shots["team"] == "Argentina"].copy()
france_shots = shots[shots["team"] == "France"].copy()

argentina_shots["cumulative_xG"] = argentina_shots["shot_statsbomb_xg"].cumsum()
france_shots["cumulative_xG"] = france_shots["shot_statsbomb_xg"].cumsum()

# Create the plot
fig, ax = plt.subplots(figsize=(14, 7))

# Starting points
ax.plot([0], [0], "o", color="#75AADB", markersize=0)
ax.plot([0], [0], "o", color="#002654", markersize=0)

# Argentina xG line (step plot)
arg_minutes = [0] + argentina_shots["minute"].tolist()
arg_xg = [0] + argentina_shots["cumulative_xG"].tolist()
ax.step(arg_minutes, arg_xg, where="post", linewidth=2.5,
        color="#75AADB", label="Argentina", alpha=0.9)

# France xG line
fra_minutes = [0] + france_shots["minute"].tolist()
fra_xg = [0] + france_shots["cumulative_xG"].tolist()
ax.step(fra_minutes, fra_xg, where="post", linewidth=2.5,
        color="#002654", label="France", alpha=0.9)

# Mark goals with stars
arg_goals = argentina_shots[argentina_shots["is_goal"]]
fra_goals = france_shots[france_shots["is_goal"]]

ax.scatter(arg_goals["minute"], arg_goals["cumulative_xG"],
           marker="*", s=300, color="#75AADB", edgecolors="gold",
           linewidth=2, zorder=5)
ax.scatter(fra_goals["minute"], fra_goals["cumulative_xG"],
           marker="*", s=300, color="#002654", edgecolors="gold",
           linewidth=2, zorder=5)

# Add period markers
ax.axvline(x=45, color="gray", linestyle="--", alpha=0.5)
ax.axvline(x=90, color="gray", linestyle="--", alpha=0.5)
ax.axvline(x=105, color="gray", linestyle=":", alpha=0.5)
ax.text(45, ax.get_ylim()[1], "HT", ha="center", fontsize=9)
ax.text(90, ax.get_ylim()[1], "FT", ha="center", fontsize=9)
ax.text(105, ax.get_ylim()[1], "ET", ha="center", fontsize=9)

# Styling
ax.set_xlabel("Minute", fontsize=12)
ax.set_ylabel("Cumulative xG", fontsize=12)
ax.set_title("xG Timeline: World Cup 2022 Final\nArgentina 3-3 France",
             fontsize=14, fontweight="bold")
ax.legend(loc="upper left", fontsize=11)
ax.set_xlim(0, 125)
ax.set_ylim(0, max(arg_xg[-1], fra_xg[-1]) + 0.5)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig("xg_timeline.png", dpi=150, bbox_inches="tight")
plt.show()

library(tidyverse)
library(StatsBombR)

# Load World Cup Final data
events <- get.matchFree(data.frame(match_id = 3869685))
shots <- events %>% filter(type.name == "Shot")

# Create xG timeline data
xg_timeline <- shots %>%
  arrange(minute, second) %>%
  group_by(team.name) %>%
  mutate(
    cumulative_xG = cumsum(shot.statsbomb_xg),
    is_goal = shot.outcome.name == "Goal"
  ) %>%
  ungroup()

# Add starting point (0,0) for each team
start_points <- tibble(
  team.name = c("Argentina", "France"),
  minute = c(0, 0),
  cumulative_xG = c(0, 0),
  is_goal = c(FALSE, FALSE)
)

xg_timeline <- bind_rows(start_points, xg_timeline)

# Create the xG timeline plot
ggplot(xg_timeline, aes(x = minute, y = cumulative_xG, color = team.name)) +
  # xG accumulation lines
  geom_step(linewidth = 1.5, alpha = 0.8) +
  # Goal markers
  geom_point(
    data = filter(xg_timeline, is_goal == TRUE),
    aes(shape = team.name),
    size = 5, stroke = 2
  ) +
  # Styling
  scale_color_manual(values = c("Argentina" = "#75AADB", "France" = "#002654")) +
  scale_x_continuous(breaks = seq(0, 120, 15), limits = c(0, 125)) +
  # Add halftime and fulltime lines
  geom_vline(xintercept = c(45, 90), linetype = "dashed", alpha = 0.5) +
  annotate("text", x = 45, y = max(xg_timeline$cumulative_xG) + 0.2,
           label = "HT", size = 3) +
  annotate("text", x = 90, y = max(xg_timeline$cumulative_xG) + 0.2,
           label = "FT", size = 3) +
  labs(
    title = "xG Timeline: World Cup 2022 Final",
    subtitle = "Argentina 3-3 France (Argentina wins on penalties)",
    x = "Minute",
    y = "Cumulative xG",
    color = "Team",
    shape = "Team"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "bottom",
    panel.grid.minor = element_blank()
  )

ggsave("xg_timeline.png", width = 12, height = 6, dpi = 150)

Passing Statistics Bar Chart

Comparing team passing statistics helps understand playing styles. Let's create a professional bar chart comparing key passing metrics.

script

from statsbombpy import sb
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Load match data
events = sb.events(match_id=3869685)
passes = events[events["type"] == "Pass"].copy()

# Calculate passing statistics by team
def calc_pass_stats(team_passes):
    return {
        "Total Passes": len(team_passes),
        "Completion %": (team_passes["pass_outcome"].isna().sum() / len(team_passes)) * 100,
        "Progressive": team_passes["pass_progressive"].sum() if "pass_progressive" in team_passes else 0,
        "Final Third": (team_passes["pass_end_location"].apply(
            lambda x: x[0] > 80 if isinstance(x, list) else False
        ).sum()),
        "Key Passes": (team_passes["pass_shot_assist"].fillna(False).sum() +
                       team_passes["pass_goal_assist"].fillna(False).sum())
    }

argentina_stats = calc_pass_stats(passes[passes["team"] == "Argentina"])
france_stats = calc_pass_stats(passes[passes["team"] == "France"])

# Prepare data for plotting
metrics = list(argentina_stats.keys())
arg_values = list(argentina_stats.values())
fra_values = list(france_stats.values())

# Create grouped bar chart
x = np.arange(len(metrics))
width = 0.35

fig, ax = plt.subplots(figsize=(14, 7))

bars1 = ax.bar(x - width/2, arg_values, width, label="Argentina",
               color="#75AADB", edgecolor="white", linewidth=1.5)
bars2 = ax.bar(x + width/2, fra_values, width, label="France",
               color="#002654", edgecolor="white", linewidth=1.5)

# Add value labels on bars
def add_labels(bars):
    for bar in bars:
        height = bar.get_height()
        ax.annotate(f"{height:.1f}",
                    xy=(bar.get_x() + bar.get_width() / 2, height),
                    xytext=(0, 3),
                    textcoords="offset points",
                    ha="center", va="bottom", fontsize=10)

add_labels(bars1)
add_labels(bars2)

# Styling
ax.set_xlabel("Metric", fontsize=12)
ax.set_ylabel("Value", fontsize=12)
ax.set_title("Passing Comparison: World Cup 2022 Final\n" +
             "Argentina dominated possession but France remained dangerous",
             fontsize=14, fontweight="bold")
ax.set_xticks(x)
ax.set_xticklabels(metrics, fontsize=11)
ax.legend(fontsize=11)
ax.grid(axis="y", alpha=0.3)
ax.set_axisbelow(True)

plt.tight_layout()
plt.savefig("passing_comparison.png", dpi=150, bbox_inches="tight")
plt.show()

library(tidyverse)
library(StatsBombR)

# Load match data
events <- get.matchFree(data.frame(match_id = 3869685))

# Calculate passing statistics by team
pass_stats <- events %>%
  filter(type.name == "Pass") %>%
  group_by(team.name) %>%
  summarise(
    total_passes = n(),
    successful_passes = sum(is.na(pass.outcome.name) | pass.outcome.name == "Complete"),
    pass_completion = successful_passes / total_passes * 100,
    progressive_passes = sum(pass.progressive == TRUE, na.rm = TRUE),
    passes_final_third = sum(pass.end_location.x > 80, na.rm = TRUE),
    key_passes = sum(pass.shot_assist == TRUE | pass.goal_assist == TRUE, na.rm = TRUE),
    crosses = sum(pass.cross == TRUE, na.rm = TRUE),
    long_balls = sum(pass.length > 30, na.rm = TRUE)
  ) %>%
  pivot_longer(
    cols = c(total_passes, pass_completion, progressive_passes,
             passes_final_third, key_passes),
    names_to = "metric",
    values_to = "value"
  )

# Create comparison bar chart
ggplot(pass_stats, aes(x = metric, y = value, fill = team.name)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.8), width = 0.7) +
  geom_text(
    aes(label = round(value, 1)),
    position = position_dodge(width = 0.8),
    vjust = -0.5, size = 3.5
  ) +
  scale_fill_manual(values = c("Argentina" = "#75AADB", "France" = "#002654")) +
  scale_x_discrete(
    labels = c(
      "total_passes" = "Total\nPasses",
      "pass_completion" = "Completion\n%",
      "progressive_passes" = "Progressive\nPasses",
      "passes_final_third" = "Final Third\nPasses",
      "key_passes" = "Key\nPasses"
    )
  ) +
  labs(
    title = "Passing Comparison: World Cup 2022 Final",
    subtitle = "Argentina dominated possession but France remained dangerous",
    x = "",
    y = "Value",
    fill = "Team"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    axis.text.x = element_text(size = 10),
    legend.position = "top",
    panel.grid.major.x = element_blank()
  )

ggsave("passing_comparison.png", width = 12, height = 7, dpi = 150)

Player Performance Radar Chart

Radar charts (also called spider charts) are excellent for comparing players across multiple dimensions simultaneously. Let's compare Messi and Mbappé from the final.

script

from statsbombpy import sb
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from math import pi

# Load match data
events = sb.events(match_id=3869685)

# Filter for Messi and Mbappe
players = ["Lionel Andrés Messi Cuccittini", "Kylian Mbappé Lottin"]
player_events = events[events["player"].isin(players)]

# Calculate stats for each player
def calc_player_stats(player_name, all_events):
    pe = all_events[all_events["player"] == player_name]
    shots = pe[pe["type"] == "Shot"]
    passes = pe[pe["type"] == "Pass"]
    dribbles = pe[pe["type"] == "Dribble"]

    return {
        "Shots": len(shots),
        "xG": shots["shot_statsbomb_xg"].sum(),
        "Goals": (shots["shot_outcome"] == "Goal").sum(),
        "Key Passes": (passes["pass_shot_assist"].fillna(False).sum() +
                       passes["pass_goal_assist"].fillna(False).sum()),
        "Dribbles": (dribbles["dribble_outcome"] == "Complete").sum(),
        "Touches": len(pe)
    }

messi_stats = calc_player_stats(players[0], events)
mbappe_stats = calc_player_stats(players[1], events)

# Normalize to percentages (max across both players = 100)
categories = list(messi_stats.keys())
messi_values = list(messi_stats.values())
mbappe_values = list(mbappe_stats.values())

# Normalize
max_values = [max(m, mb) for m, mb in zip(messi_values, mbappe_values)]
messi_norm = [v/mx*100 if mx > 0 else 0 for v, mx in zip(messi_values, max_values)]
mbappe_norm = [v/mx*100 if mx > 0 else 0 for v, mx in zip(mbappe_values, max_values)]

# Create radar chart
fig, ax = plt.subplots(figsize=(10, 10), subplot_kw=dict(polar=True))

# Calculate angles for each category
angles = [n / float(len(categories)) * 2 * pi for n in range(len(categories))]
angles += angles[:1]  # Complete the loop

# Add data (closing the loop)
messi_norm += messi_norm[:1]
mbappe_norm += mbappe_norm[:1]

# Plot
ax.plot(angles, messi_norm, "o-", linewidth=2.5, color="#75AADB", label="Messi")
ax.fill(angles, messi_norm, alpha=0.25, color="#75AADB")

ax.plot(angles, mbappe_norm, "o-", linewidth=2.5, color="#002654", label="Mbappé")
ax.fill(angles, mbappe_norm, alpha=0.25, color="#002654")

# Set category labels
ax.set_xticks(angles[:-1])
ax.set_xticklabels(categories, fontsize=12)

# Add actual values as annotations
for i, (angle, m_val, mb_val) in enumerate(zip(angles[:-1], messi_values, mbappe_values)):
    ax.annotate(f"{m_val:.1f}", xy=(angle, messi_norm[i]+8),
                ha="center", fontsize=9, color="#75AADB")
    ax.annotate(f"{mb_val:.1f}", xy=(angle, mbappe_norm[i]-12),
                ha="center", fontsize=9, color="#002654")

ax.set_title("Messi vs Mbappé\nWorld Cup 2022 Final Performance",
             fontsize=14, fontweight="bold", y=1.08)
ax.legend(loc="upper right", bbox_to_anchor=(1.15, 1.1), fontsize=11)

plt.tight_layout()
plt.savefig("player_radar.png", dpi=150, bbox_inches="tight")
plt.show()

# Print raw stats
print("\nRaw Statistics:")
print(f"Messi: {messi_stats}")
print(f"Mbappé: {mbappe_stats}")

library(tidyverse)
library(StatsBombR)
library(fmsb)

# Load match data
events <- get.matchFree(data.frame(match_id = 3869685))

# Calculate player stats
player_stats <- events %>%
  group_by(player.name) %>%
  summarise(
    shots = sum(type.name == "Shot"),
    xG = sum(shot.statsbomb_xg, na.rm = TRUE),
    goals = sum(shot.outcome.name == "Goal", na.rm = TRUE),
    passes = sum(type.name == "Pass"),
    pass_completion = sum(type.name == "Pass" & is.na(pass.outcome.name)) /
                      sum(type.name == "Pass") * 100,
    key_passes = sum(pass.shot_assist == TRUE | pass.goal_assist == TRUE, na.rm = TRUE),
    dribbles = sum(type.name == "Dribble"),
    successful_dribbles = sum(type.name == "Dribble" & dribble.outcome.name == "Complete", na.rm = TRUE),
    touches = n()
  ) %>%
  filter(player.name %in% c("Lionel Andrés Messi Cuccittini", "Kylian Mbappé Lottin"))

# Prepare radar data - normalize to 0-100 scale
radar_data <- player_stats %>%
  select(player.name, shots, xG, goals, key_passes, successful_dribbles, touches) %>%
  pivot_longer(-player.name, names_to = "metric", values_to = "value") %>%
  group_by(metric) %>%
  mutate(normalized = value / max(value) * 100) %>%
  select(player.name, metric, normalized) %>%
  pivot_wider(names_from = metric, values_from = normalized)

# Create radar chart with fmsb
# Add max and min rows required by fmsb
radar_df <- rbind(
  rep(100, 6),  # max
  rep(0, 6),    # min
  radar_data %>% filter(str_detect(player.name, "Messi")) %>% select(-player.name),
  radar_data %>% filter(str_detect(player.name, "Mbappé")) %>% select(-player.name)
)

colnames(radar_df) <- c("Shots", "xG", "Goals", "Key Passes", "Dribbles", "Touches")

# Plot
colors <- c("#75AADB", "#002654")
png("player_radar.png", width = 800, height = 600, res = 150)
radarchart(
  radar_df,
  axistype = 1,
  pcol = colors,
  pfcol = alpha(colors, 0.3),
  plwd = 3,
  plty = 1,
  cglcol = "grey",
  cglty = 1,
  axislabcol = "grey40",
  vlcex = 0.9,
  title = "Messi vs Mbappé - World Cup Final Performance"
)
legend("topright", legend = c("Messi", "Mbappé"),
       col = colors, lwd = 3, bty = "n")
dev.off()

Heat Map Visualization

Heat maps show where players or teams concentrate their activity on the pitch. This is crucial for understanding positioning and tactical tendencies.

script

from statsbombpy import sb
import pandas as pd
import matplotlib.pyplot as plt
from mplsoccer import Pitch, VerticalPitch
import numpy as np
from scipy.stats import gaussian_kde

# Load match data
events = sb.events(match_id=3869685)

# Get Messi events with location
messi_events = events[events["player"].str.contains("Messi", na=False)].copy()
messi_events = messi_events[messi_events["location"].notna()]

# Extract x, y coordinates
messi_events["x"] = messi_events["location"].apply(lambda loc: loc[0])
messi_events["y"] = messi_events["location"].apply(lambda loc: loc[1])

# Create pitch
pitch = Pitch(pitch_type="statsbomb", pitch_color="#22312b",
              line_color="white", linewidth=1)
fig, ax = pitch.draw(figsize=(12, 8))

# Create heat map using kernel density estimation
pitch.kdeplot(
    messi_events["x"], messi_events["y"],
    ax=ax,
    cmap="YlOrRd",
    shade=True,
    shade_lowest=False,
    n_levels=25,
    alpha=0.7
)

# Add scatter points
pitch.scatter(
    messi_events["x"], messi_events["y"],
    ax=ax,
    s=20, color="white", alpha=0.3, edgecolors="none"
)

ax.set_title("Messi Touch Heat Map - World Cup 2022 Final\n" +
             "Density of all touches throughout the match",
             fontsize=14, fontweight="bold", color="white", y=1.02)

fig.patch.set_facecolor("#22312b")
plt.tight_layout()
plt.savefig("messi_heatmap.png", dpi=150, bbox_inches="tight",
            facecolor="#22312b", edgecolor="none")
plt.show()

library(tidyverse)
library(StatsBombR)
library(ggsoccer)

# Load match data
events <- get.matchFree(data.frame(match_id = 3869685))

# Get Messi touches
messi_touches <- events %>%
  filter(str_detect(player.name, "Messi")) %>%
  filter(!is.na(location.x))

# Create heat map
ggplot(messi_touches, aes(x = location.x, y = location.y)) +
  annotate_pitch(colour = "white", fill = "#22312b") +
  stat_density_2d(
    aes(fill = after_stat(level)),
    geom = "polygon",
    alpha = 0.7,
    bins = 10
  ) +
  scale_fill_gradient(low = "#75AADB", high = "#FFD700") +
  geom_point(alpha = 0.3, color = "white", size = 1) +
  coord_flip(xlim = c(0, 120), ylim = c(0, 80)) +
  theme_pitch() +
  theme(
    plot.background = element_rect(fill = "#22312b"),
    plot.title = element_text(color = "white", face = "bold", size = 14),
    plot.subtitle = element_text(color = "white"),
    legend.position = "none"
  ) +
  labs(
    title = "Messi Touch Heat Map - World Cup 2022 Final",
    subtitle = "Density of all touches throughout the match"
  )

ggsave("messi_heatmap.png", width = 12, height = 8, dpi = 150)

1.8 Practice Exercises

Now it's your turn to practice. Complete these exercises to solidify your understanding of the concepts covered in this chapter.

Exercise 1.1: Load Different Match Data

Task: Load data from a different World Cup 2022 match (e.g., Brazil vs Croatia quarter-final) and calculate basic shot statistics for both teams.

Steps:

Find the match_id for Brazil vs Croatia
Load all events from that match
Filter for shots only
Calculate total shots, shots on target, and total xG for each team

script

# Exercise 1.1 Solution
from statsbombpy import sb
import pandas as pd

# Load World Cup matches
matches = sb.matches(competition_id=43, season_id=106)

# Find Brazil vs Croatia
bra_cro = matches[
    ((matches["home_team"] == "Brazil") & (matches["away_team"] == "Croatia")) |
    ((matches["home_team"] == "Croatia") & (matches["away_team"] == "Brazil"))
].iloc[0]

print(f"Match ID: {bra_cro['match_id']}")
print(f"Score: {bra_cro['home_score']} - {bra_cro['away_score']}")

# Load events
events = sb.events(match_id=bra_cro["match_id"])

# Shot analysis
shots = events[events["type"] == "Shot"]
shot_stats = shots.groupby("team").agg(
    total_shots=("type", "count"),
    shots_on_target=("shot_outcome", lambda x: x.isin(["Goal", "Saved"]).sum()),
    goals=("shot_outcome", lambda x: (x == "Goal").sum()),
    total_xG=("shot_statsbomb_xg", "sum")
).round(2)

print("\nShot Statistics:")
print(shot_stats)

# Exercise 1.1 Solution
library(StatsBombR)
library(tidyverse)

# Load World Cup matches
matches <- FreeMatches(Competitions = FreeCompetitions() %>%
  filter(competition_name == "FIFA World Cup", season_name == "2022"))

# Find Brazil vs Croatia
bra_cro <- matches %>%
  filter((home_team.home_team_name == "Brazil" & away_team.away_team_name == "Croatia") |
         (home_team.home_team_name == "Croatia" & away_team.away_team_name == "Brazil"))

cat(sprintf("Match ID: %d\n", bra_cro$match_id))
cat(sprintf("Score: %d - %d\n", bra_cro$home_score, bra_cro$away_score))

# Load events
events <- get.matchFree(bra_cro)

# Shot analysis
shot_stats <- events %>%
  filter(type.name == "Shot") %>%
  group_by(team.name) %>%
  summarise(
    total_shots = n(),
    shots_on_target = sum(shot.outcome.name %in% c("Goal", "Saved"), na.rm = TRUE),
    goals = sum(shot.outcome.name == "Goal", na.rm = TRUE),
    total_xG = round(sum(shot.statsbomb_xg, na.rm = TRUE), 2)
  )

print(shot_stats)

Exercise 1.2: Create a Pass Map

Task: Create a pass map showing all successful passes by a specific player (e.g., Enzo Fernández) in the World Cup Final.

Requirements:

Filter for passes by the player
Show only successful passes
Draw arrows from pass start to end location
Color-code by pass type (progressive vs. normal)

script

# Exercise 1.2 Solution
from statsbombpy import sb
import matplotlib.pyplot as plt
from mplsoccer import Pitch

# Load World Cup Final
events = sb.events(match_id=3869685)

# Get Enzo Fernandez passes
enzo_passes = events[
    (events["player"].str.contains("Enzo", na=False)) &
    (events["type"] == "Pass") &
    (events["pass_outcome"].isna())  # Successful passes
].copy()

# Extract coordinates
enzo_passes["x"] = enzo_passes["location"].apply(lambda x: x[0])
enzo_passes["y"] = enzo_passes["location"].apply(lambda x: x[1])
enzo_passes["end_x"] = enzo_passes["pass_end_location"].apply(lambda x: x[0] if isinstance(x, list) else None)
enzo_passes["end_y"] = enzo_passes["pass_end_location"].apply(lambda x: x[1] if isinstance(x, list) else None)

# Check if progressive
enzo_passes["is_progressive"] = enzo_passes["pass_progressive"].fillna(False)

# Create pitch
pitch = Pitch(pitch_type="statsbomb", pitch_color="#22312b", line_color="white")
fig, ax = pitch.draw(figsize=(12, 8))

# Plot normal passes
normal = enzo_passes[~enzo_passes["is_progressive"]]
pitch.arrows(
    normal["x"], normal["y"], normal["end_x"], normal["end_y"],
    ax=ax, color="#75AADB", width=2, headwidth=6, headlength=5, alpha=0.7
)

# Plot progressive passes
progressive = enzo_passes[enzo_passes["is_progressive"]]
pitch.arrows(
    progressive["x"], progressive["y"], progressive["end_x"], progressive["end_y"],
    ax=ax, color="#FFD700", width=2, headwidth=6, headlength=5, alpha=0.9
)

ax.set_title("Enzo Fernández Pass Map - World Cup 2022 Final",
             fontsize=14, fontweight="bold", color="white")

# Add legend
ax.plot([], [], color="#75AADB", label="Normal Pass", linewidth=3)
ax.plot([], [], color="#FFD700", label="Progressive Pass", linewidth=3)
ax.legend(loc="lower right", facecolor="#22312b", labelcolor="white")

fig.patch.set_facecolor("#22312b")
plt.tight_layout()
plt.savefig("enzo_pass_map.png", dpi=150, bbox_inches="tight", facecolor="#22312b")
plt.show()

# Exercise 1.2 Solution
library(StatsBombR)
library(tidyverse)
library(ggsoccer)

# Load World Cup Final
events <- get.matchFree(data.frame(match_id = 3869685))

# Get Enzo Fernandez passes
enzo_passes <- events %>%
  filter(str_detect(player.name, "Enzo")) %>%
  filter(type.name == "Pass") %>%
  filter(is.na(pass.outcome.name) | pass.outcome.name == "Complete") %>%
  mutate(
    is_progressive = ifelse(is.na(pass.progressive), FALSE, pass.progressive)
  )

# Create pass map
ggplot(enzo_passes) +
  annotate_pitch(colour = "white", fill = "#22312b") +
  geom_segment(
    aes(x = location.x, y = location.y,
        xend = pass.end_location.x, yend = pass.end_location.y,
        color = is_progressive),
    arrow = arrow(length = unit(0.15, "cm"), type = "closed"),
    alpha = 0.7, linewidth = 0.8
  ) +
  scale_color_manual(
    values = c("FALSE" = "#75AADB", "TRUE" = "#FFD700"),
    labels = c("Normal Pass", "Progressive Pass")
  ) +
  coord_flip(xlim = c(0, 120), ylim = c(0, 80)) +
  theme_pitch() +
  theme(
    plot.background = element_rect(fill = "#22312b"),
    plot.title = element_text(color = "white", face = "bold"),
    legend.position = "bottom",
    legend.text = element_text(color = "white"),
    legend.title = element_blank()
  ) +
  labs(title = "Enzo Fernández Pass Map - World Cup 2022 Final")

ggsave("enzo_pass_map.png", width = 12, height = 8, dpi = 150)

Exercise 1.3: Team Comparison Dashboard

Task: Create a multi-panel visualization comparing Argentina and France across multiple metrics from the World Cup Final.

Panels to include:

Shot locations for both teams
xG timeline
Passing statistics bar chart
Key player statistics table

script

# Exercise 1.3 Solution - Multi-panel Dashboard
from statsbombpy import sb
import matplotlib.pyplot as plt
from mplsoccer import Pitch, VerticalPitch
import pandas as pd
import numpy as np

# Load data
events = sb.events(match_id=3869685)
shots = events[events["type"] == "Shot"].copy()
shots["x"] = shots["location"].apply(lambda x: x[0])
shots["y"] = shots["location"].apply(lambda x: x[1])
shots["is_goal"] = shots["shot_outcome"] == "Goal"

# Create figure with subplots
fig = plt.figure(figsize=(18, 14))

# Panel 1: Argentina Shots
ax1 = fig.add_subplot(2, 3, 1)
pitch = VerticalPitch(pitch_type="statsbomb", half=True, pitch_color="#22312b", line_color="white")
pitch.draw(ax=ax1)
arg_shots = shots[shots["team"] == "Argentina"]
pitch.scatter(arg_shots["x"], arg_shots["y"],
              s=arg_shots["shot_statsbomb_xg"]*500+50,
              c=["#FFD700" if g else "#75AADB" for g in arg_shots["is_goal"]],
              edgecolors="white", ax=ax1, alpha=0.8)
ax1.set_title("Argentina Shots", color="white", fontweight="bold")

# Panel 2: France Shots
ax2 = fig.add_subplot(2, 3, 2)
pitch.draw(ax=ax2)
fra_shots = shots[shots["team"] == "France"]
pitch.scatter(fra_shots["x"], fra_shots["y"],
              s=fra_shots["shot_statsbomb_xg"]*500+50,
              c=["#FFD700" if g else "#002654" for g in fra_shots["is_goal"]],
              edgecolors="white", ax=ax2, alpha=0.8)
ax2.set_title("France Shots", color="white", fontweight="bold")

# Panel 3: xG Timeline
ax3 = fig.add_subplot(2, 3, 3)
for team, color in [("Argentina", "#75AADB"), ("France", "#002654")]:
    team_shots = shots[shots["team"] == team].sort_values("minute")
    cum_xg = [0] + team_shots["shot_statsbomb_xg"].cumsum().tolist()
    minutes = [0] + team_shots["minute"].tolist()
    ax3.step(minutes, cum_xg, where="post", label=team, color=color, linewidth=2)
    goals = team_shots[team_shots["is_goal"]]
    if len(goals) > 0:
        goal_xg = team_shots["shot_statsbomb_xg"].cumsum()
        ax3.scatter(goals["minute"], goal_xg[goals.index],
                    marker="*", s=200, color=color, edgecolors="gold", linewidth=1.5, zorder=5)
ax3.set_xlabel("Minute")
ax3.set_ylabel("Cumulative xG")
ax3.set_title("xG Timeline", fontweight="bold")
ax3.legend()
ax3.grid(True, alpha=0.3)

# Panel 4-5: Team Statistics
ax4 = fig.add_subplot(2, 3, (4, 5))
stats = []
for team in ["Argentina", "France"]:
    team_events = events[events["team"] == team]
    team_shots = shots[shots["team"] == team]
    team_passes = events[(events["team"] == team) & (events["type"] == "Pass")]
    stats.append({
        "Team": team,
        "Shots": len(team_shots),
        "Goals": team_shots["is_goal"].sum(),
        "xG": round(team_shots["shot_statsbomb_xg"].sum(), 2),
        "Passes": len(team_passes),
        "Pass %": round(team_passes["pass_outcome"].isna().mean() * 100, 1)
    })
stats_df = pd.DataFrame(stats)
ax4.axis("off")
table = ax4.table(cellText=stats_df.values, colLabels=stats_df.columns,
                   cellLoc="center", loc="center")
table.auto_set_font_size(False)
table.set_fontsize(12)
table.scale(1.2, 2)
ax4.set_title("Match Statistics", fontweight="bold", y=0.7)

# Overall title
fig.suptitle("World Cup 2022 Final Dashboard\nArgentina 3-3 France (Argentina wins on penalties)",
             fontsize=16, fontweight="bold", y=0.98)
fig.patch.set_facecolor("#f5f5f5")
plt.tight_layout(rect=[0, 0.03, 1, 0.95])
plt.savefig("match_dashboard.png", dpi=150, bbox_inches="tight")
plt.show()

# Exercise 1.3 Solution - Multi-panel Dashboard
library(StatsBombR)
library(tidyverse)
library(ggsoccer)
library(patchwork)

# Load data
events <- get.matchFree(data.frame(match_id = 3869685))
shots <- events %>% filter(type.name == "Shot")
passes <- events %>% filter(type.name == "Pass")

# Panel 1: Shot Map (Argentina)
p1 <- ggplot(filter(shots, team.name == "Argentina")) +
  annotate_pitch(colour = "white", fill = "#22312b") +
  geom_point(
    aes(x = location.x, y = location.y,
        size = shot.statsbomb_xg,
        color = shot.outcome.name == "Goal"),
    alpha = 0.7
  ) +
  scale_color_manual(values = c("FALSE" = "#75AADB", "TRUE" = "#FFD700")) +
  coord_flip(xlim = c(60, 120)) +
  theme_pitch() +
  theme(legend.position = "none", plot.background = element_rect(fill = "#22312b"),
        plot.title = element_text(color = "white", face = "bold", hjust = 0.5)) +
  labs(title = "Argentina Shots")

# Panel 2: Shot Map (France)
p2 <- ggplot(filter(shots, team.name == "France")) +
  annotate_pitch(colour = "white", fill = "#22312b") +
  geom_point(
    aes(x = location.x, y = location.y,
        size = shot.statsbomb_xg,
        color = shot.outcome.name == "Goal"),
    alpha = 0.7
  ) +
  scale_color_manual(values = c("FALSE" = "#002654", "TRUE" = "#FFD700")) +
  coord_flip(xlim = c(60, 120)) +
  theme_pitch() +
  theme(legend.position = "none", plot.background = element_rect(fill = "#22312b"),
        plot.title = element_text(color = "white", face = "bold", hjust = 0.5)) +
  labs(title = "France Shots")

# Panel 3: xG Timeline
xg_data <- shots %>%
  arrange(minute) %>%
  group_by(team.name) %>%
  mutate(cumulative_xG = cumsum(shot.statsbomb_xg)) %>%
  ungroup()

p3 <- ggplot(xg_data, aes(x = minute, y = cumulative_xG, color = team.name)) +
  geom_step(linewidth = 1.5) +
  geom_point(data = filter(xg_data, shot.outcome.name == "Goal"),
             aes(shape = team.name), size = 4) +
  scale_color_manual(values = c("Argentina" = "#75AADB", "France" = "#002654")) +
  theme_minimal() +
  theme(legend.position = "bottom") +
  labs(title = "xG Timeline", x = "Minute", y = "Cumulative xG", color = "")

# Panel 4: Stats Summary
stats_summary <- events %>%
  group_by(team.name) %>%
  summarise(
    Shots = sum(type.name == "Shot"),
    Goals = sum(shot.outcome.name == "Goal", na.rm = TRUE),
    xG = round(sum(shot.statsbomb_xg, na.rm = TRUE), 2),
    Passes = sum(type.name == "Pass"),
    `Pass %` = round(sum(type.name == "Pass" & is.na(pass.outcome.name)) /
                     sum(type.name == "Pass") * 100, 1)
  )

library(gridExtra)
p4 <- tableGrob(stats_summary, rows = NULL)

# Combine panels
dashboard <- (p1 | p2) / (p3) + plot_annotation(
  title = "World Cup 2022 Final Dashboard",
  subtitle = "Argentina 3-3 France (Argentina wins on penalties)",
  theme = theme(
    plot.title = element_text(size = 16, face = "bold"),
    plot.subtitle = element_text(size = 12)
  )
)

ggsave("match_dashboard.png", dashboard, width = 16, height = 12, dpi = 150)

1.9 Summary

In this chapter, you learned:

Key Concepts

What soccer analytics is and its three pillars
The history and evolution of football analytics
Types of football data (event, tracking, aggregate)
How Expected Goals (xG) measures shot quality

Technical Skills

Setting up Python or R for football analytics
Loading data from StatsBomb
Calculating basic shot statistics
Creating a shot map visualization

What's Next

In Chapter 2: Data Wrangling for Football, we'll dive deeper into working with football data—handling different coordinate systems, dealing with missing data, and transforming raw events into analysis-ready datasets.

Capstone - Complete Analytics System

Learning Objectives

1.1 What is Soccer Analytics?

The Three Pillars of Football Analytics

Performance Analysis

Recruitment & Scouting

Tactical Analysis

Why Analytics Matters in Modern Football

Key Insight

1.2 The Analytics Revolution in Football

Timeline of Key Developments

1990s - Early Pioneers

2000s - Data Collection Expands

2012 - Expected Goals Emerges

2017 - StatsBomb Open Data

2018-Present - Mainstream Adoption

Case Study: Leicester City's 2015-16 Title

1.3 Questions Analytics Can Answer

Player Evaluation

Team Analysis

Recruitment

Tactical Insights

1.4 Types of Football Data

1. Event Data

2. Tracking Data

Tracking Data Availability

3. Aggregate Statistics

Data Comparison Table

1.5 Setting Up Your Development Environment

Python Setup (Recommended: Anaconda)

R Setup (Recommended: RStudio)

Recommended Development Setup

1.6 Your First Football Analysis

Step 1: Load the Match Data

Step 2: Analyze Shots and Expected Goals

Step 3: Visualize the Shots on a Pitch

What This Analysis Tells Us

1.7 Creating Advanced Visualizations

xG Timeline Chart

Passing Statistics Bar Chart

Player Performance Radar Chart

Heat Map Visualization

1.8 Practice Exercises

Exercise 1.1: Load Different Match Data

Show Solution

Exercise 1.2: Create a Pass Map

Show Solution

Exercise 1.3: Team Comparison Dashboard

Show Solution

1.9 Summary

Key Concepts

Technical Skills

What's Next

Key Takeaways

On This Page

Exercises

Chapter Info