Capstone - Complete Analytics System
- Understand spatial analysis concepts in football context
- Create and analyze pitch zone grids of varying granularity
- Build heat maps and kernel density estimates for player actions
- Calculate territorial control and dominance metrics
- Analyze passing patterns using spatial network analysis
- Implement Voronoi diagrams for space control visualization
- Use spatial clustering to identify action hotspots
- Compare spatial profiles across players and teams
The Spatial Dimension of Football
Football is fundamentally a spatial game. Teams compete for territorial control, create and exploit space, and organize defensively to deny opponents space. Spatial analytics provides the tools to quantify these dynamics and uncover patterns invisible to the naked eye.
Why Spatial Analysis?
Traditional statistics count events but ignore where they happen. A pass completion rate of 85% means different things if those passes are all in the defensive third versus the attacking third. Spatial analysis adds the crucial "where" dimension to football analytics.
Zone Analysis
Divide pitch into actionable zonesHeat Maps
Visualize action densityVoronoi
Space control regionsClustering
Identify action hotspots# Python: Introduction to Spatial Data in Football
import pandas as pd
import numpy as np
import geopandas as gpd
from shapely.geometry import Point
import matplotlib.pyplot as plt
# Load event data with locations
events = pd.read_csv("match_events.csv")
# Examine spatial distribution
spatial_summary = events.dropna(subset=["location_x", "location_y"]).agg({
"location_x": ["count", "min", "max", "mean"],
"location_y": ["min", "max", "mean"]
})
print("Spatial Summary:")
print(spatial_summary)
# Create GeoDataFrame for spatial operations
events_clean = events.dropna(subset=["location_x", "location_y"])
geometry = [Point(xy) for xy in zip(events_clean["location_x"],
events_clean["location_y"])]
events_gdf = gpd.GeoDataFrame(events_clean, geometry=geometry)
# Basic spatial plot
fig, ax = plt.subplots(figsize=(12, 8))
ax.set_facecolor("darkgreen")
ax.set_xlim(0, 120)
ax.set_ylim(0, 80)
# Plot events by type
for event_type in events_gdf["type"].unique()[:5]:
subset = events_gdf[events_gdf["type"] == event_type]
ax.scatter(subset["location_x"], subset["location_y"],
label=event_type, alpha=0.5, s=20)
ax.legend()
ax.set_title("Event Locations")
plt.show()# R: Introduction to Spatial Data in Football
library(tidyverse)
library(sf)
# Load event data with locations
events <- read_csv("match_events.csv")
# Examine spatial distribution
spatial_summary <- events %>%
filter(!is.na(location_x), !is.na(location_y)) %>%
summarize(
n_events = n(),
x_min = min(location_x),
x_max = max(location_x),
y_min = min(location_y),
y_max = max(location_y),
x_mean = mean(location_x),
y_mean = mean(location_y)
)
print(spatial_summary)
# Create spatial points object
events_sf <- events %>%
filter(!is.na(location_x), !is.na(location_y)) %>%
st_as_sf(coords = c("location_x", "location_y"))
# Basic spatial plot
ggplot() +
# Pitch outline
annotate("rect", xmin = 0, xmax = 120, ymin = 0, ymax = 80,
fill = "darkgreen", alpha = 0.3) +
geom_sf(data = events_sf, aes(color = type), alpha = 0.5, size = 1) +
coord_sf() +
theme_minimal() +
labs(title = "Event Locations")Spatial Summary:
location_x location_y
count 1623.000000
min 0.100000 0.100000
max 119.900000 79.900000
mean 60.234000 39.876000Pitch Zone Analysis
Dividing the pitch into zones allows aggregation and comparison of actions across different areas. The choice of zone system depends on your analysis goals.
| Zone System | Grid Size | Best For |
|---|---|---|
| Thirds | 3 zones (Defensive, Middle, Attacking) | High-level territorial analysis |
| 18-Zone | 6x3 grid | Tactical zone analysis |
| Juego de Posición | 5 vertical lanes, 4 horizontal bands | Positional play analysis |
| Fine Grid | 12x8 or finer | Heat maps, xG models |
# Python: Creating Pitch Zone Systems
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
def create_zone_grid(n_x=6, n_y=3, pitch_length=120, pitch_width=80):
"""Create a grid of pitch zones."""
zone_width = pitch_length / n_x
zone_height = pitch_width / n_y
zones = []
zone_id = 1
for x_zone in range(1, n_x + 1):
for y_zone in range(1, n_y + 1):
zones.append({
"zone_id": zone_id,
"x_zone": x_zone,
"y_zone": y_zone,
"x_min": (x_zone - 1) * zone_width,
"x_max": x_zone * zone_width,
"y_min": (y_zone - 1) * zone_height,
"y_max": y_zone * zone_height,
"x_center": (x_zone - 0.5) * zone_width,
"y_center": (y_zone - 0.5) * zone_height,
})
zone_id += 1
return pd.DataFrame(zones)
# Create 18-zone grid (6x3)
zones_18 = create_zone_grid(6, 3)
def assign_zones(events, zones):
"""Assign zone IDs to events based on location."""
n_x = zones["x_zone"].max()
n_y = zones["y_zone"].max()
events = events.copy()
events["x_zone"] = np.ceil(events["location_x"] / (120 / n_x)).clip(1, n_x).astype(int)
events["y_zone"] = np.ceil(events["location_y"] / (80 / n_y)).clip(1, n_y).astype(int)
events["zone_id"] = (events["x_zone"] - 1) * n_y + events["y_zone"]
return events
# Assign zones to events
events_zoned = assign_zones(events, zones_18)
# Calculate zone statistics
zone_stats = events_zoned.groupby("zone_id").agg(
n_events=("index", "count"),
n_passes=("type", lambda x: (x == "Pass").sum()),
n_shots=("type", lambda x: (x == "Shot").sum()),
).reset_index()
zone_stats = zone_stats.merge(zones_18, on="zone_id")
print(zone_stats)# R: Creating Pitch Zone Systems
library(tidyverse)
# Define zone systems
create_zone_grid <- function(n_x = 6, n_y = 3, pitch_length = 120, pitch_width = 80) {
zone_width <- pitch_length / n_x
zone_height <- pitch_width / n_y
zones <- expand_grid(
x_zone = 1:n_x,
y_zone = 1:n_y
) %>%
mutate(
zone_id = row_number(),
x_min = (x_zone - 1) * zone_width,
x_max = x_zone * zone_width,
y_min = (y_zone - 1) * zone_height,
y_max = y_zone * zone_height,
x_center = (x_min + x_max) / 2,
y_center = (y_min + y_max) / 2
)
return(zones)
}
# Create 18-zone grid (6x3)
zones_18 <- create_zone_grid(6, 3)
# Assign events to zones
assign_zones <- function(events, zones) {
events %>%
mutate(
x_zone = ceiling(location_x / (120 / max(zones$x_zone))),
y_zone = ceiling(location_y / (80 / max(zones$y_zone))),
zone_id = (x_zone - 1) * max(zones$y_zone) + y_zone
) %>%
# Clamp to valid zone range
mutate(
x_zone = pmin(pmax(x_zone, 1), max(zones$x_zone)),
y_zone = pmin(pmax(y_zone, 1), max(zones$y_zone))
)
}
# Assign zones to events
events_zoned <- assign_zones(events, zones_18)
# Calculate zone statistics
zone_stats <- events_zoned %>%
group_by(zone_id) %>%
summarize(
n_events = n(),
n_passes = sum(type == "Pass"),
n_shots = sum(type == "Shot"),
pass_success_rate = mean(type == "Pass" & pass_outcome == "Complete", na.rm = TRUE),
.groups = "drop"
) %>%
left_join(zones_18, by = "zone_id")
print(zone_stats) zone_id n_events n_passes n_shots x_zone y_zone x_min x_max ...
0 1 45 32 0 1 1 0.0 20.0 ...
1 2 67 48 0 1 2 0.0 20.0 ...
2 3 52 38 0 1 3 0.0 20.0 ...
3 4 89 62 0 2 1 20.0 40.0 ...
...Visualizing Zone Statistics
# Python: Visualize Zone Statistics on Pitch
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
from matplotlib.colors import LinearSegmentedColormap
import numpy as np
def plot_zone_heatmap(zone_stats, metric="n_events", title="Zone Activity"):
"""Plot zone statistics as a heatmap on the pitch."""
fig, ax = plt.subplots(figsize=(14, 9))
# Color normalization
values = zone_stats[metric].values
vmin, vmax = values.min(), values.max()
# Custom colormap (green to orange)
cmap = LinearSegmentedColormap.from_list("pitch", ["#2E7D32", "#FF5722"])
# Draw zones
for _, zone in zone_stats.iterrows():
color_val = (zone[metric] - vmin) / (vmax - vmin) if vmax > vmin else 0
rect = Rectangle(
(zone["x_min"], zone["y_min"]),
zone["x_max"] - zone["x_min"],
zone["y_max"] - zone["y_min"],
facecolor=cmap(color_val),
edgecolor="white",
linewidth=2
)
ax.add_patch(rect)
# Zone label
ax.text(zone["x_center"], zone["y_center"],
f"{zone[metric]:.0f}",
ha="center", va="center",
color="white", fontweight="bold", fontsize=12)
# Pitch markings
ax.plot([60, 60], [0, 80], "w-", linewidth=1)
ax.add_patch(Rectangle((0, 18), 18, 44, fill=False, edgecolor="white"))
ax.add_patch(Rectangle((102, 18), 18, 44, fill=False, edgecolor="white"))
ax.set_xlim(0, 120)
ax.set_ylim(0, 80)
ax.set_aspect("equal")
ax.set_facecolor("#1a472a")
ax.set_title(title, fontsize=14, fontweight="bold", color="white")
ax.axis("off")
# Colorbar
sm = plt.cm.ScalarMappable(cmap=cmap, norm=plt.Normalize(vmin, vmax))
plt.colorbar(sm, ax=ax, label=metric, shrink=0.6)
plt.tight_layout()
return fig
# Plot event density by zone
fig = plot_zone_heatmap(zone_stats, "n_events", "Event Density by Zone")
plt.show()# R: Visualize Zone Statistics on Pitch
library(ggplot2)
plot_zone_heatmap <- function(zone_stats, metric = "n_events", title = "Zone Activity") {
ggplot(zone_stats) +
# Zone rectangles
geom_rect(aes(xmin = x_min, xmax = x_max,
ymin = y_min, ymax = y_max,
fill = !!sym(metric)),
color = "white", size = 0.5) +
# Zone labels
geom_text(aes(x = x_center, y = y_center,
label = round(!!sym(metric), 1)),
color = "white", fontface = "bold", size = 4) +
# Pitch markings
annotate("rect", xmin = 0, xmax = 120, ymin = 0, ymax = 80,
fill = NA, color = "white", size = 1) +
annotate("segment", x = 60, xend = 60, y = 0, yend = 80,
color = "white", size = 0.5) +
# Penalty areas
annotate("rect", xmin = 0, xmax = 18, ymin = 18, ymax = 62,
fill = NA, color = "white", size = 0.5) +
annotate("rect", xmin = 102, xmax = 120, ymin = 18, ymax = 62,
fill = NA, color = "white", size = 0.5) +
scale_fill_gradient(low = "#2E7D32", high = "#FF5722") +
coord_fixed() +
theme_void() +
labs(title = title, fill = metric)
}
# Plot event density by zone
plot_zone_heatmap(zone_stats, "n_events", "Event Density by Zone")Heat Maps and Kernel Density
Heat maps provide continuous representations of action density, revealing patterns that discrete zone analysis might miss.
# Python: Creating Heat Maps with Kernel Density
import numpy as np
import pandas as pd
from scipy import stats
import matplotlib.pyplot as plt
from mplsoccer import Pitch
def create_heatmap_data(events, bandwidth=5):
"""Create kernel density estimate for event locations."""
# Filter to events with valid locations
loc_data = events.dropna(subset=["location_x", "location_y"])
x = loc_data["location_x"].values
y = loc_data["location_y"].values
# Create grid
xx, yy = np.mgrid[0:120:100j, 0:80:100j]
positions = np.vstack([xx.ravel(), yy.ravel()])
# Kernel density estimation
kernel = stats.gaussian_kde([x, y], bw_method=bandwidth / 100)
density = kernel(positions).reshape(xx.shape)
return xx, yy, density
# Create player touch heatmap
player_touches = events[
(events["player"] == "Lionel Messi") &
(events["type"].isin(["Pass", "Carry", "Shot", "Dribble"]))
]
xx, yy, density = create_heatmap_data(player_touches, bandwidth=8)
# Plot using mplsoccer
pitch = Pitch(pitch_type="statsbomb", pitch_color="#1a472a", line_color="white")
fig, ax = pitch.draw(figsize=(12, 8))
# Add heatmap
heatmap = ax.contourf(xx, yy, density, levels=20, cmap="YlOrRd", alpha=0.8)
plt.colorbar(heatmap, ax=ax, label="Density")
ax.set_title("Lionel Messi - Touch Heatmap", fontsize=14)
plt.show()
# Alternative: Using mplsoccer built-in heatmap
pitch = Pitch(pitch_type="statsbomb", pitch_color="#1a472a", line_color="white")
fig, ax = pitch.draw(figsize=(12, 8))
# mplsoccer heatmap
bin_statistic = pitch.bin_statistic(
player_touches["location_x"],
player_touches["location_y"],
statistic="count",
bins=(12, 8)
)
pitch.heatmap(bin_statistic, ax=ax, cmap="YlOrRd", edgecolors="#1a472a")
plt.show()# R: Creating Heat Maps with Kernel Density
library(tidyverse)
library(MASS)
create_heatmap_data <- function(events, bandwidth = 5) {
# Filter to events with valid locations
loc_data <- events %>%
filter(!is.na(location_x), !is.na(location_y))
# Create kernel density estimate
kde <- kde2d(
x = loc_data$location_x,
y = loc_data$location_y,
n = 100, # Grid resolution
h = bandwidth, # Bandwidth (smoothing)
lims = c(0, 120, 0, 80) # Pitch limits
)
# Convert to dataframe for ggplot
heatmap_df <- expand_grid(
x = kde$x,
y = kde$y
) %>%
mutate(
density = as.vector(kde$z)
)
return(heatmap_df)
}
# Create player touch heatmap
player_touches <- events %>%
filter(player == "Lionel Messi",
type %in% c("Pass", "Carry", "Shot", "Dribble"))
heatmap_data <- create_heatmap_data(player_touches, bandwidth = 8)
# Plot heatmap
ggplot() +
# Pitch background
annotate("rect", xmin = 0, xmax = 120, ymin = 0, ymax = 80,
fill = "#1a472a") +
# Heatmap
geom_raster(data = heatmap_data,
aes(x = x, y = y, fill = density),
interpolate = TRUE) +
scale_fill_gradient2(low = "transparent", mid = "yellow",
high = "red", midpoint = max(heatmap_data$density) / 2) +
# Pitch markings
annotate("segment", x = 60, xend = 60, y = 0, yend = 80, color = "white") +
coord_fixed() +
theme_void() +
labs(title = "Lionel Messi - Touch Heatmap")Comparative Heat Maps
# Python: Comparative Heat Maps (Player vs Team Average)
import numpy as np
import matplotlib.pyplot as plt
from mplsoccer import Pitch
def create_comparison_heatmap(player_events, team_events, player_name):
"""Create heatmap comparing player to team average."""
# Calculate densities
xx_p, yy_p, player_density = create_heatmap_data(player_events)
xx_t, yy_t, team_density = create_heatmap_data(team_events)
# Normalize team density to player sample size
scale_factor = len(player_events) / len(team_events)
team_density_scaled = team_density * scale_factor
# Calculate difference
diff_density = player_density - team_density_scaled
# Plot
pitch = Pitch(pitch_type="statsbomb", pitch_color="#1a472a", line_color="white")
fig, ax = pitch.draw(figsize=(12, 8))
# Difference heatmap (red = more than team, blue = less than team)
heatmap = ax.contourf(
xx_p, yy_p, diff_density,
levels=np.linspace(diff_density.min(), diff_density.max(), 20),
cmap="RdBu_r",
alpha=0.8
)
plt.colorbar(heatmap, ax=ax, label="Diff from Team Avg")
ax.set_title(f"{player_name} - Deviation from Team Average", fontsize=14)
return fig
# Compare Messi to team average
messi_events = events[events["player"] == "Lionel Messi"]
barcelona_events = events[events["team"] == "Barcelona"]
fig = create_comparison_heatmap(messi_events, barcelona_events, "Lionel Messi")
plt.show()# R: Comparative Heat Maps (Player vs Team Average)
library(tidyverse)
library(patchwork)
create_comparison_heatmap <- function(player_events, team_events, player_name) {
# Get player and team densities
player_kde <- create_heatmap_data(player_events, bandwidth = 8)
team_kde <- create_heatmap_data(team_events, bandwidth = 8)
# Calculate difference (player - team average)
comparison <- player_kde %>%
mutate(
team_density = team_kde$density / nrow(team_events) * nrow(player_events),
diff_density = density - team_density,
pct_diff = (density - team_density) / (team_density + 0.001) * 100
)
# Plot showing where player is more/less active than team average
ggplot() +
annotate("rect", xmin = 0, xmax = 120, ymin = 0, ymax = 80,
fill = "#1a472a") +
geom_raster(data = comparison,
aes(x = x, y = y, fill = diff_density),
interpolate = TRUE) +
scale_fill_gradient2(low = "blue", mid = "white", high = "red",
midpoint = 0, name = "Diff from\nTeam Avg") +
coord_fixed() +
theme_void() +
labs(title = sprintf("%s - Deviation from Team Average", player_name))
}
# Compare Messi to team average
messi_comparison <- create_comparison_heatmap(
player_events = events %>% filter(player == "Lionel Messi"),
team_events = events %>% filter(team == "Barcelona"),
player_name = "Lionel Messi"
)
print(messi_comparison)Territorial Control Metrics
Territorial control measures which team dominates different areas of the pitch. This goes beyond possession percentage to show where a team is actually playing.
# Python: Calculating Territorial Control
import pandas as pd
import numpy as np
def calculate_territorial_control(events, team1, team2, zone_grid):
"""Calculate territorial control metrics by zone."""
# Assign zones
n_x = zone_grid["x_zone"].max()
n_y = zone_grid["y_zone"].max()
events = events.copy()
events["x_zone"] = np.ceil(events["location_x"] / (120 / n_x)).clip(1, n_x).astype(int)
events["y_zone"] = np.ceil(events["location_y"] / (80 / n_y)).clip(1, n_y).astype(int)
events["zone_id"] = (events["x_zone"] - 1) * n_y + events["y_zone"]
# Count events by team and zone
zone_counts = events.groupby(["zone_id", "team"]).size().unstack(fill_value=0)
# Calculate control metrics
zone_control = pd.DataFrame({
"zone_id": zone_counts.index,
team1: zone_counts.get(team1, 0),
team2: zone_counts.get(team2, 0)
})
zone_control["total"] = zone_control[team1] + zone_control[team2]
zone_control["team1_share"] = zone_control[team1] / zone_control["total"]
zone_control["team2_share"] = zone_control[team2] / zone_control["total"]
zone_control["dominance"] = zone_control["team1_share"] - zone_control["team2_share"]
zone_control["contested"] = (
(zone_control["team1_share"] > 0.3) & (zone_control["team2_share"] > 0.3)
)
zone_control = zone_control.merge(zone_grid, on="zone_id")
# Overall metrics
overall = {
"team1_territory": zone_control["team1_share"].mean(),
"team2_territory": zone_control["team2_share"].mean(),
"field_tilt_team1": (
zone_control[zone_control["x_zone"] >= 4][team1].sum() /
zone_control[zone_control["x_zone"] >= 4]["total"].sum()
)
}
return {"zones": zone_control, "overall": overall}
# Calculate territorial control
zones = create_zone_grid(6, 3)
territorial = calculate_territorial_control(events, "Barcelona", "Real Madrid", zones)
print("Territorial Control:")
print(f"Barcelona: {territorial['overall']['team1_territory']*100:.1f}%")
print(f"Real Madrid: {territorial['overall']['team2_territory']*100:.1f}%")
print(f"Barcelona Field Tilt: {territorial['overall']['field_tilt_team1']*100:.1f}%")# R: Calculating Territorial Control
library(tidyverse)
calculate_territorial_control <- function(events, team1, team2, zone_grid) {
# Assign zones to events
events_zoned <- events %>%
mutate(
x_zone = ceiling(location_x / (120 / max(zone_grid$x_zone))),
y_zone = ceiling(location_y / (80 / max(zone_grid$y_zone)))
) %>%
mutate(
x_zone = pmin(pmax(x_zone, 1), max(zone_grid$x_zone)),
y_zone = pmin(pmax(y_zone, 1), max(zone_grid$y_zone)),
zone_id = (x_zone - 1) * max(zone_grid$y_zone) + y_zone
)
# Count events by team and zone
zone_control <- events_zoned %>%
count(zone_id, team, name = "n_events") %>%
pivot_wider(
names_from = team,
values_from = n_events,
values_fill = 0
)
# Calculate control metrics
zone_control <- zone_control %>%
mutate(
total = !!sym(team1) + !!sym(team2),
team1_share = !!sym(team1) / total,
team2_share = !!sym(team2) / total,
dominance = team1_share - team2_share, # Positive = team1 dominant
contested = pmin(team1_share, team2_share) > 0.3 # Both teams active
) %>%
left_join(zone_grid, by = "zone_id")
# Overall territorial metrics
overall <- list(
team1_territory = mean(zone_control$team1_share, na.rm = TRUE),
team2_territory = mean(zone_control$team2_share, na.rm = TRUE),
field_tilt_team1 = zone_control %>%
filter(x_zone >= 4) %>% # Attacking half
summarize(tilt = sum(!!sym(team1)) / sum(total)) %>%
pull(tilt)
)
return(list(zones = zone_control, overall = overall))
}
# Calculate territorial control
zones <- create_zone_grid(6, 3)
territorial <- calculate_territorial_control(events, "Barcelona", "Real Madrid", zones)
cat("Territorial Control:\n")
cat(sprintf("Barcelona: %.1f%%\n", territorial$overall$team1_territory * 100))
cat(sprintf("Real Madrid: %.1f%%\n", territorial$overall$team2_territory * 100))
cat(sprintf("Barcelona Field Tilt: %.1f%%\n", territorial$overall$field_tilt_team1 * 100))Territorial Control:
Barcelona: 58.3%
Real Madrid: 41.7%
Barcelona Field Tilt: 62.1%Voronoi Diagrams for Space Control
Voronoi diagrams partition the pitch into regions where each player controls the space closest to them. This is fundamental for understanding space occupation and is used in advanced pitch control models.
# Python: Creating Voronoi Diagrams
import numpy as np
import pandas as pd
from scipy.spatial import Voronoi, voronoi_plot_2d
import matplotlib.pyplot as plt
from matplotlib.patches import Polygon
from matplotlib.collections import PatchCollection
def create_voronoi_frame(player_positions, pitch_bounds=(0, 120, 0, 80)):
"""Create Voronoi diagram from player positions."""
# Add boundary points to ensure proper tessellation
x_min, x_max, y_min, y_max = pitch_bounds
boundary_points = np.array([
[x_min - 10, y_min - 10], [x_max + 10, y_min - 10],
[x_min - 10, y_max + 10], [x_max + 10, y_max + 10]
])
# Combine player positions with boundary
points = np.array(list(zip(player_positions["x"], player_positions["y"])))
all_points = np.vstack([points, boundary_points])
# Calculate Voronoi
vor = Voronoi(all_points)
return vor, points
def plot_voronoi_pitch(player_positions, pitch_bounds=(0, 120, 0, 80)):
"""Plot Voronoi diagram on pitch."""
vor, points = create_voronoi_frame(player_positions, pitch_bounds)
x_min, x_max, y_min, y_max = pitch_bounds
fig, ax = plt.subplots(figsize=(14, 9))
ax.set_facecolor("#1a472a")
# Plot Voronoi regions
for i, region_idx in enumerate(vor.point_region[:len(player_positions)]):
if region_idx == -1:
continue
region = vor.regions[region_idx]
if -1 in region or len(region) == 0:
continue
# Get vertices and clip to pitch
vertices = np.array([vor.vertices[v] for v in region])
vertices[:, 0] = np.clip(vertices[:, 0], x_min, x_max)
vertices[:, 1] = np.clip(vertices[:, 1], y_min, y_max)
# Color by team
team = player_positions.iloc[i]["team"]
color = "#3498db" if team == "Home" else "#e74c3c"
polygon = Polygon(vertices, alpha=0.3, facecolor=color, edgecolor="white")
ax.add_patch(polygon)
# Plot player positions
for team, color, marker in [("Home", "#3498db", "o"), ("Away", "#e74c3c", "s")]:
team_players = player_positions[player_positions["team"] == team]
ax.scatter(team_players["x"], team_players["y"],
c=color, s=200, marker=marker, edgecolor="white",
linewidth=2, label=team, zorder=5)
ax.set_xlim(x_min, x_max)
ax.set_ylim(y_min, y_max)
ax.set_aspect("equal")
ax.legend()
ax.set_title("Voronoi Space Control", fontsize=14)
plt.show()
# Example freeze frame
freeze_frame = pd.DataFrame({
"player": ["GK1", "DEF1", "DEF2", "MID1", "MID2", "FWD1",
"GK2", "DEF3", "DEF4", "MID3", "MID4", "FWD2"],
"x": [5, 25, 30, 45, 50, 75, 115, 90, 85, 70, 65, 45],
"y": [40, 25, 55, 35, 50, 40, 40, 55, 25, 60, 20, 40],
"team": ["Home"]*6 + ["Away"]*6
})
plot_voronoi_pitch(freeze_frame)# R: Creating Voronoi Diagrams
library(tidyverse)
library(ggvoronoi)
library(deldir)
create_voronoi_frame <- function(player_positions, pitch_bounds = c(0, 120, 0, 80)) {
# Player positions at a moment in time
# Input: dataframe with player, x, y, team columns
# Calculate Voronoi tessellation
voronoi <- deldir(
x = player_positions$x,
y = player_positions$y,
rw = pitch_bounds # Bounding rectangle
)
# Extract tiles
tiles <- tile.list(voronoi)
# Convert to dataframe for plotting
tile_df <- map_df(seq_along(tiles), function(i) {
tile <- tiles[[i]]
tibble(
player = player_positions$player[i],
team = player_positions$team[i],
x = tile$x,
y = tile$y,
tile_id = i
)
})
return(tile_df)
}
# Example: Create Voronoi from freeze frame data
freeze_frame <- data.frame(
player = c("GK1", "DEF1", "DEF2", "MID1", "MID2", "FWD1",
"GK2", "DEF3", "DEF4", "MID3", "MID4", "FWD2"),
x = c(5, 25, 30, 45, 50, 75, 115, 90, 85, 70, 65, 45),
y = c(40, 25, 55, 35, 50, 40, 40, 55, 25, 60, 20, 40),
team = c(rep("Home", 6), rep("Away", 6))
)
voronoi_tiles <- create_voronoi_frame(freeze_frame)
# Calculate space controlled by each team
space_control <- voronoi_tiles %>%
group_by(tile_id, team) %>%
summarize(
area = abs(sum(x * lead(y, default = first(y)) -
lead(x, default = first(x)) * y) / 2),
.groups = "drop"
) %>%
group_by(team) %>%
summarize(total_area = sum(area))
print(space_control)Spatial Clustering
Spatial clustering identifies natural groupings of actions on the pitch, revealing tactical patterns like preferred shooting zones, passing lanes, or defensive blocks.
# Python: Spatial Clustering of Actions
import pandas as pd
import numpy as np
from sklearn.cluster import DBSCAN, KMeans
import matplotlib.pyplot as plt
def cluster_shots(shots, eps=8, min_samples=3):
"""Cluster shot locations using DBSCAN."""
# Extract coordinates
coords = shots[["location_x", "location_y"]].values
# Run DBSCAN
clustering = DBSCAN(eps=eps, min_samples=min_samples)
shots = shots.copy()
shots["cluster"] = clustering.fit_predict(coords)
# Summarize clusters
cluster_summary = shots[shots["cluster"] >= 0].groupby("cluster").agg(
n_shots=("index", "count"),
n_goals=("shot_outcome", lambda x: (x == "Goal").sum()),
avg_xg=("shot_statsbomb_xg", "mean"),
center_x=("location_x", "mean"),
center_y=("location_y", "mean")
).reset_index()
cluster_summary["conversion_rate"] = (
cluster_summary["n_goals"] / cluster_summary["n_shots"]
)
# Label clusters
def label_cluster(row):
if row["center_x"] > 102:
if 30 < row["center_y"] < 50:
return "Central Box"
elif row["center_y"] <= 30:
return "Right Side Box"
else:
return "Left Side Box"
elif row["center_x"] > 80:
return "Edge of Box"
else:
return "Long Range"
cluster_summary["cluster_label"] = cluster_summary.apply(label_cluster, axis=1)
return {"shots": shots, "clusters": cluster_summary}
# Cluster team shots
shots_data = events[(events["type"] == "Shot") & (events["team"] == "Barcelona")]
shot_clusters = cluster_shots(shots_data, eps=6, min_samples=2)
print("Shot Clusters:")
print(shot_clusters["clusters"])
# Visualize clusters
def plot_shot_clusters(shot_data, cluster_summary):
"""Plot shot clusters on pitch."""
fig, ax = plt.subplots(figsize=(12, 8))
ax.set_facecolor("#1a472a")
# Plot shots colored by cluster
scatter = ax.scatter(
shot_data["location_x"],
shot_data["location_y"],
c=shot_data["cluster"],
cmap="tab10",
s=100,
alpha=0.7,
edgecolors="white"
)
# Plot cluster centers
for _, cluster in cluster_summary.iterrows():
ax.scatter(cluster["center_x"], cluster["center_y"],
marker="X", s=300, c="yellow", edgecolors="black",
linewidths=2, zorder=10)
ax.annotate(cluster["cluster_label"],
(cluster["center_x"], cluster["center_y"]),
textcoords="offset points", xytext=(10, 10),
fontsize=9, color="white")
ax.set_xlim(60, 120)
ax.set_ylim(0, 80)
ax.set_aspect("equal")
ax.set_title("Shot Clusters", fontsize=14)
plt.show()
plot_shot_clusters(shot_clusters["shots"], shot_clusters["clusters"])# R: Spatial Clustering of Actions
library(tidyverse)
library(dbscan)
library(cluster)
# DBSCAN clustering for shot locations
cluster_shots <- function(shots, eps = 8, min_pts = 3) {
# Extract coordinates
coords <- shots %>%
select(location_x, location_y) %>%
as.matrix()
# Run DBSCAN
clustering <- dbscan(coords, eps = eps, minPts = min_pts)
shots$cluster <- clustering$cluster
# Summarize clusters
cluster_summary <- shots %>%
filter(cluster > 0) %>% # Exclude noise (cluster 0)
group_by(cluster) %>%
summarize(
n_shots = n(),
n_goals = sum(outcome == "Goal"),
avg_xg = mean(xg, na.rm = TRUE),
center_x = mean(location_x),
center_y = mean(location_y),
.groups = "drop"
) %>%
mutate(
conversion_rate = n_goals / n_shots,
cluster_label = case_when(
center_x > 102 & center_y > 30 & center_y < 50 ~ "Central Box",
center_x > 102 & center_y <= 30 ~ "Right Side Box",
center_x > 102 & center_y >= 50 ~ "Left Side Box",
center_x <= 102 & center_x > 80 ~ "Edge of Box",
TRUE ~ "Long Range"
)
)
return(list(shots = shots, clusters = cluster_summary))
}
# Cluster team shots
shot_clusters <- cluster_shots(
events %>% filter(type == "Shot", team == "Barcelona"),
eps = 6,
min_pts = 2
)
print(shot_clusters$clusters)Shot Clusters:
cluster n_shots n_goals avg_xg center_x center_y conversion_rate cluster_label
0 0 12 3 0.152 108.2 38.5 0.250 Central Box
1 1 6 1 0.087 105.1 22.3 0.167 Right Side Box
2 2 5 0 0.065 92.4 40.1 0.000 Edge of BoxPractice Exercises
Task: Create a side-by-side comparison of touch heat maps for two players in the same position.
Requirements:
- Load event data for two full-backs from the same league
- Create kernel density heat maps for each player
- Display them side by side with the same color scale
- Calculate and display metrics: average touch position, spread
Task: Analyze passing patterns between pitch zones for a team.
Requirements:
- Create a 6x3 zone grid
- Calculate passes between each zone pair
- Visualize as a flow diagram or chord diagram
- Identify the most common passing lanes
Task: Using tracking data, analyze how space control changes during a possession sequence.
Requirements:
- Load tracking data for a possession sequence
- Calculate Voronoi diagrams at each frame
- Track how each team's controlled area changes
- Identify the moment of maximum space creation
Pitch Control Models
Pitch control models extend Voronoi analysis by incorporating player velocities, reaction times, and physical capabilities to estimate which team is most likely to reach each point on the pitch first. This provides a more realistic representation of space control.
Key Concept: From Static to Dynamic
While Voronoi diagrams assume all players have equal ability to reach any point, pitch control models account for current velocity, maximum speed, and time to intercept. This makes them far more accurate for real-time tactical analysis.
Components of Pitch Control
Player Velocity
Current speed and direction affect how quickly a player can reach different areas of the pitch.
Reaction Time
Time delay before a player can change direction or accelerate towards the ball.
Physical Parameters
Maximum speed, acceleration, and deceleration capabilities vary by player.
# Python: Pitch Control Model Implementation
import numpy as np
import pandas as pd
from scipy.ndimage import gaussian_filter
import matplotlib.pyplot as plt
class PitchControlModel:
"""
Pitch control model based on Spearman et al. (2017).
Estimates probability of each team controlling space.
"""
def __init__(self, max_speed=5.0, reaction_time=0.7,
sigma=0.45, time_to_control=0.5):
self.max_speed = max_speed
self.reaction_time = reaction_time
self.sigma = sigma
self.time_to_control = time_to_control
def time_to_intercept(self, player_x, player_y, player_vx, player_vy,
target_x, target_y):
"""Calculate time for player to reach target location."""
# Distance to target
dx = target_x - player_x
dy = target_y - player_y
distance = np.sqrt(dx**2 + dy**2)
if distance > 0:
# Direction to target
dir_x = dx / distance
dir_y = dy / distance
# Velocity component towards target
velocity_towards = player_vx * dir_x + player_vy * dir_y
else:
velocity_towards = 0
# Effective speed (accounting for current momentum)
effective_speed = max(velocity_towards, 0) + self.max_speed * 0.7
# Total time = reaction + movement
time = self.reaction_time + distance / effective_speed
return time
def control_probability(self, target_x, target_y,
team_players, opponent_players):
"""Calculate control probability at a single point."""
# Get minimum intercept time for each team
team_times = [
self.time_to_intercept(p["x"], p["y"], p["vx"], p["vy"],
target_x, target_y)
for p in team_players
]
opp_times = [
self.time_to_intercept(p["x"], p["y"], p["vx"], p["vy"],
target_x, target_y)
for p in opponent_players
]
min_team_time = min(team_times)
min_opp_time = min(opp_times)
# Logistic function for probability
team_prob = 1 / (1 + np.exp(min_team_time / self.sigma))
opp_prob = 1 / (1 + np.exp(min_opp_time / self.sigma))
# Normalize
total = team_prob + opp_prob
control = team_prob / total if total > 0 else 0.5
return control
def generate_surface(self, team_players, opponent_players,
grid_size=50, pitch_length=120, pitch_width=80):
"""Generate full pitch control surface."""
x_grid = np.linspace(0, pitch_length, grid_size)
y_grid = np.linspace(0, pitch_width, grid_size)
control_surface = np.zeros((grid_size, grid_size))
for i, x in enumerate(x_grid):
for j, y in enumerate(y_grid):
control_surface[j, i] = self.control_probability(
x, y, team_players, opponent_players
)
# Apply smoothing
control_surface = gaussian_filter(control_surface, sigma=1)
return x_grid, y_grid, control_surface
def plot_pitch_control(self, team_players, opponent_players):
"""Visualize pitch control surface."""
x_grid, y_grid, surface = self.generate_surface(
team_players, opponent_players
)
fig, ax = plt.subplots(figsize=(14, 9))
# Plot control surface
im = ax.contourf(x_grid, y_grid, surface, levels=20,
cmap="RdBu", vmin=0, vmax=1, alpha=0.8)
plt.colorbar(im, ax=ax, label="Control Probability (Team)")
# Plot players
for p in team_players:
ax.scatter(p["x"], p["y"], c="blue", s=200,
edgecolors="white", linewidth=2, zorder=5)
ax.arrow(p["x"], p["y"], p["vx"]*2, p["vy"]*2,
head_width=1, head_length=0.5, fc="blue", ec="blue")
for p in opponent_players:
ax.scatter(p["x"], p["y"], c="red", s=200,
edgecolors="white", linewidth=2, zorder=5)
ax.arrow(p["x"], p["y"], p["vx"]*2, p["vy"]*2,
head_width=1, head_length=0.5, fc="red", ec="red")
ax.set_xlim(0, 120)
ax.set_ylim(0, 80)
ax.set_aspect("equal")
ax.set_title("Pitch Control Model", fontsize=14)
# Calculate summary stats
team_control = np.mean(surface)
attacking_control = np.mean(surface[:, x_grid > 80])
ax.text(60, -5, f"Overall: {team_control:.1%} | Attacking Third: {attacking_control:.1%}",
ha="center", fontsize=10)
return fig
# Example usage
model = PitchControlModel()
team_players = [
{"x": 5, "y": 40, "vx": 0, "vy": 0},
{"x": 25, "y": 20, "vx": 2, "vy": 1},
{"x": 30, "y": 60, "vx": 1, "vy": -1},
{"x": 45, "y": 40, "vx": 3, "vy": 0},
{"x": 55, "y": 25, "vx": 4, "vy": 2},
{"x": 70, "y": 50, "vx": 3, "vy": -1},
]
opponent_players = [
{"x": 115, "y": 40, "vx": 0, "vy": 0},
{"x": 95, "y": 55, "vx": -3, "vy": 1},
{"x": 90, "y": 25, "vx": -2, "vy": -1},
{"x": 75, "y": 45, "vx": -4, "vy": 0},
{"x": 65, "y": 60, "vx": -2, "vy": 1},
{"x": 50, "y": 35, "vx": -1, "vy": -2},
]
fig = model.plot_pitch_control(team_players, opponent_players)
plt.show()# R: Pitch Control Model Implementation
library(tidyverse)
# Physical parameters
PARAMS <- list(
max_speed = 5.0, # m/s maximum player speed
reaction_time = 0.7, # seconds reaction time
avg_ball_speed = 15.0, # m/s average ball speed
time_to_control = 0.5, # seconds to control ball once reached
sigma = 0.45 # uncertainty parameter
)
# Calculate time for player to reach target location
time_to_intercept <- function(player_x, player_y, player_vx, player_vy,
target_x, target_y, params = PARAMS) {
# Distance to target
dx <- target_x - player_x
dy <- target_y - player_y
distance <- sqrt(dx^2 + dy^2)
# Account for current velocity
# Project velocity onto direction to target
if (distance > 0) {
dir_x <- dx / distance
dir_y <- dy / distance
velocity_towards <- player_vx * dir_x + player_vy * dir_y
} else {
velocity_towards <- 0
}
# Time = reaction_time + time to cover distance
# Using kinematic equation with current velocity
effective_speed <- max(velocity_towards, 0) + params$max_speed * 0.7
time <- params$reaction_time + distance / effective_speed
return(time)
}
# Calculate pitch control probability at a point
pitch_control_at_point <- function(target_x, target_y,
team_positions, opponent_positions,
params = PARAMS) {
# Calculate intercept times for all players
team_times <- map_dbl(1:nrow(team_positions), function(i) {
time_to_intercept(
team_positions$x[i], team_positions$y[i],
team_positions$vx[i], team_positions$vy[i],
target_x, target_y, params
)
})
opponent_times <- map_dbl(1:nrow(opponent_positions), function(i) {
time_to_intercept(
opponent_positions$x[i], opponent_positions$y[i],
opponent_positions$vx[i], opponent_positions$vy[i],
target_x, target_y, params
)
})
# Convert times to probabilities using logistic function
team_prob <- 1 / (1 + exp(min(team_times) / params$sigma))
opponent_prob <- 1 / (1 + exp(min(opponent_times) / params$sigma))
# Normalize
total <- team_prob + opponent_prob
control_prob <- team_prob / total
return(control_prob)
}
# Generate pitch control surface
generate_pitch_control <- function(team_pos, opp_pos,
grid_size = 50, params = PARAMS) {
# Create grid
x_grid <- seq(0, 120, length.out = grid_size)
y_grid <- seq(0, 80, length.out = grid_size)
# Calculate control at each point
control_surface <- expand_grid(x = x_grid, y = y_grid) %>%
rowwise() %>%
mutate(
control = pitch_control_at_point(x, y, team_pos, opp_pos, params)
) %>%
ungroup()
return(control_surface)
}
# Example: Calculate pitch control
team_positions <- tibble(
player = paste0("T", 1:11),
x = c(5, 20, 25, 30, 35, 45, 50, 55, 70, 75, 90),
y = c(40, 20, 60, 35, 50, 40, 25, 55, 30, 50, 40),
vx = c(0, 2, 1, 3, 2, 4, 3, 2, 5, 4, 3),
vy = c(0, 1, -1, 0, 2, -1, 2, -2, 1, 0, 1)
)
opponent_positions <- tibble(
player = paste0("O", 1:11),
x = c(115, 100, 95, 90, 85, 75, 70, 65, 50, 45, 30),
y = c(40, 55, 25, 45, 35, 60, 20, 45, 50, 30, 40),
vx = c(0, -3, -2, -4, -3, -2, -1, -3, -2, -1, 0),
vy = c(0, 1, -1, 0, -2, 1, -1, 2, 0, 1, 0)
)
pitch_control <- generate_pitch_control(team_positions, opponent_positions)
# Summary stats
cat("Team Total Control:", mean(pitch_control$control) * 100, "%\n")
cat("Attacking Third Control:",
mean(pitch_control$control[pitch_control$x > 80]) * 100, "%\n")Team Total Control: 54.2%
Attacking Third Control: 38.7%Expected Possession Value from Pitch Control
Pitch control can be combined with expected possession value (EPV) models to estimate the value of different passing options.
# Python: EPV from Pitch Control
import numpy as np
import pandas as pd
class EPVModel:
"""Expected Possession Value model integrated with pitch control."""
def __init__(self, grid_size=50, pitch_length=120, pitch_width=80):
self.grid_size = grid_size
self.pitch_length = pitch_length
self.pitch_width = pitch_width
self.epv_surface = self._create_epv_surface()
def _create_epv_surface(self):
"""Create base EPV surface."""
x = np.linspace(0, self.pitch_length, self.grid_size)
y = np.linspace(0, self.pitch_width, self.grid_size)
xx, yy = np.meshgrid(x, y)
# Distance to goal
goal_x, goal_y = 120, 40
distance = np.sqrt((goal_x - xx)**2 + (goal_y - yy)**2)
# Centrality (higher in middle)
centrality = 1 - np.abs(yy - 40) / 40
# Base EPV
epv = np.maximum(0, 0.1 - distance / 1500) * (1 + centrality * 0.5)
# Boost in penalty box
in_box = (xx > 102) & (yy > 18) & (yy < 62)
epv[in_box] *= 2
return epv
def get_epv(self, x, y):
"""Get EPV at specific location."""
i = int(np.clip(x / self.pitch_length * (self.grid_size - 1),
0, self.grid_size - 1))
j = int(np.clip(y / self.pitch_width * (self.grid_size - 1),
0, self.grid_size - 1))
return self.epv_surface[j, i]
def calculate_controlled_epv(self, pitch_control_surface):
"""Calculate EPV weighted by pitch control."""
controlled_epv = pitch_control_surface * self.epv_surface
return np.sum(controlled_epv) / controlled_epv.size
def evaluate_pass_options(self, ball_pos, targets,
pitch_control_model, team_players, opp_players):
"""Evaluate potential pass targets using EPV and control."""
results = []
for target in targets:
# Control probability at target
control = pitch_control_model.control_probability(
target["x"], target["y"], team_players, opp_players
)
# EPV at target
epv = self.get_epv(target["x"], target["y"])
# Pass success probability (distance-based)
pass_dist = np.sqrt((target["x"] - ball_pos["x"])**2 +
(target["y"] - ball_pos["y"])**2)
pass_success = 1 / (1 + np.exp((pass_dist - 25) / 10))
# Expected value
expected_value = control * epv * pass_success
results.append({
"target_x": target["x"],
"target_y": target["y"],
"control_prob": control,
"epv": epv,
"pass_success": pass_success,
"expected_value": expected_value
})
return pd.DataFrame(results).sort_values("expected_value", ascending=False)
# Example: Evaluate pass options
epv_model = EPVModel()
pc_model = PitchControlModel()
ball_pos = {"x": 60, "y": 40}
pass_targets = [
{"x": 70, "y": 30},
{"x": 75, "y": 50},
{"x": 85, "y": 40},
{"x": 65, "y": 60},
]
pass_evaluation = epv_model.evaluate_pass_options(
ball_pos, pass_targets, pc_model, team_players, opponent_players
)
print("Pass Options Ranked by Expected Value:")
print(pass_evaluation)# R: EPV from Pitch Control
library(tidyverse)
# Simplified EPV grid (pre-calculated)
create_epv_grid <- function(grid_size = 50) {
x_grid <- seq(0, 120, length.out = grid_size)
y_grid <- seq(0, 80, length.out = grid_size)
expand_grid(x = x_grid, y = y_grid) %>%
mutate(
# EPV increases towards goal, higher in central areas
distance_to_goal = sqrt((120 - x)^2 + (40 - y)^2),
centrality = 1 - abs(y - 40) / 40,
base_epv = pmax(0, 0.1 - distance_to_goal / 1500) * (1 + centrality * 0.5),
# Boost in box
in_box = x > 102 & y > 18 & y < 62,
epv = ifelse(in_box, base_epv * 2, base_epv)
)
}
epv_grid <- create_epv_grid()
# Calculate controlled EPV
calculate_controlled_epv <- function(pitch_control, epv_grid) {
# Join pitch control with EPV
combined <- pitch_control %>%
left_join(epv_grid %>% select(x, y, epv), by = c("x", "y"))
# Controlled EPV = control probability * EPV at that location
controlled_epv <- combined %>%
mutate(controlled_epv = control * epv)
# Total controlled EPV
total_controlled <- sum(controlled_epv$controlled_epv) / nrow(controlled_epv)
return(list(
surface = controlled_epv,
total = total_controlled
))
}
# Evaluate pass options using pitch control
evaluate_pass_options <- function(ball_pos, target_positions,
team_pos, opp_pos, epv_grid) {
pass_values <- map_df(1:nrow(target_positions), function(i) {
target <- target_positions[i, ]
# Estimate control at target after pass
control_at_target <- pitch_control_at_point(
target$x, target$y, team_pos, opp_pos
)
# Get EPV at target
epv_at_target <- epv_grid %>%
filter(abs(x - target$x) < 2, abs(y - target$y) < 2) %>%
summarize(epv = mean(epv)) %>%
pull(epv)
# Pass distance and risk
pass_dist <- sqrt((target$x - ball_pos$x)^2 +
(target$y - ball_pos$y)^2)
pass_success_prob <- 1 / (1 + exp((pass_dist - 25) / 10))
tibble(
target_id = i,
target_x = target$x,
target_y = target$y,
control_prob = control_at_target,
epv_at_target = epv_at_target,
pass_success = pass_success_prob,
expected_value = control_at_target * epv_at_target * pass_success_prob
)
})
return(pass_values %>% arrange(desc(expected_value)))
}Spatial Passing Networks
Combining spatial analysis with network analysis reveals how teams connect different areas of the pitch. Spatial passing networks show not just who passes to whom, but where on the pitch these connections occur.
# Python: Spatial Passing Networks
import pandas as pd
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
from matplotlib.patches import FancyArrowPatch
def create_spatial_pass_network(passes, zones):
"""Create a network of passes between pitch zones."""
n_x = zones["x_zone"].max()
n_y = zones["y_zone"].max()
# Assign zones
passes = passes.copy()
passes["origin_zone"] = (
np.ceil(passes["location_x"] / (120 / n_x)).clip(1, n_x).astype(int) * n_y +
np.ceil(passes["location_y"] / (80 / n_y)).clip(1, n_y).astype(int)
)
passes["dest_zone"] = (
np.ceil(passes["pass_end_x"] / (120 / n_x)).clip(1, n_x).astype(int) * n_y +
np.ceil(passes["pass_end_y"] / (80 / n_y)).clip(1, n_y).astype(int)
)
# Edge list
edges = passes.groupby(["origin_zone", "dest_zone"]).agg(
n_passes=("id", "count"),
success_rate=("pass_outcome", lambda x: (x.isna()).mean()),
progressive=("pass_end_x", lambda x: (x > passes.loc[x.index, "location_x"]).mean())
).reset_index()
edges = edges[edges["n_passes"] >= 3]
# Create NetworkX graph
G = nx.DiGraph()
for _, row in edges.iterrows():
G.add_edge(
row["origin_zone"], row["dest_zone"],
weight=row["n_passes"],
success_rate=row["success_rate"],
progressive=row["progressive"]
)
# Add node positions
for zone_id in G.nodes():
zone = zones[zones["zone_id"] == zone_id].iloc[0]
G.nodes[zone_id]["x"] = zone["x_center"]
G.nodes[zone_id]["y"] = zone["y_center"]
return G, edges
def plot_spatial_pass_network(G, zones):
"""Visualize spatial passing network on pitch."""
fig, ax = plt.subplots(figsize=(14, 9))
ax.set_facecolor("#1a472a")
# Draw pitch markings
ax.plot([60, 60], [0, 80], "w-", linewidth=1, alpha=0.5)
ax.add_patch(plt.Rectangle((0, 18), 18, 44, fill=False,
edgecolor="white", alpha=0.5))
ax.add_patch(plt.Rectangle((102, 18), 18, 44, fill=False,
edgecolor="white", alpha=0.5))
# Get edge weights for scaling
weights = [G[u][v]["weight"] for u, v in G.edges()]
max_weight = max(weights) if weights else 1
# Draw edges
for u, v in G.edges():
weight = G[u][v]["weight"]
x1, y1 = G.nodes[u]["x"], G.nodes[u]["y"]
x2, y2 = G.nodes[v]["x"], G.nodes[v]["y"]
# Line thickness based on pass count
linewidth = 1 + (weight / max_weight) * 5
# Color based on progressive or not
color = "#FFD700" if G[u][v]["progressive"] > 0.5 else "#87CEEB"
ax.annotate("", xy=(x2, y2), xytext=(x1, y1),
arrowprops=dict(arrowstyle="->", color=color,
lw=linewidth, alpha=0.7))
# Draw nodes
node_sizes = [300 * (G.degree(n) + 1) for n in G.nodes()]
xs = [G.nodes[n]["x"] for n in G.nodes()]
ys = [G.nodes[n]["y"] for n in G.nodes()]
ax.scatter(xs, ys, s=node_sizes, c="#FFFFFF",
edgecolors="#333333", linewidth=2, zorder=5)
ax.set_xlim(0, 120)
ax.set_ylim(0, 80)
ax.set_aspect("equal")
ax.set_title("Spatial Passing Network", fontsize=14, color="white")
ax.axis("off")
return fig
# Create and plot network
zones = create_zone_grid(6, 4)
passes = events[(events["type"] == "Pass") & (events["team"] == "Barcelona")]
G, edges = create_spatial_pass_network(passes, zones)
print("Top Zone-to-Zone Connections:")
print(edges.nlargest(10, "n_passes"))
fig = plot_spatial_pass_network(G, zones)
plt.show()# R: Spatial Passing Networks
library(tidyverse)
library(igraph)
create_spatial_pass_network <- function(passes, zones) {
# Assign zones to pass origins and destinations
passes_zoned <- passes %>%
mutate(
origin_zone = ceiling(location_x / (120 / max(zones$x_zone))) *
max(zones$y_zone) +
ceiling(location_y / (80 / max(zones$y_zone))),
dest_zone = ceiling(pass_end_x / (120 / max(zones$x_zone))) *
max(zones$y_zone) +
ceiling(pass_end_y / (80 / max(zones$y_zone)))
)
# Create edge list (zone to zone passes)
zone_edges <- passes_zoned %>%
filter(!is.na(origin_zone), !is.na(dest_zone)) %>%
count(origin_zone, dest_zone, name = "n_passes") %>%
filter(n_passes >= 3) # Minimum passes threshold
# Calculate additional metrics
zone_edges <- zone_edges %>%
left_join(
passes_zoned %>%
group_by(origin_zone, dest_zone) %>%
summarize(
success_rate = mean(pass_outcome == "Complete", na.rm = TRUE),
avg_angle = mean(atan2(pass_end_y - location_y,
pass_end_x - location_x) * 180 / pi),
progressive = mean(pass_end_x > location_x),
.groups = "drop"
),
by = c("origin_zone", "dest_zone")
)
# Create igraph object
g <- graph_from_data_frame(zone_edges, directed = TRUE)
# Add zone attributes
V(g)$zone_id <- as.numeric(V(g)$name)
V(g)$x_center <- zones$x_center[match(V(g)$zone_id, zones$zone_id)]
V(g)$y_center <- zones$y_center[match(V(g)$zone_id, zones$zone_id)]
return(list(
graph = g,
edges = zone_edges,
node_metrics = tibble(
zone_id = as.numeric(V(g)$name),
out_degree = degree(g, mode = "out"),
in_degree = degree(g, mode = "in"),
betweenness = betweenness(g),
pagerank = page_rank(g)$vector
)
))
}
# Create network
zones <- create_zone_grid(6, 4)
passes <- events %>% filter(type == "Pass", team == "Barcelona")
pass_network <- create_spatial_pass_network(passes, zones)
# Top passing connections
top_connections <- pass_network$edges %>%
arrange(desc(n_passes)) %>%
head(10)
print("Top Zone-to-Zone Passing Connections:")
print(top_connections)Zone Transition Analysis
Understanding how teams progress the ball through different zones reveals their build-up patterns and attacking approaches.
# Python: Zone Transition Analysis
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
def analyze_zone_transitions(events, n_zones=6):
"""Analyze how possessions progress through horizontal zones."""
events = events.copy()
# Assign horizontal zones
events["x_zone"] = np.ceil(events["location_x"] / (120 / n_zones)).clip(1, n_zones).astype(int)
# Track zone changes within possessions
events["prev_zone"] = events.groupby("possession")["x_zone"].shift(1)
events["zone_changed"] = events["x_zone"] != events["prev_zone"]
transitions = events[events["zone_changed"] & events["prev_zone"].notna()].copy()
transitions["zone_from"] = transitions["prev_zone"].astype(int)
transitions["zone_to"] = transitions["x_zone"].astype(int)
# Transition matrix
transition_matrix = pd.crosstab(
transitions["zone_from"],
transitions["zone_to"],
margins=True
)
# Direction analysis
transitions["direction"] = np.where(
transitions["zone_to"] > transitions["zone_from"], "Progressive",
np.where(transitions["zone_to"] < transitions["zone_from"], "Regressive", "Lateral")
)
direction_summary = transitions["direction"].value_counts(normalize=True) * 100
return {
"matrix": transition_matrix,
"direction_summary": direction_summary,
"transitions": transitions
}
def plot_transition_sankey(transitions):
"""Plot Sankey diagram of zone transitions."""
from matplotlib.sankey import Sankey
# Simplified flow visualization
fig, ax = plt.subplots(figsize=(14, 8))
# Aggregate flows
flows = transitions["transitions"].groupby(
["zone_from", "zone_to"]
).size().reset_index(name="count")
# Create heatmap of transitions
matrix = transitions["matrix"].iloc[:-1, :-1] # Remove margins
sns.heatmap(matrix, annot=True, fmt="d", cmap="YlOrRd", ax=ax)
ax.set_xlabel("Zone To")
ax.set_ylabel("Zone From")
ax.set_title("Zone Transition Matrix")
return fig
# Analyze transitions
events_team = events[events["team"] == "Barcelona"]
transitions = analyze_zone_transitions(events_team, n_zones=6)
print("Zone Transition Matrix:")
print(transitions["matrix"])
print("\nDirection Summary:")
print(transitions["direction_summary"])
fig = plot_transition_sankey(transitions)
plt.show()# R: Zone Transition Analysis
library(tidyverse)
library(sankey)
analyze_zone_transitions <- function(possessions, zones) {
# Track ball progression through zones during possessions
transitions <- possessions %>%
group_by(possession_id) %>%
arrange(minute, second) %>%
mutate(
x_zone = ceiling(location_x / (120 / max(zones$x_zone))),
zone_change = x_zone != lag(x_zone),
zone_from = lag(x_zone),
zone_to = x_zone
) %>%
filter(!is.na(zone_from), zone_change) %>%
ungroup()
# Summarize transitions
transition_matrix <- transitions %>%
count(zone_from, zone_to, name = "count") %>%
pivot_wider(names_from = zone_to, values_from = count, values_fill = 0)
# Progressive vs regressive transitions
progression_summary <- transitions %>%
mutate(
direction = case_when(
zone_to > zone_from ~ "Progressive",
zone_to < zone_from ~ "Regressive",
TRUE ~ "Lateral"
)
) %>%
count(direction) %>%
mutate(pct = n / sum(n) * 100)
# Success rate by transition type
transition_success <- transitions %>%
mutate(
successful = lead(team) == team, # Team kept possession
direction = case_when(
zone_to > zone_from ~ "Progressive",
zone_to < zone_from ~ "Regressive",
TRUE ~ "Lateral"
)
) %>%
group_by(direction) %>%
summarize(
n = n(),
success_rate = mean(successful, na.rm = TRUE) * 100,
.groups = "drop"
)
return(list(
matrix = transition_matrix,
summary = progression_summary,
success = transition_success
))
}
# Analyze transitions
zones <- create_zone_grid(6, 1) # 6 horizontal zones only
possessions <- events %>% filter(team == "Barcelona")
transitions <- analyze_zone_transitions(possessions, zones)
print("Zone Transition Matrix:")
print(transitions$matrix)
print("\nProgression Summary:")
print(transitions$summary)Advanced Spatial Metrics
Beyond basic spatial analysis, advanced metrics can capture complex spatial patterns like compactness, width, and defensive shape.
Measures how tightly grouped a team's players are. Compact teams are harder to play through but may leave space in wide areas.
- Length: Distance between deepest and highest outfield players
- Width: Distance between widest players
- Surface Area: Convex hull of player positions
Analyzes the organization and positioning of defensive units.
- Line Height: Average position of defensive line
- Line Flatness: Variance in defender positions
- Block Solidity: Gaps between defensive lines
# Python: Advanced Spatial Metrics
import numpy as np
import pandas as pd
from scipy.spatial import ConvexHull
from scipy.spatial.distance import cdist
class TeamShapeAnalyzer:
"""Analyze team shape and structure from player positions."""
def __init__(self, positions):
"""
positions: DataFrame with columns: player, x, y, position
"""
self.positions = positions
self.outfield = positions[positions["position"] != "GK"]
def calculate_basic_shape(self):
"""Calculate basic shape metrics."""
x = self.outfield["x"].values
y = self.outfield["y"].values
metrics = {
"length": x.max() - x.min(),
"width": y.max() - y.min(),
"centroid_x": x.mean(),
"centroid_y": y.mean()
}
# Convex hull area
if len(self.outfield) >= 3:
points = np.column_stack([x, y])
hull = ConvexHull(points)
metrics["surface_area"] = hull.volume # In 2D, volume is area
else:
metrics["surface_area"] = 0
# Compactness (average distance to centroid)
distances = np.sqrt(
(x - metrics["centroid_x"])**2 +
(y - metrics["centroid_y"])**2
)
metrics["compactness"] = distances.mean()
# Stretch index
metrics["stretch_index"] = (
metrics["length"] / metrics["width"]
if metrics["width"] > 0 else 0
)
return metrics
def calculate_defensive_shape(self):
"""Analyze defensive line metrics."""
defenders = self.positions[
self.positions["position"].isin(["CB", "LB", "RB", "LWB", "RWB"])
].sort_values("y")
if len(defenders) < 2:
return {"error": "Not enough defenders"}
metrics = {
"line_height": defenders["x"].mean(),
"line_flatness": defenders["x"].std(),
}
# Gaps between defenders
gaps = np.diff(defenders["y"].values)
metrics["max_gap"] = gaps.max()
metrics["avg_gap"] = gaps.mean()
# Midfield line
midfielders = self.positions[
self.positions["position"].isin(["CM", "CDM", "CAM", "LM", "RM"])
]
if len(midfielders) > 0:
metrics["midfield_height"] = midfielders["x"].mean()
metrics["lines_distance"] = metrics["midfield_height"] - metrics["line_height"]
return metrics
def calculate_pressing_shape(self, opponent_ball_pos):
"""Analyze team shape relative to ball position."""
x = self.outfield["x"].values
y = self.outfield["y"].values
# Distance to ball for each player
ball_distances = np.sqrt(
(x - opponent_ball_pos["x"])**2 +
(y - opponent_ball_pos["y"])**2
)
metrics = {
"nearest_player_dist": ball_distances.min(),
"avg_ball_distance": ball_distances.mean(),
"players_within_10m": (ball_distances < 10).sum(),
"players_within_20m": (ball_distances < 20).sum(),
}
# Press intensity (more players close = higher press)
metrics["press_intensity"] = np.sum(1 / (ball_distances + 1))
return metrics
# Example usage
positions = pd.DataFrame({
"player": [f"P{i}" for i in range(11)],
"x": [5, 25, 28, 32, 30, 45, 48, 55, 70, 75, 85],
"y": [40, 15, 30, 50, 65, 25, 55, 40, 30, 60, 40],
"position": ["GK", "LB", "CB", "CB", "RB", "CM", "CM", "CAM", "LW", "RW", "ST"]
})
analyzer = TeamShapeAnalyzer(positions)
print("Basic Shape Metrics:")
print(analyzer.calculate_basic_shape())
print("\nDefensive Shape Metrics:")
print(analyzer.calculate_defensive_shape())
print("\nPressing Shape (ball at x=90, y=40):")
print(analyzer.calculate_pressing_shape({"x": 90, "y": 40}))# R: Advanced Spatial Metrics
library(tidyverse)
library(sf)
calculate_team_shape <- function(player_positions) {
# Filter to outfield players
outfield <- player_positions %>%
filter(position != "GK")
# Basic shape metrics
length <- max(outfield$x) - min(outfield$x)
width <- max(outfield$y) - min(outfield$y)
# Centroid
centroid_x <- mean(outfield$x)
centroid_y <- mean(outfield$y)
# Convex hull area (team surface)
hull_points <- outfield %>%
select(x, y) %>%
as.matrix()
hull_indices <- chull(hull_points)
hull_area <- abs(sum(
hull_points[hull_indices, 1] *
c(hull_points[hull_indices[-1], 2], hull_points[hull_indices[1], 2]) -
hull_points[hull_indices, 2] *
c(hull_points[hull_indices[-1], 1], hull_points[hull_indices[1], 1])
)) / 2
# Compactness (smaller = more compact)
avg_distance_to_centroid <- mean(sqrt(
(outfield$x - centroid_x)^2 + (outfield$y - centroid_y)^2
))
# Stretch index (ratio of length to width)
stretch_index <- length / width
return(tibble(
length = length,
width = width,
surface_area = hull_area,
centroid_x = centroid_x,
centroid_y = centroid_y,
compactness = avg_distance_to_centroid,
stretch_index = stretch_index
))
}
calculate_defensive_metrics <- function(player_positions) {
# Identify defensive line (usually back 4 or 5)
defenders <- player_positions %>%
filter(position %in% c("CB", "LB", "RB", "LWB", "RWB")) %>%
arrange(x)
# Line height (average x position)
line_height <- mean(defenders$x)
# Line flatness (lower = flatter line)
line_flatness <- sd(defenders$x)
# Gaps between defenders
defenders_sorted <- defenders %>% arrange(y)
gaps <- diff(defenders_sorted$y)
max_gap <- max(gaps)
avg_gap <- mean(gaps)
# Midfield line metrics
midfielders <- player_positions %>%
filter(position %in% c("CM", "CDM", "CAM", "LM", "RM"))
midfield_height <- mean(midfielders$x)
# Distance between lines
line_distance <- midfield_height - line_height
return(tibble(
defensive_line_height = line_height,
line_flatness = line_flatness,
max_defender_gap = max_gap,
avg_defender_gap = avg_gap,
midfield_height = midfield_height,
lines_distance = line_distance
))
}
# Example calculation
team_shape <- calculate_team_shape(team_positions)
defensive_metrics <- calculate_defensive_metrics(team_positions)
print("Team Shape Metrics:")
print(team_shape)
print("\nDefensive Metrics:")
print(defensive_metrics)Case Study: Analyzing Team Playing Style
This case study demonstrates how to combine multiple spatial analysis techniques to build a comprehensive understanding of a team's playing style.
Objective
Analyze Manchester City's playing style using spatial analytics to understand their territorial dominance, passing patterns, and space creation.
# Python: Complete Spatial Analysis Case Study
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec
class TeamSpatialAnalyzer:
"""Complete spatial analysis for a team."""
def __init__(self, team_events, team_name):
self.events = team_events
self.team_name = team_name
self.results = {}
def run_full_analysis(self):
"""Run complete spatial analysis suite."""
print(f"\n=== Spatial Analysis: {self.team_name} ===\n")
self._territorial_analysis()
self._passing_analysis()
self._shot_analysis()
self._width_analysis()
return self.results
def _territorial_analysis(self):
"""Analyze territorial metrics."""
print("1. TERRITORIAL METRICS")
print("-" * 25)
avg_x = self.events["location_x"].mean()
avg_y = self.events["location_y"].mean()
print(f"Average Position: x={avg_x:.1f}, y={avg_y:.1f}")
# Actions by third
self.events["third"] = pd.cut(
self.events["location_x"],
bins=[0, 40, 80, 120],
labels=["Defensive", "Middle", "Attacking"]
)
thirds = self.events["third"].value_counts(normalize=True) * 100
print(f"\nActions by Third:")
print(thirds)
self.results["territorial"] = {
"avg_x": avg_x,
"avg_y": avg_y,
"thirds": thirds.to_dict()
}
def _passing_analysis(self):
"""Analyze passing patterns."""
print("\n2. PASSING PATTERNS")
print("-" * 25)
passes = self.events[self.events["type"] == "Pass"].copy()
passes["pass_length"] = np.sqrt(
(passes["pass_end_x"] - passes["location_x"])**2 +
(passes["pass_end_y"] - passes["location_y"])**2
)
passes["is_forward"] = passes["pass_end_x"] > passes["location_x"]
passes["is_progressive"] = (passes["pass_end_x"] - passes["location_x"]) > 10
print(f"Total Passes: {len(passes)}")
print(f"Avg Pass Length: {passes['pass_length'].mean():.1f}m")
print(f"Forward Pass %: {passes['is_forward'].mean()*100:.1f}%")
print(f"Progressive Pass %: {passes['is_progressive'].mean()*100:.1f}%")
self.results["passing"] = {
"total": len(passes),
"avg_length": passes["pass_length"].mean(),
"forward_pct": passes["is_forward"].mean() * 100,
"progressive_pct": passes["is_progressive"].mean() * 100
}
def _shot_analysis(self):
"""Analyze shot locations."""
print("\n3. SHOT ANALYSIS")
print("-" * 25)
shots = self.events[self.events["type"] == "Shot"]
n_shots = len(shots)
avg_x = shots["location_x"].mean()
in_box = (
(shots["location_x"] > 102) &
(shots["location_y"] > 18) &
(shots["location_y"] < 62)
).sum()
print(f"Total Shots: {n_shots}")
print(f"Average Shot Position: x={avg_x:.1f}")
print(f"Shots in Box: {in_box} ({in_box/n_shots*100:.1f}%)")
self.results["shots"] = {
"total": n_shots,
"avg_x": avg_x,
"in_box": in_box,
"in_box_pct": in_box / n_shots * 100 if n_shots > 0 else 0
}
def _width_analysis(self):
"""Analyze pitch width utilization."""
print("\n4. WIDTH UTILIZATION")
print("-" * 25)
self.events["channel"] = pd.cut(
self.events["location_y"],
bins=[0, 20, 30, 50, 60, 80],
labels=["Left Wide", "Left Half-Space", "Central",
"Right Half-Space", "Right Wide"]
)
width = self.events["channel"].value_counts(normalize=True) * 100
print(width)
self.results["width"] = width.to_dict()
def create_summary_dashboard(self):
"""Create visual summary dashboard."""
fig = plt.figure(figsize=(16, 12))
gs = GridSpec(2, 3, figure=fig)
# 1. Heatmap
ax1 = fig.add_subplot(gs[0, :2])
self._plot_heatmap(ax1)
# 2. Thirds bar chart
ax2 = fig.add_subplot(gs[0, 2])
self._plot_thirds(ax2)
# 3. Pass length distribution
ax3 = fig.add_subplot(gs[1, 0])
self._plot_pass_lengths(ax3)
# 4. Width utilization
ax4 = fig.add_subplot(gs[1, 1])
self._plot_width(ax4)
# 5. Shot map
ax5 = fig.add_subplot(gs[1, 2])
self._plot_shots(ax5)
fig.suptitle(f"{self.team_name} - Spatial Analysis Dashboard",
fontsize=16, fontweight="bold")
plt.tight_layout()
return fig
def _plot_heatmap(self, ax):
"""Plot touch heatmap."""
ax.set_facecolor("#1a472a")
ax.hexbin(
self.events["location_x"],
self.events["location_y"],
gridsize=20,
cmap="YlOrRd",
mincnt=1
)
ax.set_xlim(0, 120)
ax.set_ylim(0, 80)
ax.set_title("Touch Heatmap")
ax.set_aspect("equal")
def _plot_thirds(self, ax):
"""Plot actions by thirds."""
thirds = self.events["third"].value_counts()
ax.bar(thirds.index, thirds.values, color=["#2E7D32", "#FFC107", "#D32F2F"])
ax.set_title("Actions by Third")
ax.set_ylabel("Count")
def _plot_pass_lengths(self, ax):
"""Plot pass length distribution."""
passes = self.events[self.events["type"] == "Pass"]
lengths = np.sqrt(
(passes["pass_end_x"] - passes["location_x"])**2 +
(passes["pass_end_y"] - passes["location_y"])**2
)
ax.hist(lengths, bins=30, color="#1976D2", edgecolor="white")
ax.set_title("Pass Length Distribution")
ax.set_xlabel("Length (m)")
def _plot_width(self, ax):
"""Plot width utilization."""
width = self.events["channel"].value_counts()
ax.barh(width.index, width.values, color="#7B1FA2")
ax.set_title("Width Utilization")
def _plot_shots(self, ax):
"""Plot shot locations."""
shots = self.events[self.events["type"] == "Shot"]
ax.set_facecolor("#1a472a")
ax.scatter(
shots["location_x"],
shots["location_y"],
c="yellow",
s=100,
edgecolors="white"
)
ax.set_xlim(60, 120)
ax.set_ylim(0, 80)
ax.set_title("Shot Locations")
ax.set_aspect("equal")
# Run analysis
man_city_events = events[events["team"] == "Manchester City"]
analyzer = TeamSpatialAnalyzer(man_city_events, "Manchester City")
results = analyzer.run_full_analysis()
dashboard = analyzer.create_summary_dashboard()
plt.show()# R: Complete Spatial Analysis Case Study
library(tidyverse)
analyze_team_spatial_style <- function(team_events, team_name) {
cat(sprintf("\n=== Spatial Analysis: %s ===\n\n", team_name))
# 1. TERRITORIAL ANALYSIS
cat("1. TERRITORIAL METRICS\n")
cat("-----------------------\n")
avg_x <- mean(team_events$location_x, na.rm = TRUE)
avg_y <- mean(team_events$location_y, na.rm = TRUE)
# Actions by third
thirds <- team_events %>%
mutate(
third = case_when(
location_x < 40 ~ "Defensive",
location_x < 80 ~ "Middle",
TRUE ~ "Attacking"
)
) %>%
count(third) %>%
mutate(pct = n / sum(n) * 100)
cat(sprintf("Average Position: x=%.1f, y=%.1f\n", avg_x, avg_y))
print(thirds)
# 2. PASSING PATTERNS
cat("\n2. PASSING PATTERNS\n")
cat("-------------------\n")
passes <- team_events %>% filter(type == "Pass")
# Pass direction analysis
passes <- passes %>%
mutate(
pass_length = sqrt((pass_end_x - location_x)^2 +
(pass_end_y - location_y)^2),
pass_angle = atan2(pass_end_y - location_y,
pass_end_x - location_x) * 180 / pi,
is_forward = pass_end_x > location_x,
is_progressive = (pass_end_x - location_x) > 10
)
cat(sprintf("Total Passes: %d\n", nrow(passes)))
cat(sprintf("Avg Pass Length: %.1f m\n", mean(passes$pass_length, na.rm = TRUE)))
cat(sprintf("Forward Pass %%: %.1f%%\n", mean(passes$is_forward) * 100))
cat(sprintf("Progressive Pass %%: %.1f%%\n", mean(passes$is_progressive) * 100))
# Passes by zone
zones <- create_zone_grid(6, 3)
passes_by_origin <- passes %>%
mutate(
zone = ceiling(location_x / 20) * 3 + ceiling(location_y / 26.67)
) %>%
count(zone, name = "passes") %>%
arrange(desc(passes))
cat("\nTop Passing Zones:\n")
print(head(passes_by_origin, 5))
# 3. SHOT LOCATIONS
cat("\n3. SHOT ANALYSIS\n")
cat("-----------------\n")
shots <- team_events %>% filter(type == "Shot")
shot_summary <- shots %>%
summarize(
n_shots = n(),
avg_x = mean(location_x, na.rm = TRUE),
avg_y = mean(location_y, na.rm = TRUE),
shots_in_box = sum(location_x > 102 & location_y > 18 & location_y < 62),
in_box_pct = shots_in_box / n_shots * 100
)
print(shot_summary)
# 4. WIDTH ANALYSIS
cat("\n4. WIDTH UTILIZATION\n")
cat("---------------------\n")
width_analysis <- team_events %>%
mutate(
channel = case_when(
location_y < 20 ~ "Left Wide",
location_y < 30 ~ "Left Half-Space",
location_y < 50 ~ "Central",
location_y < 60 ~ "Right Half-Space",
TRUE ~ "Right Wide"
)
) %>%
count(channel) %>%
mutate(pct = n / sum(n) * 100)
print(width_analysis)
# 5. CREATE HEATMAP DATA
heatmap <- create_heatmap_data(team_events, bandwidth = 8)
return(list(
territorial = thirds,
passing = passes_by_origin,
shots = shot_summary,
width = width_analysis,
heatmap = heatmap
))
}
# Run analysis
man_city_events <- events %>% filter(team == "Manchester City")
analysis <- analyze_team_spatial_style(man_city_events, "Manchester City")Practice Exercises
Task: Create a side-by-side comparison of touch heat maps for two players in the same position.
Requirements:
- Load event data for two full-backs from the same league
- Create kernel density heat maps for each player
- Display them side by side with the same color scale
- Calculate and display metrics: average touch position, spread
Task: Analyze passing patterns between pitch zones for a team.
Requirements:
- Create a 6x3 zone grid
- Calculate passes between each zone pair
- Visualize as a flow diagram or chord diagram
- Identify the most common passing lanes
Task: Using tracking data, analyze how space control changes during a possession sequence.
Requirements:
- Load tracking data for a possession sequence
- Calculate Voronoi diagrams at each frame
- Track how each team's controlled area changes
- Identify the moment of maximum space creation
Task: Implement a pitch control model that accounts for player velocities.
Requirements:
- Use tracking data with player positions and velocities
- Implement time-to-intercept calculations
- Generate pitch control surfaces for multiple frames
- Compare your model to simple Voronoi diagrams
- Analyze how pitch control changes during a counter-attack
Task: Analyze how a team's shape changes throughout a match.
Requirements:
- Calculate team compactness metrics every 5 minutes
- Track defensive line height over time
- Identify periods of high vs low pressing
- Correlate shape metrics with match events (goals, cards)
- Create a timeline visualization of shape evolution
Chapter Summary
Key Takeaways
- Zone systems: Different granularities serve different purposes—use thirds for overview, fine grids for modeling
- Heat maps: Kernel density estimation reveals continuous action distributions beyond discrete zone counts
- Territorial control: Field tilt and zone dominance show where teams actually play, beyond possession percentage
- Voronoi diagrams: Fundamental for understanding space control and are used in advanced pitch control models
- Pitch control models: Extend Voronoi by incorporating velocity and physical parameters for more realistic space analysis
- Spatial clustering: DBSCAN and similar methods reveal natural action groupings and tactical patterns
- Team shape metrics: Compactness, length, width, and defensive line analysis reveal tactical organization
Spatial Analysis Applications
Tactical Analysis
- Formation detection
- Pressing triggers
- Build-up patterns
- Defensive shape
Player Evaluation
- Role identification
- Movement profiles
- Space creation ability
- Defensive coverage
Model Building
- xG model features
- EPV surfaces
- Pass value models
- Pitch control
Key Libraries and Tools
R Libraries
sf- Spatial features and operationsggvoronoi- Voronoi visualizationMASS- Kernel density estimationdbscan- Spatial clustering
Python Libraries
geopandas- Geospatial operationsscipy.spatial- Voronoi, convex hullmplsoccer- Football pitch visualizationsklearn.cluster- DBSCAN, KMeans