Chapter 60

Capstone - Complete Analytics System

Intermediate 30 min read 5 sections 10 code examples
0 of 60 chapters completed (0%)
Learning Objectives
  • Understand spatial analysis concepts in football context
  • Create and analyze pitch zone grids of varying granularity
  • Build heat maps and kernel density estimates for player actions
  • Calculate territorial control and dominance metrics
  • Analyze passing patterns using spatial network analysis
  • Implement Voronoi diagrams for space control visualization
  • Use spatial clustering to identify action hotspots
  • Compare spatial profiles across players and teams

The Spatial Dimension of Football

Football is fundamentally a spatial game. Teams compete for territorial control, create and exploit space, and organize defensively to deny opponents space. Spatial analytics provides the tools to quantify these dynamics and uncover patterns invisible to the naked eye.

Why Spatial Analysis?

Traditional statistics count events but ignore where they happen. A pass completion rate of 85% means different things if those passes are all in the defensive third versus the attacking third. Spatial analysis adds the crucial "where" dimension to football analytics.

Zone Analysis
Divide pitch into actionable zones
Heat Maps
Visualize action density
Voronoi
Space control regions
Clustering
Identify action hotspots
spatial_intro.py
# Python: Introduction to Spatial Data in Football
import pandas as pd
import numpy as np
import geopandas as gpd
from shapely.geometry import Point
import matplotlib.pyplot as plt

# Load event data with locations
events = pd.read_csv("match_events.csv")

# Examine spatial distribution
spatial_summary = events.dropna(subset=["location_x", "location_y"]).agg({
    "location_x": ["count", "min", "max", "mean"],
    "location_y": ["min", "max", "mean"]
})
print("Spatial Summary:")
print(spatial_summary)

# Create GeoDataFrame for spatial operations
events_clean = events.dropna(subset=["location_x", "location_y"])
geometry = [Point(xy) for xy in zip(events_clean["location_x"],
                                     events_clean["location_y"])]
events_gdf = gpd.GeoDataFrame(events_clean, geometry=geometry)

# Basic spatial plot
fig, ax = plt.subplots(figsize=(12, 8))
ax.set_facecolor("darkgreen")
ax.set_xlim(0, 120)
ax.set_ylim(0, 80)

# Plot events by type
for event_type in events_gdf["type"].unique()[:5]:
    subset = events_gdf[events_gdf["type"] == event_type]
    ax.scatter(subset["location_x"], subset["location_y"],
               label=event_type, alpha=0.5, s=20)

ax.legend()
ax.set_title("Event Locations")
plt.show()
# R: Introduction to Spatial Data in Football
library(tidyverse)
library(sf)

# Load event data with locations
events <- read_csv("match_events.csv")

# Examine spatial distribution
spatial_summary <- events %>%
    filter(!is.na(location_x), !is.na(location_y)) %>%
    summarize(
        n_events = n(),
        x_min = min(location_x),
        x_max = max(location_x),
        y_min = min(location_y),
        y_max = max(location_y),
        x_mean = mean(location_x),
        y_mean = mean(location_y)
    )

print(spatial_summary)

# Create spatial points object
events_sf <- events %>%
    filter(!is.na(location_x), !is.na(location_y)) %>%
    st_as_sf(coords = c("location_x", "location_y"))

# Basic spatial plot
ggplot() +
    # Pitch outline
    annotate("rect", xmin = 0, xmax = 120, ymin = 0, ymax = 80,
             fill = "darkgreen", alpha = 0.3) +
    geom_sf(data = events_sf, aes(color = type), alpha = 0.5, size = 1) +
    coord_sf() +
    theme_minimal() +
    labs(title = "Event Locations")
Output
Spatial Summary:
           location_x                  location_y
count      1623.000000
min           0.100000                     0.100000
max         119.900000                    79.900000
mean         60.234000                    39.876000

Pitch Zone Analysis

Dividing the pitch into zones allows aggregation and comparison of actions across different areas. The choice of zone system depends on your analysis goals.

Zone System Grid Size Best For
Thirds 3 zones (Defensive, Middle, Attacking) High-level territorial analysis
18-Zone 6x3 grid Tactical zone analysis
Juego de Posición 5 vertical lanes, 4 horizontal bands Positional play analysis
Fine Grid 12x8 or finer Heat maps, xG models
pitch_zones.py
# Python: Creating Pitch Zone Systems
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle

def create_zone_grid(n_x=6, n_y=3, pitch_length=120, pitch_width=80):
    """Create a grid of pitch zones."""
    zone_width = pitch_length / n_x
    zone_height = pitch_width / n_y

    zones = []
    zone_id = 1
    for x_zone in range(1, n_x + 1):
        for y_zone in range(1, n_y + 1):
            zones.append({
                "zone_id": zone_id,
                "x_zone": x_zone,
                "y_zone": y_zone,
                "x_min": (x_zone - 1) * zone_width,
                "x_max": x_zone * zone_width,
                "y_min": (y_zone - 1) * zone_height,
                "y_max": y_zone * zone_height,
                "x_center": (x_zone - 0.5) * zone_width,
                "y_center": (y_zone - 0.5) * zone_height,
            })
            zone_id += 1

    return pd.DataFrame(zones)

# Create 18-zone grid (6x3)
zones_18 = create_zone_grid(6, 3)

def assign_zones(events, zones):
    """Assign zone IDs to events based on location."""
    n_x = zones["x_zone"].max()
    n_y = zones["y_zone"].max()

    events = events.copy()
    events["x_zone"] = np.ceil(events["location_x"] / (120 / n_x)).clip(1, n_x).astype(int)
    events["y_zone"] = np.ceil(events["location_y"] / (80 / n_y)).clip(1, n_y).astype(int)
    events["zone_id"] = (events["x_zone"] - 1) * n_y + events["y_zone"]

    return events

# Assign zones to events
events_zoned = assign_zones(events, zones_18)

# Calculate zone statistics
zone_stats = events_zoned.groupby("zone_id").agg(
    n_events=("index", "count"),
    n_passes=("type", lambda x: (x == "Pass").sum()),
    n_shots=("type", lambda x: (x == "Shot").sum()),
).reset_index()

zone_stats = zone_stats.merge(zones_18, on="zone_id")
print(zone_stats)
# R: Creating Pitch Zone Systems
library(tidyverse)

# Define zone systems
create_zone_grid <- function(n_x = 6, n_y = 3, pitch_length = 120, pitch_width = 80) {
    zone_width <- pitch_length / n_x
    zone_height <- pitch_width / n_y

    zones <- expand_grid(
        x_zone = 1:n_x,
        y_zone = 1:n_y
    ) %>%
        mutate(
            zone_id = row_number(),
            x_min = (x_zone - 1) * zone_width,
            x_max = x_zone * zone_width,
            y_min = (y_zone - 1) * zone_height,
            y_max = y_zone * zone_height,
            x_center = (x_min + x_max) / 2,
            y_center = (y_min + y_max) / 2
        )

    return(zones)
}

# Create 18-zone grid (6x3)
zones_18 <- create_zone_grid(6, 3)

# Assign events to zones
assign_zones <- function(events, zones) {
    events %>%
        mutate(
            x_zone = ceiling(location_x / (120 / max(zones$x_zone))),
            y_zone = ceiling(location_y / (80 / max(zones$y_zone))),
            zone_id = (x_zone - 1) * max(zones$y_zone) + y_zone
        ) %>%
        # Clamp to valid zone range
        mutate(
            x_zone = pmin(pmax(x_zone, 1), max(zones$x_zone)),
            y_zone = pmin(pmax(y_zone, 1), max(zones$y_zone))
        )
}

# Assign zones to events
events_zoned <- assign_zones(events, zones_18)

# Calculate zone statistics
zone_stats <- events_zoned %>%
    group_by(zone_id) %>%
    summarize(
        n_events = n(),
        n_passes = sum(type == "Pass"),
        n_shots = sum(type == "Shot"),
        pass_success_rate = mean(type == "Pass" & pass_outcome == "Complete", na.rm = TRUE),
        .groups = "drop"
    ) %>%
    left_join(zones_18, by = "zone_id")

print(zone_stats)
Output
   zone_id  n_events  n_passes  n_shots  x_zone  y_zone  x_min  x_max  ...
0        1        45        32        0       1       1    0.0   20.0  ...
1        2        67        48        0       1       2    0.0   20.0  ...
2        3        52        38        0       1       3    0.0   20.0  ...
3        4        89        62        0       2       1   20.0   40.0  ...
...

Visualizing Zone Statistics

zone_visualization.py
# Python: Visualize Zone Statistics on Pitch
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
from matplotlib.colors import LinearSegmentedColormap
import numpy as np

def plot_zone_heatmap(zone_stats, metric="n_events", title="Zone Activity"):
    """Plot zone statistics as a heatmap on the pitch."""
    fig, ax = plt.subplots(figsize=(14, 9))

    # Color normalization
    values = zone_stats[metric].values
    vmin, vmax = values.min(), values.max()

    # Custom colormap (green to orange)
    cmap = LinearSegmentedColormap.from_list("pitch", ["#2E7D32", "#FF5722"])

    # Draw zones
    for _, zone in zone_stats.iterrows():
        color_val = (zone[metric] - vmin) / (vmax - vmin) if vmax > vmin else 0
        rect = Rectangle(
            (zone["x_min"], zone["y_min"]),
            zone["x_max"] - zone["x_min"],
            zone["y_max"] - zone["y_min"],
            facecolor=cmap(color_val),
            edgecolor="white",
            linewidth=2
        )
        ax.add_patch(rect)

        # Zone label
        ax.text(zone["x_center"], zone["y_center"],
                f"{zone[metric]:.0f}",
                ha="center", va="center",
                color="white", fontweight="bold", fontsize=12)

    # Pitch markings
    ax.plot([60, 60], [0, 80], "w-", linewidth=1)
    ax.add_patch(Rectangle((0, 18), 18, 44, fill=False, edgecolor="white"))
    ax.add_patch(Rectangle((102, 18), 18, 44, fill=False, edgecolor="white"))

    ax.set_xlim(0, 120)
    ax.set_ylim(0, 80)
    ax.set_aspect("equal")
    ax.set_facecolor("#1a472a")
    ax.set_title(title, fontsize=14, fontweight="bold", color="white")
    ax.axis("off")

    # Colorbar
    sm = plt.cm.ScalarMappable(cmap=cmap, norm=plt.Normalize(vmin, vmax))
    plt.colorbar(sm, ax=ax, label=metric, shrink=0.6)

    plt.tight_layout()
    return fig

# Plot event density by zone
fig = plot_zone_heatmap(zone_stats, "n_events", "Event Density by Zone")
plt.show()
# R: Visualize Zone Statistics on Pitch
library(ggplot2)

plot_zone_heatmap <- function(zone_stats, metric = "n_events", title = "Zone Activity") {
    ggplot(zone_stats) +
        # Zone rectangles
        geom_rect(aes(xmin = x_min, xmax = x_max,
                      ymin = y_min, ymax = y_max,
                      fill = !!sym(metric)),
                  color = "white", size = 0.5) +
        # Zone labels
        geom_text(aes(x = x_center, y = y_center,
                      label = round(!!sym(metric), 1)),
                  color = "white", fontface = "bold", size = 4) +
        # Pitch markings
        annotate("rect", xmin = 0, xmax = 120, ymin = 0, ymax = 80,
                 fill = NA, color = "white", size = 1) +
        annotate("segment", x = 60, xend = 60, y = 0, yend = 80,
                 color = "white", size = 0.5) +
        # Penalty areas
        annotate("rect", xmin = 0, xmax = 18, ymin = 18, ymax = 62,
                 fill = NA, color = "white", size = 0.5) +
        annotate("rect", xmin = 102, xmax = 120, ymin = 18, ymax = 62,
                 fill = NA, color = "white", size = 0.5) +
        scale_fill_gradient(low = "#2E7D32", high = "#FF5722") +
        coord_fixed() +
        theme_void() +
        labs(title = title, fill = metric)
}

# Plot event density by zone
plot_zone_heatmap(zone_stats, "n_events", "Event Density by Zone")

Heat Maps and Kernel Density

Heat maps provide continuous representations of action density, revealing patterns that discrete zone analysis might miss.

heatmaps.py
# Python: Creating Heat Maps with Kernel Density
import numpy as np
import pandas as pd
from scipy import stats
import matplotlib.pyplot as plt
from mplsoccer import Pitch

def create_heatmap_data(events, bandwidth=5):
    """Create kernel density estimate for event locations."""
    # Filter to events with valid locations
    loc_data = events.dropna(subset=["location_x", "location_y"])

    x = loc_data["location_x"].values
    y = loc_data["location_y"].values

    # Create grid
    xx, yy = np.mgrid[0:120:100j, 0:80:100j]
    positions = np.vstack([xx.ravel(), yy.ravel()])

    # Kernel density estimation
    kernel = stats.gaussian_kde([x, y], bw_method=bandwidth / 100)
    density = kernel(positions).reshape(xx.shape)

    return xx, yy, density

# Create player touch heatmap
player_touches = events[
    (events["player"] == "Lionel Messi") &
    (events["type"].isin(["Pass", "Carry", "Shot", "Dribble"]))
]

xx, yy, density = create_heatmap_data(player_touches, bandwidth=8)

# Plot using mplsoccer
pitch = Pitch(pitch_type="statsbomb", pitch_color="#1a472a", line_color="white")
fig, ax = pitch.draw(figsize=(12, 8))

# Add heatmap
heatmap = ax.contourf(xx, yy, density, levels=20, cmap="YlOrRd", alpha=0.8)
plt.colorbar(heatmap, ax=ax, label="Density")

ax.set_title("Lionel Messi - Touch Heatmap", fontsize=14)
plt.show()

# Alternative: Using mplsoccer built-in heatmap
pitch = Pitch(pitch_type="statsbomb", pitch_color="#1a472a", line_color="white")
fig, ax = pitch.draw(figsize=(12, 8))

# mplsoccer heatmap
bin_statistic = pitch.bin_statistic(
    player_touches["location_x"],
    player_touches["location_y"],
    statistic="count",
    bins=(12, 8)
)
pitch.heatmap(bin_statistic, ax=ax, cmap="YlOrRd", edgecolors="#1a472a")
plt.show()
# R: Creating Heat Maps with Kernel Density
library(tidyverse)
library(MASS)

create_heatmap_data <- function(events, bandwidth = 5) {
    # Filter to events with valid locations
    loc_data <- events %>%
        filter(!is.na(location_x), !is.na(location_y))

    # Create kernel density estimate
    kde <- kde2d(
        x = loc_data$location_x,
        y = loc_data$location_y,
        n = 100,  # Grid resolution
        h = bandwidth,  # Bandwidth (smoothing)
        lims = c(0, 120, 0, 80)  # Pitch limits
    )

    # Convert to dataframe for ggplot
    heatmap_df <- expand_grid(
        x = kde$x,
        y = kde$y
    ) %>%
        mutate(
            density = as.vector(kde$z)
        )

    return(heatmap_df)
}

# Create player touch heatmap
player_touches <- events %>%
    filter(player == "Lionel Messi",
           type %in% c("Pass", "Carry", "Shot", "Dribble"))

heatmap_data <- create_heatmap_data(player_touches, bandwidth = 8)

# Plot heatmap
ggplot() +
    # Pitch background
    annotate("rect", xmin = 0, xmax = 120, ymin = 0, ymax = 80,
             fill = "#1a472a") +
    # Heatmap
    geom_raster(data = heatmap_data,
                aes(x = x, y = y, fill = density),
                interpolate = TRUE) +
    scale_fill_gradient2(low = "transparent", mid = "yellow",
                         high = "red", midpoint = max(heatmap_data$density) / 2) +
    # Pitch markings
    annotate("segment", x = 60, xend = 60, y = 0, yend = 80, color = "white") +
    coord_fixed() +
    theme_void() +
    labs(title = "Lionel Messi - Touch Heatmap")

Comparative Heat Maps

comparison_heatmaps.py
# Python: Comparative Heat Maps (Player vs Team Average)
import numpy as np
import matplotlib.pyplot as plt
from mplsoccer import Pitch

def create_comparison_heatmap(player_events, team_events, player_name):
    """Create heatmap comparing player to team average."""
    # Calculate densities
    xx_p, yy_p, player_density = create_heatmap_data(player_events)
    xx_t, yy_t, team_density = create_heatmap_data(team_events)

    # Normalize team density to player sample size
    scale_factor = len(player_events) / len(team_events)
    team_density_scaled = team_density * scale_factor

    # Calculate difference
    diff_density = player_density - team_density_scaled

    # Plot
    pitch = Pitch(pitch_type="statsbomb", pitch_color="#1a472a", line_color="white")
    fig, ax = pitch.draw(figsize=(12, 8))

    # Difference heatmap (red = more than team, blue = less than team)
    heatmap = ax.contourf(
        xx_p, yy_p, diff_density,
        levels=np.linspace(diff_density.min(), diff_density.max(), 20),
        cmap="RdBu_r",
        alpha=0.8
    )
    plt.colorbar(heatmap, ax=ax, label="Diff from Team Avg")

    ax.set_title(f"{player_name} - Deviation from Team Average", fontsize=14)
    return fig

# Compare Messi to team average
messi_events = events[events["player"] == "Lionel Messi"]
barcelona_events = events[events["team"] == "Barcelona"]

fig = create_comparison_heatmap(messi_events, barcelona_events, "Lionel Messi")
plt.show()
# R: Comparative Heat Maps (Player vs Team Average)
library(tidyverse)
library(patchwork)

create_comparison_heatmap <- function(player_events, team_events, player_name) {
    # Get player and team densities
    player_kde <- create_heatmap_data(player_events, bandwidth = 8)
    team_kde <- create_heatmap_data(team_events, bandwidth = 8)

    # Calculate difference (player - team average)
    comparison <- player_kde %>%
        mutate(
            team_density = team_kde$density / nrow(team_events) * nrow(player_events),
            diff_density = density - team_density,
            pct_diff = (density - team_density) / (team_density + 0.001) * 100
        )

    # Plot showing where player is more/less active than team average
    ggplot() +
        annotate("rect", xmin = 0, xmax = 120, ymin = 0, ymax = 80,
                 fill = "#1a472a") +
        geom_raster(data = comparison,
                    aes(x = x, y = y, fill = diff_density),
                    interpolate = TRUE) +
        scale_fill_gradient2(low = "blue", mid = "white", high = "red",
                            midpoint = 0, name = "Diff from\nTeam Avg") +
        coord_fixed() +
        theme_void() +
        labs(title = sprintf("%s - Deviation from Team Average", player_name))
}

# Compare Messi to team average
messi_comparison <- create_comparison_heatmap(
    player_events = events %>% filter(player == "Lionel Messi"),
    team_events = events %>% filter(team == "Barcelona"),
    player_name = "Lionel Messi"
)

print(messi_comparison)

Territorial Control Metrics

Territorial control measures which team dominates different areas of the pitch. This goes beyond possession percentage to show where a team is actually playing.

territorial_control.py
# Python: Calculating Territorial Control
import pandas as pd
import numpy as np

def calculate_territorial_control(events, team1, team2, zone_grid):
    """Calculate territorial control metrics by zone."""
    # Assign zones
    n_x = zone_grid["x_zone"].max()
    n_y = zone_grid["y_zone"].max()

    events = events.copy()
    events["x_zone"] = np.ceil(events["location_x"] / (120 / n_x)).clip(1, n_x).astype(int)
    events["y_zone"] = np.ceil(events["location_y"] / (80 / n_y)).clip(1, n_y).astype(int)
    events["zone_id"] = (events["x_zone"] - 1) * n_y + events["y_zone"]

    # Count events by team and zone
    zone_counts = events.groupby(["zone_id", "team"]).size().unstack(fill_value=0)

    # Calculate control metrics
    zone_control = pd.DataFrame({
        "zone_id": zone_counts.index,
        team1: zone_counts.get(team1, 0),
        team2: zone_counts.get(team2, 0)
    })

    zone_control["total"] = zone_control[team1] + zone_control[team2]
    zone_control["team1_share"] = zone_control[team1] / zone_control["total"]
    zone_control["team2_share"] = zone_control[team2] / zone_control["total"]
    zone_control["dominance"] = zone_control["team1_share"] - zone_control["team2_share"]
    zone_control["contested"] = (
        (zone_control["team1_share"] > 0.3) & (zone_control["team2_share"] > 0.3)
    )

    zone_control = zone_control.merge(zone_grid, on="zone_id")

    # Overall metrics
    overall = {
        "team1_territory": zone_control["team1_share"].mean(),
        "team2_territory": zone_control["team2_share"].mean(),
        "field_tilt_team1": (
            zone_control[zone_control["x_zone"] >= 4][team1].sum() /
            zone_control[zone_control["x_zone"] >= 4]["total"].sum()
        )
    }

    return {"zones": zone_control, "overall": overall}

# Calculate territorial control
zones = create_zone_grid(6, 3)
territorial = calculate_territorial_control(events, "Barcelona", "Real Madrid", zones)

print("Territorial Control:")
print(f"Barcelona: {territorial['overall']['team1_territory']*100:.1f}%")
print(f"Real Madrid: {territorial['overall']['team2_territory']*100:.1f}%")
print(f"Barcelona Field Tilt: {territorial['overall']['field_tilt_team1']*100:.1f}%")
# R: Calculating Territorial Control
library(tidyverse)

calculate_territorial_control <- function(events, team1, team2, zone_grid) {
    # Assign zones to events
    events_zoned <- events %>%
        mutate(
            x_zone = ceiling(location_x / (120 / max(zone_grid$x_zone))),
            y_zone = ceiling(location_y / (80 / max(zone_grid$y_zone)))
        ) %>%
        mutate(
            x_zone = pmin(pmax(x_zone, 1), max(zone_grid$x_zone)),
            y_zone = pmin(pmax(y_zone, 1), max(zone_grid$y_zone)),
            zone_id = (x_zone - 1) * max(zone_grid$y_zone) + y_zone
        )

    # Count events by team and zone
    zone_control <- events_zoned %>%
        count(zone_id, team, name = "n_events") %>%
        pivot_wider(
            names_from = team,
            values_from = n_events,
            values_fill = 0
        )

    # Calculate control metrics
    zone_control <- zone_control %>%
        mutate(
            total = !!sym(team1) + !!sym(team2),
            team1_share = !!sym(team1) / total,
            team2_share = !!sym(team2) / total,
            dominance = team1_share - team2_share,  # Positive = team1 dominant
            contested = pmin(team1_share, team2_share) > 0.3  # Both teams active
        ) %>%
        left_join(zone_grid, by = "zone_id")

    # Overall territorial metrics
    overall <- list(
        team1_territory = mean(zone_control$team1_share, na.rm = TRUE),
        team2_territory = mean(zone_control$team2_share, na.rm = TRUE),
        field_tilt_team1 = zone_control %>%
            filter(x_zone >= 4) %>%  # Attacking half
            summarize(tilt = sum(!!sym(team1)) / sum(total)) %>%
            pull(tilt)
    )

    return(list(zones = zone_control, overall = overall))
}

# Calculate territorial control
zones <- create_zone_grid(6, 3)
territorial <- calculate_territorial_control(events, "Barcelona", "Real Madrid", zones)

cat("Territorial Control:\n")
cat(sprintf("Barcelona: %.1f%%\n", territorial$overall$team1_territory * 100))
cat(sprintf("Real Madrid: %.1f%%\n", territorial$overall$team2_territory * 100))
cat(sprintf("Barcelona Field Tilt: %.1f%%\n", territorial$overall$field_tilt_team1 * 100))
Output
Territorial Control:
Barcelona: 58.3%
Real Madrid: 41.7%
Barcelona Field Tilt: 62.1%

Voronoi Diagrams for Space Control

Voronoi diagrams partition the pitch into regions where each player controls the space closest to them. This is fundamental for understanding space occupation and is used in advanced pitch control models.

voronoi.py
# Python: Creating Voronoi Diagrams
import numpy as np
import pandas as pd
from scipy.spatial import Voronoi, voronoi_plot_2d
import matplotlib.pyplot as plt
from matplotlib.patches import Polygon
from matplotlib.collections import PatchCollection

def create_voronoi_frame(player_positions, pitch_bounds=(0, 120, 0, 80)):
    """Create Voronoi diagram from player positions."""
    # Add boundary points to ensure proper tessellation
    x_min, x_max, y_min, y_max = pitch_bounds
    boundary_points = np.array([
        [x_min - 10, y_min - 10], [x_max + 10, y_min - 10],
        [x_min - 10, y_max + 10], [x_max + 10, y_max + 10]
    ])

    # Combine player positions with boundary
    points = np.array(list(zip(player_positions["x"], player_positions["y"])))
    all_points = np.vstack([points, boundary_points])

    # Calculate Voronoi
    vor = Voronoi(all_points)

    return vor, points

def plot_voronoi_pitch(player_positions, pitch_bounds=(0, 120, 0, 80)):
    """Plot Voronoi diagram on pitch."""
    vor, points = create_voronoi_frame(player_positions, pitch_bounds)
    x_min, x_max, y_min, y_max = pitch_bounds

    fig, ax = plt.subplots(figsize=(14, 9))
    ax.set_facecolor("#1a472a")

    # Plot Voronoi regions
    for i, region_idx in enumerate(vor.point_region[:len(player_positions)]):
        if region_idx == -1:
            continue

        region = vor.regions[region_idx]
        if -1 in region or len(region) == 0:
            continue

        # Get vertices and clip to pitch
        vertices = np.array([vor.vertices[v] for v in region])
        vertices[:, 0] = np.clip(vertices[:, 0], x_min, x_max)
        vertices[:, 1] = np.clip(vertices[:, 1], y_min, y_max)

        # Color by team
        team = player_positions.iloc[i]["team"]
        color = "#3498db" if team == "Home" else "#e74c3c"

        polygon = Polygon(vertices, alpha=0.3, facecolor=color, edgecolor="white")
        ax.add_patch(polygon)

    # Plot player positions
    for team, color, marker in [("Home", "#3498db", "o"), ("Away", "#e74c3c", "s")]:
        team_players = player_positions[player_positions["team"] == team]
        ax.scatter(team_players["x"], team_players["y"],
                   c=color, s=200, marker=marker, edgecolor="white",
                   linewidth=2, label=team, zorder=5)

    ax.set_xlim(x_min, x_max)
    ax.set_ylim(y_min, y_max)
    ax.set_aspect("equal")
    ax.legend()
    ax.set_title("Voronoi Space Control", fontsize=14)
    plt.show()

# Example freeze frame
freeze_frame = pd.DataFrame({
    "player": ["GK1", "DEF1", "DEF2", "MID1", "MID2", "FWD1",
               "GK2", "DEF3", "DEF4", "MID3", "MID4", "FWD2"],
    "x": [5, 25, 30, 45, 50, 75, 115, 90, 85, 70, 65, 45],
    "y": [40, 25, 55, 35, 50, 40, 40, 55, 25, 60, 20, 40],
    "team": ["Home"]*6 + ["Away"]*6
})

plot_voronoi_pitch(freeze_frame)
# R: Creating Voronoi Diagrams
library(tidyverse)
library(ggvoronoi)
library(deldir)

create_voronoi_frame <- function(player_positions, pitch_bounds = c(0, 120, 0, 80)) {
    # Player positions at a moment in time
    # Input: dataframe with player, x, y, team columns

    # Calculate Voronoi tessellation
    voronoi <- deldir(
        x = player_positions$x,
        y = player_positions$y,
        rw = pitch_bounds  # Bounding rectangle
    )

    # Extract tiles
    tiles <- tile.list(voronoi)

    # Convert to dataframe for plotting
    tile_df <- map_df(seq_along(tiles), function(i) {
        tile <- tiles[[i]]
        tibble(
            player = player_positions$player[i],
            team = player_positions$team[i],
            x = tile$x,
            y = tile$y,
            tile_id = i
        )
    })

    return(tile_df)
}

# Example: Create Voronoi from freeze frame data
freeze_frame <- data.frame(
    player = c("GK1", "DEF1", "DEF2", "MID1", "MID2", "FWD1",
               "GK2", "DEF3", "DEF4", "MID3", "MID4", "FWD2"),
    x = c(5, 25, 30, 45, 50, 75, 115, 90, 85, 70, 65, 45),
    y = c(40, 25, 55, 35, 50, 40, 40, 55, 25, 60, 20, 40),
    team = c(rep("Home", 6), rep("Away", 6))
)

voronoi_tiles <- create_voronoi_frame(freeze_frame)

# Calculate space controlled by each team
space_control <- voronoi_tiles %>%
    group_by(tile_id, team) %>%
    summarize(
        area = abs(sum(x * lead(y, default = first(y)) -
                      lead(x, default = first(x)) * y) / 2),
        .groups = "drop"
    ) %>%
    group_by(team) %>%
    summarize(total_area = sum(area))

print(space_control)

Spatial Clustering

Spatial clustering identifies natural groupings of actions on the pitch, revealing tactical patterns like preferred shooting zones, passing lanes, or defensive blocks.

spatial_clustering.py
# Python: Spatial Clustering of Actions
import pandas as pd
import numpy as np
from sklearn.cluster import DBSCAN, KMeans
import matplotlib.pyplot as plt

def cluster_shots(shots, eps=8, min_samples=3):
    """Cluster shot locations using DBSCAN."""
    # Extract coordinates
    coords = shots[["location_x", "location_y"]].values

    # Run DBSCAN
    clustering = DBSCAN(eps=eps, min_samples=min_samples)
    shots = shots.copy()
    shots["cluster"] = clustering.fit_predict(coords)

    # Summarize clusters
    cluster_summary = shots[shots["cluster"] >= 0].groupby("cluster").agg(
        n_shots=("index", "count"),
        n_goals=("shot_outcome", lambda x: (x == "Goal").sum()),
        avg_xg=("shot_statsbomb_xg", "mean"),
        center_x=("location_x", "mean"),
        center_y=("location_y", "mean")
    ).reset_index()

    cluster_summary["conversion_rate"] = (
        cluster_summary["n_goals"] / cluster_summary["n_shots"]
    )

    # Label clusters
    def label_cluster(row):
        if row["center_x"] > 102:
            if 30 < row["center_y"] < 50:
                return "Central Box"
            elif row["center_y"] <= 30:
                return "Right Side Box"
            else:
                return "Left Side Box"
        elif row["center_x"] > 80:
            return "Edge of Box"
        else:
            return "Long Range"

    cluster_summary["cluster_label"] = cluster_summary.apply(label_cluster, axis=1)

    return {"shots": shots, "clusters": cluster_summary}

# Cluster team shots
shots_data = events[(events["type"] == "Shot") & (events["team"] == "Barcelona")]
shot_clusters = cluster_shots(shots_data, eps=6, min_samples=2)

print("Shot Clusters:")
print(shot_clusters["clusters"])

# Visualize clusters
def plot_shot_clusters(shot_data, cluster_summary):
    """Plot shot clusters on pitch."""
    fig, ax = plt.subplots(figsize=(12, 8))
    ax.set_facecolor("#1a472a")

    # Plot shots colored by cluster
    scatter = ax.scatter(
        shot_data["location_x"],
        shot_data["location_y"],
        c=shot_data["cluster"],
        cmap="tab10",
        s=100,
        alpha=0.7,
        edgecolors="white"
    )

    # Plot cluster centers
    for _, cluster in cluster_summary.iterrows():
        ax.scatter(cluster["center_x"], cluster["center_y"],
                   marker="X", s=300, c="yellow", edgecolors="black",
                   linewidths=2, zorder=10)
        ax.annotate(cluster["cluster_label"],
                    (cluster["center_x"], cluster["center_y"]),
                    textcoords="offset points", xytext=(10, 10),
                    fontsize=9, color="white")

    ax.set_xlim(60, 120)
    ax.set_ylim(0, 80)
    ax.set_aspect("equal")
    ax.set_title("Shot Clusters", fontsize=14)
    plt.show()

plot_shot_clusters(shot_clusters["shots"], shot_clusters["clusters"])
# R: Spatial Clustering of Actions
library(tidyverse)
library(dbscan)
library(cluster)

# DBSCAN clustering for shot locations
cluster_shots <- function(shots, eps = 8, min_pts = 3) {
    # Extract coordinates
    coords <- shots %>%
        select(location_x, location_y) %>%
        as.matrix()

    # Run DBSCAN
    clustering <- dbscan(coords, eps = eps, minPts = min_pts)

    shots$cluster <- clustering$cluster

    # Summarize clusters
    cluster_summary <- shots %>%
        filter(cluster > 0) %>%  # Exclude noise (cluster 0)
        group_by(cluster) %>%
        summarize(
            n_shots = n(),
            n_goals = sum(outcome == "Goal"),
            avg_xg = mean(xg, na.rm = TRUE),
            center_x = mean(location_x),
            center_y = mean(location_y),
            .groups = "drop"
        ) %>%
        mutate(
            conversion_rate = n_goals / n_shots,
            cluster_label = case_when(
                center_x > 102 & center_y > 30 & center_y < 50 ~ "Central Box",
                center_x > 102 & center_y <= 30 ~ "Right Side Box",
                center_x > 102 & center_y >= 50 ~ "Left Side Box",
                center_x <= 102 & center_x > 80 ~ "Edge of Box",
                TRUE ~ "Long Range"
            )
        )

    return(list(shots = shots, clusters = cluster_summary))
}

# Cluster team shots
shot_clusters <- cluster_shots(
    events %>% filter(type == "Shot", team == "Barcelona"),
    eps = 6,
    min_pts = 2
)

print(shot_clusters$clusters)
Output
Shot Clusters:
   cluster  n_shots  n_goals    avg_xg  center_x  center_y  conversion_rate   cluster_label
0        0       12        3  0.152     108.2      38.5             0.250     Central Box
1        1        6        1  0.087     105.1      22.3             0.167   Right Side Box
2        2        5        0  0.065      92.4      40.1             0.000     Edge of Box

Practice Exercises

Task: Create a side-by-side comparison of touch heat maps for two players in the same position.

Requirements:

  • Load event data for two full-backs from the same league
  • Create kernel density heat maps for each player
  • Display them side by side with the same color scale
  • Calculate and display metrics: average touch position, spread

Task: Analyze passing patterns between pitch zones for a team.

Requirements:

  • Create a 6x3 zone grid
  • Calculate passes between each zone pair
  • Visualize as a flow diagram or chord diagram
  • Identify the most common passing lanes

Task: Using tracking data, analyze how space control changes during a possession sequence.

Requirements:

  • Load tracking data for a possession sequence
  • Calculate Voronoi diagrams at each frame
  • Track how each team's controlled area changes
  • Identify the moment of maximum space creation

Pitch Control Models

Pitch control models extend Voronoi analysis by incorporating player velocities, reaction times, and physical capabilities to estimate which team is most likely to reach each point on the pitch first. This provides a more realistic representation of space control.

Key Concept: From Static to Dynamic

While Voronoi diagrams assume all players have equal ability to reach any point, pitch control models account for current velocity, maximum speed, and time to intercept. This makes them far more accurate for real-time tactical analysis.

Components of Pitch Control

Player Velocity

Current speed and direction affect how quickly a player can reach different areas of the pitch.

Reaction Time

Time delay before a player can change direction or accelerate towards the ball.

Physical Parameters

Maximum speed, acceleration, and deceleration capabilities vary by player.

pitch_control.py
# Python: Pitch Control Model Implementation
import numpy as np
import pandas as pd
from scipy.ndimage import gaussian_filter
import matplotlib.pyplot as plt

class PitchControlModel:
    """
    Pitch control model based on Spearman et al. (2017).
    Estimates probability of each team controlling space.
    """

    def __init__(self, max_speed=5.0, reaction_time=0.7,
                 sigma=0.45, time_to_control=0.5):
        self.max_speed = max_speed
        self.reaction_time = reaction_time
        self.sigma = sigma
        self.time_to_control = time_to_control

    def time_to_intercept(self, player_x, player_y, player_vx, player_vy,
                          target_x, target_y):
        """Calculate time for player to reach target location."""
        # Distance to target
        dx = target_x - player_x
        dy = target_y - player_y
        distance = np.sqrt(dx**2 + dy**2)

        if distance > 0:
            # Direction to target
            dir_x = dx / distance
            dir_y = dy / distance
            # Velocity component towards target
            velocity_towards = player_vx * dir_x + player_vy * dir_y
        else:
            velocity_towards = 0

        # Effective speed (accounting for current momentum)
        effective_speed = max(velocity_towards, 0) + self.max_speed * 0.7

        # Total time = reaction + movement
        time = self.reaction_time + distance / effective_speed

        return time

    def control_probability(self, target_x, target_y,
                           team_players, opponent_players):
        """Calculate control probability at a single point."""
        # Get minimum intercept time for each team
        team_times = [
            self.time_to_intercept(p["x"], p["y"], p["vx"], p["vy"],
                                   target_x, target_y)
            for p in team_players
        ]

        opp_times = [
            self.time_to_intercept(p["x"], p["y"], p["vx"], p["vy"],
                                   target_x, target_y)
            for p in opponent_players
        ]

        min_team_time = min(team_times)
        min_opp_time = min(opp_times)

        # Logistic function for probability
        team_prob = 1 / (1 + np.exp(min_team_time / self.sigma))
        opp_prob = 1 / (1 + np.exp(min_opp_time / self.sigma))

        # Normalize
        total = team_prob + opp_prob
        control = team_prob / total if total > 0 else 0.5

        return control

    def generate_surface(self, team_players, opponent_players,
                        grid_size=50, pitch_length=120, pitch_width=80):
        """Generate full pitch control surface."""
        x_grid = np.linspace(0, pitch_length, grid_size)
        y_grid = np.linspace(0, pitch_width, grid_size)

        control_surface = np.zeros((grid_size, grid_size))

        for i, x in enumerate(x_grid):
            for j, y in enumerate(y_grid):
                control_surface[j, i] = self.control_probability(
                    x, y, team_players, opponent_players
                )

        # Apply smoothing
        control_surface = gaussian_filter(control_surface, sigma=1)

        return x_grid, y_grid, control_surface

    def plot_pitch_control(self, team_players, opponent_players):
        """Visualize pitch control surface."""
        x_grid, y_grid, surface = self.generate_surface(
            team_players, opponent_players
        )

        fig, ax = plt.subplots(figsize=(14, 9))

        # Plot control surface
        im = ax.contourf(x_grid, y_grid, surface, levels=20,
                         cmap="RdBu", vmin=0, vmax=1, alpha=0.8)
        plt.colorbar(im, ax=ax, label="Control Probability (Team)")

        # Plot players
        for p in team_players:
            ax.scatter(p["x"], p["y"], c="blue", s=200,
                      edgecolors="white", linewidth=2, zorder=5)
            ax.arrow(p["x"], p["y"], p["vx"]*2, p["vy"]*2,
                    head_width=1, head_length=0.5, fc="blue", ec="blue")

        for p in opponent_players:
            ax.scatter(p["x"], p["y"], c="red", s=200,
                      edgecolors="white", linewidth=2, zorder=5)
            ax.arrow(p["x"], p["y"], p["vx"]*2, p["vy"]*2,
                    head_width=1, head_length=0.5, fc="red", ec="red")

        ax.set_xlim(0, 120)
        ax.set_ylim(0, 80)
        ax.set_aspect("equal")
        ax.set_title("Pitch Control Model", fontsize=14)

        # Calculate summary stats
        team_control = np.mean(surface)
        attacking_control = np.mean(surface[:, x_grid > 80])

        ax.text(60, -5, f"Overall: {team_control:.1%} | Attacking Third: {attacking_control:.1%}",
                ha="center", fontsize=10)

        return fig

# Example usage
model = PitchControlModel()

team_players = [
    {"x": 5, "y": 40, "vx": 0, "vy": 0},
    {"x": 25, "y": 20, "vx": 2, "vy": 1},
    {"x": 30, "y": 60, "vx": 1, "vy": -1},
    {"x": 45, "y": 40, "vx": 3, "vy": 0},
    {"x": 55, "y": 25, "vx": 4, "vy": 2},
    {"x": 70, "y": 50, "vx": 3, "vy": -1},
]

opponent_players = [
    {"x": 115, "y": 40, "vx": 0, "vy": 0},
    {"x": 95, "y": 55, "vx": -3, "vy": 1},
    {"x": 90, "y": 25, "vx": -2, "vy": -1},
    {"x": 75, "y": 45, "vx": -4, "vy": 0},
    {"x": 65, "y": 60, "vx": -2, "vy": 1},
    {"x": 50, "y": 35, "vx": -1, "vy": -2},
]

fig = model.plot_pitch_control(team_players, opponent_players)
plt.show()
# R: Pitch Control Model Implementation
library(tidyverse)

# Physical parameters
PARAMS <- list(
    max_speed = 5.0,           # m/s maximum player speed
    reaction_time = 0.7,       # seconds reaction time
    avg_ball_speed = 15.0,     # m/s average ball speed
    time_to_control = 0.5,     # seconds to control ball once reached
    sigma = 0.45               # uncertainty parameter
)

# Calculate time for player to reach target location
time_to_intercept <- function(player_x, player_y, player_vx, player_vy,
                               target_x, target_y, params = PARAMS) {
    # Distance to target
    dx <- target_x - player_x
    dy <- target_y - player_y
    distance <- sqrt(dx^2 + dy^2)

    # Account for current velocity
    # Project velocity onto direction to target
    if (distance > 0) {
        dir_x <- dx / distance
        dir_y <- dy / distance
        velocity_towards <- player_vx * dir_x + player_vy * dir_y
    } else {
        velocity_towards <- 0
    }

    # Time = reaction_time + time to cover distance
    # Using kinematic equation with current velocity
    effective_speed <- max(velocity_towards, 0) + params$max_speed * 0.7
    time <- params$reaction_time + distance / effective_speed

    return(time)
}

# Calculate pitch control probability at a point
pitch_control_at_point <- function(target_x, target_y,
                                    team_positions, opponent_positions,
                                    params = PARAMS) {

    # Calculate intercept times for all players
    team_times <- map_dbl(1:nrow(team_positions), function(i) {
        time_to_intercept(
            team_positions$x[i], team_positions$y[i],
            team_positions$vx[i], team_positions$vy[i],
            target_x, target_y, params
        )
    })

    opponent_times <- map_dbl(1:nrow(opponent_positions), function(i) {
        time_to_intercept(
            opponent_positions$x[i], opponent_positions$y[i],
            opponent_positions$vx[i], opponent_positions$vy[i],
            target_x, target_y, params
        )
    })

    # Convert times to probabilities using logistic function
    team_prob <- 1 / (1 + exp(min(team_times) / params$sigma))
    opponent_prob <- 1 / (1 + exp(min(opponent_times) / params$sigma))

    # Normalize
    total <- team_prob + opponent_prob
    control_prob <- team_prob / total

    return(control_prob)
}

# Generate pitch control surface
generate_pitch_control <- function(team_pos, opp_pos,
                                    grid_size = 50, params = PARAMS) {
    # Create grid
    x_grid <- seq(0, 120, length.out = grid_size)
    y_grid <- seq(0, 80, length.out = grid_size)

    # Calculate control at each point
    control_surface <- expand_grid(x = x_grid, y = y_grid) %>%
        rowwise() %>%
        mutate(
            control = pitch_control_at_point(x, y, team_pos, opp_pos, params)
        ) %>%
        ungroup()

    return(control_surface)
}

# Example: Calculate pitch control
team_positions <- tibble(
    player = paste0("T", 1:11),
    x = c(5, 20, 25, 30, 35, 45, 50, 55, 70, 75, 90),
    y = c(40, 20, 60, 35, 50, 40, 25, 55, 30, 50, 40),
    vx = c(0, 2, 1, 3, 2, 4, 3, 2, 5, 4, 3),
    vy = c(0, 1, -1, 0, 2, -1, 2, -2, 1, 0, 1)
)

opponent_positions <- tibble(
    player = paste0("O", 1:11),
    x = c(115, 100, 95, 90, 85, 75, 70, 65, 50, 45, 30),
    y = c(40, 55, 25, 45, 35, 60, 20, 45, 50, 30, 40),
    vx = c(0, -3, -2, -4, -3, -2, -1, -3, -2, -1, 0),
    vy = c(0, 1, -1, 0, -2, 1, -1, 2, 0, 1, 0)
)

pitch_control <- generate_pitch_control(team_positions, opponent_positions)

# Summary stats
cat("Team Total Control:", mean(pitch_control$control) * 100, "%\n")
cat("Attacking Third Control:",
    mean(pitch_control$control[pitch_control$x > 80]) * 100, "%\n")
Output
Team Total Control: 54.2%
Attacking Third Control: 38.7%

Expected Possession Value from Pitch Control

Pitch control can be combined with expected possession value (EPV) models to estimate the value of different passing options.

epv_pitch_control.py
# Python: EPV from Pitch Control
import numpy as np
import pandas as pd

class EPVModel:
    """Expected Possession Value model integrated with pitch control."""

    def __init__(self, grid_size=50, pitch_length=120, pitch_width=80):
        self.grid_size = grid_size
        self.pitch_length = pitch_length
        self.pitch_width = pitch_width
        self.epv_surface = self._create_epv_surface()

    def _create_epv_surface(self):
        """Create base EPV surface."""
        x = np.linspace(0, self.pitch_length, self.grid_size)
        y = np.linspace(0, self.pitch_width, self.grid_size)
        xx, yy = np.meshgrid(x, y)

        # Distance to goal
        goal_x, goal_y = 120, 40
        distance = np.sqrt((goal_x - xx)**2 + (goal_y - yy)**2)

        # Centrality (higher in middle)
        centrality = 1 - np.abs(yy - 40) / 40

        # Base EPV
        epv = np.maximum(0, 0.1 - distance / 1500) * (1 + centrality * 0.5)

        # Boost in penalty box
        in_box = (xx > 102) & (yy > 18) & (yy < 62)
        epv[in_box] *= 2

        return epv

    def get_epv(self, x, y):
        """Get EPV at specific location."""
        i = int(np.clip(x / self.pitch_length * (self.grid_size - 1),
                        0, self.grid_size - 1))
        j = int(np.clip(y / self.pitch_width * (self.grid_size - 1),
                        0, self.grid_size - 1))
        return self.epv_surface[j, i]

    def calculate_controlled_epv(self, pitch_control_surface):
        """Calculate EPV weighted by pitch control."""
        controlled_epv = pitch_control_surface * self.epv_surface
        return np.sum(controlled_epv) / controlled_epv.size

    def evaluate_pass_options(self, ball_pos, targets,
                              pitch_control_model, team_players, opp_players):
        """Evaluate potential pass targets using EPV and control."""
        results = []

        for target in targets:
            # Control probability at target
            control = pitch_control_model.control_probability(
                target["x"], target["y"], team_players, opp_players
            )

            # EPV at target
            epv = self.get_epv(target["x"], target["y"])

            # Pass success probability (distance-based)
            pass_dist = np.sqrt((target["x"] - ball_pos["x"])**2 +
                               (target["y"] - ball_pos["y"])**2)
            pass_success = 1 / (1 + np.exp((pass_dist - 25) / 10))

            # Expected value
            expected_value = control * epv * pass_success

            results.append({
                "target_x": target["x"],
                "target_y": target["y"],
                "control_prob": control,
                "epv": epv,
                "pass_success": pass_success,
                "expected_value": expected_value
            })

        return pd.DataFrame(results).sort_values("expected_value", ascending=False)

# Example: Evaluate pass options
epv_model = EPVModel()
pc_model = PitchControlModel()

ball_pos = {"x": 60, "y": 40}
pass_targets = [
    {"x": 70, "y": 30},
    {"x": 75, "y": 50},
    {"x": 85, "y": 40},
    {"x": 65, "y": 60},
]

pass_evaluation = epv_model.evaluate_pass_options(
    ball_pos, pass_targets, pc_model, team_players, opponent_players
)
print("Pass Options Ranked by Expected Value:")
print(pass_evaluation)
# R: EPV from Pitch Control
library(tidyverse)

# Simplified EPV grid (pre-calculated)
create_epv_grid <- function(grid_size = 50) {
    x_grid <- seq(0, 120, length.out = grid_size)
    y_grid <- seq(0, 80, length.out = grid_size)

    expand_grid(x = x_grid, y = y_grid) %>%
        mutate(
            # EPV increases towards goal, higher in central areas
            distance_to_goal = sqrt((120 - x)^2 + (40 - y)^2),
            centrality = 1 - abs(y - 40) / 40,
            base_epv = pmax(0, 0.1 - distance_to_goal / 1500) * (1 + centrality * 0.5),
            # Boost in box
            in_box = x > 102 & y > 18 & y < 62,
            epv = ifelse(in_box, base_epv * 2, base_epv)
        )
}

epv_grid <- create_epv_grid()

# Calculate controlled EPV
calculate_controlled_epv <- function(pitch_control, epv_grid) {
    # Join pitch control with EPV
    combined <- pitch_control %>%
        left_join(epv_grid %>% select(x, y, epv), by = c("x", "y"))

    # Controlled EPV = control probability * EPV at that location
    controlled_epv <- combined %>%
        mutate(controlled_epv = control * epv)

    # Total controlled EPV
    total_controlled <- sum(controlled_epv$controlled_epv) / nrow(controlled_epv)

    return(list(
        surface = controlled_epv,
        total = total_controlled
    ))
}

# Evaluate pass options using pitch control
evaluate_pass_options <- function(ball_pos, target_positions,
                                   team_pos, opp_pos, epv_grid) {
    pass_values <- map_df(1:nrow(target_positions), function(i) {
        target <- target_positions[i, ]

        # Estimate control at target after pass
        control_at_target <- pitch_control_at_point(
            target$x, target$y, team_pos, opp_pos
        )

        # Get EPV at target
        epv_at_target <- epv_grid %>%
            filter(abs(x - target$x) < 2, abs(y - target$y) < 2) %>%
            summarize(epv = mean(epv)) %>%
            pull(epv)

        # Pass distance and risk
        pass_dist <- sqrt((target$x - ball_pos$x)^2 +
                          (target$y - ball_pos$y)^2)
        pass_success_prob <- 1 / (1 + exp((pass_dist - 25) / 10))

        tibble(
            target_id = i,
            target_x = target$x,
            target_y = target$y,
            control_prob = control_at_target,
            epv_at_target = epv_at_target,
            pass_success = pass_success_prob,
            expected_value = control_at_target * epv_at_target * pass_success_prob
        )
    })

    return(pass_values %>% arrange(desc(expected_value)))
}

Spatial Passing Networks

Combining spatial analysis with network analysis reveals how teams connect different areas of the pitch. Spatial passing networks show not just who passes to whom, but where on the pitch these connections occur.

spatial_passing_network.py
# Python: Spatial Passing Networks
import pandas as pd
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
from matplotlib.patches import FancyArrowPatch

def create_spatial_pass_network(passes, zones):
    """Create a network of passes between pitch zones."""
    n_x = zones["x_zone"].max()
    n_y = zones["y_zone"].max()

    # Assign zones
    passes = passes.copy()
    passes["origin_zone"] = (
        np.ceil(passes["location_x"] / (120 / n_x)).clip(1, n_x).astype(int) * n_y +
        np.ceil(passes["location_y"] / (80 / n_y)).clip(1, n_y).astype(int)
    )
    passes["dest_zone"] = (
        np.ceil(passes["pass_end_x"] / (120 / n_x)).clip(1, n_x).astype(int) * n_y +
        np.ceil(passes["pass_end_y"] / (80 / n_y)).clip(1, n_y).astype(int)
    )

    # Edge list
    edges = passes.groupby(["origin_zone", "dest_zone"]).agg(
        n_passes=("id", "count"),
        success_rate=("pass_outcome", lambda x: (x.isna()).mean()),
        progressive=("pass_end_x", lambda x: (x > passes.loc[x.index, "location_x"]).mean())
    ).reset_index()

    edges = edges[edges["n_passes"] >= 3]

    # Create NetworkX graph
    G = nx.DiGraph()

    for _, row in edges.iterrows():
        G.add_edge(
            row["origin_zone"], row["dest_zone"],
            weight=row["n_passes"],
            success_rate=row["success_rate"],
            progressive=row["progressive"]
        )

    # Add node positions
    for zone_id in G.nodes():
        zone = zones[zones["zone_id"] == zone_id].iloc[0]
        G.nodes[zone_id]["x"] = zone["x_center"]
        G.nodes[zone_id]["y"] = zone["y_center"]

    return G, edges

def plot_spatial_pass_network(G, zones):
    """Visualize spatial passing network on pitch."""
    fig, ax = plt.subplots(figsize=(14, 9))
    ax.set_facecolor("#1a472a")

    # Draw pitch markings
    ax.plot([60, 60], [0, 80], "w-", linewidth=1, alpha=0.5)
    ax.add_patch(plt.Rectangle((0, 18), 18, 44, fill=False,
                                edgecolor="white", alpha=0.5))
    ax.add_patch(plt.Rectangle((102, 18), 18, 44, fill=False,
                                edgecolor="white", alpha=0.5))

    # Get edge weights for scaling
    weights = [G[u][v]["weight"] for u, v in G.edges()]
    max_weight = max(weights) if weights else 1

    # Draw edges
    for u, v in G.edges():
        weight = G[u][v]["weight"]
        x1, y1 = G.nodes[u]["x"], G.nodes[u]["y"]
        x2, y2 = G.nodes[v]["x"], G.nodes[v]["y"]

        # Line thickness based on pass count
        linewidth = 1 + (weight / max_weight) * 5

        # Color based on progressive or not
        color = "#FFD700" if G[u][v]["progressive"] > 0.5 else "#87CEEB"

        ax.annotate("", xy=(x2, y2), xytext=(x1, y1),
                   arrowprops=dict(arrowstyle="->", color=color,
                                  lw=linewidth, alpha=0.7))

    # Draw nodes
    node_sizes = [300 * (G.degree(n) + 1) for n in G.nodes()]
    xs = [G.nodes[n]["x"] for n in G.nodes()]
    ys = [G.nodes[n]["y"] for n in G.nodes()]

    ax.scatter(xs, ys, s=node_sizes, c="#FFFFFF",
              edgecolors="#333333", linewidth=2, zorder=5)

    ax.set_xlim(0, 120)
    ax.set_ylim(0, 80)
    ax.set_aspect("equal")
    ax.set_title("Spatial Passing Network", fontsize=14, color="white")
    ax.axis("off")

    return fig

# Create and plot network
zones = create_zone_grid(6, 4)
passes = events[(events["type"] == "Pass") & (events["team"] == "Barcelona")]
G, edges = create_spatial_pass_network(passes, zones)

print("Top Zone-to-Zone Connections:")
print(edges.nlargest(10, "n_passes"))

fig = plot_spatial_pass_network(G, zones)
plt.show()
# R: Spatial Passing Networks
library(tidyverse)
library(igraph)

create_spatial_pass_network <- function(passes, zones) {
    # Assign zones to pass origins and destinations
    passes_zoned <- passes %>%
        mutate(
            origin_zone = ceiling(location_x / (120 / max(zones$x_zone))) *
                          max(zones$y_zone) +
                          ceiling(location_y / (80 / max(zones$y_zone))),
            dest_zone = ceiling(pass_end_x / (120 / max(zones$x_zone))) *
                        max(zones$y_zone) +
                        ceiling(pass_end_y / (80 / max(zones$y_zone)))
        )

    # Create edge list (zone to zone passes)
    zone_edges <- passes_zoned %>%
        filter(!is.na(origin_zone), !is.na(dest_zone)) %>%
        count(origin_zone, dest_zone, name = "n_passes") %>%
        filter(n_passes >= 3)  # Minimum passes threshold

    # Calculate additional metrics
    zone_edges <- zone_edges %>%
        left_join(
            passes_zoned %>%
                group_by(origin_zone, dest_zone) %>%
                summarize(
                    success_rate = mean(pass_outcome == "Complete", na.rm = TRUE),
                    avg_angle = mean(atan2(pass_end_y - location_y,
                                          pass_end_x - location_x) * 180 / pi),
                    progressive = mean(pass_end_x > location_x),
                    .groups = "drop"
                ),
            by = c("origin_zone", "dest_zone")
        )

    # Create igraph object
    g <- graph_from_data_frame(zone_edges, directed = TRUE)

    # Add zone attributes
    V(g)$zone_id <- as.numeric(V(g)$name)
    V(g)$x_center <- zones$x_center[match(V(g)$zone_id, zones$zone_id)]
    V(g)$y_center <- zones$y_center[match(V(g)$zone_id, zones$zone_id)]

    return(list(
        graph = g,
        edges = zone_edges,
        node_metrics = tibble(
            zone_id = as.numeric(V(g)$name),
            out_degree = degree(g, mode = "out"),
            in_degree = degree(g, mode = "in"),
            betweenness = betweenness(g),
            pagerank = page_rank(g)$vector
        )
    ))
}

# Create network
zones <- create_zone_grid(6, 4)
passes <- events %>% filter(type == "Pass", team == "Barcelona")
pass_network <- create_spatial_pass_network(passes, zones)

# Top passing connections
top_connections <- pass_network$edges %>%
    arrange(desc(n_passes)) %>%
    head(10)

print("Top Zone-to-Zone Passing Connections:")
print(top_connections)

Zone Transition Analysis

Understanding how teams progress the ball through different zones reveals their build-up patterns and attacking approaches.

zone_transitions.py
# Python: Zone Transition Analysis
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

def analyze_zone_transitions(events, n_zones=6):
    """Analyze how possessions progress through horizontal zones."""
    events = events.copy()

    # Assign horizontal zones
    events["x_zone"] = np.ceil(events["location_x"] / (120 / n_zones)).clip(1, n_zones).astype(int)

    # Track zone changes within possessions
    events["prev_zone"] = events.groupby("possession")["x_zone"].shift(1)
    events["zone_changed"] = events["x_zone"] != events["prev_zone"]

    transitions = events[events["zone_changed"] & events["prev_zone"].notna()].copy()
    transitions["zone_from"] = transitions["prev_zone"].astype(int)
    transitions["zone_to"] = transitions["x_zone"].astype(int)

    # Transition matrix
    transition_matrix = pd.crosstab(
        transitions["zone_from"],
        transitions["zone_to"],
        margins=True
    )

    # Direction analysis
    transitions["direction"] = np.where(
        transitions["zone_to"] > transitions["zone_from"], "Progressive",
        np.where(transitions["zone_to"] < transitions["zone_from"], "Regressive", "Lateral")
    )

    direction_summary = transitions["direction"].value_counts(normalize=True) * 100

    return {
        "matrix": transition_matrix,
        "direction_summary": direction_summary,
        "transitions": transitions
    }

def plot_transition_sankey(transitions):
    """Plot Sankey diagram of zone transitions."""
    from matplotlib.sankey import Sankey

    # Simplified flow visualization
    fig, ax = plt.subplots(figsize=(14, 8))

    # Aggregate flows
    flows = transitions["transitions"].groupby(
        ["zone_from", "zone_to"]
    ).size().reset_index(name="count")

    # Create heatmap of transitions
    matrix = transitions["matrix"].iloc[:-1, :-1]  # Remove margins

    sns.heatmap(matrix, annot=True, fmt="d", cmap="YlOrRd", ax=ax)
    ax.set_xlabel("Zone To")
    ax.set_ylabel("Zone From")
    ax.set_title("Zone Transition Matrix")

    return fig

# Analyze transitions
events_team = events[events["team"] == "Barcelona"]
transitions = analyze_zone_transitions(events_team, n_zones=6)

print("Zone Transition Matrix:")
print(transitions["matrix"])
print("\nDirection Summary:")
print(transitions["direction_summary"])

fig = plot_transition_sankey(transitions)
plt.show()
# R: Zone Transition Analysis
library(tidyverse)
library(sankey)

analyze_zone_transitions <- function(possessions, zones) {
    # Track ball progression through zones during possessions
    transitions <- possessions %>%
        group_by(possession_id) %>%
        arrange(minute, second) %>%
        mutate(
            x_zone = ceiling(location_x / (120 / max(zones$x_zone))),
            zone_change = x_zone != lag(x_zone),
            zone_from = lag(x_zone),
            zone_to = x_zone
        ) %>%
        filter(!is.na(zone_from), zone_change) %>%
        ungroup()

    # Summarize transitions
    transition_matrix <- transitions %>%
        count(zone_from, zone_to, name = "count") %>%
        pivot_wider(names_from = zone_to, values_from = count, values_fill = 0)

    # Progressive vs regressive transitions
    progression_summary <- transitions %>%
        mutate(
            direction = case_when(
                zone_to > zone_from ~ "Progressive",
                zone_to < zone_from ~ "Regressive",
                TRUE ~ "Lateral"
            )
        ) %>%
        count(direction) %>%
        mutate(pct = n / sum(n) * 100)

    # Success rate by transition type
    transition_success <- transitions %>%
        mutate(
            successful = lead(team) == team,  # Team kept possession
            direction = case_when(
                zone_to > zone_from ~ "Progressive",
                zone_to < zone_from ~ "Regressive",
                TRUE ~ "Lateral"
            )
        ) %>%
        group_by(direction) %>%
        summarize(
            n = n(),
            success_rate = mean(successful, na.rm = TRUE) * 100,
            .groups = "drop"
        )

    return(list(
        matrix = transition_matrix,
        summary = progression_summary,
        success = transition_success
    ))
}

# Analyze transitions
zones <- create_zone_grid(6, 1)  # 6 horizontal zones only
possessions <- events %>% filter(team == "Barcelona")
transitions <- analyze_zone_transitions(possessions, zones)

print("Zone Transition Matrix:")
print(transitions$matrix)
print("\nProgression Summary:")
print(transitions$summary)

Advanced Spatial Metrics

Beyond basic spatial analysis, advanced metrics can capture complex spatial patterns like compactness, width, and defensive shape.

Team Compactness

Measures how tightly grouped a team's players are. Compact teams are harder to play through but may leave space in wide areas.

  • Length: Distance between deepest and highest outfield players
  • Width: Distance between widest players
  • Surface Area: Convex hull of player positions
Defensive Shape

Analyzes the organization and positioning of defensive units.

  • Line Height: Average position of defensive line
  • Line Flatness: Variance in defender positions
  • Block Solidity: Gaps between defensive lines
team_shape_analysis.py
# Python: Advanced Spatial Metrics
import numpy as np
import pandas as pd
from scipy.spatial import ConvexHull
from scipy.spatial.distance import cdist

class TeamShapeAnalyzer:
    """Analyze team shape and structure from player positions."""

    def __init__(self, positions):
        """
        positions: DataFrame with columns: player, x, y, position
        """
        self.positions = positions
        self.outfield = positions[positions["position"] != "GK"]

    def calculate_basic_shape(self):
        """Calculate basic shape metrics."""
        x = self.outfield["x"].values
        y = self.outfield["y"].values

        metrics = {
            "length": x.max() - x.min(),
            "width": y.max() - y.min(),
            "centroid_x": x.mean(),
            "centroid_y": y.mean()
        }

        # Convex hull area
        if len(self.outfield) >= 3:
            points = np.column_stack([x, y])
            hull = ConvexHull(points)
            metrics["surface_area"] = hull.volume  # In 2D, volume is area
        else:
            metrics["surface_area"] = 0

        # Compactness (average distance to centroid)
        distances = np.sqrt(
            (x - metrics["centroid_x"])**2 +
            (y - metrics["centroid_y"])**2
        )
        metrics["compactness"] = distances.mean()

        # Stretch index
        metrics["stretch_index"] = (
            metrics["length"] / metrics["width"]
            if metrics["width"] > 0 else 0
        )

        return metrics

    def calculate_defensive_shape(self):
        """Analyze defensive line metrics."""
        defenders = self.positions[
            self.positions["position"].isin(["CB", "LB", "RB", "LWB", "RWB"])
        ].sort_values("y")

        if len(defenders) < 2:
            return {"error": "Not enough defenders"}

        metrics = {
            "line_height": defenders["x"].mean(),
            "line_flatness": defenders["x"].std(),
        }

        # Gaps between defenders
        gaps = np.diff(defenders["y"].values)
        metrics["max_gap"] = gaps.max()
        metrics["avg_gap"] = gaps.mean()

        # Midfield line
        midfielders = self.positions[
            self.positions["position"].isin(["CM", "CDM", "CAM", "LM", "RM"])
        ]

        if len(midfielders) > 0:
            metrics["midfield_height"] = midfielders["x"].mean()
            metrics["lines_distance"] = metrics["midfield_height"] - metrics["line_height"]

        return metrics

    def calculate_pressing_shape(self, opponent_ball_pos):
        """Analyze team shape relative to ball position."""
        x = self.outfield["x"].values
        y = self.outfield["y"].values

        # Distance to ball for each player
        ball_distances = np.sqrt(
            (x - opponent_ball_pos["x"])**2 +
            (y - opponent_ball_pos["y"])**2
        )

        metrics = {
            "nearest_player_dist": ball_distances.min(),
            "avg_ball_distance": ball_distances.mean(),
            "players_within_10m": (ball_distances < 10).sum(),
            "players_within_20m": (ball_distances < 20).sum(),
        }

        # Press intensity (more players close = higher press)
        metrics["press_intensity"] = np.sum(1 / (ball_distances + 1))

        return metrics

# Example usage
positions = pd.DataFrame({
    "player": [f"P{i}" for i in range(11)],
    "x": [5, 25, 28, 32, 30, 45, 48, 55, 70, 75, 85],
    "y": [40, 15, 30, 50, 65, 25, 55, 40, 30, 60, 40],
    "position": ["GK", "LB", "CB", "CB", "RB", "CM", "CM", "CAM", "LW", "RW", "ST"]
})

analyzer = TeamShapeAnalyzer(positions)

print("Basic Shape Metrics:")
print(analyzer.calculate_basic_shape())

print("\nDefensive Shape Metrics:")
print(analyzer.calculate_defensive_shape())

print("\nPressing Shape (ball at x=90, y=40):")
print(analyzer.calculate_pressing_shape({"x": 90, "y": 40}))
# R: Advanced Spatial Metrics
library(tidyverse)
library(sf)

calculate_team_shape <- function(player_positions) {
    # Filter to outfield players
    outfield <- player_positions %>%
        filter(position != "GK")

    # Basic shape metrics
    length <- max(outfield$x) - min(outfield$x)
    width <- max(outfield$y) - min(outfield$y)

    # Centroid
    centroid_x <- mean(outfield$x)
    centroid_y <- mean(outfield$y)

    # Convex hull area (team surface)
    hull_points <- outfield %>%
        select(x, y) %>%
        as.matrix()
    hull_indices <- chull(hull_points)
    hull_area <- abs(sum(
        hull_points[hull_indices, 1] *
        c(hull_points[hull_indices[-1], 2], hull_points[hull_indices[1], 2]) -
        hull_points[hull_indices, 2] *
        c(hull_points[hull_indices[-1], 1], hull_points[hull_indices[1], 1])
    )) / 2

    # Compactness (smaller = more compact)
    avg_distance_to_centroid <- mean(sqrt(
        (outfield$x - centroid_x)^2 + (outfield$y - centroid_y)^2
    ))

    # Stretch index (ratio of length to width)
    stretch_index <- length / width

    return(tibble(
        length = length,
        width = width,
        surface_area = hull_area,
        centroid_x = centroid_x,
        centroid_y = centroid_y,
        compactness = avg_distance_to_centroid,
        stretch_index = stretch_index
    ))
}

calculate_defensive_metrics <- function(player_positions) {
    # Identify defensive line (usually back 4 or 5)
    defenders <- player_positions %>%
        filter(position %in% c("CB", "LB", "RB", "LWB", "RWB")) %>%
        arrange(x)

    # Line height (average x position)
    line_height <- mean(defenders$x)

    # Line flatness (lower = flatter line)
    line_flatness <- sd(defenders$x)

    # Gaps between defenders
    defenders_sorted <- defenders %>% arrange(y)
    gaps <- diff(defenders_sorted$y)
    max_gap <- max(gaps)
    avg_gap <- mean(gaps)

    # Midfield line metrics
    midfielders <- player_positions %>%
        filter(position %in% c("CM", "CDM", "CAM", "LM", "RM"))

    midfield_height <- mean(midfielders$x)

    # Distance between lines
    line_distance <- midfield_height - line_height

    return(tibble(
        defensive_line_height = line_height,
        line_flatness = line_flatness,
        max_defender_gap = max_gap,
        avg_defender_gap = avg_gap,
        midfield_height = midfield_height,
        lines_distance = line_distance
    ))
}

# Example calculation
team_shape <- calculate_team_shape(team_positions)
defensive_metrics <- calculate_defensive_metrics(team_positions)

print("Team Shape Metrics:")
print(team_shape)
print("\nDefensive Metrics:")
print(defensive_metrics)

Case Study: Analyzing Team Playing Style

This case study demonstrates how to combine multiple spatial analysis techniques to build a comprehensive understanding of a team's playing style.

Objective

Analyze Manchester City's playing style using spatial analytics to understand their territorial dominance, passing patterns, and space creation.

team_spatial_case_study.py
# Python: Complete Spatial Analysis Case Study
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec

class TeamSpatialAnalyzer:
    """Complete spatial analysis for a team."""

    def __init__(self, team_events, team_name):
        self.events = team_events
        self.team_name = team_name
        self.results = {}

    def run_full_analysis(self):
        """Run complete spatial analysis suite."""
        print(f"\n=== Spatial Analysis: {self.team_name} ===\n")

        self._territorial_analysis()
        self._passing_analysis()
        self._shot_analysis()
        self._width_analysis()

        return self.results

    def _territorial_analysis(self):
        """Analyze territorial metrics."""
        print("1. TERRITORIAL METRICS")
        print("-" * 25)

        avg_x = self.events["location_x"].mean()
        avg_y = self.events["location_y"].mean()
        print(f"Average Position: x={avg_x:.1f}, y={avg_y:.1f}")

        # Actions by third
        self.events["third"] = pd.cut(
            self.events["location_x"],
            bins=[0, 40, 80, 120],
            labels=["Defensive", "Middle", "Attacking"]
        )

        thirds = self.events["third"].value_counts(normalize=True) * 100
        print(f"\nActions by Third:")
        print(thirds)

        self.results["territorial"] = {
            "avg_x": avg_x,
            "avg_y": avg_y,
            "thirds": thirds.to_dict()
        }

    def _passing_analysis(self):
        """Analyze passing patterns."""
        print("\n2. PASSING PATTERNS")
        print("-" * 25)

        passes = self.events[self.events["type"] == "Pass"].copy()

        passes["pass_length"] = np.sqrt(
            (passes["pass_end_x"] - passes["location_x"])**2 +
            (passes["pass_end_y"] - passes["location_y"])**2
        )
        passes["is_forward"] = passes["pass_end_x"] > passes["location_x"]
        passes["is_progressive"] = (passes["pass_end_x"] - passes["location_x"]) > 10

        print(f"Total Passes: {len(passes)}")
        print(f"Avg Pass Length: {passes['pass_length'].mean():.1f}m")
        print(f"Forward Pass %: {passes['is_forward'].mean()*100:.1f}%")
        print(f"Progressive Pass %: {passes['is_progressive'].mean()*100:.1f}%")

        self.results["passing"] = {
            "total": len(passes),
            "avg_length": passes["pass_length"].mean(),
            "forward_pct": passes["is_forward"].mean() * 100,
            "progressive_pct": passes["is_progressive"].mean() * 100
        }

    def _shot_analysis(self):
        """Analyze shot locations."""
        print("\n3. SHOT ANALYSIS")
        print("-" * 25)

        shots = self.events[self.events["type"] == "Shot"]

        n_shots = len(shots)
        avg_x = shots["location_x"].mean()
        in_box = (
            (shots["location_x"] > 102) &
            (shots["location_y"] > 18) &
            (shots["location_y"] < 62)
        ).sum()

        print(f"Total Shots: {n_shots}")
        print(f"Average Shot Position: x={avg_x:.1f}")
        print(f"Shots in Box: {in_box} ({in_box/n_shots*100:.1f}%)")

        self.results["shots"] = {
            "total": n_shots,
            "avg_x": avg_x,
            "in_box": in_box,
            "in_box_pct": in_box / n_shots * 100 if n_shots > 0 else 0
        }

    def _width_analysis(self):
        """Analyze pitch width utilization."""
        print("\n4. WIDTH UTILIZATION")
        print("-" * 25)

        self.events["channel"] = pd.cut(
            self.events["location_y"],
            bins=[0, 20, 30, 50, 60, 80],
            labels=["Left Wide", "Left Half-Space", "Central",
                   "Right Half-Space", "Right Wide"]
        )

        width = self.events["channel"].value_counts(normalize=True) * 100
        print(width)

        self.results["width"] = width.to_dict()

    def create_summary_dashboard(self):
        """Create visual summary dashboard."""
        fig = plt.figure(figsize=(16, 12))
        gs = GridSpec(2, 3, figure=fig)

        # 1. Heatmap
        ax1 = fig.add_subplot(gs[0, :2])
        self._plot_heatmap(ax1)

        # 2. Thirds bar chart
        ax2 = fig.add_subplot(gs[0, 2])
        self._plot_thirds(ax2)

        # 3. Pass length distribution
        ax3 = fig.add_subplot(gs[1, 0])
        self._plot_pass_lengths(ax3)

        # 4. Width utilization
        ax4 = fig.add_subplot(gs[1, 1])
        self._plot_width(ax4)

        # 5. Shot map
        ax5 = fig.add_subplot(gs[1, 2])
        self._plot_shots(ax5)

        fig.suptitle(f"{self.team_name} - Spatial Analysis Dashboard",
                     fontsize=16, fontweight="bold")
        plt.tight_layout()

        return fig

    def _plot_heatmap(self, ax):
        """Plot touch heatmap."""
        ax.set_facecolor("#1a472a")
        ax.hexbin(
            self.events["location_x"],
            self.events["location_y"],
            gridsize=20,
            cmap="YlOrRd",
            mincnt=1
        )
        ax.set_xlim(0, 120)
        ax.set_ylim(0, 80)
        ax.set_title("Touch Heatmap")
        ax.set_aspect("equal")

    def _plot_thirds(self, ax):
        """Plot actions by thirds."""
        thirds = self.events["third"].value_counts()
        ax.bar(thirds.index, thirds.values, color=["#2E7D32", "#FFC107", "#D32F2F"])
        ax.set_title("Actions by Third")
        ax.set_ylabel("Count")

    def _plot_pass_lengths(self, ax):
        """Plot pass length distribution."""
        passes = self.events[self.events["type"] == "Pass"]
        lengths = np.sqrt(
            (passes["pass_end_x"] - passes["location_x"])**2 +
            (passes["pass_end_y"] - passes["location_y"])**2
        )
        ax.hist(lengths, bins=30, color="#1976D2", edgecolor="white")
        ax.set_title("Pass Length Distribution")
        ax.set_xlabel("Length (m)")

    def _plot_width(self, ax):
        """Plot width utilization."""
        width = self.events["channel"].value_counts()
        ax.barh(width.index, width.values, color="#7B1FA2")
        ax.set_title("Width Utilization")

    def _plot_shots(self, ax):
        """Plot shot locations."""
        shots = self.events[self.events["type"] == "Shot"]
        ax.set_facecolor("#1a472a")
        ax.scatter(
            shots["location_x"],
            shots["location_y"],
            c="yellow",
            s=100,
            edgecolors="white"
        )
        ax.set_xlim(60, 120)
        ax.set_ylim(0, 80)
        ax.set_title("Shot Locations")
        ax.set_aspect("equal")

# Run analysis
man_city_events = events[events["team"] == "Manchester City"]
analyzer = TeamSpatialAnalyzer(man_city_events, "Manchester City")
results = analyzer.run_full_analysis()
dashboard = analyzer.create_summary_dashboard()
plt.show()
# R: Complete Spatial Analysis Case Study
library(tidyverse)

analyze_team_spatial_style <- function(team_events, team_name) {
    cat(sprintf("\n=== Spatial Analysis: %s ===\n\n", team_name))

    # 1. TERRITORIAL ANALYSIS
    cat("1. TERRITORIAL METRICS\n")
    cat("-----------------------\n")

    avg_x <- mean(team_events$location_x, na.rm = TRUE)
    avg_y <- mean(team_events$location_y, na.rm = TRUE)

    # Actions by third
    thirds <- team_events %>%
        mutate(
            third = case_when(
                location_x < 40 ~ "Defensive",
                location_x < 80 ~ "Middle",
                TRUE ~ "Attacking"
            )
        ) %>%
        count(third) %>%
        mutate(pct = n / sum(n) * 100)

    cat(sprintf("Average Position: x=%.1f, y=%.1f\n", avg_x, avg_y))
    print(thirds)

    # 2. PASSING PATTERNS
    cat("\n2. PASSING PATTERNS\n")
    cat("-------------------\n")

    passes <- team_events %>% filter(type == "Pass")

    # Pass direction analysis
    passes <- passes %>%
        mutate(
            pass_length = sqrt((pass_end_x - location_x)^2 +
                              (pass_end_y - location_y)^2),
            pass_angle = atan2(pass_end_y - location_y,
                              pass_end_x - location_x) * 180 / pi,
            is_forward = pass_end_x > location_x,
            is_progressive = (pass_end_x - location_x) > 10
        )

    cat(sprintf("Total Passes: %d\n", nrow(passes)))
    cat(sprintf("Avg Pass Length: %.1f m\n", mean(passes$pass_length, na.rm = TRUE)))
    cat(sprintf("Forward Pass %%: %.1f%%\n", mean(passes$is_forward) * 100))
    cat(sprintf("Progressive Pass %%: %.1f%%\n", mean(passes$is_progressive) * 100))

    # Passes by zone
    zones <- create_zone_grid(6, 3)
    passes_by_origin <- passes %>%
        mutate(
            zone = ceiling(location_x / 20) * 3 + ceiling(location_y / 26.67)
        ) %>%
        count(zone, name = "passes") %>%
        arrange(desc(passes))

    cat("\nTop Passing Zones:\n")
    print(head(passes_by_origin, 5))

    # 3. SHOT LOCATIONS
    cat("\n3. SHOT ANALYSIS\n")
    cat("-----------------\n")

    shots <- team_events %>% filter(type == "Shot")

    shot_summary <- shots %>%
        summarize(
            n_shots = n(),
            avg_x = mean(location_x, na.rm = TRUE),
            avg_y = mean(location_y, na.rm = TRUE),
            shots_in_box = sum(location_x > 102 & location_y > 18 & location_y < 62),
            in_box_pct = shots_in_box / n_shots * 100
        )

    print(shot_summary)

    # 4. WIDTH ANALYSIS
    cat("\n4. WIDTH UTILIZATION\n")
    cat("---------------------\n")

    width_analysis <- team_events %>%
        mutate(
            channel = case_when(
                location_y < 20 ~ "Left Wide",
                location_y < 30 ~ "Left Half-Space",
                location_y < 50 ~ "Central",
                location_y < 60 ~ "Right Half-Space",
                TRUE ~ "Right Wide"
            )
        ) %>%
        count(channel) %>%
        mutate(pct = n / sum(n) * 100)

    print(width_analysis)

    # 5. CREATE HEATMAP DATA
    heatmap <- create_heatmap_data(team_events, bandwidth = 8)

    return(list(
        territorial = thirds,
        passing = passes_by_origin,
        shots = shot_summary,
        width = width_analysis,
        heatmap = heatmap
    ))
}

# Run analysis
man_city_events <- events %>% filter(team == "Manchester City")
analysis <- analyze_team_spatial_style(man_city_events, "Manchester City")

Practice Exercises

Task: Create a side-by-side comparison of touch heat maps for two players in the same position.

Requirements:

  • Load event data for two full-backs from the same league
  • Create kernel density heat maps for each player
  • Display them side by side with the same color scale
  • Calculate and display metrics: average touch position, spread

Task: Analyze passing patterns between pitch zones for a team.

Requirements:

  • Create a 6x3 zone grid
  • Calculate passes between each zone pair
  • Visualize as a flow diagram or chord diagram
  • Identify the most common passing lanes

Task: Using tracking data, analyze how space control changes during a possession sequence.

Requirements:

  • Load tracking data for a possession sequence
  • Calculate Voronoi diagrams at each frame
  • Track how each team's controlled area changes
  • Identify the moment of maximum space creation

Task: Implement a pitch control model that accounts for player velocities.

Requirements:

  • Use tracking data with player positions and velocities
  • Implement time-to-intercept calculations
  • Generate pitch control surfaces for multiple frames
  • Compare your model to simple Voronoi diagrams
  • Analyze how pitch control changes during a counter-attack

Task: Analyze how a team's shape changes throughout a match.

Requirements:

  • Calculate team compactness metrics every 5 minutes
  • Track defensive line height over time
  • Identify periods of high vs low pressing
  • Correlate shape metrics with match events (goals, cards)
  • Create a timeline visualization of shape evolution

Chapter Summary

Key Takeaways
  • Zone systems: Different granularities serve different purposes—use thirds for overview, fine grids for modeling
  • Heat maps: Kernel density estimation reveals continuous action distributions beyond discrete zone counts
  • Territorial control: Field tilt and zone dominance show where teams actually play, beyond possession percentage
  • Voronoi diagrams: Fundamental for understanding space control and are used in advanced pitch control models
  • Pitch control models: Extend Voronoi by incorporating velocity and physical parameters for more realistic space analysis
  • Spatial clustering: DBSCAN and similar methods reveal natural action groupings and tactical patterns
  • Team shape metrics: Compactness, length, width, and defensive line analysis reveal tactical organization
Spatial Analysis Applications
Tactical Analysis
  • Formation detection
  • Pressing triggers
  • Build-up patterns
  • Defensive shape
Player Evaluation
  • Role identification
  • Movement profiles
  • Space creation ability
  • Defensive coverage
Model Building
  • xG model features
  • EPV surfaces
  • Pass value models
  • Pitch control
Key Libraries and Tools
R Libraries
  • sf - Spatial features and operations
  • ggvoronoi - Voronoi visualization
  • MASS - Kernel density estimation
  • dbscan - Spatial clustering
Python Libraries
  • geopandas - Geospatial operations
  • scipy.spatial - Voronoi, convex hull
  • mplsoccer - Football pitch visualization
  • sklearn.cluster - DBSCAN, KMeans