Chapter 60

Capstone - Complete Analytics System

Intermediate 30 min read 5 sections 10 code examples

0 of 60 chapters completed (0%)

Learning Objectives

Understand spatial analysis concepts in football context
Create and analyze pitch zone grids of varying granularity
Build heat maps and kernel density estimates for player actions
Calculate territorial control and dominance metrics
Analyze passing patterns using spatial network analysis
Implement Voronoi diagrams for space control visualization
Use spatial clustering to identify action hotspots
Compare spatial profiles across players and teams

The Spatial Dimension of Football

Football is fundamentally a spatial game. Teams compete for territorial control, create and exploit space, and organize defensively to deny opponents space. Spatial analytics provides the tools to quantify these dynamics and uncover patterns invisible to the naked eye.

Why Spatial Analysis?

Traditional statistics count events but ignore where they happen. A pass completion rate of 85% means different things if those passes are all in the defensive third versus the attacking third. Spatial analysis adds the crucial "where" dimension to football analytics.

Zone Analysis

Divide pitch into actionable zones

Heat Maps

Visualize action density

Voronoi

Space control regions

Clustering

Identify action hotspots

spatial_intro.py

# Python: Introduction to Spatial Data in Football
import pandas as pd
import numpy as np
import geopandas as gpd
from shapely.geometry import Point
import matplotlib.pyplot as plt

# Load event data with locations
events = pd.read_csv("match_events.csv")

# Examine spatial distribution
spatial_summary = events.dropna(subset=["location_x", "location_y"]).agg({
    "location_x": ["count", "min", "max", "mean"],
    "location_y": ["min", "max", "mean"]
})
print("Spatial Summary:")
print(spatial_summary)

# Create GeoDataFrame for spatial operations
events_clean = events.dropna(subset=["location_x", "location_y"])
geometry = [Point(xy) for xy in zip(events_clean["location_x"],
                                     events_clean["location_y"])]
events_gdf = gpd.GeoDataFrame(events_clean, geometry=geometry)

# Basic spatial plot
fig, ax = plt.subplots(figsize=(12, 8))
ax.set_facecolor("darkgreen")
ax.set_xlim(0, 120)
ax.set_ylim(0, 80)

# Plot events by type
for event_type in events_gdf["type"].unique()[:5]:
    subset = events_gdf[events_gdf["type"] == event_type]
    ax.scatter(subset["location_x"], subset["location_y"],
               label=event_type, alpha=0.5, s=20)

ax.legend()
ax.set_title("Event Locations")
plt.show()
# R: Introduction to Spatial Data in Football
library(tidyverse)
library(sf)

# Load event data with locations
events <- read_csv("match_events.csv")

# Examine spatial distribution
spatial_summary <- events %>%
    filter(!is.na(location_x), !is.na(location_y)) %>%
    summarize(
        n_events = n(),
        x_min = min(location_x),
        x_max = max(location_x),
        y_min = min(location_y),
        y_max = max(location_y),
        x_mean = mean(location_x),
        y_mean = mean(location_y)
    )

print(spatial_summary)

# Create spatial points object
events_sf <- events %>%
    filter(!is.na(location_x), !is.na(location_y)) %>%
    st_as_sf(coords = c("location_x", "location_y"))

# Basic spatial plot
ggplot() +
    # Pitch outline
    annotate("rect", xmin = 0, xmax = 120, ymin = 0, ymax = 80,
             fill = "darkgreen", alpha = 0.3) +
    geom_sf(data = events_sf, aes(color = type), alpha = 0.5, size = 1) +
    coord_sf() +
    theme_minimal() +
    labs(title = "Event Locations")

Output

Spatial Summary:
           location_x                  location_y
count      1623.000000
min           0.100000                     0.100000
max         119.900000                    79.900000
mean         60.234000                    39.876000

Pitch Zone Analysis

Dividing the pitch into zones allows aggregation and comparison of actions across different areas. The choice of zone system depends on your analysis goals.

Zone System	Grid Size	Best For
Thirds	3 zones (Defensive, Middle, Attacking)	High-level territorial analysis
18-Zone	6x3 grid	Tactical zone analysis
Juego de Posición	5 vertical lanes, 4 horizontal bands	Positional play analysis
Fine Grid	12x8 or finer	Heat maps, xG models

pitch_zones.py

# Python: Creating Pitch Zone Systems
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle

def create_zone_grid(n_x=6, n_y=3, pitch_length=120, pitch_width=80):
    """Create a grid of pitch zones."""
    zone_width = pitch_length / n_x
    zone_height = pitch_width / n_y

    zones = []
    zone_id = 1
    for x_zone in range(1, n_x + 1):
        for y_zone in range(1, n_y + 1):
            zones.append({
                "zone_id": zone_id,
                "x_zone": x_zone,
                "y_zone": y_zone,
                "x_min": (x_zone - 1) * zone_width,
                "x_max": x_zone * zone_width,
                "y_min": (y_zone - 1) * zone_height,
                "y_max": y_zone * zone_height,
                "x_center": (x_zone - 0.5) * zone_width,
                "y_center": (y_zone - 0.5) * zone_height,
            })
            zone_id += 1

    return pd.DataFrame(zones)

# Create 18-zone grid (6x3)
zones_18 = create_zone_grid(6, 3)

def assign_zones(events, zones):
    """Assign zone IDs to events based on location."""
    n_x = zones["x_zone"].max()
    n_y = zones["y_zone"].max()

    events = events.copy()
    events["x_zone"] = np.ceil(events["location_x"] / (120 / n_x)).clip(1, n_x).astype(int)
    events["y_zone"] = np.ceil(events["location_y"] / (80 / n_y)).clip(1, n_y).astype(int)
    events["zone_id"] = (events["x_zone"] - 1) * n_y + events["y_zone"]

    return events

# Assign zones to events
events_zoned = assign_zones(events, zones_18)

# Calculate zone statistics
zone_stats = events_zoned.groupby("zone_id").agg(
    n_events=("index", "count"),
    n_passes=("type", lambda x: (x == "Pass").sum()),
    n_shots=("type", lambda x: (x == "Shot").sum()),
).reset_index()

zone_stats = zone_stats.merge(zones_18, on="zone_id")
print(zone_stats)
# R: Creating Pitch Zone Systems
library(tidyverse)

# Define zone systems
create_zone_grid <- function(n_x = 6, n_y = 3, pitch_length = 120, pitch_width = 80) {
    zone_width <- pitch_length / n_x
    zone_height <- pitch_width / n_y

    zones <- expand_grid(
        x_zone = 1:n_x,
        y_zone = 1:n_y
    ) %>%
        mutate(
            zone_id = row_number(),
            x_min = (x_zone - 1) * zone_width,
            x_max = x_zone * zone_width,
            y_min = (y_zone - 1) * zone_height,
            y_max = y_zone * zone_height,
            x_center = (x_min + x_max) / 2,
            y_center = (y_min + y_max) / 2
        )

    return(zones)
}

# Create 18-zone grid (6x3)
zones_18 <- create_zone_grid(6, 3)

# Assign events to zones
assign_zones <- function(events, zones) {
    events %>%
        mutate(
            x_zone = ceiling(location_x / (120 / max(zones$x_zone))),
            y_zone = ceiling(location_y / (80 / max(zones$y_zone))),
            zone_id = (x_zone - 1) * max(zones$y_zone) + y_zone
        ) %>%
        # Clamp to valid zone range
        mutate(
            x_zone = pmin(pmax(x_zone, 1), max(zones$x_zone)),
            y_zone = pmin(pmax(y_zone, 1), max(zones$y_zone))
        )
}

# Assign zones to events
events_zoned <- assign_zones(events, zones_18)

# Calculate zone statistics
zone_stats <- events_zoned %>%
    group_by(zone_id) %>%
    summarize(
        n_events = n(),
        n_passes = sum(type == "Pass"),
        n_shots = sum(type == "Shot"),
        pass_success_rate = mean(type == "Pass" & pass_outcome == "Complete", na.rm = TRUE),
        .groups = "drop"
    ) %>%
    left_join(zones_18, by = "zone_id")

print(zone_stats)

Output

   zone_id  n_events  n_passes  n_shots  x_zone  y_zone  x_min  x_max  ...
0        1        45        32        0       1       1    0.0   20.0  ...
1        2        67        48        0       1       2    0.0   20.0  ...
2        3        52        38        0       1       3    0.0   20.0  ...
3        4        89        62        0       2       1   20.0   40.0  ...
...

Visualizing Zone Statistics

zone_visualization.py

# Python: Visualize Zone Statistics on Pitch
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
from matplotlib.colors import LinearSegmentedColormap
import numpy as np

def plot_zone_heatmap(zone_stats, metric="n_events", title="Zone Activity"):
    """Plot zone statistics as a heatmap on the pitch."""
    fig, ax = plt.subplots(figsize=(14, 9))

    # Color normalization
    values = zone_stats[metric].values
    vmin, vmax = values.min(), values.max()

    # Custom colormap (green to orange)
    cmap = LinearSegmentedColormap.from_list("pitch", ["#2E7D32", "#FF5722"])

    # Draw zones
    for _, zone in zone_stats.iterrows():
        color_val = (zone[metric] - vmin) / (vmax - vmin) if vmax > vmin else 0
        rect = Rectangle(
            (zone["x_min"], zone["y_min"]),
            zone["x_max"] - zone["x_min"],
            zone["y_max"] - zone["y_min"],
            facecolor=cmap(color_val),
            edgecolor="white",
            linewidth=2
        )
        ax.add_patch(rect)

        # Zone label
        ax.text(zone["x_center"], zone["y_center"],
                f"{zone[metric]:.0f}",
                ha="center", va="center",
                color="white", fontweight="bold", fontsize=12)

    # Pitch markings
    ax.plot([60, 60], [0, 80], "w-", linewidth=1)
    ax.add_patch(Rectangle((0, 18), 18, 44, fill=False, edgecolor="white"))
    ax.add_patch(Rectangle((102, 18), 18, 44, fill=False, edgecolor="white"))

    ax.set_xlim(0, 120)
    ax.set_ylim(0, 80)
    ax.set_aspect("equal")
    ax.set_facecolor("#1a472a")
    ax.set_title(title, fontsize=14, fontweight="bold", color="white")
    ax.axis("off")

    # Colorbar
    sm = plt.cm.ScalarMappable(cmap=cmap, norm=plt.Normalize(vmin, vmax))
    plt.colorbar(sm, ax=ax, label=metric, shrink=0.6)

    plt.tight_layout()
    return fig

# Plot event density by zone
fig = plot_zone_heatmap(zone_stats, "n_events", "Event Density by Zone")
plt.show()
# R: Visualize Zone Statistics on Pitch
library(ggplot2)

plot_zone_heatmap <- function(zone_stats, metric = "n_events", title = "Zone Activity") {
    ggplot(zone_stats) +
        # Zone rectangles
        geom_rect(aes(xmin = x_min, xmax = x_max,
                      ymin = y_min, ymax = y_max,
                      fill = !!sym(metric)),
                  color = "white", size = 0.5) +
        # Zone labels
        geom_text(aes(x = x_center, y = y_center,
                      label = round(!!sym(metric), 1)),
                  color = "white", fontface = "bold", size = 4) +
        # Pitch markings
        annotate("rect", xmin = 0, xmax = 120, ymin = 0, ymax = 80,
                 fill = NA, color = "white", size = 1) +
        annotate("segment", x = 60, xend = 60, y = 0, yend = 80,
                 color = "white", size = 0.5) +
        # Penalty areas
        annotate("rect", xmin = 0, xmax = 18, ymin = 18, ymax = 62,
                 fill = NA, color = "white", size = 0.5) +
        annotate("rect", xmin = 102, xmax = 120, ymin = 18, ymax = 62,
                 fill = NA, color = "white", size = 0.5) +
        scale_fill_gradient(low = "#2E7D32", high = "#FF5722") +
        coord_fixed() +
        theme_void() +
        labs(title = title, fill = metric)
}

# Plot event density by zone
plot_zone_heatmap(zone_stats, "n_events", "Event Density by Zone")

Heat Maps and Kernel Density

Heat maps provide continuous representations of action density, revealing patterns that discrete zone analysis might miss.

heatmaps.py

# Python: Creating Heat Maps with Kernel Density
import numpy as np
import pandas as pd
from scipy import stats
import matplotlib.pyplot as plt
from mplsoccer import Pitch

def create_heatmap_data(events, bandwidth=5):
    """Create kernel density estimate for event locations."""
    # Filter to events with valid locations
    loc_data = events.dropna(subset=["location_x", "location_y"])

    x = loc_data["location_x"].values
    y = loc_data["location_y"].values

    # Create grid
    xx, yy = np.mgrid[0:120:100j, 0:80:100j]
    positions = np.vstack([xx.ravel(), yy.ravel()])

    # Kernel density estimation
    kernel = stats.gaussian_kde([x, y], bw_method=bandwidth / 100)
    density = kernel(positions).reshape(xx.shape)

    return xx, yy, density

# Create player touch heatmap
player_touches = events[
    (events["player"] == "Lionel Messi") &
    (events["type"].isin(["Pass", "Carry", "Shot", "Dribble"]))
]

xx, yy, density = create_heatmap_data(player_touches, bandwidth=8)

# Plot using mplsoccer
pitch = Pitch(pitch_type="statsbomb", pitch_color="#1a472a", line_color="white")
fig, ax = pitch.draw(figsize=(12, 8))

# Add heatmap
heatmap = ax.contourf(xx, yy, density, levels=20, cmap="YlOrRd", alpha=0.8)
plt.colorbar(heatmap, ax=ax, label="Density")

ax.set_title("Lionel Messi - Touch Heatmap", fontsize=14)
plt.show()

# Alternative: Using mplsoccer built-in heatmap
pitch = Pitch(pitch_type="statsbomb", pitch_color="#1a472a", line_color="white")
fig, ax = pitch.draw(figsize=(12, 8))

# mplsoccer heatmap
bin_statistic = pitch.bin_statistic(
    player_touches["location_x"],
    player_touches["location_y"],
    statistic="count",
    bins=(12, 8)
)
pitch.heatmap(bin_statistic, ax=ax, cmap="YlOrRd", edgecolors="#1a472a")
plt.show()
# R: Creating Heat Maps with Kernel Density
library(tidyverse)
library(MASS)

create_heatmap_data <- function(events, bandwidth = 5) {
    # Filter to events with valid locations
    loc_data <- events %>%
        filter(!is.na(location_x), !is.na(location_y))

    # Create kernel density estimate
    kde <- kde2d(
        x = loc_data$location_x,
        y = loc_data$location_y,
        n = 100,  # Grid resolution
        h = bandwidth,  # Bandwidth (smoothing)
        lims = c(0, 120, 0, 80)  # Pitch limits
    )

    # Convert to dataframe for ggplot
    heatmap_df <- expand_grid(
        x = kde$x,
        y = kde$y
    ) %>%
        mutate(
            density = as.vector(kde$z)
        )

    return(heatmap_df)
}

# Create player touch heatmap
player_touches <- events %>%
    filter(player == "Lionel Messi",
           type %in% c("Pass", "Carry", "Shot", "Dribble"))

heatmap_data <- create_heatmap_data(player_touches, bandwidth = 8)

# Plot heatmap
ggplot() +
    # Pitch background
    annotate("rect", xmin = 0, xmax = 120, ymin = 0, ymax = 80,
             fill = "#1a472a") +
    # Heatmap
    geom_raster(data = heatmap_data,
                aes(x = x, y = y, fill = density),
                interpolate = TRUE) +
    scale_fill_gradient2(low = "transparent", mid = "yellow",
                         high = "red", midpoint = max(heatmap_data$density) / 2) +
    # Pitch markings
    annotate("segment", x = 60, xend = 60, y = 0, yend = 80, color = "white") +
    coord_fixed() +
    theme_void() +
    labs(title = "Lionel Messi - Touch Heatmap")

Comparative Heat Maps

comparison_heatmaps.py

# Python: Comparative Heat Maps (Player vs Team Average)
import numpy as np
import matplotlib.pyplot as plt
from mplsoccer import Pitch

def create_comparison_heatmap(player_events, team_events, player_name):
    """Create heatmap comparing player to team average."""
    # Calculate densities
    xx_p, yy_p, player_density = create_heatmap_data(player_events)
    xx_t, yy_t, team_density = create_heatmap_data(team_events)

    # Normalize team density to player sample size
    scale_factor = len(player_events) / len(team_events)
    team_density_scaled = team_density * scale_factor

    # Calculate difference
    diff_density = player_density - team_density_scaled

    # Plot
    pitch = Pitch(pitch_type="statsbomb", pitch_color="#1a472a", line_color="white")
    fig, ax = pitch.draw(figsize=(12, 8))

    # Difference heatmap (red = more than team, blue = less than team)
    heatmap = ax.contourf(
        xx_p, yy_p, diff_density,
        levels=np.linspace(diff_density.min(), diff_density.max(), 20),
        cmap="RdBu_r",
        alpha=0.8
    )
    plt.colorbar(heatmap, ax=ax, label="Diff from Team Avg")

    ax.set_title(f"{player_name} - Deviation from Team Average", fontsize=14)
    return fig

# Compare Messi to team average
messi_events = events[events["player"] == "Lionel Messi"]
barcelona_events = events[events["team"] == "Barcelona"]

fig = create_comparison_heatmap(messi_events, barcelona_events, "Lionel Messi")
plt.show()
# R: Comparative Heat Maps (Player vs Team Average)
library(tidyverse)
library(patchwork)

create_comparison_heatmap <- function(player_events, team_events, player_name) {
    # Get player and team densities
    player_kde <- create_heatmap_data(player_events, bandwidth = 8)
    team_kde <- create_heatmap_data(team_events, bandwidth = 8)

    # Calculate difference (player - team average)
    comparison <- player_kde %>%
        mutate(
            team_density = team_kde$density / nrow(team_events) * nrow(player_events),
            diff_density = density - team_density,
            pct_diff = (density - team_density) / (team_density + 0.001) * 100
        )

    # Plot showing where player is more/less active than team average
    ggplot() +
        annotate("rect", xmin = 0, xmax = 120, ymin = 0, ymax = 80,
                 fill = "#1a472a") +
        geom_raster(data = comparison,
                    aes(x = x, y = y, fill = diff_density),
                    interpolate = TRUE) +
        scale_fill_gradient2(low = "blue", mid = "white", high = "red",
                            midpoint = 0, name = "Diff from\nTeam Avg") +
        coord_fixed() +
        theme_void() +
        labs(title = sprintf("%s - Deviation from Team Average", player_name))
}

# Compare Messi to team average
messi_comparison <- create_comparison_heatmap(
    player_events = events %>% filter(player == "Lionel Messi"),
    team_events = events %>% filter(team == "Barcelona"),
    player_name = "Lionel Messi"
)

print(messi_comparison)

Territorial Control Metrics

Territorial control measures which team dominates different areas of the pitch. This goes beyond possession percentage to show where a team is actually playing.

territorial_control.py

# Python: Calculating Territorial Control
import pandas as pd
import numpy as np

def calculate_territorial_control(events, team1, team2, zone_grid):
    """Calculate territorial control metrics by zone."""
    # Assign zones
    n_x = zone_grid["x_zone"].max()
    n_y = zone_grid["y_zone"].max()

    events = events.copy()
    events["x_zone"] = np.ceil(events["location_x"] / (120 / n_x)).clip(1, n_x).astype(int)
    events["y_zone"] = np.ceil(events["location_y"] / (80 / n_y)).clip(1, n_y).astype(int)
    events["zone_id"] = (events["x_zone"] - 1) * n_y + events["y_zone"]

    # Count events by team and zone
    zone_counts = events.groupby(["zone_id", "team"]).size().unstack(fill_value=0)

    # Calculate control metrics
    zone_control = pd.DataFrame({
        "zone_id": zone_counts.index,
        team1: zone_counts.get(team1, 0),
        team2: zone_counts.get(team2, 0)
    })

    zone_control["total"] = zone_control[team1] + zone_control[team2]
    zone_control["team1_share"] = zone_control[team1] / zone_control["total"]
    zone_control["team2_share"] = zone_control[team2] / zone_control["total"]
    zone_control["dominance"] = zone_control["team1_share"] - zone_control["team2_share"]
    zone_control["contested"] = (
        (zone_control["team1_share"] > 0.3) & (zone_control["team2_share"] > 0.3)
    )

    zone_control = zone_control.merge(zone_grid, on="zone_id")

    # Overall metrics
    overall = {
        "team1_territory": zone_control["team1_share"].mean(),
        "team2_territory": zone_control["team2_share"].mean(),
        "field_tilt_team1": (
            zone_control[zone_control["x_zone"] >= 4][team1].sum() /
            zone_control[zone_control["x_zone"] >= 4]["total"].sum()
        )
    }

    return {"zones": zone_control, "overall": overall}

# Calculate territorial control
zones = create_zone_grid(6, 3)
territorial = calculate_territorial_control(events, "Barcelona", "Real Madrid", zones)

print("Territorial Control:")
print(f"Barcelona: {territorial['overall']['team1_territory']*100:.1f}%")
print(f"Real Madrid: {territorial['overall']['team2_territory']*100:.1f}%")
print(f"Barcelona Field Tilt: {territorial['overall']['field_tilt_team1']*100:.1f}%")
# R: Calculating Territorial Control
library(tidyverse)

calculate_territorial_control <- function(events, team1, team2, zone_grid) {
    # Assign zones to events
    events_zoned <- events %>%
        mutate(
            x_zone = ceiling(location_x / (120 / max(zone_grid$x_zone))),
            y_zone = ceiling(location_y / (80 / max(zone_grid$y_zone)))
        ) %>%
        mutate(
            x_zone = pmin(pmax(x_zone, 1), max(zone_grid$x_zone)),
            y_zone = pmin(pmax(y_zone, 1), max(zone_grid$y_zone)),
            zone_id = (x_zone - 1) * max(zone_grid$y_zone) + y_zone
        )

    # Count events by team and zone
    zone_control <- events_zoned %>%
        count(zone_id, team, name = "n_events") %>%
        pivot_wider(
            names_from = team,
            values_from = n_events,
            values_fill = 0
        )

    # Calculate control metrics
    zone_control <- zone_control %>%
        mutate(
            total = !!sym(team1) + !!sym(team2),
            team1_share = !!sym(team1) / total,
            team2_share = !!sym(team2) / total,
            dominance = team1_share - team2_share,  # Positive = team1 dominant
            contested = pmin(team1_share, team2_share) > 0.3  # Both teams active
        ) %>%
        left_join(zone_grid, by = "zone_id")

    # Overall territorial metrics
    overall <- list(
        team1_territory = mean(zone_control$team1_share, na.rm = TRUE),
        team2_territory = mean(zone_control$team2_share, na.rm = TRUE),
        field_tilt_team1 = zone_control %>%
            filter(x_zone >= 4) %>%  # Attacking half
            summarize(tilt = sum(!!sym(team1)) / sum(total)) %>%
            pull(tilt)
    )

    return(list(zones = zone_control, overall = overall))
}

# Calculate territorial control
zones <- create_zone_grid(6, 3)
territorial <- calculate_territorial_control(events, "Barcelona", "Real Madrid", zones)

cat("Territorial Control:\n")
cat(sprintf("Barcelona: %.1f%%\n", territorial$overall$team1_territory * 100))
cat(sprintf("Real Madrid: %.1f%%\n", territorial$overall$team2_territory * 100))
cat(sprintf("Barcelona Field Tilt: %.1f%%\n", territorial$overall$field_tilt_team1 * 100))

Output

Territorial Control:
Barcelona: 58.3%
Real Madrid: 41.7%
Barcelona Field Tilt: 62.1%

Voronoi Diagrams for Space Control

Voronoi diagrams partition the pitch into regions where each player controls the space closest to them. This is fundamental for understanding space occupation and is used in advanced pitch control models.

voronoi.py

# Python: Creating Voronoi Diagrams
import numpy as np
import pandas as pd
from scipy.spatial import Voronoi, voronoi_plot_2d
import matplotlib.pyplot as plt
from matplotlib.patches import Polygon
from matplotlib.collections import PatchCollection

def create_voronoi_frame(player_positions, pitch_bounds=(0, 120, 0, 80)):
    """Create Voronoi diagram from player positions."""
    # Add boundary points to ensure proper tessellation
    x_min, x_max, y_min, y_max = pitch_bounds
    boundary_points = np.array([
        [x_min - 10, y_min - 10], [x_max + 10, y_min - 10],
        [x_min - 10, y_max + 10], [x_max + 10, y_max + 10]
    ])

    # Combine player positions with boundary
    points = np.array(list(zip(player_positions["x"], player_positions["y"])))
    all_points = np.vstack([points, boundary_points])

    # Calculate Voronoi
    vor = Voronoi(all_points)

    return vor, points

def plot_voronoi_pitch(player_positions, pitch_bounds=(0, 120, 0, 80)):
    """Plot Voronoi diagram on pitch."""
    vor, points = create_voronoi_frame(player_positions, pitch_bounds)
    x_min, x_max, y_min, y_max = pitch_bounds

    fig, ax = plt.subplots(figsize=(14, 9))
    ax.set_facecolor("#1a472a")

    # Plot Voronoi regions
    for i, region_idx in enumerate(vor.point_region[:len(player_positions)]):
        if region_idx == -1:
            continue

        region = vor.regions[region_idx]
        if -1 in region or len(region) == 0:
            continue

        # Get vertices and clip to pitch
        vertices = np.array([vor.vertices[v] for v in region])
        vertices[:, 0] = np.clip(vertices[:, 0], x_min, x_max)
        vertices[:, 1] = np.clip(vertices[:, 1], y_min, y_max)

        # Color by team
        team = player_positions.iloc[i]["team"]
        color = "#3498db" if team == "Home" else "#e74c3c"

        polygon = Polygon(vertices, alpha=0.3, facecolor=color, edgecolor="white")
        ax.add_patch(polygon)

    # Plot player positions
    for team, color, marker in [("Home", "#3498db", "o"), ("Away", "#e74c3c", "s")]:
        team_players = player_positions[player_positions["team"] == team]
        ax.scatter(team_players["x"], team_players["y"],
                   c=color, s=200, marker=marker, edgecolor="white",
                   linewidth=2, label=team, zorder=5)

    ax.set_xlim(x_min, x_max)
    ax.set_ylim(y_min, y_max)
    ax.set_aspect("equal")
    ax.legend()
    ax.set_title("Voronoi Space Control", fontsize=14)
    plt.show()

# Example freeze frame
freeze_frame = pd.DataFrame({
    "player": ["GK1", "DEF1", "DEF2", "MID1", "MID2", "FWD1",
               "GK2", "DEF3", "DEF4", "MID3", "MID4", "FWD2"],
    "x": [5, 25, 30, 45, 50, 75, 115, 90, 85, 70, 65, 45],
    "y": [40, 25, 55, 35, 50, 40, 40, 55, 25, 60, 20, 40],
    "team": ["Home"]*6 + ["Away"]*6
})

plot_voronoi_pitch(freeze_frame)
# R: Creating Voronoi Diagrams
library(tidyverse)
library(ggvoronoi)
library(deldir)

create_voronoi_frame <- function(player_positions, pitch_bounds = c(0, 120, 0, 80)) {
    # Player positions at a moment in time
    # Input: dataframe with player, x, y, team columns

    # Calculate Voronoi tessellation
    voronoi <- deldir(
        x = player_positions$x,
        y = player_positions$y,
        rw = pitch_bounds  # Bounding rectangle
    )

    # Extract tiles
    tiles <- tile.list(voronoi)

    # Convert to dataframe for plotting
    tile_df <- map_df(seq_along(tiles), function(i) {
        tile <- tiles[[i]]
        tibble(
            player = player_positions$player[i],
            team = player_positions$team[i],
            x = tile$x,
            y = tile$y,
            tile_id = i
        )
    })

    return(tile_df)
}

# Example: Create Voronoi from freeze frame data
freeze_frame <- data.frame(
    player = c("GK1", "DEF1", "DEF2", "MID1", "MID2", "FWD1",
               "GK2", "DEF3", "DEF4", "MID3", "MID4", "FWD2"),
    x = c(5, 25, 30, 45, 50, 75, 115, 90, 85, 70, 65, 45),
    y = c(40, 25, 55, 35, 50, 40, 40, 55, 25, 60, 20, 40),
    team = c(rep("Home", 6), rep("Away", 6))
)

voronoi_tiles <- create_voronoi_frame(freeze_frame)

# Calculate space controlled by each team
space_control <- voronoi_tiles %>%
    group_by(tile_id, team) %>%
    summarize(
        area = abs(sum(x * lead(y, default = first(y)) -
                      lead(x, default = first(x)) * y) / 2),
        .groups = "drop"
    ) %>%
    group_by(team) %>%
    summarize(total_area = sum(area))

print(space_control)

Spatial Clustering

Spatial clustering identifies natural groupings of actions on the pitch, revealing tactical patterns like preferred shooting zones, passing lanes, or defensive blocks.

spatial_clustering.py

# Python: Spatial Clustering of Actions
import pandas as pd
import numpy as np
from sklearn.cluster import DBSCAN, KMeans
import matplotlib.pyplot as plt

def cluster_shots(shots, eps=8, min_samples=3):
    """Cluster shot locations using DBSCAN."""
    # Extract coordinates
    coords = shots[["location_x", "location_y"]].values

    # Run DBSCAN
    clustering = DBSCAN(eps=eps, min_samples=min_samples)
    shots = shots.copy()
    shots["cluster"] = clustering.fit_predict(coords)

    # Summarize clusters
    cluster_summary = shots[shots["cluster"] >= 0].groupby("cluster").agg(
        n_shots=("index", "count"),
        n_goals=("shot_outcome", lambda x: (x == "Goal").sum()),
        avg_xg=("shot_statsbomb_xg", "mean"),
        center_x=("location_x", "mean"),
        center_y=("location_y", "mean")
    ).reset_index()

    cluster_summary["conversion_rate"] = (
        cluster_summary["n_goals"] / cluster_summary["n_shots"]
    )

    # Label clusters
    def label_cluster(row):
        if row["center_x"] > 102:
            if 30 < row["center_y"] < 50:
                return "Central Box"
            elif row["center_y"] <= 30:
                return "Right Side Box"
            else:
                return "Left Side Box"
        elif row["center_x"] > 80:
            return "Edge of Box"
        else:
            return "Long Range"

    cluster_summary["cluster_label"] = cluster_summary.apply(label_cluster, axis=1)

    return {"shots": shots, "clusters": cluster_summary}

# Cluster team shots
shots_data = events[(events["type"] == "Shot") & (events["team"] == "Barcelona")]
shot_clusters = cluster_shots(shots_data, eps=6, min_samples=2)

print("Shot Clusters:")
print(shot_clusters["clusters"])

# Visualize clusters
def plot_shot_clusters(shot_data, cluster_summary):
    """Plot shot clusters on pitch."""
    fig, ax = plt.subplots(figsize=(12, 8))
    ax.set_facecolor("#1a472a")

    # Plot shots colored by cluster
    scatter = ax.scatter(
        shot_data["location_x"],
        shot_data["location_y"],
        c=shot_data["cluster"],
        cmap="tab10",
        s=100,
        alpha=0.7,
        edgecolors="white"
    )

    # Plot cluster centers
    for _, cluster in cluster_summary.iterrows():
        ax.scatter(cluster["center_x"], cluster["center_y"],
                   marker="X", s=300, c="yellow", edgecolors="black",
                   linewidths=2, zorder=10)
        ax.annotate(cluster["cluster_label"],
                    (cluster["center_x"], cluster["center_y"]),
                    textcoords="offset points", xytext=(10, 10),
                    fontsize=9, color="white")

    ax.set_xlim(60, 120)
    ax.set_ylim(0, 80)
    ax.set_aspect("equal")
    ax.set_title("Shot Clusters", fontsize=14)
    plt.show()

plot_shot_clusters(shot_clusters["shots"], shot_clusters["clusters"])
# R: Spatial Clustering of Actions
library(tidyverse)
library(dbscan)
library(cluster)

# DBSCAN clustering for shot locations
cluster_shots <- function(shots, eps = 8, min_pts = 3) {
    # Extract coordinates
    coords <- shots %>%
        select(location_x, location_y) %>%
        as.matrix()

    # Run DBSCAN
    clustering <- dbscan(coords, eps = eps, minPts = min_pts)

    shots$cluster <- clustering$cluster

    # Summarize clusters
    cluster_summary <- shots %>%
        filter(cluster > 0) %>%  # Exclude noise (cluster 0)
        group_by(cluster) %>%
        summarize(
            n_shots = n(),
            n_goals = sum(outcome == "Goal"),
            avg_xg = mean(xg, na.rm = TRUE),
            center_x = mean(location_x),
            center_y = mean(location_y),
            .groups = "drop"
        ) %>%
        mutate(
            conversion_rate = n_goals / n_shots,
            cluster_label = case_when(
                center_x > 102 & center_y > 30 & center_y < 50 ~ "Central Box",
                center_x > 102 & center_y <= 30 ~ "Right Side Box",
                center_x > 102 & center_y >= 50 ~ "Left Side Box",
                center_x <= 102 & center_x > 80 ~ "Edge of Box",
                TRUE ~ "Long Range"
            )
        )

    return(list(shots = shots, clusters = cluster_summary))
}

# Cluster team shots
shot_clusters <- cluster_shots(
    events %>% filter(type == "Shot", team == "Barcelona"),
    eps = 6,
    min_pts = 2
)

print(shot_clusters$clusters)

Output

Shot Clusters:
   cluster  n_shots  n_goals    avg_xg  center_x  center_y  conversion_rate   cluster_label
0        0       12        3  0.152     108.2      38.5             0.250     Central Box
1        1        6        1  0.087     105.1      22.3             0.167   Right Side Box
2        2        5        0  0.065      92.4      40.1             0.000     Edge of Box

Practice Exercises

Task: Create a side-by-side comparison of touch heat maps for two players in the same position.

Requirements:

Load event data for two full-backs from the same league
Create kernel density heat maps for each player
Display them side by side with the same color scale
Calculate and display metrics: average touch position, spread

Task: Analyze passing patterns between pitch zones for a team.

Requirements:

Create a 6x3 zone grid
Calculate passes between each zone pair
Visualize as a flow diagram or chord diagram
Identify the most common passing lanes

Task: Using tracking data, analyze how space control changes during a possession sequence.

Requirements:

Load tracking data for a possession sequence
Calculate Voronoi diagrams at each frame
Track how each team's controlled area changes
Identify the moment of maximum space creation

Pitch Control Models

Pitch control models extend Voronoi analysis by incorporating player velocities, reaction times, and physical capabilities to estimate which team is most likely to reach each point on the pitch first. This provides a more realistic representation of space control.

Key Concept: From Static to Dynamic

While Voronoi diagrams assume all players have equal ability to reach any point, pitch control models account for current velocity, maximum speed, and time to intercept. This makes them far more accurate for real-time tactical analysis.

Components of Pitch Control

Player Velocity

Current speed and direction affect how quickly a player can reach different areas of the pitch.

Reaction Time

Time delay before a player can change direction or accelerate towards the ball.

Physical Parameters

Maximum speed, acceleration, and deceleration capabilities vary by player.

pitch_control.py

# Python: Pitch Control Model Implementation
import numpy as np
import pandas as pd
from scipy.ndimage import gaussian_filter
import matplotlib.pyplot as plt

class PitchControlModel:
    """
    Pitch control model based on Spearman et al. (2017).
    Estimates probability of each team controlling space.
    """

    def __init__(self, max_speed=5.0, reaction_time=0.7,
                 sigma=0.45, time_to_control=0.5):
        self.max_speed = max_speed
        self.reaction_time = reaction_time
        self.sigma = sigma
        self.time_to_control = time_to_control

    def time_to_intercept(self, player_x, player_y, player_vx, player_vy,
                          target_x, target_y):
        """Calculate time for player to reach target location."""
        # Distance to target
        dx = target_x - player_x
        dy = target_y - player_y
        distance = np.sqrt(dx**2 + dy**2)

        if distance > 0:
            # Direction to target
            dir_x = dx / distance
            dir_y = dy / distance
            # Velocity component towards target
            velocity_towards = player_vx * dir_x + player_vy * dir_y
        else:
            velocity_towards = 0

        # Effective speed (accounting for current momentum)
        effective_speed = max(velocity_towards, 0) + self.max_speed * 0.7

        # Total time = reaction + movement
        time = self.reaction_time + distance / effective_speed

        return time

    def control_probability(self, target_x, target_y,
                           team_players, opponent_players):
        """Calculate control probability at a single point."""
        # Get minimum intercept time for each team
        team_times = [
            self.time_to_intercept(p["x"], p["y"], p["vx"], p["vy"],
                                   target_x, target_y)
            for p in team_players
        ]

        opp_times = [
            self.time_to_intercept(p["x"], p["y"], p["vx"], p["vy"],
                                   target_x, target_y)
            for p in opponent_players
        ]

        min_team_time = min(team_times)
        min_opp_time = min(opp_times)

        # Logistic function for probability
        team_prob = 1 / (1 + np.exp(min_team_time / self.sigma))
        opp_prob = 1 / (1 + np.exp(min_opp_time / self.sigma))

        # Normalize
        total = team_prob + opp_prob
        control = team_prob / total if total > 0 else 0.5

        return control

    def generate_surface(self, team_players, opponent_players,
                        grid_size=50, pitch_length=120, pitch_width=80):
        """Generate full pitch control surface."""
        x_grid = np.linspace(0, pitch_length, grid_size)
        y_grid = np.linspace(0, pitch_width, grid_size)

        control_surface = np.zeros((grid_size, grid_size))

        for i, x in enumerate(x_grid):
            for j, y in enumerate(y_grid):
                control_surface[j, i] = self.control_probability(
                    x, y, team_players, opponent_players
                )

        # Apply smoothing
        control_surface = gaussian_filter(control_surface, sigma=1)

        return x_grid, y_grid, control_surface

    def plot_pitch_control(self, team_players, opponent_players):
        """Visualize pitch control surface."""
        x_grid, y_grid, surface = self.generate_surface(
            team_players, opponent_players
        )

        fig, ax = plt.subplots(figsize=(14, 9))

        # Plot control surface
        im = ax.contourf(x_grid, y_grid, surface, levels=20,
                         cmap="RdBu", vmin=0, vmax=1, alpha=0.8)
        plt.colorbar(im, ax=ax, label="Control Probability (Team)")

        # Plot players
        for p in team_players:
            ax.scatter(p["x"], p["y"], c="blue", s=200,
                      edgecolors="white", linewidth=2, zorder=5)
            ax.arrow(p["x"], p["y"], p["vx"]*2, p["vy"]*2,
                    head_width=1, head_length=0.5, fc="blue", ec="blue")

        for p in opponent_players:
            ax.scatter(p["x"], p["y"], c="red", s=200,
                      edgecolors="white", linewidth=2, zorder=5)
            ax.arrow(p["x"], p["y"], p["vx"]*2, p["vy"]*2,
                    head_width=1, head_length=0.5, fc="red", ec="red")

        ax.set_xlim(0, 120)
        ax.set_ylim(0, 80)
        ax.set_aspect("equal")
        ax.set_title("Pitch Control Model", fontsize=14)

        # Calculate summary stats
        team_control = np.mean(surface)
        attacking_control = np.mean(surface[:, x_grid > 80])

        ax.text(60, -5, f"Overall: {team_control:.1%} | Attacking Third: {attacking_control:.1%}",
                ha="center", fontsize=10)

        return fig

# Example usage
model = PitchControlModel()

team_players = [
    {"x": 5, "y": 40, "vx": 0, "vy": 0},
    {"x": 25, "y": 20, "vx": 2, "vy": 1},
    {"x": 30, "y": 60, "vx": 1, "vy": -1},
    {"x": 45, "y": 40, "vx": 3, "vy": 0},
    {"x": 55, "y": 25, "vx": 4, "vy": 2},
    {"x": 70, "y": 50, "vx": 3, "vy": -1},
]

opponent_players = [
    {"x": 115, "y": 40, "vx": 0, "vy": 0},
    {"x": 95, "y": 55, "vx": -3, "vy": 1},
    {"x": 90, "y": 25, "vx": -2, "vy": -1},
    {"x": 75, "y": 45, "vx": -4, "vy": 0},
    {"x": 65, "y": 60, "vx": -2, "vy": 1},
    {"x": 50, "y": 35, "vx": -1, "vy": -2},
]

fig = model.plot_pitch_control(team_players, opponent_players)
plt.show()
# R: Pitch Control Model Implementation
library(tidyverse)

# Physical parameters
PARAMS <- list(
    max_speed = 5.0,           # m/s maximum player speed
    reaction_time = 0.7,       # seconds reaction time
    avg_ball_speed = 15.0,     # m/s average ball speed
    time_to_control = 0.5,     # seconds to control ball once reached
    sigma = 0.45               # uncertainty parameter
)

# Calculate time for player to reach target location
time_to_intercept <- function(player_x, player_y, player_vx, player_vy,
                               target_x, target_y, params = PARAMS) {
    # Distance to target
    dx <- target_x - player_x
    dy <- target_y - player_y
    distance <- sqrt(dx^2 + dy^2)

    # Account for current velocity
    # Project velocity onto direction to target
    if (distance > 0) {
        dir_x <- dx / distance
        dir_y <- dy / distance
        velocity_towards <- player_vx * dir_x + player_vy * dir_y
    } else {
        velocity_towards <- 0
    }

    # Time = reaction_time + time to cover distance
    # Using kinematic equation with current velocity
    effective_speed <- max(velocity_towards, 0) + params$max_speed * 0.7
    time <- params$reaction_time + distance / effective_speed

    return(time)
}

# Calculate pitch control probability at a point
pitch_control_at_point <- function(target_x, target_y,
                                    team_positions, opponent_positions,
                                    params = PARAMS) {

    # Calculate intercept times for all players
    team_times <- map_dbl(1:nrow(team_positions), function(i) {
        time_to_intercept(
            team_positions$x[i], team_positions$y[i],
            team_positions$vx[i], team_positions$vy[i],
            target_x, target_y, params
        )
    })

    opponent_times <- map_dbl(1:nrow(opponent_positions), function(i) {
        time_to_intercept(
            opponent_positions$x[i], opponent_positions$y[i],
            opponent_positions$vx[i], opponent_positions$vy[i],
            target_x, target_y, params
        )
    })

    # Convert times to probabilities using logistic function
    team_prob <- 1 / (1 + exp(min(team_times) / params$sigma))
    opponent_prob <- 1 / (1 + exp(min(opponent_times) / params$sigma))

    # Normalize
    total <- team_prob + opponent_prob
    control_prob <- team_prob / total

    return(control_prob)
}

# Generate pitch control surface
generate_pitch_control <- function(team_pos, opp_pos,
                                    grid_size = 50, params = PARAMS) {
    # Create grid
    x_grid <- seq(0, 120, length.out = grid_size)
    y_grid <- seq(0, 80, length.out = grid_size)

    # Calculate control at each point
    control_surface <- expand_grid(x = x_grid, y = y_grid) %>%
        rowwise() %>%
        mutate(
            control = pitch_control_at_point(x, y, team_pos, opp_pos, params)
        ) %>%
        ungroup()

    return(control_surface)
}

# Example: Calculate pitch control
team_positions <- tibble(
    player = paste0("T", 1:11),
    x = c(5, 20, 25, 30, 35, 45, 50, 55, 70, 75, 90),
    y = c(40, 20, 60, 35, 50, 40, 25, 55, 30, 50, 40),
    vx = c(0, 2, 1, 3, 2, 4, 3, 2, 5, 4, 3),
    vy = c(0, 1, -1, 0, 2, -1, 2, -2, 1, 0, 1)
)

opponent_positions <- tibble(
    player = paste0("O", 1:11),
    x = c(115, 100, 95, 90, 85, 75, 70, 65, 50, 45, 30),
    y = c(40, 55, 25, 45, 35, 60, 20, 45, 50, 30, 40),
    vx = c(0, -3, -2, -4, -3, -2, -1, -3, -2, -1, 0),
    vy = c(0, 1, -1, 0, -2, 1, -1, 2, 0, 1, 0)
)

pitch_control <- generate_pitch_control(team_positions, opponent_positions)

# Summary stats
cat("Team Total Control:", mean(pitch_control$control) * 100, "%\n")
cat("Attacking Third Control:",
    mean(pitch_control$control[pitch_control$x > 80]) * 100, "%\n")

Output

Team Total Control: 54.2%
Attacking Third Control: 38.7%

Expected Possession Value from Pitch Control

Pitch control can be combined with expected possession value (EPV) models to estimate the value of different passing options.

epv_pitch_control.py

# Python: EPV from Pitch Control
import numpy as np
import pandas as pd

class EPVModel:
    """Expected Possession Value model integrated with pitch control."""

    def __init__(self, grid_size=50, pitch_length=120, pitch_width=80):
        self.grid_size = grid_size
        self.pitch_length = pitch_length
        self.pitch_width = pitch_width
        self.epv_surface = self._create_epv_surface()

    def _create_epv_surface(self):
        """Create base EPV surface."""
        x = np.linspace(0, self.pitch_length, self.grid_size)
        y = np.linspace(0, self.pitch_width, self.grid_size)
        xx, yy = np.meshgrid(x, y)

        # Distance to goal
        goal_x, goal_y = 120, 40
        distance = np.sqrt((goal_x - xx)**2 + (goal_y - yy)**2)

        # Centrality (higher in middle)
        centrality = 1 - np.abs(yy - 40) / 40

        # Base EPV
        epv = np.maximum(0, 0.1 - distance / 1500) * (1 + centrality * 0.5)

        # Boost in penalty box
        in_box = (xx > 102) & (yy > 18) & (yy < 62)
        epv[in_box] *= 2

        return epv

    def get_epv(self, x, y):
        """Get EPV at specific location."""
        i = int(np.clip(x / self.pitch_length * (self.grid_size - 1),
                        0, self.grid_size - 1))
        j = int(np.clip(y / self.pitch_width * (self.grid_size - 1),
                        0, self.grid_size - 1))
        return self.epv_surface[j, i]

    def calculate_controlled_epv(self, pitch_control_surface):
        """Calculate EPV weighted by pitch control."""
        controlled_epv = pitch_control_surface * self.epv_surface
        return np.sum(controlled_epv) / controlled_epv.size

    def evaluate_pass_options(self, ball_pos, targets,
                              pitch_control_model, team_players, opp_players):
        """Evaluate potential pass targets using EPV and control."""
        results = []

        for target in targets:
            # Control probability at target
            control = pitch_control_model.control_probability(
                target["x"], target["y"], team_players, opp_players
            )

            # EPV at target
            epv = self.get_epv(target["x"], target["y"])

            # Pass success probability (distance-based)
            pass_dist = np.sqrt((target["x"] - ball_pos["x"])**2 +
                               (target["y"] - ball_pos["y"])**2)
            pass_success = 1 / (1 + np.exp((pass_dist - 25) / 10))

            # Expected value
            expected_value = control * epv * pass_success

            results.append({
                "target_x": target["x"],
                "target_y": target["y"],
                "control_prob": control,
                "epv": epv,
                "pass_success": pass_success,
                "expected_value": expected_value
            })

        return pd.DataFrame(results).sort_values("expected_value", ascending=False)

# Example: Evaluate pass options
epv_model = EPVModel()
pc_model = PitchControlModel()

ball_pos = {"x": 60, "y": 40}
pass_targets = [
    {"x": 70, "y": 30},
    {"x": 75, "y": 50},
    {"x": 85, "y": 40},
    {"x": 65, "y": 60},
]

pass_evaluation = epv_model.evaluate_pass_options(
    ball_pos, pass_targets, pc_model, team_players, opponent_players
)
print("Pass Options Ranked by Expected Value:")
print(pass_evaluation)
# R: EPV from Pitch Control
library(tidyverse)

# Simplified EPV grid (pre-calculated)
create_epv_grid <- function(grid_size = 50) {
    x_grid <- seq(0, 120, length.out = grid_size)
    y_grid <- seq(0, 80, length.out = grid_size)

    expand_grid(x = x_grid, y = y_grid) %>%
        mutate(
            # EPV increases towards goal, higher in central areas
            distance_to_goal = sqrt((120 - x)^2 + (40 - y)^2),
            centrality = 1 - abs(y - 40) / 40,
            base_epv = pmax(0, 0.1 - distance_to_goal / 1500) * (1 + centrality * 0.5),
            # Boost in box
            in_box = x > 102 & y > 18 & y < 62,
            epv = ifelse(in_box, base_epv * 2, base_epv)
        )
}

epv_grid <- create_epv_grid()

# Calculate controlled EPV
calculate_controlled_epv <- function(pitch_control, epv_grid) {
    # Join pitch control with EPV
    combined <- pitch_control %>%
        left_join(epv_grid %>% select(x, y, epv), by = c("x", "y"))

    # Controlled EPV = control probability * EPV at that location
    controlled_epv <- combined %>%
        mutate(controlled_epv = control * epv)

    # Total controlled EPV
    total_controlled <- sum(controlled_epv$controlled_epv) / nrow(controlled_epv)

    return(list(
        surface = controlled_epv,
        total = total_controlled
    ))
}

# Evaluate pass options using pitch control
evaluate_pass_options <- function(ball_pos, target_positions,
                                   team_pos, opp_pos, epv_grid) {
    pass_values <- map_df(1:nrow(target_positions), function(i) {
        target <- target_positions[i, ]

        # Estimate control at target after pass
        control_at_target <- pitch_control_at_point(
            target$x, target$y, team_pos, opp_pos
        )

        # Get EPV at target
        epv_at_target <- epv_grid %>%
            filter(abs(x - target$x) < 2, abs(y - target$y) < 2) %>%
            summarize(epv = mean(epv)) %>%
            pull(epv)

        # Pass distance and risk
        pass_dist <- sqrt((target$x - ball_pos$x)^2 +
                          (target$y - ball_pos$y)^2)
        pass_success_prob <- 1 / (1 + exp((pass_dist - 25) / 10))

        tibble(
            target_id = i,
            target_x = target$x,
            target_y = target$y,
            control_prob = control_at_target,
            epv_at_target = epv_at_target,
            pass_success = pass_success_prob,
            expected_value = control_at_target * epv_at_target * pass_success_prob
        )
    })

    return(pass_values %>% arrange(desc(expected_value)))
}

Spatial Passing Networks

Combining spatial analysis with network analysis reveals how teams connect different areas of the pitch. Spatial passing networks show not just who passes to whom, but where on the pitch these connections occur.

spatial_passing_network.py

# Python: Spatial Passing Networks
import pandas as pd
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
from matplotlib.patches import FancyArrowPatch

def create_spatial_pass_network(passes, zones):
    """Create a network of passes between pitch zones."""
    n_x = zones["x_zone"].max()
    n_y = zones["y_zone"].max()

    # Assign zones
    passes = passes.copy()
    passes["origin_zone"] = (
        np.ceil(passes["location_x"] / (120 / n_x)).clip(1, n_x).astype(int) * n_y +
        np.ceil(passes["location_y"] / (80 / n_y)).clip(1, n_y).astype(int)
    )
    passes["dest_zone"] = (
        np.ceil(passes["pass_end_x"] / (120 / n_x)).clip(1, n_x).astype(int) * n_y +
        np.ceil(passes["pass_end_y"] / (80 / n_y)).clip(1, n_y).astype(int)
    )

    # Edge list
    edges = passes.groupby(["origin_zone", "dest_zone"]).agg(
        n_passes=("id", "count"),
        success_rate=("pass_outcome", lambda x: (x.isna()).mean()),
        progressive=("pass_end_x", lambda x: (x > passes.loc[x.index, "location_x"]).mean())
    ).reset_index()

    edges = edges[edges["n_passes"] >= 3]

    # Create NetworkX graph
    G = nx.DiGraph()

    for _, row in edges.iterrows():
        G.add_edge(
            row["origin_zone"], row["dest_zone"],
            weight=row["n_passes"],
            success_rate=row["success_rate"],
            progressive=row["progressive"]
        )

    # Add node positions
    for zone_id in G.nodes():
        zone = zones[zones["zone_id"] == zone_id].iloc[0]
        G.nodes[zone_id]["x"] = zone["x_center"]
        G.nodes[zone_id]["y"] = zone["y_center"]

    return G, edges

def plot_spatial_pass_network(G, zones):
    """Visualize spatial passing network on pitch."""
    fig, ax = plt.subplots(figsize=(14, 9))
    ax.set_facecolor("#1a472a")

    # Draw pitch markings
    ax.plot([60, 60], [0, 80], "w-", linewidth=1, alpha=0.5)
    ax.add_patch(plt.Rectangle((0, 18), 18, 44, fill=False,
                                edgecolor="white", alpha=0.5))
    ax.add_patch(plt.Rectangle((102, 18), 18, 44, fill=False,
                                edgecolor="white", alpha=0.5))

    # Get edge weights for scaling
    weights = [G[u][v]["weight"] for u, v in G.edges()]
    max_weight = max(weights) if weights else 1

    # Draw edges
    for u, v in G.edges():
        weight = G[u][v]["weight"]
        x1, y1 = G.nodes[u]["x"], G.nodes[u]["y"]
        x2, y2 = G.nodes[v]["x"], G.nodes[v]["y"]

        # Line thickness based on pass count
        linewidth = 1 + (weight / max_weight) * 5

        # Color based on progressive or not
        color = "#FFD700" if G[u][v]["progressive"] > 0.5 else "#87CEEB"

        ax.annotate("", xy=(x2, y2), xytext=(x1, y1),
                   arrowprops=dict(arrowstyle="->", color=color,
                                  lw=linewidth, alpha=0.7))

    # Draw nodes
    node_sizes = [300 * (G.degree(n) + 1) for n in G.nodes()]
    xs = [G.nodes[n]["x"] for n in G.nodes()]
    ys = [G.nodes[n]["y"] for n in G.nodes()]

    ax.scatter(xs, ys, s=node_sizes, c="#FFFFFF",
              edgecolors="#333333", linewidth=2, zorder=5)

    ax.set_xlim(0, 120)
    ax.set_ylim(0, 80)
    ax.set_aspect("equal")
    ax.set_title("Spatial Passing Network", fontsize=14, color="white")
    ax.axis("off")

    return fig

# Create and plot network
zones = create_zone_grid(6, 4)
passes = events[(events["type"] == "Pass") & (events["team"] == "Barcelona")]
G, edges = create_spatial_pass_network(passes, zones)

print("Top Zone-to-Zone Connections:")
print(edges.nlargest(10, "n_passes"))

fig = plot_spatial_pass_network(G, zones)
plt.show()
# R: Spatial Passing Networks
library(tidyverse)
library(igraph)

create_spatial_pass_network <- function(passes, zones) {
    # Assign zones to pass origins and destinations
    passes_zoned <- passes %>%
        mutate(
            origin_zone = ceiling(location_x / (120 / max(zones$x_zone))) *
                          max(zones$y_zone) +
                          ceiling(location_y / (80 / max(zones$y_zone))),
            dest_zone = ceiling(pass_end_x / (120 / max(zones$x_zone))) *
                        max(zones$y_zone) +
                        ceiling(pass_end_y / (80 / max(zones$y_zone)))
        )

    # Create edge list (zone to zone passes)
    zone_edges <- passes_zoned %>%
        filter(!is.na(origin_zone), !is.na(dest_zone)) %>%
        count(origin_zone, dest_zone, name = "n_passes") %>%
        filter(n_passes >= 3)  # Minimum passes threshold

    # Calculate additional metrics
    zone_edges <- zone_edges %>%
        left_join(
            passes_zoned %>%
                group_by(origin_zone, dest_zone) %>%
                summarize(
                    success_rate = mean(pass_outcome == "Complete", na.rm = TRUE),
                    avg_angle = mean(atan2(pass_end_y - location_y,
                                          pass_end_x - location_x) * 180 / pi),
                    progressive = mean(pass_end_x > location_x),
                    .groups = "drop"
                ),
            by = c("origin_zone", "dest_zone")
        )

    # Create igraph object
    g <- graph_from_data_frame(zone_edges, directed = TRUE)

    # Add zone attributes
    V(g)$zone_id <- as.numeric(V(g)$name)
    V(g)$x_center <- zones$x_center[match(V(g)$zone_id, zones$zone_id)]
    V(g)$y_center <- zones$y_center[match(V(g)$zone_id, zones$zone_id)]

    return(list(
        graph = g,
        edges = zone_edges,
        node_metrics = tibble(
            zone_id = as.numeric(V(g)$name),
            out_degree = degree(g, mode = "out"),
            in_degree = degree(g, mode = "in"),
            betweenness = betweenness(g),
            pagerank = page_rank(g)$vector
        )
    ))
}

# Create network
zones <- create_zone_grid(6, 4)
passes <- events %>% filter(type == "Pass", team == "Barcelona")
pass_network <- create_spatial_pass_network(passes, zones)

# Top passing connections
top_connections <- pass_network$edges %>%
    arrange(desc(n_passes)) %>%
    head(10)

print("Top Zone-to-Zone Passing Connections:")
print(top_connections)

Zone Transition Analysis

Understanding how teams progress the ball through different zones reveals their build-up patterns and attacking approaches.

zone_transitions.py

# Python: Zone Transition Analysis
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

def analyze_zone_transitions(events, n_zones=6):
    """Analyze how possessions progress through horizontal zones."""
    events = events.copy()

    # Assign horizontal zones
    events["x_zone"] = np.ceil(events["location_x"] / (120 / n_zones)).clip(1, n_zones).astype(int)

    # Track zone changes within possessions
    events["prev_zone"] = events.groupby("possession")["x_zone"].shift(1)
    events["zone_changed"] = events["x_zone"] != events["prev_zone"]

    transitions = events[events["zone_changed"] & events["prev_zone"].notna()].copy()
    transitions["zone_from"] = transitions["prev_zone"].astype(int)
    transitions["zone_to"] = transitions["x_zone"].astype(int)

    # Transition matrix
    transition_matrix = pd.crosstab(
        transitions["zone_from"],
        transitions["zone_to"],
        margins=True
    )

    # Direction analysis
    transitions["direction"] = np.where(
        transitions["zone_to"] > transitions["zone_from"], "Progressive",
        np.where(transitions["zone_to"] < transitions["zone_from"], "Regressive", "Lateral")
    )

    direction_summary = transitions["direction"].value_counts(normalize=True) * 100

    return {
        "matrix": transition_matrix,
        "direction_summary": direction_summary,
        "transitions": transitions
    }

def plot_transition_sankey(transitions):
    """Plot Sankey diagram of zone transitions."""
    from matplotlib.sankey import Sankey

    # Simplified flow visualization
    fig, ax = plt.subplots(figsize=(14, 8))

    # Aggregate flows
    flows = transitions["transitions"].groupby(
        ["zone_from", "zone_to"]
    ).size().reset_index(name="count")

    # Create heatmap of transitions
    matrix = transitions["matrix"].iloc[:-1, :-1]  # Remove margins

    sns.heatmap(matrix, annot=True, fmt="d", cmap="YlOrRd", ax=ax)
    ax.set_xlabel("Zone To")
    ax.set_ylabel("Zone From")
    ax.set_title("Zone Transition Matrix")

    return fig

# Analyze transitions
events_team = events[events["team"] == "Barcelona"]
transitions = analyze_zone_transitions(events_team, n_zones=6)

print("Zone Transition Matrix:")
print(transitions["matrix"])
print("\nDirection Summary:")
print(transitions["direction_summary"])

fig = plot_transition_sankey(transitions)
plt.show()
# R: Zone Transition Analysis
library(tidyverse)
library(sankey)

analyze_zone_transitions <- function(possessions, zones) {
    # Track ball progression through zones during possessions
    transitions <- possessions %>%
        group_by(possession_id) %>%
        arrange(minute, second) %>%
        mutate(
            x_zone = ceiling(location_x / (120 / max(zones$x_zone))),
            zone_change = x_zone != lag(x_zone),
            zone_from = lag(x_zone),
            zone_to = x_zone
        ) %>%
        filter(!is.na(zone_from), zone_change) %>%
        ungroup()

    # Summarize transitions
    transition_matrix <- transitions %>%
        count(zone_from, zone_to, name = "count") %>%
        pivot_wider(names_from = zone_to, values_from = count, values_fill = 0)

    # Progressive vs regressive transitions
    progression_summary <- transitions %>%
        mutate(
            direction = case_when(
                zone_to > zone_from ~ "Progressive",
                zone_to < zone_from ~ "Regressive",
                TRUE ~ "Lateral"
            )
        ) %>%
        count(direction) %>%
        mutate(pct = n / sum(n) * 100)

    # Success rate by transition type
    transition_success <- transitions %>%
        mutate(
            successful = lead(team) == team,  # Team kept possession
            direction = case_when(
                zone_to > zone_from ~ "Progressive",
                zone_to < zone_from ~ "Regressive",
                TRUE ~ "Lateral"
            )
        ) %>%
        group_by(direction) %>%
        summarize(
            n = n(),
            success_rate = mean(successful, na.rm = TRUE) * 100,
            .groups = "drop"
        )

    return(list(
        matrix = transition_matrix,
        summary = progression_summary,
        success = transition_success
    ))
}

# Analyze transitions
zones <- create_zone_grid(6, 1)  # 6 horizontal zones only
possessions <- events %>% filter(team == "Barcelona")
transitions <- analyze_zone_transitions(possessions, zones)

print("Zone Transition Matrix:")
print(transitions$matrix)
print("\nProgression Summary:")
print(transitions$summary)

Advanced Spatial Metrics

Beyond basic spatial analysis, advanced metrics can capture complex spatial patterns like compactness, width, and defensive shape.

Team Compactness

Measures how tightly grouped a team's players are. Compact teams are harder to play through but may leave space in wide areas.

Length: Distance between deepest and highest outfield players
Width: Distance between widest players
Surface Area: Convex hull of player positions

Defensive Shape

Analyzes the organization and positioning of defensive units.

Line Height: Average position of defensive line
Line Flatness: Variance in defender positions
Block Solidity: Gaps between defensive lines

team_shape_analysis.py

# Python: Advanced Spatial Metrics
import numpy as np
import pandas as pd
from scipy.spatial import ConvexHull
from scipy.spatial.distance import cdist

class TeamShapeAnalyzer:
    """Analyze team shape and structure from player positions."""

    def __init__(self, positions):
        """
        positions: DataFrame with columns: player, x, y, position
        """
        self.positions = positions
        self.outfield = positions[positions["position"] != "GK"]

    def calculate_basic_shape(self):
        """Calculate basic shape metrics."""
        x = self.outfield["x"].values
        y = self.outfield["y"].values

        metrics = {
            "length": x.max() - x.min(),
            "width": y.max() - y.min(),
            "centroid_x": x.mean(),
            "centroid_y": y.mean()
        }

        # Convex hull area
        if len(self.outfield) >= 3:
            points = np.column_stack([x, y])
            hull = ConvexHull(points)
            metrics["surface_area"] = hull.volume  # In 2D, volume is area
        else:
            metrics["surface_area"] = 0

        # Compactness (average distance to centroid)
        distances = np.sqrt(
            (x - metrics["centroid_x"])**2 +
            (y - metrics["centroid_y"])**2
        )
        metrics["compactness"] = distances.mean()

        # Stretch index
        metrics["stretch_index"] = (
            metrics["length"] / metrics["width"]
            if metrics["width"] > 0 else 0
        )

        return metrics

    def calculate_defensive_shape(self):
        """Analyze defensive line metrics."""
        defenders = self.positions[
            self.positions["position"].isin(["CB", "LB", "RB", "LWB", "RWB"])
        ].sort_values("y")

        if len(defenders) < 2:
            return {"error": "Not enough defenders"}

        metrics = {
            "line_height": defenders["x"].mean(),
            "line_flatness": defenders["x"].std(),
        }

        # Gaps between defenders
        gaps = np.diff(defenders["y"].values)
        metrics["max_gap"] = gaps.max()
        metrics["avg_gap"] = gaps.mean()

        # Midfield line
        midfielders = self.positions[
            self.positions["position"].isin(["CM", "CDM", "CAM", "LM", "RM"])
        ]

        if len(midfielders) > 0:
            metrics["midfield_height"] = midfielders["x"].mean()
            metrics["lines_distance"] = metrics["midfield_height"] - metrics["line_height"]

        return metrics

    def calculate_pressing_shape(self, opponent_ball_pos):
        """Analyze team shape relative to ball position."""
        x = self.outfield["x"].values
        y = self.outfield["y"].values

        # Distance to ball for each player
        ball_distances = np.sqrt(
            (x - opponent_ball_pos["x"])**2 +
            (y - opponent_ball_pos["y"])**2
        )

        metrics = {
            "nearest_player_dist": ball_distances.min(),
            "avg_ball_distance": ball_distances.mean(),
            "players_within_10m": (ball_distances < 10).sum(),
            "players_within_20m": (ball_distances < 20).sum(),
        }

        # Press intensity (more players close = higher press)
        metrics["press_intensity"] = np.sum(1 / (ball_distances + 1))

        return metrics

# Example usage
positions = pd.DataFrame({
    "player": [f"P{i}" for i in range(11)],
    "x": [5, 25, 28, 32, 30, 45, 48, 55, 70, 75, 85],
    "y": [40, 15, 30, 50, 65, 25, 55, 40, 30, 60, 40],
    "position": ["GK", "LB", "CB", "CB", "RB", "CM", "CM", "CAM", "LW", "RW", "ST"]
})

analyzer = TeamShapeAnalyzer(positions)

print("Basic Shape Metrics:")
print(analyzer.calculate_basic_shape())

print("\nDefensive Shape Metrics:")
print(analyzer.calculate_defensive_shape())

print("\nPressing Shape (ball at x=90, y=40):")
print(analyzer.calculate_pressing_shape({"x": 90, "y": 40}))
# R: Advanced Spatial Metrics
library(tidyverse)
library(sf)

calculate_team_shape <- function(player_positions) {
    # Filter to outfield players
    outfield <- player_positions %>%
        filter(position != "GK")

    # Basic shape metrics
    length <- max(outfield$x) - min(outfield$x)
    width <- max(outfield$y) - min(outfield$y)

    # Centroid
    centroid_x <- mean(outfield$x)
    centroid_y <- mean(outfield$y)

    # Convex hull area (team surface)
    hull_points <- outfield %>%
        select(x, y) %>%
        as.matrix()
    hull_indices <- chull(hull_points)
    hull_area <- abs(sum(
        hull_points[hull_indices, 1] *
        c(hull_points[hull_indices[-1], 2], hull_points[hull_indices[1], 2]) -
        hull_points[hull_indices, 2] *
        c(hull_points[hull_indices[-1], 1], hull_points[hull_indices[1], 1])
    )) / 2

    # Compactness (smaller = more compact)
    avg_distance_to_centroid <- mean(sqrt(
        (outfield$x - centroid_x)^2 + (outfield$y - centroid_y)^2
    ))

    # Stretch index (ratio of length to width)
    stretch_index <- length / width

    return(tibble(
        length = length,
        width = width,
        surface_area = hull_area,
        centroid_x = centroid_x,
        centroid_y = centroid_y,
        compactness = avg_distance_to_centroid,
        stretch_index = stretch_index
    ))
}

calculate_defensive_metrics <- function(player_positions) {
    # Identify defensive line (usually back 4 or 5)
    defenders <- player_positions %>%
        filter(position %in% c("CB", "LB", "RB", "LWB", "RWB")) %>%
        arrange(x)

    # Line height (average x position)
    line_height <- mean(defenders$x)

    # Line flatness (lower = flatter line)
    line_flatness <- sd(defenders$x)

    # Gaps between defenders
    defenders_sorted <- defenders %>% arrange(y)
    gaps <- diff(defenders_sorted$y)
    max_gap <- max(gaps)
    avg_gap <- mean(gaps)

    # Midfield line metrics
    midfielders <- player_positions %>%
        filter(position %in% c("CM", "CDM", "CAM", "LM", "RM"))

    midfield_height <- mean(midfielders$x)

    # Distance between lines
    line_distance <- midfield_height - line_height

    return(tibble(
        defensive_line_height = line_height,
        line_flatness = line_flatness,
        max_defender_gap = max_gap,
        avg_defender_gap = avg_gap,
        midfield_height = midfield_height,
        lines_distance = line_distance
    ))
}

# Example calculation
team_shape <- calculate_team_shape(team_positions)
defensive_metrics <- calculate_defensive_metrics(team_positions)

print("Team Shape Metrics:")
print(team_shape)
print("\nDefensive Metrics:")
print(defensive_metrics)

Case Study: Analyzing Team Playing Style

This case study demonstrates how to combine multiple spatial analysis techniques to build a comprehensive understanding of a team's playing style.

Objective

Analyze Manchester City's playing style using spatial analytics to understand their territorial dominance, passing patterns, and space creation.

team_spatial_case_study.py

# Python: Complete Spatial Analysis Case Study
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec

class TeamSpatialAnalyzer:
    """Complete spatial analysis for a team."""

    def __init__(self, team_events, team_name):
        self.events = team_events
        self.team_name = team_name
        self.results = {}

    def run_full_analysis(self):
        """Run complete spatial analysis suite."""
        print(f"\n=== Spatial Analysis: {self.team_name} ===\n")

        self._territorial_analysis()
        self._passing_analysis()
        self._shot_analysis()
        self._width_analysis()

        return self.results

    def _territorial_analysis(self):
        """Analyze territorial metrics."""
        print("1. TERRITORIAL METRICS")
        print("-" * 25)

        avg_x = self.events["location_x"].mean()
        avg_y = self.events["location_y"].mean()
        print(f"Average Position: x={avg_x:.1f}, y={avg_y:.1f}")

        # Actions by third
        self.events["third"] = pd.cut(
            self.events["location_x"],
            bins=[0, 40, 80, 120],
            labels=["Defensive", "Middle", "Attacking"]
        )

        thirds = self.events["third"].value_counts(normalize=True) * 100
        print(f"\nActions by Third:")
        print(thirds)

        self.results["territorial"] = {
            "avg_x": avg_x,
            "avg_y": avg_y,
            "thirds": thirds.to_dict()
        }

    def _passing_analysis(self):
        """Analyze passing patterns."""
        print("\n2. PASSING PATTERNS")
        print("-" * 25)

        passes = self.events[self.events["type"] == "Pass"].copy()

        passes["pass_length"] = np.sqrt(
            (passes["pass_end_x"] - passes["location_x"])**2 +
            (passes["pass_end_y"] - passes["location_y"])**2
        )
        passes["is_forward"] = passes["pass_end_x"] > passes["location_x"]
        passes["is_progressive"] = (passes["pass_end_x"] - passes["location_x"]) > 10

        print(f"Total Passes: {len(passes)}")
        print(f"Avg Pass Length: {passes['pass_length'].mean():.1f}m")
        print(f"Forward Pass %: {passes['is_forward'].mean()*100:.1f}%")
        print(f"Progressive Pass %: {passes['is_progressive'].mean()*100:.1f}%")

        self.results["passing"] = {
            "total": len(passes),
            "avg_length": passes["pass_length"].mean(),
            "forward_pct": passes["is_forward"].mean() * 100,
            "progressive_pct": passes["is_progressive"].mean() * 100
        }

    def _shot_analysis(self):
        """Analyze shot locations."""
        print("\n3. SHOT ANALYSIS")
        print("-" * 25)

        shots = self.events[self.events["type"] == "Shot"]

        n_shots = len(shots)
        avg_x = shots["location_x"].mean()
        in_box = (
            (shots["location_x"] > 102) &
            (shots["location_y"] > 18) &
            (shots["location_y"] < 62)
        ).sum()

        print(f"Total Shots: {n_shots}")
        print(f"Average Shot Position: x={avg_x:.1f}")
        print(f"Shots in Box: {in_box} ({in_box/n_shots*100:.1f}%)")

        self.results["shots"] = {
            "total": n_shots,
            "avg_x": avg_x,
            "in_box": in_box,
            "in_box_pct": in_box / n_shots * 100 if n_shots > 0 else 0
        }

    def _width_analysis(self):
        """Analyze pitch width utilization."""
        print("\n4. WIDTH UTILIZATION")
        print("-" * 25)

        self.events["channel"] = pd.cut(
            self.events["location_y"],
            bins=[0, 20, 30, 50, 60, 80],
            labels=["Left Wide", "Left Half-Space", "Central",
                   "Right Half-Space", "Right Wide"]
        )

        width = self.events["channel"].value_counts(normalize=True) * 100
        print(width)

        self.results["width"] = width.to_dict()

    def create_summary_dashboard(self):
        """Create visual summary dashboard."""
        fig = plt.figure(figsize=(16, 12))
        gs = GridSpec(2, 3, figure=fig)

        # 1. Heatmap
        ax1 = fig.add_subplot(gs[0, :2])
        self._plot_heatmap(ax1)

        # 2. Thirds bar chart
        ax2 = fig.add_subplot(gs[0, 2])
        self._plot_thirds(ax2)

        # 3. Pass length distribution
        ax3 = fig.add_subplot(gs[1, 0])
        self._plot_pass_lengths(ax3)

        # 4. Width utilization
        ax4 = fig.add_subplot(gs[1, 1])
        self._plot_width(ax4)

        # 5. Shot map
        ax5 = fig.add_subplot(gs[1, 2])
        self._plot_shots(ax5)

        fig.suptitle(f"{self.team_name} - Spatial Analysis Dashboard",
                     fontsize=16, fontweight="bold")
        plt.tight_layout()

        return fig

    def _plot_heatmap(self, ax):
        """Plot touch heatmap."""
        ax.set_facecolor("#1a472a")
        ax.hexbin(
            self.events["location_x"],
            self.events["location_y"],
            gridsize=20,
            cmap="YlOrRd",
            mincnt=1
        )
        ax.set_xlim(0, 120)
        ax.set_ylim(0, 80)
        ax.set_title("Touch Heatmap")
        ax.set_aspect("equal")

    def _plot_thirds(self, ax):
        """Plot actions by thirds."""
        thirds = self.events["third"].value_counts()
        ax.bar(thirds.index, thirds.values, color=["#2E7D32", "#FFC107", "#D32F2F"])
        ax.set_title("Actions by Third")
        ax.set_ylabel("Count")

    def _plot_pass_lengths(self, ax):
        """Plot pass length distribution."""
        passes = self.events[self.events["type"] == "Pass"]
        lengths = np.sqrt(
            (passes["pass_end_x"] - passes["location_x"])**2 +
            (passes["pass_end_y"] - passes["location_y"])**2
        )
        ax.hist(lengths, bins=30, color="#1976D2", edgecolor="white")
        ax.set_title("Pass Length Distribution")
        ax.set_xlabel("Length (m)")

    def _plot_width(self, ax):
        """Plot width utilization."""
        width = self.events["channel"].value_counts()
        ax.barh(width.index, width.values, color="#7B1FA2")
        ax.set_title("Width Utilization")

    def _plot_shots(self, ax):
        """Plot shot locations."""
        shots = self.events[self.events["type"] == "Shot"]
        ax.set_facecolor("#1a472a")
        ax.scatter(
            shots["location_x"],
            shots["location_y"],
            c="yellow",
            s=100,
            edgecolors="white"
        )
        ax.set_xlim(60, 120)
        ax.set_ylim(0, 80)
        ax.set_title("Shot Locations")
        ax.set_aspect("equal")

# Run analysis
man_city_events = events[events["team"] == "Manchester City"]
analyzer = TeamSpatialAnalyzer(man_city_events, "Manchester City")
results = analyzer.run_full_analysis()
dashboard = analyzer.create_summary_dashboard()
plt.show()
# R: Complete Spatial Analysis Case Study
library(tidyverse)

analyze_team_spatial_style <- function(team_events, team_name) {
    cat(sprintf("\n=== Spatial Analysis: %s ===\n\n", team_name))

    # 1. TERRITORIAL ANALYSIS
    cat("1. TERRITORIAL METRICS\n")
    cat("-----------------------\n")

    avg_x <- mean(team_events$location_x, na.rm = TRUE)
    avg_y <- mean(team_events$location_y, na.rm = TRUE)

    # Actions by third
    thirds <- team_events %>%
        mutate(
            third = case_when(
                location_x < 40 ~ "Defensive",
                location_x < 80 ~ "Middle",
                TRUE ~ "Attacking"
            )
        ) %>%
        count(third) %>%
        mutate(pct = n / sum(n) * 100)

    cat(sprintf("Average Position: x=%.1f, y=%.1f\n", avg_x, avg_y))
    print(thirds)

    # 2. PASSING PATTERNS
    cat("\n2. PASSING PATTERNS\n")
    cat("-------------------\n")

    passes <- team_events %>% filter(type == "Pass")

    # Pass direction analysis
    passes <- passes %>%
        mutate(
            pass_length = sqrt((pass_end_x - location_x)^2 +
                              (pass_end_y - location_y)^2),
            pass_angle = atan2(pass_end_y - location_y,
                              pass_end_x - location_x) * 180 / pi,
            is_forward = pass_end_x > location_x,
            is_progressive = (pass_end_x - location_x) > 10
        )

    cat(sprintf("Total Passes: %d\n", nrow(passes)))
    cat(sprintf("Avg Pass Length: %.1f m\n", mean(passes$pass_length, na.rm = TRUE)))
    cat(sprintf("Forward Pass %%: %.1f%%\n", mean(passes$is_forward) * 100))
    cat(sprintf("Progressive Pass %%: %.1f%%\n", mean(passes$is_progressive) * 100))

    # Passes by zone
    zones <- create_zone_grid(6, 3)
    passes_by_origin <- passes %>%
        mutate(
            zone = ceiling(location_x / 20) * 3 + ceiling(location_y / 26.67)
        ) %>%
        count(zone, name = "passes") %>%
        arrange(desc(passes))

    cat("\nTop Passing Zones:\n")
    print(head(passes_by_origin, 5))

    # 3. SHOT LOCATIONS
    cat("\n3. SHOT ANALYSIS\n")
    cat("-----------------\n")

    shots <- team_events %>% filter(type == "Shot")

    shot_summary <- shots %>%
        summarize(
            n_shots = n(),
            avg_x = mean(location_x, na.rm = TRUE),
            avg_y = mean(location_y, na.rm = TRUE),
            shots_in_box = sum(location_x > 102 & location_y > 18 & location_y < 62),
            in_box_pct = shots_in_box / n_shots * 100
        )

    print(shot_summary)

    # 4. WIDTH ANALYSIS
    cat("\n4. WIDTH UTILIZATION\n")
    cat("---------------------\n")

    width_analysis <- team_events %>%
        mutate(
            channel = case_when(
                location_y < 20 ~ "Left Wide",
                location_y < 30 ~ "Left Half-Space",
                location_y < 50 ~ "Central",
                location_y < 60 ~ "Right Half-Space",
                TRUE ~ "Right Wide"
            )
        ) %>%
        count(channel) %>%
        mutate(pct = n / sum(n) * 100)

    print(width_analysis)

    # 5. CREATE HEATMAP DATA
    heatmap <- create_heatmap_data(team_events, bandwidth = 8)

    return(list(
        territorial = thirds,
        passing = passes_by_origin,
        shots = shot_summary,
        width = width_analysis,
        heatmap = heatmap
    ))
}

# Run analysis
man_city_events <- events %>% filter(team == "Manchester City")
analysis <- analyze_team_spatial_style(man_city_events, "Manchester City")

Practice Exercises

Task: Create a side-by-side comparison of touch heat maps for two players in the same position.

Requirements:

Load event data for two full-backs from the same league
Create kernel density heat maps for each player
Display them side by side with the same color scale
Calculate and display metrics: average touch position, spread

Task: Analyze passing patterns between pitch zones for a team.

Requirements:

Create a 6x3 zone grid
Calculate passes between each zone pair
Visualize as a flow diagram or chord diagram
Identify the most common passing lanes

Task: Using tracking data, analyze how space control changes during a possession sequence.

Requirements:

Load tracking data for a possession sequence
Calculate Voronoi diagrams at each frame
Track how each team's controlled area changes
Identify the moment of maximum space creation

Task: Implement a pitch control model that accounts for player velocities.

Requirements:

Use tracking data with player positions and velocities
Implement time-to-intercept calculations
Generate pitch control surfaces for multiple frames
Compare your model to simple Voronoi diagrams
Analyze how pitch control changes during a counter-attack

Task: Analyze how a team's shape changes throughout a match.

Requirements:

Calculate team compactness metrics every 5 minutes
Track defensive line height over time
Identify periods of high vs low pressing
Correlate shape metrics with match events (goals, cards)
Create a timeline visualization of shape evolution

Chapter Summary

Key Takeaways

Zone systems: Different granularities serve different purposes—use thirds for overview, fine grids for modeling
Heat maps: Kernel density estimation reveals continuous action distributions beyond discrete zone counts
Territorial control: Field tilt and zone dominance show where teams actually play, beyond possession percentage
Voronoi diagrams: Fundamental for understanding space control and are used in advanced pitch control models
Pitch control models: Extend Voronoi by incorporating velocity and physical parameters for more realistic space analysis
Spatial clustering: DBSCAN and similar methods reveal natural action groupings and tactical patterns
Team shape metrics: Compactness, length, width, and defensive line analysis reveal tactical organization

Spatial Analysis Applications

Tactical Analysis

Formation detection
Pressing triggers
Build-up patterns
Defensive shape

Player Evaluation

Role identification
Movement profiles
Space creation ability
Defensive coverage

Model Building

xG model features
EPV surfaces
Pass value models
Pitch control

Key Libraries and Tools

R Libraries

sf - Spatial features and operations
ggvoronoi - Voronoi visualization
MASS - Kernel density estimation
dbscan - Spatial clustering

Python Libraries

geopandas - Geospatial operations
scipy.spatial - Voronoi, convex hull
mplsoccer - Football pitch visualization
sklearn.cluster - DBSCAN, KMeans

Capstone - Complete Analytics System

The Spatial Dimension of Football

Why Spatial Analysis?

Zone Analysis

Heat Maps

Voronoi

Clustering

Pitch Zone Analysis

Visualizing Zone Statistics

Heat Maps and Kernel Density

Comparative Heat Maps

Territorial Control Metrics

Voronoi Diagrams for Space Control

Spatial Clustering

Practice Exercises

Pitch Control Models

Key Concept: From Static to Dynamic

Components of Pitch Control

Player Velocity

Reaction Time

Physical Parameters

Expected Possession Value from Pitch Control

Spatial Passing Networks

Zone Transition Analysis

Advanced Spatial Metrics

Case Study: Analyzing Team Playing Style

Objective

Practice Exercises

Chapter Summary

Key Takeaways

Spatial Analysis Applications

Tactical Analysis

Player Evaluation

Model Building

Key Libraries and Tools

R Libraries

Python Libraries

On This Page

Exercises

Chapter Info

Capstone - Complete Analytics System

The Spatial Dimension of Football

Why Spatial Analysis?

Zone Analysis

Heat Maps

Voronoi

Clustering

Pitch Zone Analysis

Visualizing Zone Statistics

Heat Maps and Kernel Density

Comparative Heat Maps

Territorial Control Metrics

Voronoi Diagrams for Space Control

Spatial Clustering

Practice Exercises

Exercise 1: Create Player Heat Map Comparisons

Exercise 2: Zone-Based Pass Analysis

Exercise 3: Dynamic Voronoi Analysis

Pitch Control Models

Key Concept: From Static to Dynamic

Components of Pitch Control

Player Velocity

Reaction Time

Physical Parameters

Expected Possession Value from Pitch Control

Spatial Passing Networks

Zone Transition Analysis

Advanced Spatial Metrics

Case Study: Analyzing Team Playing Style

Objective

Practice Exercises

Exercise 1: Create Player Heat Map Comparisons

Exercise 2: Zone-Based Pass Analysis

Exercise 3: Dynamic Voronoi Analysis

Exercise 4: Build a Pitch Control Model

Exercise 5: Team Shape Evolution

Chapter Summary

Key Takeaways

Spatial Analysis Applications

Tactical Analysis

Player Evaluation

Model Building

Key Libraries and Tools

R Libraries

Python Libraries

On This Page

Exercises

Chapter Info