Chapter 60

Capstone - Complete Analytics System

Intermediate 30 min read 5 sections 10 code examples
0 of 60 chapters completed (0%)

Network Analysis in Football

Football is fundamentally a network game. Every pass creates a connection between players, every team forms a dynamic network of interactions. Network analysis provides powerful tools to understand team structure, identify key players, and analyze tactical patterns.

Graph Theory Fundamentals

A graph (or network) consists of nodes (vertices) and edges (connections). In football pass networks, players are nodes and passes are edges. Understanding basic graph concepts is essential for network analysis.

Graph Components
  • Nodes: Players on the pitch
  • Edges: Passes between players
  • Directed: Pass has sender/receiver
  • Weighted: Number of passes
  • Degree: Connections per node
Network Metrics
  • Density: Connectedness of network
  • Centrality: Node importance
  • Clustering: Triangular connections
  • Path Length: Steps between nodes
  • Communities: Subgroups in network
network_basics
# Python: Setting up network analysis
import pandas as pd
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt

# Create a simple pass network example
passes = pd.DataFrame({
    "passer": ["GK", "CB1", "CB1", "CB2", "CM1", "CM1", "CM2", "CM2", "LW", "RW", "ST"],
    "receiver": ["CB1", "CM1", "CB2", "CM2", "CM2", "LW", "CM1", "RW", "ST", "ST", "CM1"],
    "count": [15, 22, 18, 20, 28, 14, 25, 16, 8, 10, 5]
})

# Create directed graph
G = nx.DiGraph()

# Add weighted edges
for _, row in passes.iterrows():
    G.add_edge(row["passer"], row["receiver"], weight=row["count"])

# Basic network properties
print(f"Nodes: {G.number_of_nodes()}")
print(f"Edges: {G.number_of_edges()}")
print(f"Density: {nx.density(G):.4f}")
print(f"Is connected: {nx.is_weakly_connected(G)}")
# R: Setting up network analysis
library(tidyverse)
library(igraph)
library(tidygraph)
library(ggraph)

# Create a simple pass network example
passes <- tibble(
  passer = c("GK", "CB1", "CB1", "CB2", "CM1", "CM1", "CM2", "CM2", "LW", "RW", "ST"),
  receiver = c("CB1", "CM1", "CB2", "CM2", "CM2", "LW", "CM1", "RW", "ST", "ST", "CM1"),
  count = c(15, 22, 18, 20, 28, 14, 25, 16, 8, 10, 5)
)

# Create igraph object
g <- graph_from_data_frame(passes, directed = TRUE)

# Add edge weights
E(g)$weight <- passes$count

# Basic network properties
cat("Nodes:", vcount(g), "\n")
cat("Edges:", ecount(g), "\n")
cat("Density:", edge_density(g), "\n")
cat("Is connected:", is_connected(g, mode = "weak"), "\n")
Output
Nodes: 11
Edges: 11
Density: 0.1
Is connected: True

Building Pass Networks from Event Data

To analyze real football networks, we need to extract passing data from event datasets and construct meaningful network representations. This involves aggregating passes between player pairs and handling substitutions.

build_pass_network
# Python: Build pass network from StatsBomb data
from statsbombpy import sb
import networkx as nx

# Load match events
events = sb.events(match_id=3788741)

# Extract passes for one team
team_passes = events[
    (events["type"] == "Pass") &
    (events["team"] == "Barcelona") &
    (events["pass_recipient"].notna())
][["player", "pass_recipient", "location", "pass_end_location"]].copy()

# Parse locations
team_passes["start_x"] = team_passes["location"].apply(lambda x: x[0])
team_passes["start_y"] = team_passes["location"].apply(lambda x: x[1])
team_passes["end_x"] = team_passes["pass_end_location"].apply(lambda x: x[0])
team_passes["end_y"] = team_passes["pass_end_location"].apply(lambda x: x[1])

# Aggregate passes between player pairs
pass_matrix = team_passes.groupby(["player", "pass_recipient"]).agg(
    passes=("player", "count"),
    avg_length=("start_x", lambda x: np.sqrt(
        (team_passes.loc[x.index, "end_x"] - team_passes.loc[x.index, "start_x"])**2 +
        (team_passes.loc[x.index, "end_y"] - team_passes.loc[x.index, "start_y"])**2
    ).mean())
).reset_index()

# Filter minimum passes
pass_matrix = pass_matrix[pass_matrix["passes"] >= 3]

# Calculate average positions
player_positions = team_passes.groupby("player").agg(
    x=("start_x", "mean"),
    y=("start_y", "mean"),
    total_passes=("player", "count")
).reset_index()

# Create network
G = nx.DiGraph()

# Add nodes with positions
for _, row in player_positions.iterrows():
    G.add_node(row["player"], pos=(row["x"], row["y"]),
               passes=row["total_passes"])

# Add edges with weights
for _, row in pass_matrix.iterrows():
    G.add_edge(row["player"], row["pass_recipient"],
               weight=row["passes"])

print(f"Network: {G.number_of_nodes()} players, {G.number_of_edges()} connections")
# R: Build pass network from StatsBomb data
library(StatsBombR)

# Load match data
events <- StatsBombFreeEvents(MatchesDF = Matches) %>%
  filter(match_id == 3788741)  # Example match

# Extract passes for one team
team_passes <- events %>%
  filter(type.name == "Pass",
         team.name == "Barcelona",
         !is.na(pass.recipient.name)) %>%
  select(player.name, pass.recipient.name,
         location.x, location.y,
         pass.end_location.x, pass.end_location.y)

# Aggregate passes between player pairs
pass_matrix <- team_passes %>%
  group_by(passer = player.name, receiver = pass.recipient.name) %>%
  summarise(
    passes = n(),
    avg_length = mean(sqrt((pass.end_location.x - location.x)^2 +
                           (pass.end_location.y - location.y)^2)),
    .groups = "drop"
  ) %>%
  filter(passes >= 3)  # Minimum threshold

# Calculate average positions
player_positions <- team_passes %>%
  group_by(player = player.name) %>%
  summarise(
    x = mean(location.x),
    y = mean(location.y),
    total_passes = n()
  )

# Create network with positions
g <- graph_from_data_frame(pass_matrix, directed = TRUE,
                           vertices = player_positions)
E(g)$weight <- pass_matrix$passes

print(g)
Output
Network: 14 players, 42 connections

Visualizing Pass Networks

Effective visualization is crucial for communicating network insights. We use node size to represent involvement, edge thickness for pass frequency, and spatial positions to show team shape.

visualize_network
# Python: Visualize pass network on pitch
from mplsoccer import Pitch
import matplotlib.pyplot as plt
import numpy as np

# Create pitch
pitch = Pitch(pitch_type="statsbomb", pitch_color="#1a472a",
              line_color="white")
fig, ax = pitch.draw(figsize=(12, 8))

# Get positions
pos = nx.get_node_attributes(G, "pos")

# Calculate degree for node sizing
degrees = dict(G.degree())
max_degree = max(degrees.values())

# Draw edges
for (u, v, d) in G.edges(data=True):
    if u in pos and v in pos:
        x1, y1 = pos[u]
        x2, y2 = pos[v]

        # Line width based on passes
        width = d["weight"] / 10

        ax.annotate("", xy=(x2, y2), xytext=(x1, y1),
                   arrowprops=dict(arrowstyle="->",
                                  color="white",
                                  alpha=0.6,
                                  linewidth=width,
                                  connectionstyle="arc3,rad=0.1"))

# Draw nodes
for node in G.nodes():
    if node in pos:
        x, y = pos[node]
        size = 200 + (degrees[node] / max_degree) * 800
        ax.scatter(x, y, s=size, c="#a50044", edgecolors="white",
                  linewidths=2, zorder=5)
        ax.annotate(node.split()[-1], (x, y + 3),
                   color="white", ha="center", fontsize=8,
                   fontweight="bold")

ax.set_title("Barcelona Pass Network", color="white", fontsize=14)
plt.tight_layout()
plt.savefig("pass_network.png", dpi=150, facecolor="#1a472a")
plt.show()
# R: Visualize pass network on pitch
library(ggplot2)
library(ggsoccer)

# Convert to tidygraph for ggraph
tg <- as_tbl_graph(g) %>%
  activate(nodes) %>%
  mutate(
    degree = centrality_degree(mode = "all"),
    betweenness = centrality_betweenness()
  )

# Get node data for plotting
node_data <- tg %>%
  activate(nodes) %>%
  as_tibble()

# Get edge data
edge_data <- tg %>%
  activate(edges) %>%
  as_tibble() %>%
  left_join(node_data %>% select(name, x, y) %>% rename(from_x = x, from_y = y),
            by = c("from" = "name")) %>%
  left_join(node_data %>% select(name, x, y) %>% rename(to_x = x, to_y = y),
            by = c("to" = "name"))

# Create pitch visualization
ggplot() +
  annotate_pitch(colour = "white", fill = "#1a472a") +

  # Draw edges (passes)
  geom_segment(data = edge_data,
               aes(x = from_x, y = from_y,
                   xend = to_x, yend = to_y,
                   linewidth = weight),
               alpha = 0.6, color = "white",
               arrow = arrow(length = unit(0.15, "cm"))) +

  # Draw nodes (players)
  geom_point(data = node_data,
             aes(x = x, y = y, size = degree),
             color = "#a50044", fill = "#a50044",
             shape = 21, stroke = 2) +

  # Add labels
  geom_text(data = node_data,
            aes(x = x, y = y + 3, label = name),
            color = "white", size = 3, fontface = "bold") +

  scale_linewidth_continuous(range = c(0.5, 3)) +
  scale_size_continuous(range = c(4, 12)) +
  coord_flip(xlim = c(0, 120), ylim = c(0, 80)) +
  theme_pitch() +
  theme(legend.position = "none") +
  labs(title = "Barcelona Pass Network",
       subtitle = "Node size = degree centrality, Edge width = passes")

Centrality Metrics

Centrality metrics quantify the importance of nodes within a network. Different centrality measures capture different aspects of importance - who touches the ball most, who connects different parts of the team, and who controls the flow.

Metric Measures Football Interpretation
Degree Number of connections Passing options / involvement
Betweenness Bridge between groups Playmaker connecting lines
Closeness Average distance to all nodes Ball circulation efficiency
Eigenvector Connected to important nodes Quality of passing partners
PageRank Influence via connections Expected ball reception
centrality_metrics
# Python: Calculate all centrality metrics
import pandas as pd

# Calculate centrality measures
centrality = pd.DataFrame({
    "player": list(G.nodes()),

    # Degree centrality
    "degree_in": [G.in_degree(n) for n in G.nodes()],
    "degree_out": [G.out_degree(n) for n in G.nodes()],
    "degree_total": [G.degree(n) for n in G.nodes()],

    # Betweenness - who bridges groups
    "betweenness": list(nx.betweenness_centrality(G, normalized=True).values()),

    # Closeness - how quickly can reach others
    "closeness": list(nx.closeness_centrality(G).values()),

    # Eigenvector - connected to important players
    "eigenvector": list(nx.eigenvector_centrality(G.to_undirected(),
                                                   max_iter=1000).values()),

    # PageRank - influence measure
    "pagerank": list(nx.pagerank(G, alpha=0.85).values())
})

# Top players by different metrics
print("Top 5 by Betweenness (Playmakers):")
print(centrality.nlargest(5, "betweenness")[["player", "betweenness", "degree_total"]])

print("\nTop 5 by PageRank (Ball Magnets):")
print(centrality.nlargest(5, "pagerank")[["player", "pagerank", "degree_in"]])
# R: Calculate all centrality metrics
library(tidygraph)

# Calculate centrality measures
centrality_df <- tg %>%
  activate(nodes) %>%
  mutate(
    # Degree centrality
    degree_in = centrality_degree(mode = "in"),
    degree_out = centrality_degree(mode = "out"),
    degree_total = centrality_degree(mode = "all"),

    # Betweenness - who bridges groups
    betweenness = centrality_betweenness(directed = TRUE, normalized = TRUE),

    # Closeness - how quickly can reach others
    closeness = centrality_closeness(mode = "all"),

    # Eigenvector - connected to important players
    eigenvector = centrality_eigen(directed = FALSE),

    # PageRank - influence measure
    pagerank = centrality_pagerank(directed = TRUE)
  ) %>%
  as_tibble() %>%
  arrange(desc(betweenness))

# Display top players by different metrics
cat("Top 5 by Betweenness (Playmakers):\n")
centrality_df %>%
  select(name, betweenness, degree_total) %>%
  head(5) %>%
  print()

cat("\nTop 5 by PageRank (Ball Magnets):\n")
centrality_df %>%
  arrange(desc(pagerank)) %>%
  select(name, pagerank, degree_in) %>%
  head(5) %>%
  print()
Output
Top 5 by Betweenness (Playmakers):
           player  betweenness  degree_total
0    Sergio Busquets      0.284           18
1    Lionel Messi         0.198           15
2    Jordi Alba           0.156           12

Top 5 by PageRank (Ball Magnets):
           player  pagerank  degree_in
0    Lionel Messi    0.142         12
1    Sergio Busquets 0.128         14
2    Gerard Pique    0.098          8

Centrality Radar Charts

centrality_radar
# Python: Create centrality radar chart
import matplotlib.pyplot as plt
import numpy as np
from math import pi

# Prepare data
metrics = ["degree_total", "betweenness", "closeness", "eigenvector", "pagerank"]
top_players = centrality.nlargest(5, "betweenness")

# Normalize to 0-100
for col in metrics:
    top_players[col] = (top_players[col] - top_players[col].min()) / \
                       (top_players[col].max() - top_players[col].min()) * 100

# Create radar chart
fig, ax = plt.subplots(figsize=(10, 10), subplot_kw=dict(polar=True))

# Number of variables
N = len(metrics)
angles = [n / float(N) * 2 * pi for n in range(N)]
angles += angles[:1]

# Colors
colors = ["#a50044", "#004d98", "#edbb00", "#00a19c", "#8b0000"]

for idx, (_, row) in enumerate(top_players.iterrows()):
    values = row[metrics].tolist()
    values += values[:1]

    ax.plot(angles, values, "o-", linewidth=2,
            label=row["player"], color=colors[idx])
    ax.fill(angles, values, alpha=0.25, color=colors[idx])

# Add labels
ax.set_xticks(angles[:-1])
ax.set_xticklabels(["Degree", "Betweenness", "Closeness",
                   "Eigenvector", "PageRank"])

ax.legend(loc="upper right", bbox_to_anchor=(1.3, 1.0))
plt.title("Player Centrality Profiles", size=14, y=1.1)
plt.tight_layout()
plt.show()
# R: Create centrality radar chart
library(fmsb)

# Prepare data for radar chart
radar_data <- centrality_df %>%
  select(name, degree_total, betweenness, closeness, eigenvector, pagerank) %>%
  mutate(across(-name, ~ scales::rescale(., to = c(0, 100)))) %>%
  head(5)

# Format for fmsb
radar_matrix <- radar_data %>%
  column_to_rownames("name") %>%
  as.data.frame()

# Add max and min rows
radar_matrix <- rbind(
  rep(100, 5),  # Max
  rep(0, 5),    # Min
  radar_matrix
)

# Create radar chart
colors <- c("#a50044", "#004d98", "#edbb00", "#00a19c", "#8b0000")

radarchart(radar_matrix,
           axistype = 1,
           pcol = colors,
           pfcol = scales::alpha(colors, 0.3),
           plwd = 2,
           cglcol = "grey",
           cglty = 1,
           axislabcol = "grey",
           vlcex = 0.8,
           title = "Player Centrality Profiles")

Team-Level Network Metrics

Beyond individual player metrics, we can characterize entire team networks. These metrics reveal playing style - is the team hierarchical or egalitarian? How connected are the players?

team_network_metrics
# Python: Calculate team network metrics
def calculate_team_network_metrics(G):
    """Calculate comprehensive network metrics for a team."""

    # Convert to undirected for some metrics
    G_undirected = G.to_undirected()

    metrics = {
        # Basic properties
        "nodes": G.number_of_nodes(),
        "edges": G.number_of_edges(),

        # Density
        "density": nx.density(G),

        # Clustering coefficient
        "clustering": nx.average_clustering(G_undirected),

        # Average path length (if connected)
        "avg_path_length": nx.average_shortest_path_length(G_undirected)
                          if nx.is_connected(G_undirected) else None,

        # Diameter
        "diameter": nx.diameter(G_undirected)
                   if nx.is_connected(G_undirected) else None,

        # Centralization (degree)
        "degree_centralization": calculate_centralization(
            list(dict(G.degree()).values())
        ),

        # Reciprocity
        "reciprocity": nx.reciprocity(G),

        # Assortativity
        "assortativity": nx.degree_assortativity_coefficient(G)
    }

    return metrics

def calculate_centralization(values):
    """Calculate Freeman centralization index."""
    max_val = max(values)
    n = len(values)
    numerator = sum(max_val - v for v in values)
    denominator = (n - 1) * (n - 2)
    return numerator / denominator if denominator > 0 else 0

# Calculate metrics
team_metrics = calculate_team_network_metrics(G)

print("Team Network Metrics:")
for metric, value in team_metrics.items():
    if value is not None:
        print(f"  {metric}: {value:.3f}" if isinstance(value, float)
              else f"  {metric}: {value}")
# R: Calculate team network metrics
calculate_team_network_metrics <- function(g) {
  tibble(
    # Basic properties
    nodes = vcount(g),
    edges = ecount(g),

    # Density - proportion of possible edges that exist
    density = edge_density(g),

    # Clustering coefficient - transitivity
    clustering = transitivity(g, type = "global"),

    # Average path length
    avg_path_length = mean_distance(g, directed = FALSE),

    # Diameter - longest shortest path
    diameter = diameter(g, directed = FALSE),

    # Centralization - how concentrated is importance
    degree_centralization = centr_degree(g, mode = "all")$centralization,
    betweenness_centralization = centr_betw(g, directed = TRUE)$centralization,

    # Reciprocity - proportion of mutual connections
    reciprocity = reciprocity(g),

    # Assortativity - do similar connect to similar
    assortativity = assortativity_degree(g, directed = FALSE)
  )
}

# Calculate for our team
team_metrics <- calculate_team_network_metrics(g)

# Display
team_metrics %>%
  pivot_longer(everything(), names_to = "metric", values_to = "value") %>%
  mutate(value = round(value, 3)) %>%
  print(n = 15)
Output
Team Network Metrics:
  nodes: 11
  edges: 42
  density: 0.382
  clustering: 0.456
  avg_path_length: 1.764
  diameter: 3
  degree_centralization: 0.287
  reciprocity: 0.714
  assortativity: -0.156

Interpreting Network Metrics

High Density / Low Centralization

Style: Possession-based, tiki-taka

  • Ball circulates freely
  • No single dependency
  • Many passing triangles
  • Example: Guardiola's Barcelona
Low Density / High Centralization

Style: Direct, star-dependent

  • Play through key player
  • Fewer passing combinations
  • More predictable
  • Example: Counter-attacking teams

Community Detection

Community detection algorithms identify subgroups of players who pass more frequently among themselves. This reveals tactical structures like defensive units, midfield partnerships, and attacking combinations.

community_detection
# Python: Community detection in pass networks
import community as community_louvain
from networkx.algorithms import community as nx_community

# Convert to undirected for community detection
G_undirected = G.to_undirected()

# Louvain community detection
partition = community_louvain.best_partition(G_undirected)

# Add community membership to nodes
for node, comm in partition.items():
    G.nodes[node]["community"] = comm

# Analyze communities
community_df = pd.DataFrame([
    {"player": node, "community": data["community"],
     "degree": G.degree(node)}
    for node, data in G.nodes(data=True)
]).sort_values(["community", "degree"], ascending=[True, False])

# Summary by community
community_summary = community_df.groupby("community").agg(
    players=("player", "count"),
    members=("player", lambda x: ", ".join(x)),
    avg_degree=("degree", "mean")
).reset_index()

print("Community Structure:")
print(community_summary)

# Modularity score
modularity = community_louvain.modularity(partition, G_undirected)
print(f"\nModularity: {modularity:.3f}")

# Try Girvan-Newman algorithm
gn_communities = list(nx_community.girvan_newman(G_undirected))
print(f"\nGirvan-Newman found {len(gn_communities[0])} communities at first level")
# R: Community detection in pass networks
library(igraph)

# Louvain community detection (most common)
communities_louvain <- cluster_louvain(as.undirected(g))

# Add community membership to network
V(g)$community <- membership(communities_louvain)

# Analyze communities
community_df <- tibble(
  player = V(g)$name,
  community = V(g)$community,
  degree = degree(g, mode = "all")
) %>%
  arrange(community, desc(degree))

# Summary by community
community_summary <- community_df %>%
  group_by(community) %>%
  summarise(
    players = n(),
    members = paste(player, collapse = ", "),
    avg_degree = mean(degree)
  )

print(community_summary)

# Modularity score (quality of community structure)
cat("\nModularity:", modularity(communities_louvain), "\n")

# Try different algorithms
communities_walktrap <- cluster_walktrap(as.undirected(g))
communities_infomap <- cluster_infomap(g)

cat("\nAlgorithm comparison:\n")
cat("Louvain communities:", length(communities_louvain), "\n")
cat("Walktrap communities:", length(communities_walktrap), "\n")
cat("Infomap communities:", length(communities_infomap), "\n")
Output
Community Structure:
   community  players                          members  avg_degree
0          0        4       GK, CB1, CB2, LB          8.5
1          1        4       CM1, CM2, CDM             12.3
2          2        3       LW, RW, ST                9.7

Modularity: 0.384

Visualizing Communities

visualize_communities
# Python: Visualize communities on pitch
from mplsoccer import Pitch

pitch = Pitch(pitch_type="statsbomb", pitch_color="#1a472a",
              line_color="white")
fig, ax = pitch.draw(figsize=(12, 8))

# Community colors
colors = ["#e41a1c", "#377eb8", "#4daf4a", "#984ea3"]
pos = nx.get_node_attributes(G, "pos")

# Draw edges
for (u, v, d) in G.edges(data=True):
    if u in pos and v in pos:
        x1, y1 = pos[u]
        x2, y2 = pos[v]
        ax.plot([x1, x2], [y1, y2], "w-", alpha=0.3, linewidth=0.5)

# Draw nodes colored by community
for node in G.nodes():
    if node in pos:
        x, y = pos[node]
        comm = G.nodes[node].get("community", 0)
        ax.scatter(x, y, s=400, c=colors[comm % len(colors)],
                  edgecolors="white", linewidths=2, zorder=5)
        ax.annotate(node.split()[-1], (x, y + 3),
                   color="white", ha="center", fontsize=8)

# Add legend
for i in range(max(partition.values()) + 1):
    ax.scatter([], [], c=colors[i], s=100, label=f"Community {i+1}")
ax.legend(loc="upper left")

ax.set_title("Pass Network Communities", color="white", fontsize=14)
plt.tight_layout()
plt.show()
# R: Visualize communities on pitch
community_colors <- c("#e41a1c", "#377eb8", "#4daf4a", "#984ea3")

# Create visualization
ggplot() +
  annotate_pitch(colour = "white", fill = "#1a472a") +

  # Draw edges
  geom_segment(data = edge_data,
               aes(x = from_x, y = from_y,
                   xend = to_x, yend = to_y),
               alpha = 0.3, color = "white") +

  # Draw nodes colored by community
  geom_point(data = node_data %>%
               left_join(community_df, by = c("name" = "player")),
             aes(x = x, y = y, fill = factor(community)),
             size = 8, shape = 21, stroke = 2, color = "white") +

  scale_fill_manual(values = community_colors,
                    name = "Community") +

  coord_flip(xlim = c(0, 120), ylim = c(0, 80)) +
  theme_pitch() +
  labs(title = "Pass Network Communities",
       subtitle = "Colors indicate detected player groupings")

Weighted Network Analysis

Pass count alone doesn't tell the full story. Weighting edges by pass quality, progressiveness, or danger created provides deeper insights into network effectiveness.

weighted_network
# Python: Weighted Network with Pass Quality
import numpy as np
import pandas as pd
import networkx as nx

def create_weighted_network(events, team_name):
    """Create networks with different weighting schemes."""
    passes = events[
        (events["type"] == "Pass") &
        (events["team"] == team_name) &
        (events["pass_recipient"].notna())
    ].copy()

    # Parse locations
    passes["start_x"] = passes["location"].apply(lambda x: x[0] if x else 0)
    passes["end_x"] = passes["pass_end_location"].apply(lambda x: x[0] if x else 0)
    passes["end_y"] = passes["pass_end_location"].apply(lambda x: x[1] if x else 0)

    # Calculate quality metrics
    passes["progressive"] = (passes["end_x"] - passes["start_x"] > 10) & (passes["end_x"] > 40)
    passes["final_third"] = passes["end_x"] >= 80
    passes["box_entry"] = (passes["end_x"] >= 102) & (passes["end_y"] >= 18) & (passes["end_y"] <= 62)
    passes["successful"] = passes["pass_outcome"].isna() | (passes["pass_outcome"] == "Complete")

    # Quality score
    def calc_quality(row):
        if row["box_entry"] and row["successful"]:
            return 3
        if row["final_third"] and row["successful"]:
            return 2
        if row["progressive"] and row["successful"]:
            return 1.5
        if row["successful"]:
            return 1
        return 0

    passes["quality"] = passes.apply(calc_quality, axis=1)

    # Aggregate
    pass_matrix = passes.groupby(["player", "pass_recipient"]).agg(
        count=("player", "count"),
        total_quality=("quality", "sum"),
        avg_quality=("quality", "mean"),
        progressive_pct=("progressive", "mean"),
        final_third_pct=("final_third", "mean"),
        success_rate=("successful", "mean")
    ).reset_index()

    pass_matrix = pass_matrix[pass_matrix["count"] >= 3]

    # Create networks
    G_count = nx.DiGraph()
    G_quality = nx.DiGraph()

    for _, row in pass_matrix.iterrows():
        G_count.add_edge(row["player"], row["pass_recipient"], weight=row["count"])
        G_quality.add_edge(row["player"], row["pass_recipient"], weight=row["total_quality"])

    return {"count": G_count, "quality": G_quality, "data": pass_matrix}

# Create weighted networks
weighted = create_weighted_network(events, "Barcelona")

# Compare centrality
def weighted_degree(G):
    return {n: sum(d["weight"] for _, _, d in G.edges(n, data=True)) for n in G.nodes()}

strength_count = weighted_degree(weighted["count"])
strength_quality = weighted_degree(weighted["quality"])
pagerank_count = nx.pagerank(weighted["count"], weight="weight")
pagerank_quality = nx.pagerank(weighted["quality"], weight="weight")

comparison = pd.DataFrame({
    "player": list(strength_count.keys()),
    "strength_count": list(strength_count.values()),
    "strength_quality": list(strength_quality.values()),
    "pagerank_count": [pagerank_count.get(p, 0) for p in strength_count.keys()],
    "pagerank_quality": [pagerank_quality.get(p, 0) for p in strength_count.keys()]
})

comparison["quality_boost"] = (comparison["strength_quality"] / comparison["strength_count"]) - 1
comparison = comparison.sort_values("quality_boost", ascending=False)

print("Players with highest quality boost:")
print(comparison.head())
# R: Weighted Network with Pass Quality
library(tidyverse)
library(igraph)

create_weighted_network <- function(events, team_name) {
    # Extract passes with quality metrics
    passes <- events %>%
        filter(type.name == "Pass",
               team.name == team_name,
               !is.na(pass.recipient.name)) %>%
        mutate(
            # Calculate progressiveness
            start_x = location.x,
            end_x = pass.end_location.x,
            progressive = end_x - start_x > 10 & end_x > 40,

            # Calculate danger zone entry
            final_third = end_x >= 80,
            box_entry = end_x >= 102 & pass.end_location.y >= 18 &
                       pass.end_location.y <= 62,

            # Pass success
            successful = is.na(pass.outcome.name) | pass.outcome.name == "Complete",

            # Pass quality score
            quality = case_when(
                box_entry & successful ~ 3,
                final_third & successful ~ 2,
                progressive & successful ~ 1.5,
                successful ~ 1,
                TRUE ~ 0
            )
        )

    # Aggregate with different weighting schemes
    pass_matrix <- passes %>%
        group_by(passer = player.name, receiver = pass.recipient.name) %>%
        summarise(
            count = n(),
            total_quality = sum(quality),
            avg_quality = mean(quality),
            progressive_pct = mean(progressive),
            final_third_pct = mean(final_third),
            success_rate = mean(successful),
            .groups = "drop"
        ) %>%
        filter(count >= 3)

    # Create weighted networks
    networks <- list()

    # Count-weighted
    g_count <- graph_from_data_frame(pass_matrix, directed = TRUE)
    E(g_count)$weight <- pass_matrix$count
    networks$count <- g_count

    # Quality-weighted
    g_quality <- graph_from_data_frame(pass_matrix, directed = TRUE)
    E(g_quality)$weight <- pass_matrix$total_quality
    networks$quality <- g_quality

    return(list(networks = networks, data = pass_matrix))
}

# Create weighted networks
weighted <- create_weighted_network(events, "Barcelona")

# Compare centrality between count and quality weighting
centrality_comparison <- tibble(
    player = V(weighted$networks$count)$name,
    strength_count = strength(weighted$networks$count, mode = "all"),
    strength_quality = strength(weighted$networks$quality, mode = "all"),
    pagerank_count = page_rank(weighted$networks$count)$vector,
    pagerank_quality = page_rank(weighted$networks$quality)$vector
) %>%
    mutate(
        quality_boost = (strength_quality / strength_count) - 1,
        rank_change = rank(-pagerank_quality) - rank(-pagerank_count)
    ) %>%
    arrange(desc(quality_boost))

cat("Players with highest quality boost:\n")
print(head(centrality_comparison, 5))
Output
Players with highest quality boost:
              player  strength_count  strength_quality  quality_boost
0      Lionel Messi             156           312.5          1.004
1      Luis Suárez              98           189.2          0.931
2      Jordi Alba               87           154.8          0.779
3      Coutinho                 72           123.6          0.717
4      Ousmane Dembélé          45            76.5          0.700

Progressive Pass Networks

Focusing only on progressive passes reveals which players drive the team forward and who receives in dangerous positions.

progressive_network
# Python: Progressive Pass Network
def build_progressive_network(events, team_name):
    """Build network using only progressive passes."""
    passes = events[
        (events["type"] == "Pass") &
        (events["team"] == team_name) &
        (events["pass_recipient"].notna()) &
        (events["pass_outcome"].isna())  # Successful only
    ].copy()

    passes["start_x"] = passes["location"].apply(lambda x: x[0] if x else 0)
    passes["end_x"] = passes["pass_end_location"].apply(lambda x: x[0] if x else 0)

    # Define progressive
    passes["progressive"] = (
        ((passes["end_x"] - passes["start_x"]) >= 10) & (passes["end_x"] >= 40)
    ) | (passes["end_x"] >= 102)

    # Filter progressive only
    prog_passes = passes[passes["progressive"]].groupby(
        ["player", "pass_recipient"]
    ).size().reset_index(name="progressive_passes")

    # Build network
    G = nx.DiGraph()
    for _, row in prog_passes.iterrows():
        G.add_edge(row["player"], row["pass_recipient"],
                  weight=row["progressive_passes"])

    # Calculate role metrics
    in_strength = dict(G.in_degree(weight="weight"))
    out_strength = dict(G.out_degree(weight="weight"))

    player_roles = pd.DataFrame({
        "player": list(G.nodes()),
        "receives_progressive": [in_strength.get(n, 0) for n in G.nodes()],
        "makes_progressive": [out_strength.get(n, 0) for n in G.nodes()]
    })

    player_roles["progressive_balance"] = (
        player_roles["receives_progressive"] - player_roles["makes_progressive"]
    )

    def assign_role(balance):
        if balance > 3:
            return "Progressive Receiver"
        elif balance < -3:
            return "Progressive Passer"
        return "Balanced"

    player_roles["role"] = player_roles["progressive_balance"].apply(assign_role)

    return G, player_roles.sort_values("receives_progressive", ascending=False)

G_prog, roles = build_progressive_network(events, "Barcelona")
print(roles)
# R: Progressive Pass Network
build_progressive_network <- function(events, team_name) {
    progressive_passes <- events %>%
        filter(type.name == "Pass",
               team.name == team_name,
               !is.na(pass.recipient.name)) %>%
        mutate(
            start_x = location.x,
            end_x = pass.end_location.x,
            # Progressive: moves ball at least 10m towards goal in final 60%
            progressive = (end_x - start_x >= 10 & end_x >= 40) |
                         (end_x >= 102)  # Any pass into box is progressive
        ) %>%
        filter(progressive, is.na(pass.outcome.name)) %>%
        group_by(passer = player.name, receiver = pass.recipient.name) %>%
        summarise(progressive_passes = n(), .groups = "drop")

    g <- graph_from_data_frame(progressive_passes, directed = TRUE)
    E(g)$weight <- progressive_passes$progressive_passes

    # Key metrics for progressive network
    in_progressive <- strength(g, mode = "in")
    out_progressive <- strength(g, mode = "out")

    player_roles <- tibble(
        player = V(g)$name,
        receives_progressive = as.numeric(in_progressive),
        makes_progressive = as.numeric(out_progressive),
        progressive_balance = receives_progressive - makes_progressive
    ) %>%
        mutate(
            role = case_when(
                progressive_balance > 3 ~ "Progressive Receiver",
                progressive_balance < -3 ~ "Progressive Passer",
                TRUE ~ "Balanced"
            )
        ) %>%
        arrange(desc(receives_progressive))

    return(list(network = g, roles = player_roles))
}

prog_network <- build_progressive_network(events, "Barcelona")
print(prog_network$roles)
Output
              player  receives_progressive  makes_progressive  progressive_balance           role
0       Lionel Messi                      32                  18                   14  Progressive Receiver
1        Luis Suárez                      24                   8                   16  Progressive Receiver
2   Ousmane Dembélé                      18                   6                   12  Progressive Receiver
3        Jordi Alba                      14                  22                   -8    Progressive Passer
4    Sergio Busquets                       8                  28                  -20    Progressive Passer

Formation Detection from Networks

By analyzing player positions within pass networks, we can detect formations and understand how teams actually shape up during play.

formation_detection
# Python: Formation Detection
from sklearn.cluster import KMeans
import numpy as np

def detect_formation(events, team_name):
    """Detect formation from average positions."""
    # Get average positions
    player_data = events[
        (events["team"] == team_name) &
        (events["location"].notna())
    ].copy()

    player_data["x"] = player_data["location"].apply(lambda p: p[0] if p else 0)
    player_data["y"] = player_data["location"].apply(lambda p: p[1] if p else 0)

    positions = player_data.groupby("player").agg(
        avg_x=("x", "mean"),
        avg_y=("y", "mean"),
        touches=("player", "count")
    ).reset_index()

    positions = positions[positions["touches"] >= 20]

    # Exclude goalkeeper
    outfield = positions[positions["avg_x"] > 25].copy()

    # Find optimal number of lines using elbow method
    def find_best_lines(x_positions):
        wss = []
        for k in range(2, 6):
            km = KMeans(n_clusters=k, random_state=42, n_init=25)
            km.fit(x_positions.reshape(-1, 1))
            wss.append(km.inertia_)

        # Simple elbow detection
        diffs = np.diff(wss)
        best_k = np.argmin(diffs) + 2
        return max(3, min(4, best_k))  # Clamp to reasonable values

    n_lines = find_best_lines(outfield["avg_x"].values)

    # Cluster into lines
    km = KMeans(n_clusters=n_lines, random_state=42, n_init=25)
    outfield["line"] = km.fit_predict(outfield[["avg_x"]])

    # Reorder from back to front
    line_order = outfield.groupby("line")["avg_x"].mean().sort_values()
    line_mapping = {old: new for new, old in enumerate(line_order.index, 1)}
    outfield["line"] = outfield["line"].map(line_mapping)

    # Count per line
    formation = outfield.groupby("line").size().sort_index().tolist()
    formation_string = "-".join(map(str, formation))

    return {
        "positions": outfield.sort_values(["line", "avg_y"]),
        "formation": formation_string,
        "n_lines": n_lines
    }

result = detect_formation(events, "Barcelona")
print(f"Detected formation: {result[\"formation\"]}")
print(result["positions"])
# R: Formation Detection
library(cluster)

detect_formation <- function(events, team_name) {
    # Get average positions for all players
    player_positions <- events %>%
        filter(team.name == team_name,
               !is.na(location.x)) %>%
        group_by(player.name) %>%
        summarise(
            avg_x = mean(location.x),
            avg_y = mean(location.y),
            touches = n(),
            .groups = "drop"
        ) %>%
        filter(touches >= 20)  # Filter out substitutes with few touches

    # Identify lines using k-means on x-position
    # Try different numbers of lines
    find_best_lines <- function(positions) {
        # Exclude goalkeeper
        outfield <- positions %>% filter(avg_x > 25)

        wss <- sapply(2:5, function(k) {
            kmeans(outfield$avg_x, centers = k, nstart = 25)$tot.withinss
        })

        # Use elbow method or default to 3 lines
        best_k <- which.min(diff(wss)) + 2
        return(best_k)
    }

    n_lines <- find_best_lines(player_positions)
    outfield <- player_positions %>% filter(avg_x > 25)

    # Cluster into lines
    km <- kmeans(outfield$avg_x, centers = n_lines, nstart = 25)
    outfield$line <- km$cluster

    # Reorder lines from back to front
    line_order <- outfield %>%
        group_by(line) %>%
        summarise(avg_pos = mean(avg_x)) %>%
        arrange(avg_pos) %>%
        mutate(new_line = row_number())

    outfield <- outfield %>%
        left_join(line_order %>% select(line, new_line), by = "line") %>%
        mutate(line = new_line) %>%
        select(-new_line)

    # Count players per line
    formation <- outfield %>%
        count(line) %>%
        arrange(line) %>%
        pull(n)

    formation_string <- paste(formation, collapse = "-")

    return(list(
        positions = outfield,
        formation = formation_string,
        n_lines = n_lines
    ))
}

formation_result <- detect_formation(events, "Barcelona")
cat("Detected formation:", formation_result$formation, "\n")
print(formation_result$positions %>% arrange(line, avg_y))
Output
Detected formation: 4-3-3
              player  avg_x  avg_y  touches  line
0     Gerard Piqué   35.2   32.5      124     1
1    Samuel Umtiti   34.8   48.2      118     1
2        Jordi Alba   42.3   12.4      156     1
3    Sergi Roberto   41.8   68.2      142     1
4   Sergio Busquets   52.4   38.6      234     2
5       Ivan Rakitic   58.2   52.3      189     2
6        Arthur       55.8   28.4      167     2
7   Ousmane Dembélé   78.4   72.1       89     3
8      Lionel Messi   82.6   54.2      198     3
9        Luis Suárez   85.1   38.4      145     3

Case Study: El Clásico Network Analysis

Let's apply all our network analysis techniques to compare Barcelona and Real Madrid in a Clásico match.

clasico_case_study
# Python: Complete Network Comparison Case Study
import pandas as pd
import networkx as nx

def analyze_clasico_networks(events):
    """Complete network analysis comparison for El Clásico."""
    teams = ["Barcelona", "Real Madrid"]
    results = {}

    for team in teams:
        # Build pass network
        passes = events[
            (events["type"] == "Pass") &
            (events["team"] == team) &
            (events["pass_recipient"].notna())
        ].groupby(["player", "pass_recipient"]).agg(
            count=("player", "count")
        ).reset_index()

        passes = passes[passes["count"] >= 2]

        G = nx.DiGraph()
        for _, row in passes.iterrows():
            G.add_edge(row["player"], row["pass_recipient"], weight=row["count"])

        G_undirected = G.to_undirected()

        # Calculate metrics
        betweenness = nx.betweenness_centrality(G, normalized=True)
        pagerank = nx.pagerank(G, weight="weight")

        top_betweenness = max(betweenness, key=betweenness.get)
        top_pagerank = max(pagerank, key=pagerank.get)

        metrics = {
            "team": team,
            "density": nx.density(G),
            "clustering": nx.average_clustering(G_undirected),
            "centralization": calculate_centralization(list(dict(G.degree()).values())),
            "reciprocity": nx.reciprocity(G),
            "top_betweenness": top_betweenness,
            "top_pagerank": top_pagerank,
            "total_passes": sum(d["weight"] for _, _, d in G.edges(data=True))
        }

        centrality = pd.DataFrame({
            "player": list(G.nodes()),
            "team": team,
            "degree": [G.degree(n) for n in G.nodes()],
            "betweenness": [betweenness[n] for n in G.nodes()],
            "pagerank": [pagerank[n] for n in G.nodes()]
        })

        results[team] = {
            "network": G,
            "metrics": metrics,
            "centrality": centrality
        }

    # Combine and print
    print("\n=== EL CLÁSICO NETWORK COMPARISON ===\n")

    metrics_df = pd.DataFrame([r["metrics"] for r in results.values()])
    print("Team Metrics:")
    print(metrics_df[["team", "density", "clustering", "centralization", "total_passes"]])

    print("\nTop Playmakers (by Betweenness):")
    all_centrality = pd.concat([r["centrality"] for r in results.values()])
    top_players = all_centrality.groupby("team").apply(
        lambda x: x.nlargest(3, "betweenness")
    ).reset_index(drop=True)
    print(top_players[["team", "player", "betweenness", "pagerank"]])

    return results

clasico_analysis = analyze_clasico_networks(events)
# R: Complete Network Comparison Case Study
library(tidyverse)
library(igraph)

analyze_clasico_networks <- function(events) {
    teams <- c("Barcelona", "Real Madrid")

    results <- map(teams, function(team) {
        # Build pass network
        passes <- events %>%
            filter(type.name == "Pass",
                   team.name == team,
                   !is.na(pass.recipient.name)) %>%
            group_by(passer = player.name, receiver = pass.recipient.name) %>%
            summarise(
                count = n(),
                successful = sum(is.na(pass.outcome.name)),
                progressive = sum(pass.end_location.x - location.x > 10),
                .groups = "drop"
            ) %>%
            filter(count >= 2)

        g <- graph_from_data_frame(passes, directed = TRUE)
        E(g)$weight <- passes$count

        # Calculate metrics
        list(
            team = team,
            network = g,
            metrics = tibble(
                team = team,
                density = edge_density(g),
                clustering = transitivity(g, type = "global"),
                centralization = centr_degree(g)$centralization,
                reciprocity = reciprocity(g),
                top_betweenness = V(g)$name[which.max(betweenness(g))],
                top_pagerank = V(g)$name[which.max(page_rank(g)$vector)],
                total_passes = sum(E(g)$weight)
            ),
            centrality = tibble(
                player = V(g)$name,
                team = team,
                degree = degree(g, mode = "all"),
                betweenness = betweenness(g, normalized = TRUE),
                pagerank = page_rank(g)$vector
            )
        )
    })

    # Combine results
    metrics_comparison <- bind_rows(map(results, "metrics"))
    centrality_all <- bind_rows(map(results, "centrality"))

    # Print comparison
    cat("\n=== EL CLÁSICO NETWORK COMPARISON ===\n\n")

    cat("Team Metrics:\n")
    print(metrics_comparison %>%
          select(team, density, clustering, centralization, total_passes))

    cat("\nTop Playmakers (by Betweenness):\n")
    centrality_all %>%
        group_by(team) %>%
        slice_max(betweenness, n = 3) %>%
        select(team, player, betweenness, pagerank) %>%
        print()

    return(list(
        metrics = metrics_comparison,
        centrality = centrality_all,
        networks = map(results, "network")
    ))
}

clasico_analysis <- analyze_clasico_networks(events)
Output
=== EL CLÁSICO NETWORK COMPARISON ===

Team Metrics:
          team  density  clustering  centralization  total_passes
0    Barcelona    0.412       0.523           0.234           524
1  Real Madrid    0.356       0.467           0.298           478

Top Playmakers (by Betweenness):
          team            player  betweenness  pagerank
0    Barcelona   Sergio Busquets        0.284     0.128
1    Barcelona      Lionel Messi        0.198     0.142
2    Barcelona        Jordi Alba        0.156     0.098
3  Real Madrid     Toni Kroos           0.312     0.134
4  Real Madrid     Luka Modrić          0.256     0.121
5  Real Madrid     Casemiro            0.178     0.092
Barcelona Style
  • Higher density (0.412): More passing combinations
  • Higher clustering (0.523): More triangular play
  • Lower centralization: Less dependent on individuals
  • Key hub: Busquets orchestrates from deep
  • Style: Possession-based, patient buildup
Real Madrid Style
  • Lower density (0.356): Fewer combinations used
  • Higher centralization (0.298): Play through key players
  • Key hubs: Kroos and Modrić dominate passing
  • Less reciprocity: More direct, vertical play
  • Style: More direct transitions

Temporal Network Analysis

Football networks change throughout a match. Analyzing how networks evolve over time reveals tactical shifts, the impact of substitutions, and momentum changes.

temporal_network
# Python: Temporal network analysis
def analyze_network_by_period(events, team, periods=[15, 30, 45, 60, 75, 90]):
    """Analyze how network metrics change over time."""

    results = []

    for i, end_min in enumerate(periods):
        start_min = 0 if i == 0 else periods[i-1]

        # Filter passes for this period
        period_passes = events[
            (events["type"] == "Pass") &
            (events["team"] == team) &
            (events["minute"] >= start_min) &
            (events["minute"] < end_min) &
            (events["pass_recipient"].notna())
        ].groupby(["player", "pass_recipient"]).size().reset_index(name="passes")

        period_passes = period_passes[period_passes["passes"] >= 2]

        if len(period_passes) < 5:
            continue

        # Build network
        G = nx.DiGraph()
        for _, row in period_passes.iterrows():
            G.add_edge(row["player"], row["pass_recipient"],
                      weight=row["passes"])

        results.append({
            "period": f"{start_min}-{end_min}",
            "density": nx.density(G),
            "clustering": nx.average_clustering(G.to_undirected()),
            "centralization": calculate_centralization(list(dict(G.degree()).values())),
            "edges": G.number_of_edges(),
            "total_passes": sum(d["weight"] for _, _, d in G.edges(data=True))
        })

    return pd.DataFrame(results)

# Analyze evolution
network_evolution = analyze_network_by_period(events, "Barcelona")

# Plot
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(network_evolution["period"], network_evolution["density"],
        "o-", label="Density", color="#1B5E20", linewidth=2)
ax.plot(network_evolution["period"], network_evolution["centralization"],
        "s-", label="Centralization", color="#FF6B00", linewidth=2)
ax.set_xlabel("Period (minutes)")
ax.set_ylabel("Metric Value")
ax.set_title("Network Evolution Throughout Match")
ax.legend()
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
# R: Temporal network analysis
analyze_network_by_period <- function(events, team, periods = c(15, 30, 45, 60, 75, 90)) {

  results <- map_dfr(seq_along(periods), function(i) {
    start_min <- if (i == 1) 0 else periods[i-1]
    end_min <- periods[i]

    # Filter passes for this period
    period_passes <- events %>%
      filter(type.name == "Pass",
             team.name == team,
             minute >= start_min,
             minute < end_min,
             !is.na(pass.recipient.name)) %>%
      group_by(player.name, pass.recipient.name) %>%
      summarise(passes = n(), .groups = "drop") %>%
      filter(passes >= 2)

    if (nrow(period_passes) < 5) return(NULL)

    # Build network
    g <- graph_from_data_frame(period_passes, directed = TRUE)
    E(g)$weight <- period_passes$passes

    tibble(
      period = paste0(start_min, "-", end_min),
      density = edge_density(g),
      clustering = transitivity(g, type = "global"),
      centralization = centr_degree(g)$centralization,
      edges = ecount(g),
      total_passes = sum(E(g)$weight)
    )
  })

  results
}

# Analyze network evolution
network_evolution <- analyze_network_by_period(events, "Barcelona")

# Plot evolution
ggplot(network_evolution, aes(x = period)) +
  geom_line(aes(y = density, group = 1, color = "Density"), linewidth = 1.2) +
  geom_line(aes(y = centralization, group = 1, color = "Centralization"), linewidth = 1.2) +
  geom_point(aes(y = density, color = "Density"), size = 3) +
  geom_point(aes(y = centralization, color = "Centralization"), size = 3) +
  scale_color_manual(values = c("Density" = "#1B5E20", "Centralization" = "#FF6B00")) +
  labs(title = "Network Evolution Throughout Match",
       x = "Period (minutes)", y = "Metric Value",
       color = "Metric") +
  theme_minimal()

Comparing Team Networks

Network analysis enables objective comparison of team playing styles. We can create fingerprints based on network metrics to understand what makes each team unique.

compare_teams
# Python: Compare multiple teams
def compare_team_networks(events, teams):
    """Compare network metrics across multiple teams."""

    results = []

    for team_name in teams:
        # Build network
        team_passes = events[
            (events["type"] == "Pass") &
            (events["team"] == team_name) &
            (events["pass_recipient"].notna())
        ].groupby(["player", "pass_recipient"]).size().reset_index(name="passes")

        team_passes = team_passes[team_passes["passes"] >= 3]

        G = nx.DiGraph()
        for _, row in team_passes.iterrows():
            G.add_edge(row["player"], row["pass_recipient"],
                      weight=row["passes"])

        G_undirected = G.to_undirected()

        results.append({
            "team": team_name,
            "density": nx.density(G),
            "clustering": nx.average_clustering(G_undirected),
            "degree_centralization": calculate_centralization(
                list(dict(G.degree()).values())),
            "reciprocity": nx.reciprocity(G),
            "avg_path_length": nx.average_shortest_path_length(G_undirected)
                              if nx.is_connected(G_undirected) else np.nan,
            "total_passes": sum(d["weight"] for _, _, d in G.edges(data=True)),
            "unique_combinations": G.number_of_edges()
        })

    return pd.DataFrame(results)

# Compare teams
comparison = compare_team_networks(all_events,
    ["Barcelona", "Real Madrid", "Bayern Munich", "Liverpool"])

# Create heatmap comparison
from sklearn.preprocessing import MinMaxScaler

metrics = ["density", "clustering", "degree_centralization",
           "reciprocity", "unique_combinations"]
scaled_data = comparison[metrics].copy()
scaler = MinMaxScaler()
scaled_data[metrics] = scaler.fit_transform(scaled_data[metrics])
scaled_data["team"] = comparison["team"]

# Plot heatmap
fig, ax = plt.subplots(figsize=(10, 6))
im = ax.imshow(scaled_data[metrics].values, cmap="RdYlGn", aspect="auto")

ax.set_xticks(range(len(metrics)))
ax.set_xticklabels(metrics, rotation=45, ha="right")
ax.set_yticks(range(len(comparison)))
ax.set_yticklabels(comparison["team"])

plt.colorbar(im, label="Scaled Value")
ax.set_title("Team Network Comparison")
plt.tight_layout()
plt.show()
# R: Compare multiple teams
compare_team_networks <- function(events, teams) {

  map_dfr(teams, function(team_name) {
    # Build network
    team_passes <- events %>%
      filter(type.name == "Pass",
             team.name == team_name,
             !is.na(pass.recipient.name)) %>%
      group_by(player.name, pass.recipient.name) %>%
      summarise(passes = n(), .groups = "drop") %>%
      filter(passes >= 3)

    g <- graph_from_data_frame(team_passes, directed = TRUE)
    E(g)$weight <- team_passes$passes

    tibble(
      team = team_name,
      density = edge_density(g),
      clustering = transitivity(g, type = "global"),
      degree_centralization = centr_degree(g)$centralization,
      betweenness_centralization = centr_betw(g)$centralization,
      reciprocity = reciprocity(g),
      avg_path_length = mean_distance(g, directed = FALSE),
      total_passes = sum(E(g)$weight),
      unique_combinations = ecount(g)
    )
  })
}

# Compare teams in competition
team_comparison <- compare_team_networks(all_events,
                                         c("Barcelona", "Real Madrid",
                                           "Bayern Munich", "Liverpool"))

# Create comparison visualization
team_comparison %>%
  pivot_longer(-team, names_to = "metric", values_to = "value") %>%
  group_by(metric) %>%
  mutate(scaled = scales::rescale(value, to = c(0, 100))) %>%
  ggplot(aes(x = metric, y = scaled, fill = team)) +
  geom_col(position = "dodge") +
  coord_flip() +
  scale_fill_brewer(palette = "Set1") +
  labs(title = "Team Network Comparison",
       x = "Metric", y = "Scaled Value (0-100)") +
  theme_minimal()

Practice Exercises

Exercise 36.1: Build a Match Network

Using StatsBomb free data, build pass networks for both teams in a match. Compare their density and centralization metrics. Which team had a more hierarchical passing structure?

Exercise 36.2: Identify the Playmaker

Calculate betweenness centrality for all players in a team. Who has the highest betweenness? Does this match your intuition about who the team's playmaker is?

Exercise 36.3: First Half vs Second Half

Build separate networks for the first and second half of a match. How do the metrics change? Can you identify any tactical shifts from the network evolution?

Exercise 36.4: Community Analysis

Apply community detection to a team's pass network. Do the detected communities align with defensive/midfield/attacking units? Visualize the result on a pitch.

Exercise 36.5: Quality-Weighted Network

Build pass networks weighted by (1) pass count, (2) progressive passes, and (3) passes into the final third. Compare the centrality rankings under each weighting scheme. Which players rise or fall in importance?

Exercise 36.6: Formation Detection Validation

Use the formation detection algorithm on 5 different matches for the same team. Does the detected formation match the reported lineup? How consistent is the detected formation across matches?

Exercise 36.7: Network Disruption Analysis

Simulate removing key players from the network (as if they were sent off). How does the network structure change? Calculate the "importance" of each player by measuring how much network metrics deteriorate when they're removed.

Exercise 36.8: Passing Motifs

Identify common passing triangles (3-player subgraphs) in the network. Which triangular combinations are used most frequently? Are high-frequency triangles associated with better attacking outcomes?

Summary

Key Network Metrics Reference

Metric Level Interpretation Football Application
Degree Node Number of connections Passing options, involvement
Betweenness Node Bridge between groups Playmaker identification
PageRank Node Influence via connections Expected ball recipient
Closeness Node Avg distance to all nodes Ball circulation efficiency
Eigenvector Node Connected to important nodes Quality of passing partners
Density Network Proportion of possible edges Passing variety (high = tiki-taka)
Clustering Network Transitivity of connections Triangular play frequency
Centralization Network Concentration of importance Star player dependency
Reciprocity Network Mutual connections Two-way passing combinations
Modularity Network Quality of community structure Tactical unit cohesion

Key Libraries and Tools

R Libraries
  • igraph - Core network analysis
  • tidygraph - Tidy interface to igraph
  • ggraph - Grammar of graphics for networks
  • ggsoccer - Football pitch visualization
  • visNetwork - Interactive network visualization
  • sna - Social network analysis
Python Libraries
  • networkx - Core network analysis
  • python-louvain - Community detection
  • mplsoccer - Football pitch visualization
  • pyvis - Interactive networks
  • graph-tool - High-performance graphs
  • scikit-network - Network ML

Common Pitfalls to Avoid

  • Ignoring direction: Pass networks are directed - A passing to B ≠ B passing to A
  • Minimum threshold selection: Too low includes noise, too high misses connections
  • Sample size issues: Networks from short periods (e.g., 15 min) may be unreliable
  • Ignoring game state: Networks differ when winning vs losing
  • Substitution effects: Players with few minutes shouldn't be compared directly
  • Over-interpreting communities: Algorithms may find spurious groupings
  • Formation detection errors: Fluid formations may not cluster cleanly

Style Archetypes from Network Metrics

Possession-Based Style
  • High density (> 0.4)
  • High clustering (> 0.5)
  • Low centralization (< 0.25)
  • High reciprocity (> 0.7)
  • Many unique combinations
  • Example: Guardiola's Barcelona
Direct/Counter Style
  • Lower density (< 0.35)
  • Lower clustering (< 0.45)
  • Higher centralization (> 0.3)
  • Lower reciprocity
  • More vertical passing
  • Example: Mourinho's counter-attacking teams

Network analysis provides a mathematically rigorous framework for understanding team structure and player importance. In the next chapter, we'll explore computer vision applications in football analytics.