Chapter 60

Capstone - Complete Analytics System

Intermediate 30 min read 5 sections 10 code examples

0 of 60 chapters completed (0%)

Network Analysis in Football

Football is fundamentally a network game. Every pass creates a connection between players, every team forms a dynamic network of interactions. Network analysis provides powerful tools to understand team structure, identify key players, and analyze tactical patterns.

Learning Objectives

Understand graph theory fundamentals for football analysis
Build and visualize pass networks from event data
Calculate centrality metrics to identify key players
Analyze network density and clustering coefficients
Compare team playing styles through network metrics
Apply community detection to find player groupings

Graph Theory Fundamentals

A graph (or network) consists of nodes (vertices) and edges (connections). In football pass networks, players are nodes and passes are edges. Understanding basic graph concepts is essential for network analysis.

Graph Components

Nodes: Players on the pitch
Edges: Passes between players
Directed: Pass has sender/receiver
Weighted: Number of passes
Degree: Connections per node

Network Metrics

Density: Connectedness of network
Centrality: Node importance
Clustering: Triangular connections
Path Length: Steps between nodes
Communities: Subgroups in network

network_basics

# Python: Setting up network analysis
import pandas as pd
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt

# Create a simple pass network example
passes = pd.DataFrame({
    "passer": ["GK", "CB1", "CB1", "CB2", "CM1", "CM1", "CM2", "CM2", "LW", "RW", "ST"],
    "receiver": ["CB1", "CM1", "CB2", "CM2", "CM2", "LW", "CM1", "RW", "ST", "ST", "CM1"],
    "count": [15, 22, 18, 20, 28, 14, 25, 16, 8, 10, 5]
})

# Create directed graph
G = nx.DiGraph()

# Add weighted edges
for _, row in passes.iterrows():
    G.add_edge(row["passer"], row["receiver"], weight=row["count"])

# Basic network properties
print(f"Nodes: {G.number_of_nodes()}")
print(f"Edges: {G.number_of_edges()}")
print(f"Density: {nx.density(G):.4f}")
print(f"Is connected: {nx.is_weakly_connected(G)}")
# R: Setting up network analysis
library(tidyverse)
library(igraph)
library(tidygraph)
library(ggraph)

# Create a simple pass network example
passes <- tibble(
  passer = c("GK", "CB1", "CB1", "CB2", "CM1", "CM1", "CM2", "CM2", "LW", "RW", "ST"),
  receiver = c("CB1", "CM1", "CB2", "CM2", "CM2", "LW", "CM1", "RW", "ST", "ST", "CM1"),
  count = c(15, 22, 18, 20, 28, 14, 25, 16, 8, 10, 5)
)

# Create igraph object
g <- graph_from_data_frame(passes, directed = TRUE)

# Add edge weights
E(g)$weight <- passes$count

# Basic network properties
cat("Nodes:", vcount(g), "\n")
cat("Edges:", ecount(g), "\n")
cat("Density:", edge_density(g), "\n")
cat("Is connected:", is_connected(g, mode = "weak"), "\n")

Output

Nodes: 11
Edges: 11
Density: 0.1
Is connected: True

Building Pass Networks from Event Data

To analyze real football networks, we need to extract passing data from event datasets and construct meaningful network representations. This involves aggregating passes between player pairs and handling substitutions.

build_pass_network

# Python: Build pass network from StatsBomb data
from statsbombpy import sb
import networkx as nx

# Load match events
events = sb.events(match_id=3788741)

# Extract passes for one team
team_passes = events[
    (events["type"] == "Pass") &
    (events["team"] == "Barcelona") &
    (events["pass_recipient"].notna())
][["player", "pass_recipient", "location", "pass_end_location"]].copy()

# Parse locations
team_passes["start_x"] = team_passes["location"].apply(lambda x: x[0])
team_passes["start_y"] = team_passes["location"].apply(lambda x: x[1])
team_passes["end_x"] = team_passes["pass_end_location"].apply(lambda x: x[0])
team_passes["end_y"] = team_passes["pass_end_location"].apply(lambda x: x[1])

# Aggregate passes between player pairs
pass_matrix = team_passes.groupby(["player", "pass_recipient"]).agg(
    passes=("player", "count"),
    avg_length=("start_x", lambda x: np.sqrt(
        (team_passes.loc[x.index, "end_x"] - team_passes.loc[x.index, "start_x"])**2 +
        (team_passes.loc[x.index, "end_y"] - team_passes.loc[x.index, "start_y"])**2
    ).mean())
).reset_index()

# Filter minimum passes
pass_matrix = pass_matrix[pass_matrix["passes"] >= 3]

# Calculate average positions
player_positions = team_passes.groupby("player").agg(
    x=("start_x", "mean"),
    y=("start_y", "mean"),
    total_passes=("player", "count")
).reset_index()

# Create network
G = nx.DiGraph()

# Add nodes with positions
for _, row in player_positions.iterrows():
    G.add_node(row["player"], pos=(row["x"], row["y"]),
               passes=row["total_passes"])

# Add edges with weights
for _, row in pass_matrix.iterrows():
    G.add_edge(row["player"], row["pass_recipient"],
               weight=row["passes"])

print(f"Network: {G.number_of_nodes()} players, {G.number_of_edges()} connections")
# R: Build pass network from StatsBomb data
library(StatsBombR)

# Load match data
events <- StatsBombFreeEvents(MatchesDF = Matches) %>%
  filter(match_id == 3788741)  # Example match

# Extract passes for one team
team_passes <- events %>%
  filter(type.name == "Pass",
         team.name == "Barcelona",
         !is.na(pass.recipient.name)) %>%
  select(player.name, pass.recipient.name,
         location.x, location.y,
         pass.end_location.x, pass.end_location.y)

# Aggregate passes between player pairs
pass_matrix <- team_passes %>%
  group_by(passer = player.name, receiver = pass.recipient.name) %>%
  summarise(
    passes = n(),
    avg_length = mean(sqrt((pass.end_location.x - location.x)^2 +
                           (pass.end_location.y - location.y)^2)),
    .groups = "drop"
  ) %>%
  filter(passes >= 3)  # Minimum threshold

# Calculate average positions
player_positions <- team_passes %>%
  group_by(player = player.name) %>%
  summarise(
    x = mean(location.x),
    y = mean(location.y),
    total_passes = n()
  )

# Create network with positions
g <- graph_from_data_frame(pass_matrix, directed = TRUE,
                           vertices = player_positions)
E(g)$weight <- pass_matrix$passes

print(g)

Output

Network: 14 players, 42 connections

Visualizing Pass Networks

Effective visualization is crucial for communicating network insights. We use node size to represent involvement, edge thickness for pass frequency, and spatial positions to show team shape.

visualize_network

# Python: Visualize pass network on pitch
from mplsoccer import Pitch
import matplotlib.pyplot as plt
import numpy as np

# Create pitch
pitch = Pitch(pitch_type="statsbomb", pitch_color="#1a472a",
              line_color="white")
fig, ax = pitch.draw(figsize=(12, 8))

# Get positions
pos = nx.get_node_attributes(G, "pos")

# Calculate degree for node sizing
degrees = dict(G.degree())
max_degree = max(degrees.values())

# Draw edges
for (u, v, d) in G.edges(data=True):
    if u in pos and v in pos:
        x1, y1 = pos[u]
        x2, y2 = pos[v]

        # Line width based on passes
        width = d["weight"] / 10

        ax.annotate("", xy=(x2, y2), xytext=(x1, y1),
                   arrowprops=dict(arrowstyle="->",
                                  color="white",
                                  alpha=0.6,
                                  linewidth=width,
                                  connectionstyle="arc3,rad=0.1"))

# Draw nodes
for node in G.nodes():
    if node in pos:
        x, y = pos[node]
        size = 200 + (degrees[node] / max_degree) * 800
        ax.scatter(x, y, s=size, c="#a50044", edgecolors="white",
                  linewidths=2, zorder=5)
        ax.annotate(node.split()[-1], (x, y + 3),
                   color="white", ha="center", fontsize=8,
                   fontweight="bold")

ax.set_title("Barcelona Pass Network", color="white", fontsize=14)
plt.tight_layout()
plt.savefig("pass_network.png", dpi=150, facecolor="#1a472a")
plt.show()
# R: Visualize pass network on pitch
library(ggplot2)
library(ggsoccer)

# Convert to tidygraph for ggraph
tg <- as_tbl_graph(g) %>%
  activate(nodes) %>%
  mutate(
    degree = centrality_degree(mode = "all"),
    betweenness = centrality_betweenness()
  )

# Get node data for plotting
node_data <- tg %>%
  activate(nodes) %>%
  as_tibble()

# Get edge data
edge_data <- tg %>%
  activate(edges) %>%
  as_tibble() %>%
  left_join(node_data %>% select(name, x, y) %>% rename(from_x = x, from_y = y),
            by = c("from" = "name")) %>%
  left_join(node_data %>% select(name, x, y) %>% rename(to_x = x, to_y = y),
            by = c("to" = "name"))

# Create pitch visualization
ggplot() +
  annotate_pitch(colour = "white", fill = "#1a472a") +

  # Draw edges (passes)
  geom_segment(data = edge_data,
               aes(x = from_x, y = from_y,
                   xend = to_x, yend = to_y,
                   linewidth = weight),
               alpha = 0.6, color = "white",
               arrow = arrow(length = unit(0.15, "cm"))) +

  # Draw nodes (players)
  geom_point(data = node_data,
             aes(x = x, y = y, size = degree),
             color = "#a50044", fill = "#a50044",
             shape = 21, stroke = 2) +

  # Add labels
  geom_text(data = node_data,
            aes(x = x, y = y + 3, label = name),
            color = "white", size = 3, fontface = "bold") +

  scale_linewidth_continuous(range = c(0.5, 3)) +
  scale_size_continuous(range = c(4, 12)) +
  coord_flip(xlim = c(0, 120), ylim = c(0, 80)) +
  theme_pitch() +
  theme(legend.position = "none") +
  labs(title = "Barcelona Pass Network",
       subtitle = "Node size = degree centrality, Edge width = passes")

Centrality Metrics

Centrality metrics quantify the importance of nodes within a network. Different centrality measures capture different aspects of importance - who touches the ball most, who connects different parts of the team, and who controls the flow.

Metric	Measures	Football Interpretation
Degree	Number of connections	Passing options / involvement
Betweenness	Bridge between groups	Playmaker connecting lines
Closeness	Average distance to all nodes	Ball circulation efficiency
Eigenvector	Connected to important nodes	Quality of passing partners
PageRank	Influence via connections	Expected ball reception

centrality_metrics

# Python: Calculate all centrality metrics
import pandas as pd

# Calculate centrality measures
centrality = pd.DataFrame({
    "player": list(G.nodes()),

    # Degree centrality
    "degree_in": [G.in_degree(n) for n in G.nodes()],
    "degree_out": [G.out_degree(n) for n in G.nodes()],
    "degree_total": [G.degree(n) for n in G.nodes()],

    # Betweenness - who bridges groups
    "betweenness": list(nx.betweenness_centrality(G, normalized=True).values()),

    # Closeness - how quickly can reach others
    "closeness": list(nx.closeness_centrality(G).values()),

    # Eigenvector - connected to important players
    "eigenvector": list(nx.eigenvector_centrality(G.to_undirected(),
                                                   max_iter=1000).values()),

    # PageRank - influence measure
    "pagerank": list(nx.pagerank(G, alpha=0.85).values())
})

# Top players by different metrics
print("Top 5 by Betweenness (Playmakers):")
print(centrality.nlargest(5, "betweenness")[["player", "betweenness", "degree_total"]])

print("\nTop 5 by PageRank (Ball Magnets):")
print(centrality.nlargest(5, "pagerank")[["player", "pagerank", "degree_in"]])
# R: Calculate all centrality metrics
library(tidygraph)

# Calculate centrality measures
centrality_df <- tg %>%
  activate(nodes) %>%
  mutate(
    # Degree centrality
    degree_in = centrality_degree(mode = "in"),
    degree_out = centrality_degree(mode = "out"),
    degree_total = centrality_degree(mode = "all"),

    # Betweenness - who bridges groups
    betweenness = centrality_betweenness(directed = TRUE, normalized = TRUE),

    # Closeness - how quickly can reach others
    closeness = centrality_closeness(mode = "all"),

    # Eigenvector - connected to important players
    eigenvector = centrality_eigen(directed = FALSE),

    # PageRank - influence measure
    pagerank = centrality_pagerank(directed = TRUE)
  ) %>%
  as_tibble() %>%
  arrange(desc(betweenness))

# Display top players by different metrics
cat("Top 5 by Betweenness (Playmakers):\n")
centrality_df %>%
  select(name, betweenness, degree_total) %>%
  head(5) %>%
  print()

cat("\nTop 5 by PageRank (Ball Magnets):\n")
centrality_df %>%
  arrange(desc(pagerank)) %>%
  select(name, pagerank, degree_in) %>%
  head(5) %>%
  print()

Output

Top 5 by Betweenness (Playmakers):
           player  betweenness  degree_total
0    Sergio Busquets      0.284           18
1    Lionel Messi         0.198           15
2    Jordi Alba           0.156           12

Top 5 by PageRank (Ball Magnets):
           player  pagerank  degree_in
0    Lionel Messi    0.142         12
1    Sergio Busquets 0.128         14
2    Gerard Pique    0.098          8

Centrality Radar Charts

centrality_radar

# Python: Create centrality radar chart
import matplotlib.pyplot as plt
import numpy as np
from math import pi

# Prepare data
metrics = ["degree_total", "betweenness", "closeness", "eigenvector", "pagerank"]
top_players = centrality.nlargest(5, "betweenness")

# Normalize to 0-100
for col in metrics:
    top_players[col] = (top_players[col] - top_players[col].min()) / \
                       (top_players[col].max() - top_players[col].min()) * 100

# Create radar chart
fig, ax = plt.subplots(figsize=(10, 10), subplot_kw=dict(polar=True))

# Number of variables
N = len(metrics)
angles = [n / float(N) * 2 * pi for n in range(N)]
angles += angles[:1]

# Colors
colors = ["#a50044", "#004d98", "#edbb00", "#00a19c", "#8b0000"]

for idx, (_, row) in enumerate(top_players.iterrows()):
    values = row[metrics].tolist()
    values += values[:1]

    ax.plot(angles, values, "o-", linewidth=2,
            label=row["player"], color=colors[idx])
    ax.fill(angles, values, alpha=0.25, color=colors[idx])

# Add labels
ax.set_xticks(angles[:-1])
ax.set_xticklabels(["Degree", "Betweenness", "Closeness",
                   "Eigenvector", "PageRank"])

ax.legend(loc="upper right", bbox_to_anchor=(1.3, 1.0))
plt.title("Player Centrality Profiles", size=14, y=1.1)
plt.tight_layout()
plt.show()
# R: Create centrality radar chart
library(fmsb)

# Prepare data for radar chart
radar_data <- centrality_df %>%
  select(name, degree_total, betweenness, closeness, eigenvector, pagerank) %>%
  mutate(across(-name, ~ scales::rescale(., to = c(0, 100)))) %>%
  head(5)

# Format for fmsb
radar_matrix <- radar_data %>%
  column_to_rownames("name") %>%
  as.data.frame()

# Add max and min rows
radar_matrix <- rbind(
  rep(100, 5),  # Max
  rep(0, 5),    # Min
  radar_matrix
)

# Create radar chart
colors <- c("#a50044", "#004d98", "#edbb00", "#00a19c", "#8b0000")

radarchart(radar_matrix,
           axistype = 1,
           pcol = colors,
           pfcol = scales::alpha(colors, 0.3),
           plwd = 2,
           cglcol = "grey",
           cglty = 1,
           axislabcol = "grey",
           vlcex = 0.8,
           title = "Player Centrality Profiles")

Team-Level Network Metrics

Beyond individual player metrics, we can characterize entire team networks. These metrics reveal playing style - is the team hierarchical or egalitarian? How connected are the players?

team_network_metrics

# Python: Calculate team network metrics
def calculate_team_network_metrics(G):
    """Calculate comprehensive network metrics for a team."""

    # Convert to undirected for some metrics
    G_undirected = G.to_undirected()

    metrics = {
        # Basic properties
        "nodes": G.number_of_nodes(),
        "edges": G.number_of_edges(),

        # Density
        "density": nx.density(G),

        # Clustering coefficient
        "clustering": nx.average_clustering(G_undirected),

        # Average path length (if connected)
        "avg_path_length": nx.average_shortest_path_length(G_undirected)
                          if nx.is_connected(G_undirected) else None,

        # Diameter
        "diameter": nx.diameter(G_undirected)
                   if nx.is_connected(G_undirected) else None,

        # Centralization (degree)
        "degree_centralization": calculate_centralization(
            list(dict(G.degree()).values())
        ),

        # Reciprocity
        "reciprocity": nx.reciprocity(G),

        # Assortativity
        "assortativity": nx.degree_assortativity_coefficient(G)
    }

    return metrics

def calculate_centralization(values):
    """Calculate Freeman centralization index."""
    max_val = max(values)
    n = len(values)
    numerator = sum(max_val - v for v in values)
    denominator = (n - 1) * (n - 2)
    return numerator / denominator if denominator > 0 else 0

# Calculate metrics
team_metrics = calculate_team_network_metrics(G)

print("Team Network Metrics:")
for metric, value in team_metrics.items():
    if value is not None:
        print(f"  {metric}: {value:.3f}" if isinstance(value, float)
              else f"  {metric}: {value}")
# R: Calculate team network metrics
calculate_team_network_metrics <- function(g) {
  tibble(
    # Basic properties
    nodes = vcount(g),
    edges = ecount(g),

    # Density - proportion of possible edges that exist
    density = edge_density(g),

    # Clustering coefficient - transitivity
    clustering = transitivity(g, type = "global"),

    # Average path length
    avg_path_length = mean_distance(g, directed = FALSE),

    # Diameter - longest shortest path
    diameter = diameter(g, directed = FALSE),

    # Centralization - how concentrated is importance
    degree_centralization = centr_degree(g, mode = "all")$centralization,
    betweenness_centralization = centr_betw(g, directed = TRUE)$centralization,

    # Reciprocity - proportion of mutual connections
    reciprocity = reciprocity(g),

    # Assortativity - do similar connect to similar
    assortativity = assortativity_degree(g, directed = FALSE)
  )
}

# Calculate for our team
team_metrics <- calculate_team_network_metrics(g)

# Display
team_metrics %>%
  pivot_longer(everything(), names_to = "metric", values_to = "value") %>%
  mutate(value = round(value, 3)) %>%
  print(n = 15)

Output

Team Network Metrics:
  nodes: 11
  edges: 42
  density: 0.382
  clustering: 0.456
  avg_path_length: 1.764
  diameter: 3
  degree_centralization: 0.287
  reciprocity: 0.714
  assortativity: -0.156

Interpreting Network Metrics

High Density / Low Centralization

Style: Possession-based, tiki-taka

Ball circulates freely
No single dependency
Many passing triangles
Example: Guardiola's Barcelona

Low Density / High Centralization

Style: Direct, star-dependent

Play through key player
Fewer passing combinations
More predictable
Example: Counter-attacking teams

Community Detection

Community detection algorithms identify subgroups of players who pass more frequently among themselves. This reveals tactical structures like defensive units, midfield partnerships, and attacking combinations.

community_detection

# Python: Community detection in pass networks
import community as community_louvain
from networkx.algorithms import community as nx_community

# Convert to undirected for community detection
G_undirected = G.to_undirected()

# Louvain community detection
partition = community_louvain.best_partition(G_undirected)

# Add community membership to nodes
for node, comm in partition.items():
    G.nodes[node]["community"] = comm

# Analyze communities
community_df = pd.DataFrame([
    {"player": node, "community": data["community"],
     "degree": G.degree(node)}
    for node, data in G.nodes(data=True)
]).sort_values(["community", "degree"], ascending=[True, False])

# Summary by community
community_summary = community_df.groupby("community").agg(
    players=("player", "count"),
    members=("player", lambda x: ", ".join(x)),
    avg_degree=("degree", "mean")
).reset_index()

print("Community Structure:")
print(community_summary)

# Modularity score
modularity = community_louvain.modularity(partition, G_undirected)
print(f"\nModularity: {modularity:.3f}")

# Try Girvan-Newman algorithm
gn_communities = list(nx_community.girvan_newman(G_undirected))
print(f"\nGirvan-Newman found {len(gn_communities[0])} communities at first level")
# R: Community detection in pass networks
library(igraph)

# Louvain community detection (most common)
communities_louvain <- cluster_louvain(as.undirected(g))

# Add community membership to network
V(g)$community <- membership(communities_louvain)

# Analyze communities
community_df <- tibble(
  player = V(g)$name,
  community = V(g)$community,
  degree = degree(g, mode = "all")
) %>%
  arrange(community, desc(degree))

# Summary by community
community_summary <- community_df %>%
  group_by(community) %>%
  summarise(
    players = n(),
    members = paste(player, collapse = ", "),
    avg_degree = mean(degree)
  )

print(community_summary)

# Modularity score (quality of community structure)
cat("\nModularity:", modularity(communities_louvain), "\n")

# Try different algorithms
communities_walktrap <- cluster_walktrap(as.undirected(g))
communities_infomap <- cluster_infomap(g)

cat("\nAlgorithm comparison:\n")
cat("Louvain communities:", length(communities_louvain), "\n")
cat("Walktrap communities:", length(communities_walktrap), "\n")
cat("Infomap communities:", length(communities_infomap), "\n")

Output

Community Structure:
   community  players                          members  avg_degree
0          0        4       GK, CB1, CB2, LB          8.5
1          1        4       CM1, CM2, CDM             12.3
2          2        3       LW, RW, ST                9.7

Modularity: 0.384

Visualizing Communities

visualize_communities

# Python: Visualize communities on pitch
from mplsoccer import Pitch

pitch = Pitch(pitch_type="statsbomb", pitch_color="#1a472a",
              line_color="white")
fig, ax = pitch.draw(figsize=(12, 8))

# Community colors
colors = ["#e41a1c", "#377eb8", "#4daf4a", "#984ea3"]
pos = nx.get_node_attributes(G, "pos")

# Draw edges
for (u, v, d) in G.edges(data=True):
    if u in pos and v in pos:
        x1, y1 = pos[u]
        x2, y2 = pos[v]
        ax.plot([x1, x2], [y1, y2], "w-", alpha=0.3, linewidth=0.5)

# Draw nodes colored by community
for node in G.nodes():
    if node in pos:
        x, y = pos[node]
        comm = G.nodes[node].get("community", 0)
        ax.scatter(x, y, s=400, c=colors[comm % len(colors)],
                  edgecolors="white", linewidths=2, zorder=5)
        ax.annotate(node.split()[-1], (x, y + 3),
                   color="white", ha="center", fontsize=8)

# Add legend
for i in range(max(partition.values()) + 1):
    ax.scatter([], [], c=colors[i], s=100, label=f"Community {i+1}")
ax.legend(loc="upper left")

ax.set_title("Pass Network Communities", color="white", fontsize=14)
plt.tight_layout()
plt.show()
# R: Visualize communities on pitch
community_colors <- c("#e41a1c", "#377eb8", "#4daf4a", "#984ea3")

# Create visualization
ggplot() +
  annotate_pitch(colour = "white", fill = "#1a472a") +

  # Draw edges
  geom_segment(data = edge_data,
               aes(x = from_x, y = from_y,
                   xend = to_x, yend = to_y),
               alpha = 0.3, color = "white") +

  # Draw nodes colored by community
  geom_point(data = node_data %>%
               left_join(community_df, by = c("name" = "player")),
             aes(x = x, y = y, fill = factor(community)),
             size = 8, shape = 21, stroke = 2, color = "white") +

  scale_fill_manual(values = community_colors,
                    name = "Community") +

  coord_flip(xlim = c(0, 120), ylim = c(0, 80)) +
  theme_pitch() +
  labs(title = "Pass Network Communities",
       subtitle = "Colors indicate detected player groupings")

Weighted Network Analysis

Pass count alone doesn't tell the full story. Weighting edges by pass quality, progressiveness, or danger created provides deeper insights into network effectiveness.

weighted_network

# Python: Weighted Network with Pass Quality
import numpy as np
import pandas as pd
import networkx as nx

def create_weighted_network(events, team_name):
    """Create networks with different weighting schemes."""
    passes = events[
        (events["type"] == "Pass") &
        (events["team"] == team_name) &
        (events["pass_recipient"].notna())
    ].copy()

    # Parse locations
    passes["start_x"] = passes["location"].apply(lambda x: x[0] if x else 0)
    passes["end_x"] = passes["pass_end_location"].apply(lambda x: x[0] if x else 0)
    passes["end_y"] = passes["pass_end_location"].apply(lambda x: x[1] if x else 0)

    # Calculate quality metrics
    passes["progressive"] = (passes["end_x"] - passes["start_x"] > 10) & (passes["end_x"] > 40)
    passes["final_third"] = passes["end_x"] >= 80
    passes["box_entry"] = (passes["end_x"] >= 102) & (passes["end_y"] >= 18) & (passes["end_y"] <= 62)
    passes["successful"] = passes["pass_outcome"].isna() | (passes["pass_outcome"] == "Complete")

    # Quality score
    def calc_quality(row):
        if row["box_entry"] and row["successful"]:
            return 3
        if row["final_third"] and row["successful"]:
            return 2
        if row["progressive"] and row["successful"]:
            return 1.5
        if row["successful"]:
            return 1
        return 0

    passes["quality"] = passes.apply(calc_quality, axis=1)

    # Aggregate
    pass_matrix = passes.groupby(["player", "pass_recipient"]).agg(
        count=("player", "count"),
        total_quality=("quality", "sum"),
        avg_quality=("quality", "mean"),
        progressive_pct=("progressive", "mean"),
        final_third_pct=("final_third", "mean"),
        success_rate=("successful", "mean")
    ).reset_index()

    pass_matrix = pass_matrix[pass_matrix["count"] >= 3]

    # Create networks
    G_count = nx.DiGraph()
    G_quality = nx.DiGraph()

    for _, row in pass_matrix.iterrows():
        G_count.add_edge(row["player"], row["pass_recipient"], weight=row["count"])
        G_quality.add_edge(row["player"], row["pass_recipient"], weight=row["total_quality"])

    return {"count": G_count, "quality": G_quality, "data": pass_matrix}

# Create weighted networks
weighted = create_weighted_network(events, "Barcelona")

# Compare centrality
def weighted_degree(G):
    return {n: sum(d["weight"] for _, _, d in G.edges(n, data=True)) for n in G.nodes()}

strength_count = weighted_degree(weighted["count"])
strength_quality = weighted_degree(weighted["quality"])
pagerank_count = nx.pagerank(weighted["count"], weight="weight")
pagerank_quality = nx.pagerank(weighted["quality"], weight="weight")

comparison = pd.DataFrame({
    "player": list(strength_count.keys()),
    "strength_count": list(strength_count.values()),
    "strength_quality": list(strength_quality.values()),
    "pagerank_count": [pagerank_count.get(p, 0) for p in strength_count.keys()],
    "pagerank_quality": [pagerank_quality.get(p, 0) for p in strength_count.keys()]
})

comparison["quality_boost"] = (comparison["strength_quality"] / comparison["strength_count"]) - 1
comparison = comparison.sort_values("quality_boost", ascending=False)

print("Players with highest quality boost:")
print(comparison.head())
# R: Weighted Network with Pass Quality
library(tidyverse)
library(igraph)

create_weighted_network <- function(events, team_name) {
    # Extract passes with quality metrics
    passes <- events %>%
        filter(type.name == "Pass",
               team.name == team_name,
               !is.na(pass.recipient.name)) %>%
        mutate(
            # Calculate progressiveness
            start_x = location.x,
            end_x = pass.end_location.x,
            progressive = end_x - start_x > 10 & end_x > 40,

            # Calculate danger zone entry
            final_third = end_x >= 80,
            box_entry = end_x >= 102 & pass.end_location.y >= 18 &
                       pass.end_location.y <= 62,

            # Pass success
            successful = is.na(pass.outcome.name) | pass.outcome.name == "Complete",

            # Pass quality score
            quality = case_when(
                box_entry & successful ~ 3,
                final_third & successful ~ 2,
                progressive & successful ~ 1.5,
                successful ~ 1,
                TRUE ~ 0
            )
        )

    # Aggregate with different weighting schemes
    pass_matrix <- passes %>%
        group_by(passer = player.name, receiver = pass.recipient.name) %>%
        summarise(
            count = n(),
            total_quality = sum(quality),
            avg_quality = mean(quality),
            progressive_pct = mean(progressive),
            final_third_pct = mean(final_third),
            success_rate = mean(successful),
            .groups = "drop"
        ) %>%
        filter(count >= 3)

    # Create weighted networks
    networks <- list()

    # Count-weighted
    g_count <- graph_from_data_frame(pass_matrix, directed = TRUE)
    E(g_count)$weight <- pass_matrix$count
    networks$count <- g_count

    # Quality-weighted
    g_quality <- graph_from_data_frame(pass_matrix, directed = TRUE)
    E(g_quality)$weight <- pass_matrix$total_quality
    networks$quality <- g_quality

    return(list(networks = networks, data = pass_matrix))
}

# Create weighted networks
weighted <- create_weighted_network(events, "Barcelona")

# Compare centrality between count and quality weighting
centrality_comparison <- tibble(
    player = V(weighted$networks$count)$name,
    strength_count = strength(weighted$networks$count, mode = "all"),
    strength_quality = strength(weighted$networks$quality, mode = "all"),
    pagerank_count = page_rank(weighted$networks$count)$vector,
    pagerank_quality = page_rank(weighted$networks$quality)$vector
) %>%
    mutate(
        quality_boost = (strength_quality / strength_count) - 1,
        rank_change = rank(-pagerank_quality) - rank(-pagerank_count)
    ) %>%
    arrange(desc(quality_boost))

cat("Players with highest quality boost:\n")
print(head(centrality_comparison, 5))

Output

Players with highest quality boost:
              player  strength_count  strength_quality  quality_boost
0      Lionel Messi             156           312.5          1.004
1      Luis Suárez              98           189.2          0.931
2      Jordi Alba               87           154.8          0.779
3      Coutinho                 72           123.6          0.717
4      Ousmane Dembélé          45            76.5          0.700

Progressive Pass Networks

Focusing only on progressive passes reveals which players drive the team forward and who receives in dangerous positions.

progressive_network

# Python: Progressive Pass Network
def build_progressive_network(events, team_name):
    """Build network using only progressive passes."""
    passes = events[
        (events["type"] == "Pass") &
        (events["team"] == team_name) &
        (events["pass_recipient"].notna()) &
        (events["pass_outcome"].isna())  # Successful only
    ].copy()

    passes["start_x"] = passes["location"].apply(lambda x: x[0] if x else 0)
    passes["end_x"] = passes["pass_end_location"].apply(lambda x: x[0] if x else 0)

    # Define progressive
    passes["progressive"] = (
        ((passes["end_x"] - passes["start_x"]) >= 10) & (passes["end_x"] >= 40)
    ) | (passes["end_x"] >= 102)

    # Filter progressive only
    prog_passes = passes[passes["progressive"]].groupby(
        ["player", "pass_recipient"]
    ).size().reset_index(name="progressive_passes")

    # Build network
    G = nx.DiGraph()
    for _, row in prog_passes.iterrows():
        G.add_edge(row["player"], row["pass_recipient"],
                  weight=row["progressive_passes"])

    # Calculate role metrics
    in_strength = dict(G.in_degree(weight="weight"))
    out_strength = dict(G.out_degree(weight="weight"))

    player_roles = pd.DataFrame({
        "player": list(G.nodes()),
        "receives_progressive": [in_strength.get(n, 0) for n in G.nodes()],
        "makes_progressive": [out_strength.get(n, 0) for n in G.nodes()]
    })

    player_roles["progressive_balance"] = (
        player_roles["receives_progressive"] - player_roles["makes_progressive"]
    )

    def assign_role(balance):
        if balance > 3:
            return "Progressive Receiver"
        elif balance < -3:
            return "Progressive Passer"
        return "Balanced"

    player_roles["role"] = player_roles["progressive_balance"].apply(assign_role)

    return G, player_roles.sort_values("receives_progressive", ascending=False)

G_prog, roles = build_progressive_network(events, "Barcelona")
print(roles)
# R: Progressive Pass Network
build_progressive_network <- function(events, team_name) {
    progressive_passes <- events %>%
        filter(type.name == "Pass",
               team.name == team_name,
               !is.na(pass.recipient.name)) %>%
        mutate(
            start_x = location.x,
            end_x = pass.end_location.x,
            # Progressive: moves ball at least 10m towards goal in final 60%
            progressive = (end_x - start_x >= 10 & end_x >= 40) |
                         (end_x >= 102)  # Any pass into box is progressive
        ) %>%
        filter(progressive, is.na(pass.outcome.name)) %>%
        group_by(passer = player.name, receiver = pass.recipient.name) %>%
        summarise(progressive_passes = n(), .groups = "drop")

    g <- graph_from_data_frame(progressive_passes, directed = TRUE)
    E(g)$weight <- progressive_passes$progressive_passes

    # Key metrics for progressive network
    in_progressive <- strength(g, mode = "in")
    out_progressive <- strength(g, mode = "out")

    player_roles <- tibble(
        player = V(g)$name,
        receives_progressive = as.numeric(in_progressive),
        makes_progressive = as.numeric(out_progressive),
        progressive_balance = receives_progressive - makes_progressive
    ) %>%
        mutate(
            role = case_when(
                progressive_balance > 3 ~ "Progressive Receiver",
                progressive_balance < -3 ~ "Progressive Passer",
                TRUE ~ "Balanced"
            )
        ) %>%
        arrange(desc(receives_progressive))

    return(list(network = g, roles = player_roles))
}

prog_network <- build_progressive_network(events, "Barcelona")
print(prog_network$roles)

Output

              player  receives_progressive  makes_progressive  progressive_balance           role
0       Lionel Messi                      32                  18                   14  Progressive Receiver
1        Luis Suárez                      24                   8                   16  Progressive Receiver
2   Ousmane Dembélé                      18                   6                   12  Progressive Receiver
3        Jordi Alba                      14                  22                   -8    Progressive Passer
4    Sergio Busquets                       8                  28                  -20    Progressive Passer

Formation Detection from Networks

By analyzing player positions within pass networks, we can detect formations and understand how teams actually shape up during play.

formation_detection

# Python: Formation Detection
from sklearn.cluster import KMeans
import numpy as np

def detect_formation(events, team_name):
    """Detect formation from average positions."""
    # Get average positions
    player_data = events[
        (events["team"] == team_name) &
        (events["location"].notna())
    ].copy()

    player_data["x"] = player_data["location"].apply(lambda p: p[0] if p else 0)
    player_data["y"] = player_data["location"].apply(lambda p: p[1] if p else 0)

    positions = player_data.groupby("player").agg(
        avg_x=("x", "mean"),
        avg_y=("y", "mean"),
        touches=("player", "count")
    ).reset_index()

    positions = positions[positions["touches"] >= 20]

    # Exclude goalkeeper
    outfield = positions[positions["avg_x"] > 25].copy()

    # Find optimal number of lines using elbow method
    def find_best_lines(x_positions):
        wss = []
        for k in range(2, 6):
            km = KMeans(n_clusters=k, random_state=42, n_init=25)
            km.fit(x_positions.reshape(-1, 1))
            wss.append(km.inertia_)

        # Simple elbow detection
        diffs = np.diff(wss)
        best_k = np.argmin(diffs) + 2
        return max(3, min(4, best_k))  # Clamp to reasonable values

    n_lines = find_best_lines(outfield["avg_x"].values)

    # Cluster into lines
    km = KMeans(n_clusters=n_lines, random_state=42, n_init=25)
    outfield["line"] = km.fit_predict(outfield[["avg_x"]])

    # Reorder from back to front
    line_order = outfield.groupby("line")["avg_x"].mean().sort_values()
    line_mapping = {old: new for new, old in enumerate(line_order.index, 1)}
    outfield["line"] = outfield["line"].map(line_mapping)

    # Count per line
    formation = outfield.groupby("line").size().sort_index().tolist()
    formation_string = "-".join(map(str, formation))

    return {
        "positions": outfield.sort_values(["line", "avg_y"]),
        "formation": formation_string,
        "n_lines": n_lines
    }

result = detect_formation(events, "Barcelona")
print(f"Detected formation: {result[\"formation\"]}")
print(result["positions"])
# R: Formation Detection
library(cluster)

detect_formation <- function(events, team_name) {
    # Get average positions for all players
    player_positions <- events %>%
        filter(team.name == team_name,
               !is.na(location.x)) %>%
        group_by(player.name) %>%
        summarise(
            avg_x = mean(location.x),
            avg_y = mean(location.y),
            touches = n(),
            .groups = "drop"
        ) %>%
        filter(touches >= 20)  # Filter out substitutes with few touches

    # Identify lines using k-means on x-position
    # Try different numbers of lines
    find_best_lines <- function(positions) {
        # Exclude goalkeeper
        outfield <- positions %>% filter(avg_x > 25)

        wss <- sapply(2:5, function(k) {
            kmeans(outfield$avg_x, centers = k, nstart = 25)$tot.withinss
        })

        # Use elbow method or default to 3 lines
        best_k <- which.min(diff(wss)) + 2
        return(best_k)
    }

    n_lines <- find_best_lines(player_positions)
    outfield <- player_positions %>% filter(avg_x > 25)

    # Cluster into lines
    km <- kmeans(outfield$avg_x, centers = n_lines, nstart = 25)
    outfield$line <- km$cluster

    # Reorder lines from back to front
    line_order <- outfield %>%
        group_by(line) %>%
        summarise(avg_pos = mean(avg_x)) %>%
        arrange(avg_pos) %>%
        mutate(new_line = row_number())

    outfield <- outfield %>%
        left_join(line_order %>% select(line, new_line), by = "line") %>%
        mutate(line = new_line) %>%
        select(-new_line)

    # Count players per line
    formation <- outfield %>%
        count(line) %>%
        arrange(line) %>%
        pull(n)

    formation_string <- paste(formation, collapse = "-")

    return(list(
        positions = outfield,
        formation = formation_string,
        n_lines = n_lines
    ))
}

formation_result <- detect_formation(events, "Barcelona")
cat("Detected formation:", formation_result$formation, "\n")
print(formation_result$positions %>% arrange(line, avg_y))

Output

Detected formation: 4-3-3
              player  avg_x  avg_y  touches  line
0     Gerard Piqué   35.2   32.5      124     1
1    Samuel Umtiti   34.8   48.2      118     1
2        Jordi Alba   42.3   12.4      156     1
3    Sergi Roberto   41.8   68.2      142     1
4   Sergio Busquets   52.4   38.6      234     2
5       Ivan Rakitic   58.2   52.3      189     2
6        Arthur       55.8   28.4      167     2
7   Ousmane Dembélé   78.4   72.1       89     3
8      Lionel Messi   82.6   54.2      198     3
9        Luis Suárez   85.1   38.4      145     3

Case Study: El Clásico Network Analysis

Let's apply all our network analysis techniques to compare Barcelona and Real Madrid in a Clásico match.

clasico_case_study

# Python: Complete Network Comparison Case Study
import pandas as pd
import networkx as nx

def analyze_clasico_networks(events):
    """Complete network analysis comparison for El Clásico."""
    teams = ["Barcelona", "Real Madrid"]
    results = {}

    for team in teams:
        # Build pass network
        passes = events[
            (events["type"] == "Pass") &
            (events["team"] == team) &
            (events["pass_recipient"].notna())
        ].groupby(["player", "pass_recipient"]).agg(
            count=("player", "count")
        ).reset_index()

        passes = passes[passes["count"] >= 2]

        G = nx.DiGraph()
        for _, row in passes.iterrows():
            G.add_edge(row["player"], row["pass_recipient"], weight=row["count"])

        G_undirected = G.to_undirected()

        # Calculate metrics
        betweenness = nx.betweenness_centrality(G, normalized=True)
        pagerank = nx.pagerank(G, weight="weight")

        top_betweenness = max(betweenness, key=betweenness.get)
        top_pagerank = max(pagerank, key=pagerank.get)

        metrics = {
            "team": team,
            "density": nx.density(G),
            "clustering": nx.average_clustering(G_undirected),
            "centralization": calculate_centralization(list(dict(G.degree()).values())),
            "reciprocity": nx.reciprocity(G),
            "top_betweenness": top_betweenness,
            "top_pagerank": top_pagerank,
            "total_passes": sum(d["weight"] for _, _, d in G.edges(data=True))
        }

        centrality = pd.DataFrame({
            "player": list(G.nodes()),
            "team": team,
            "degree": [G.degree(n) for n in G.nodes()],
            "betweenness": [betweenness[n] for n in G.nodes()],
            "pagerank": [pagerank[n] for n in G.nodes()]
        })

        results[team] = {
            "network": G,
            "metrics": metrics,
            "centrality": centrality
        }

    # Combine and print
    print("\n=== EL CLÁSICO NETWORK COMPARISON ===\n")

    metrics_df = pd.DataFrame([r["metrics"] for r in results.values()])
    print("Team Metrics:")
    print(metrics_df[["team", "density", "clustering", "centralization", "total_passes"]])

    print("\nTop Playmakers (by Betweenness):")
    all_centrality = pd.concat([r["centrality"] for r in results.values()])
    top_players = all_centrality.groupby("team").apply(
        lambda x: x.nlargest(3, "betweenness")
    ).reset_index(drop=True)
    print(top_players[["team", "player", "betweenness", "pagerank"]])

    return results

clasico_analysis = analyze_clasico_networks(events)
# R: Complete Network Comparison Case Study
library(tidyverse)
library(igraph)

analyze_clasico_networks <- function(events) {
    teams <- c("Barcelona", "Real Madrid")

    results <- map(teams, function(team) {
        # Build pass network
        passes <- events %>%
            filter(type.name == "Pass",
                   team.name == team,
                   !is.na(pass.recipient.name)) %>%
            group_by(passer = player.name, receiver = pass.recipient.name) %>%
            summarise(
                count = n(),
                successful = sum(is.na(pass.outcome.name)),
                progressive = sum(pass.end_location.x - location.x > 10),
                .groups = "drop"
            ) %>%
            filter(count >= 2)

        g <- graph_from_data_frame(passes, directed = TRUE)
        E(g)$weight <- passes$count

        # Calculate metrics
        list(
            team = team,
            network = g,
            metrics = tibble(
                team = team,
                density = edge_density(g),
                clustering = transitivity(g, type = "global"),
                centralization = centr_degree(g)$centralization,
                reciprocity = reciprocity(g),
                top_betweenness = V(g)$name[which.max(betweenness(g))],
                top_pagerank = V(g)$name[which.max(page_rank(g)$vector)],
                total_passes = sum(E(g)$weight)
            ),
            centrality = tibble(
                player = V(g)$name,
                team = team,
                degree = degree(g, mode = "all"),
                betweenness = betweenness(g, normalized = TRUE),
                pagerank = page_rank(g)$vector
            )
        )
    })

    # Combine results
    metrics_comparison <- bind_rows(map(results, "metrics"))
    centrality_all <- bind_rows(map(results, "centrality"))

    # Print comparison
    cat("\n=== EL CLÁSICO NETWORK COMPARISON ===\n\n")

    cat("Team Metrics:\n")
    print(metrics_comparison %>%
          select(team, density, clustering, centralization, total_passes))

    cat("\nTop Playmakers (by Betweenness):\n")
    centrality_all %>%
        group_by(team) %>%
        slice_max(betweenness, n = 3) %>%
        select(team, player, betweenness, pagerank) %>%
        print()

    return(list(
        metrics = metrics_comparison,
        centrality = centrality_all,
        networks = map(results, "network")
    ))
}

clasico_analysis <- analyze_clasico_networks(events)

Output

=== EL CLÁSICO NETWORK COMPARISON ===

Team Metrics:
          team  density  clustering  centralization  total_passes
0    Barcelona    0.412       0.523           0.234           524
1  Real Madrid    0.356       0.467           0.298           478

Top Playmakers (by Betweenness):
          team            player  betweenness  pagerank
0    Barcelona   Sergio Busquets        0.284     0.128
1    Barcelona      Lionel Messi        0.198     0.142
2    Barcelona        Jordi Alba        0.156     0.098
3  Real Madrid     Toni Kroos           0.312     0.134
4  Real Madrid     Luka Modrić          0.256     0.121
5  Real Madrid     Casemiro            0.178     0.092

Barcelona Style

Higher density (0.412): More passing combinations
Higher clustering (0.523): More triangular play
Lower centralization: Less dependent on individuals
Key hub: Busquets orchestrates from deep
Style: Possession-based, patient buildup

Real Madrid Style

Lower density (0.356): Fewer combinations used
Higher centralization (0.298): Play through key players
Key hubs: Kroos and Modrić dominate passing
Less reciprocity: More direct, vertical play
Style: More direct transitions

Temporal Network Analysis

Football networks change throughout a match. Analyzing how networks evolve over time reveals tactical shifts, the impact of substitutions, and momentum changes.

temporal_network

# Python: Temporal network analysis
def analyze_network_by_period(events, team, periods=[15, 30, 45, 60, 75, 90]):
    """Analyze how network metrics change over time."""

    results = []

    for i, end_min in enumerate(periods):
        start_min = 0 if i == 0 else periods[i-1]

        # Filter passes for this period
        period_passes = events[
            (events["type"] == "Pass") &
            (events["team"] == team) &
            (events["minute"] >= start_min) &
            (events["minute"] < end_min) &
            (events["pass_recipient"].notna())
        ].groupby(["player", "pass_recipient"]).size().reset_index(name="passes")

        period_passes = period_passes[period_passes["passes"] >= 2]

        if len(period_passes) < 5:
            continue

        # Build network
        G = nx.DiGraph()
        for _, row in period_passes.iterrows():
            G.add_edge(row["player"], row["pass_recipient"],
                      weight=row["passes"])

        results.append({
            "period": f"{start_min}-{end_min}",
            "density": nx.density(G),
            "clustering": nx.average_clustering(G.to_undirected()),
            "centralization": calculate_centralization(list(dict(G.degree()).values())),
            "edges": G.number_of_edges(),
            "total_passes": sum(d["weight"] for _, _, d in G.edges(data=True))
        })

    return pd.DataFrame(results)

# Analyze evolution
network_evolution = analyze_network_by_period(events, "Barcelona")

# Plot
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(network_evolution["period"], network_evolution["density"],
        "o-", label="Density", color="#1B5E20", linewidth=2)
ax.plot(network_evolution["period"], network_evolution["centralization"],
        "s-", label="Centralization", color="#FF6B00", linewidth=2)
ax.set_xlabel("Period (minutes)")
ax.set_ylabel("Metric Value")
ax.set_title("Network Evolution Throughout Match")
ax.legend()
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
# R: Temporal network analysis
analyze_network_by_period <- function(events, team, periods = c(15, 30, 45, 60, 75, 90)) {

  results <- map_dfr(seq_along(periods), function(i) {
    start_min <- if (i == 1) 0 else periods[i-1]
    end_min <- periods[i]

    # Filter passes for this period
    period_passes <- events %>%
      filter(type.name == "Pass",
             team.name == team,
             minute >= start_min,
             minute < end_min,
             !is.na(pass.recipient.name)) %>%
      group_by(player.name, pass.recipient.name) %>%
      summarise(passes = n(), .groups = "drop") %>%
      filter(passes >= 2)

    if (nrow(period_passes) < 5) return(NULL)

    # Build network
    g <- graph_from_data_frame(period_passes, directed = TRUE)
    E(g)$weight <- period_passes$passes

    tibble(
      period = paste0(start_min, "-", end_min),
      density = edge_density(g),
      clustering = transitivity(g, type = "global"),
      centralization = centr_degree(g)$centralization,
      edges = ecount(g),
      total_passes = sum(E(g)$weight)
    )
  })

  results
}

# Analyze network evolution
network_evolution <- analyze_network_by_period(events, "Barcelona")

# Plot evolution
ggplot(network_evolution, aes(x = period)) +
  geom_line(aes(y = density, group = 1, color = "Density"), linewidth = 1.2) +
  geom_line(aes(y = centralization, group = 1, color = "Centralization"), linewidth = 1.2) +
  geom_point(aes(y = density, color = "Density"), size = 3) +
  geom_point(aes(y = centralization, color = "Centralization"), size = 3) +
  scale_color_manual(values = c("Density" = "#1B5E20", "Centralization" = "#FF6B00")) +
  labs(title = "Network Evolution Throughout Match",
       x = "Period (minutes)", y = "Metric Value",
       color = "Metric") +
  theme_minimal()

Comparing Team Networks

Network analysis enables objective comparison of team playing styles. We can create fingerprints based on network metrics to understand what makes each team unique.

compare_teams

# Python: Compare multiple teams
def compare_team_networks(events, teams):
    """Compare network metrics across multiple teams."""

    results = []

    for team_name in teams:
        # Build network
        team_passes = events[
            (events["type"] == "Pass") &
            (events["team"] == team_name) &
            (events["pass_recipient"].notna())
        ].groupby(["player", "pass_recipient"]).size().reset_index(name="passes")

        team_passes = team_passes[team_passes["passes"] >= 3]

        G = nx.DiGraph()
        for _, row in team_passes.iterrows():
            G.add_edge(row["player"], row["pass_recipient"],
                      weight=row["passes"])

        G_undirected = G.to_undirected()

        results.append({
            "team": team_name,
            "density": nx.density(G),
            "clustering": nx.average_clustering(G_undirected),
            "degree_centralization": calculate_centralization(
                list(dict(G.degree()).values())),
            "reciprocity": nx.reciprocity(G),
            "avg_path_length": nx.average_shortest_path_length(G_undirected)
                              if nx.is_connected(G_undirected) else np.nan,
            "total_passes": sum(d["weight"] for _, _, d in G.edges(data=True)),
            "unique_combinations": G.number_of_edges()
        })

    return pd.DataFrame(results)

# Compare teams
comparison = compare_team_networks(all_events,
    ["Barcelona", "Real Madrid", "Bayern Munich", "Liverpool"])

# Create heatmap comparison
from sklearn.preprocessing import MinMaxScaler

metrics = ["density", "clustering", "degree_centralization",
           "reciprocity", "unique_combinations"]
scaled_data = comparison[metrics].copy()
scaler = MinMaxScaler()
scaled_data[metrics] = scaler.fit_transform(scaled_data[metrics])
scaled_data["team"] = comparison["team"]

# Plot heatmap
fig, ax = plt.subplots(figsize=(10, 6))
im = ax.imshow(scaled_data[metrics].values, cmap="RdYlGn", aspect="auto")

ax.set_xticks(range(len(metrics)))
ax.set_xticklabels(metrics, rotation=45, ha="right")
ax.set_yticks(range(len(comparison)))
ax.set_yticklabels(comparison["team"])

plt.colorbar(im, label="Scaled Value")
ax.set_title("Team Network Comparison")
plt.tight_layout()
plt.show()
# R: Compare multiple teams
compare_team_networks <- function(events, teams) {

  map_dfr(teams, function(team_name) {
    # Build network
    team_passes <- events %>%
      filter(type.name == "Pass",
             team.name == team_name,
             !is.na(pass.recipient.name)) %>%
      group_by(player.name, pass.recipient.name) %>%
      summarise(passes = n(), .groups = "drop") %>%
      filter(passes >= 3)

    g <- graph_from_data_frame(team_passes, directed = TRUE)
    E(g)$weight <- team_passes$passes

    tibble(
      team = team_name,
      density = edge_density(g),
      clustering = transitivity(g, type = "global"),
      degree_centralization = centr_degree(g)$centralization,
      betweenness_centralization = centr_betw(g)$centralization,
      reciprocity = reciprocity(g),
      avg_path_length = mean_distance(g, directed = FALSE),
      total_passes = sum(E(g)$weight),
      unique_combinations = ecount(g)
    )
  })
}

# Compare teams in competition
team_comparison <- compare_team_networks(all_events,
                                         c("Barcelona", "Real Madrid",
                                           "Bayern Munich", "Liverpool"))

# Create comparison visualization
team_comparison %>%
  pivot_longer(-team, names_to = "metric", values_to = "value") %>%
  group_by(metric) %>%
  mutate(scaled = scales::rescale(value, to = c(0, 100))) %>%
  ggplot(aes(x = metric, y = scaled, fill = team)) +
  geom_col(position = "dodge") +
  coord_flip() +
  scale_fill_brewer(palette = "Set1") +
  labs(title = "Team Network Comparison",
       x = "Metric", y = "Scaled Value (0-100)") +
  theme_minimal()

Practice Exercises

Hands-On Practice

Complete these exercises to master network analysis in football:

Exercise 36.1: Build a Match Network

Using StatsBomb free data, build pass networks for both teams in a match. Compare their density and centralization metrics. Which team had a more hierarchical passing structure?

Exercise 36.2: Identify the Playmaker

Calculate betweenness centrality for all players in a team. Who has the highest betweenness? Does this match your intuition about who the team's playmaker is?

Exercise 36.3: First Half vs Second Half

Build separate networks for the first and second half of a match. How do the metrics change? Can you identify any tactical shifts from the network evolution?

Exercise 36.4: Community Analysis

Apply community detection to a team's pass network. Do the detected communities align with defensive/midfield/attacking units? Visualize the result on a pitch.

Exercise 36.5: Quality-Weighted Network

Build pass networks weighted by (1) pass count, (2) progressive passes, and (3) passes into the final third. Compare the centrality rankings under each weighting scheme. Which players rise or fall in importance?

Exercise 36.6: Formation Detection Validation

Use the formation detection algorithm on 5 different matches for the same team. Does the detected formation match the reported lineup? How consistent is the detected formation across matches?

Exercise 36.7: Network Disruption Analysis

Simulate removing key players from the network (as if they were sent off). How does the network structure change? Calculate the "importance" of each player by measuring how much network metrics deteriorate when they're removed.

Exercise 36.8: Passing Motifs

Identify common passing triangles (3-player subgraphs) in the network. Which triangular combinations are used most frequently? Are high-frequency triangles associated with better attacking outcomes?

Summary

Key Takeaways

Pass networks represent football as a graph where players are nodes and passes are edges
Centrality metrics identify important players: degree (involvement), betweenness (playmaker), PageRank (ball magnet)
Team metrics like density and centralization reveal playing styles (possession vs direct)
Community detection finds natural player groupings that may reveal tactical structures
Temporal analysis tracks how networks evolve during matches
Weighted networks incorporate pass quality, progressiveness, and danger for richer analysis
Formation detection reveals actual team shapes from positional data
Network comparison enables objective measurement of style differences between teams

Key Network Metrics Reference

Metric	Level	Interpretation	Football Application
Degree	Node	Number of connections	Passing options, involvement
Betweenness	Node	Bridge between groups	Playmaker identification
PageRank	Node	Influence via connections	Expected ball recipient
Closeness	Node	Avg distance to all nodes	Ball circulation efficiency
Eigenvector	Node	Connected to important nodes	Quality of passing partners
Density	Network	Proportion of possible edges	Passing variety (high = tiki-taka)
Clustering	Network	Transitivity of connections	Triangular play frequency
Centralization	Network	Concentration of importance	Star player dependency
Reciprocity	Network	Mutual connections	Two-way passing combinations
Modularity	Network	Quality of community structure	Tactical unit cohesion

Key Libraries and Tools

R Libraries

igraph - Core network analysis
tidygraph - Tidy interface to igraph
ggraph - Grammar of graphics for networks
ggsoccer - Football pitch visualization
visNetwork - Interactive network visualization
sna - Social network analysis

Python Libraries

networkx - Core network analysis
python-louvain - Community detection
mplsoccer - Football pitch visualization
pyvis - Interactive networks
graph-tool - High-performance graphs
scikit-network - Network ML

Common Pitfalls to Avoid

Ignoring direction: Pass networks are directed - A passing to B ≠ B passing to A
Minimum threshold selection: Too low includes noise, too high misses connections
Sample size issues: Networks from short periods (e.g., 15 min) may be unreliable
Ignoring game state: Networks differ when winning vs losing
Substitution effects: Players with few minutes shouldn't be compared directly
Over-interpreting communities: Algorithms may find spurious groupings
Formation detection errors: Fluid formations may not cluster cleanly

Style Archetypes from Network Metrics

Possession-Based Style

High density (> 0.4)
High clustering (> 0.5)
Low centralization (< 0.25)
High reciprocity (> 0.7)
Many unique combinations
Example: Guardiola's Barcelona

Direct/Counter Style

Lower density (< 0.35)
Lower clustering (< 0.45)
Higher centralization (> 0.3)
Lower reciprocity
More vertical passing
Example: Mourinho's counter-attacking teams

Network analysis provides a mathematically rigorous framework for understanding team structure and player importance. In the next chapter, we'll explore computer vision applications in football analytics.

Capstone - Complete Analytics System

Network Analysis in Football

Learning Objectives

Graph Theory Fundamentals

Building Pass Networks from Event Data

Visualizing Pass Networks

Centrality Metrics

Centrality Radar Charts

Team-Level Network Metrics

Interpreting Network Metrics

Community Detection

Visualizing Communities

Weighted Network Analysis

Progressive Pass Networks

Formation Detection from Networks

Case Study: El Clásico Network Analysis

Temporal Network Analysis

Comparing Team Networks

Practice Exercises

Hands-On Practice

Summary

Key Takeaways

Key Network Metrics Reference

Key Libraries and Tools

R Libraries

Python Libraries

Common Pitfalls to Avoid

Style Archetypes from Network Metrics

On This Page

Exercises

Chapter Info