Capstone - Complete Analytics System
Network Analysis in Football
Football is fundamentally a network game. Every pass creates a connection between players, every team forms a dynamic network of interactions. Network analysis provides powerful tools to understand team structure, identify key players, and analyze tactical patterns.
Learning Objectives
- Understand graph theory fundamentals for football analysis
- Build and visualize pass networks from event data
- Calculate centrality metrics to identify key players
- Analyze network density and clustering coefficients
- Compare team playing styles through network metrics
- Apply community detection to find player groupings
Graph Theory Fundamentals
A graph (or network) consists of nodes (vertices) and edges (connections). In football pass networks, players are nodes and passes are edges. Understanding basic graph concepts is essential for network analysis.
- Nodes: Players on the pitch
- Edges: Passes between players
- Directed: Pass has sender/receiver
- Weighted: Number of passes
- Degree: Connections per node
- Density: Connectedness of network
- Centrality: Node importance
- Clustering: Triangular connections
- Path Length: Steps between nodes
- Communities: Subgroups in network
# Python: Setting up network analysis
import pandas as pd
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
# Create a simple pass network example
passes = pd.DataFrame({
"passer": ["GK", "CB1", "CB1", "CB2", "CM1", "CM1", "CM2", "CM2", "LW", "RW", "ST"],
"receiver": ["CB1", "CM1", "CB2", "CM2", "CM2", "LW", "CM1", "RW", "ST", "ST", "CM1"],
"count": [15, 22, 18, 20, 28, 14, 25, 16, 8, 10, 5]
})
# Create directed graph
G = nx.DiGraph()
# Add weighted edges
for _, row in passes.iterrows():
G.add_edge(row["passer"], row["receiver"], weight=row["count"])
# Basic network properties
print(f"Nodes: {G.number_of_nodes()}")
print(f"Edges: {G.number_of_edges()}")
print(f"Density: {nx.density(G):.4f}")
print(f"Is connected: {nx.is_weakly_connected(G)}")# R: Setting up network analysis
library(tidyverse)
library(igraph)
library(tidygraph)
library(ggraph)
# Create a simple pass network example
passes <- tibble(
passer = c("GK", "CB1", "CB1", "CB2", "CM1", "CM1", "CM2", "CM2", "LW", "RW", "ST"),
receiver = c("CB1", "CM1", "CB2", "CM2", "CM2", "LW", "CM1", "RW", "ST", "ST", "CM1"),
count = c(15, 22, 18, 20, 28, 14, 25, 16, 8, 10, 5)
)
# Create igraph object
g <- graph_from_data_frame(passes, directed = TRUE)
# Add edge weights
E(g)$weight <- passes$count
# Basic network properties
cat("Nodes:", vcount(g), "\n")
cat("Edges:", ecount(g), "\n")
cat("Density:", edge_density(g), "\n")
cat("Is connected:", is_connected(g, mode = "weak"), "\n")Nodes: 11
Edges: 11
Density: 0.1
Is connected: TrueBuilding Pass Networks from Event Data
To analyze real football networks, we need to extract passing data from event datasets and construct meaningful network representations. This involves aggregating passes between player pairs and handling substitutions.
# Python: Build pass network from StatsBomb data
from statsbombpy import sb
import networkx as nx
# Load match events
events = sb.events(match_id=3788741)
# Extract passes for one team
team_passes = events[
(events["type"] == "Pass") &
(events["team"] == "Barcelona") &
(events["pass_recipient"].notna())
][["player", "pass_recipient", "location", "pass_end_location"]].copy()
# Parse locations
team_passes["start_x"] = team_passes["location"].apply(lambda x: x[0])
team_passes["start_y"] = team_passes["location"].apply(lambda x: x[1])
team_passes["end_x"] = team_passes["pass_end_location"].apply(lambda x: x[0])
team_passes["end_y"] = team_passes["pass_end_location"].apply(lambda x: x[1])
# Aggregate passes between player pairs
pass_matrix = team_passes.groupby(["player", "pass_recipient"]).agg(
passes=("player", "count"),
avg_length=("start_x", lambda x: np.sqrt(
(team_passes.loc[x.index, "end_x"] - team_passes.loc[x.index, "start_x"])**2 +
(team_passes.loc[x.index, "end_y"] - team_passes.loc[x.index, "start_y"])**2
).mean())
).reset_index()
# Filter minimum passes
pass_matrix = pass_matrix[pass_matrix["passes"] >= 3]
# Calculate average positions
player_positions = team_passes.groupby("player").agg(
x=("start_x", "mean"),
y=("start_y", "mean"),
total_passes=("player", "count")
).reset_index()
# Create network
G = nx.DiGraph()
# Add nodes with positions
for _, row in player_positions.iterrows():
G.add_node(row["player"], pos=(row["x"], row["y"]),
passes=row["total_passes"])
# Add edges with weights
for _, row in pass_matrix.iterrows():
G.add_edge(row["player"], row["pass_recipient"],
weight=row["passes"])
print(f"Network: {G.number_of_nodes()} players, {G.number_of_edges()} connections")# R: Build pass network from StatsBomb data
library(StatsBombR)
# Load match data
events <- StatsBombFreeEvents(MatchesDF = Matches) %>%
filter(match_id == 3788741) # Example match
# Extract passes for one team
team_passes <- events %>%
filter(type.name == "Pass",
team.name == "Barcelona",
!is.na(pass.recipient.name)) %>%
select(player.name, pass.recipient.name,
location.x, location.y,
pass.end_location.x, pass.end_location.y)
# Aggregate passes between player pairs
pass_matrix <- team_passes %>%
group_by(passer = player.name, receiver = pass.recipient.name) %>%
summarise(
passes = n(),
avg_length = mean(sqrt((pass.end_location.x - location.x)^2 +
(pass.end_location.y - location.y)^2)),
.groups = "drop"
) %>%
filter(passes >= 3) # Minimum threshold
# Calculate average positions
player_positions <- team_passes %>%
group_by(player = player.name) %>%
summarise(
x = mean(location.x),
y = mean(location.y),
total_passes = n()
)
# Create network with positions
g <- graph_from_data_frame(pass_matrix, directed = TRUE,
vertices = player_positions)
E(g)$weight <- pass_matrix$passes
print(g)Network: 14 players, 42 connectionsVisualizing Pass Networks
Effective visualization is crucial for communicating network insights. We use node size to represent involvement, edge thickness for pass frequency, and spatial positions to show team shape.
# Python: Visualize pass network on pitch
from mplsoccer import Pitch
import matplotlib.pyplot as plt
import numpy as np
# Create pitch
pitch = Pitch(pitch_type="statsbomb", pitch_color="#1a472a",
line_color="white")
fig, ax = pitch.draw(figsize=(12, 8))
# Get positions
pos = nx.get_node_attributes(G, "pos")
# Calculate degree for node sizing
degrees = dict(G.degree())
max_degree = max(degrees.values())
# Draw edges
for (u, v, d) in G.edges(data=True):
if u in pos and v in pos:
x1, y1 = pos[u]
x2, y2 = pos[v]
# Line width based on passes
width = d["weight"] / 10
ax.annotate("", xy=(x2, y2), xytext=(x1, y1),
arrowprops=dict(arrowstyle="->",
color="white",
alpha=0.6,
linewidth=width,
connectionstyle="arc3,rad=0.1"))
# Draw nodes
for node in G.nodes():
if node in pos:
x, y = pos[node]
size = 200 + (degrees[node] / max_degree) * 800
ax.scatter(x, y, s=size, c="#a50044", edgecolors="white",
linewidths=2, zorder=5)
ax.annotate(node.split()[-1], (x, y + 3),
color="white", ha="center", fontsize=8,
fontweight="bold")
ax.set_title("Barcelona Pass Network", color="white", fontsize=14)
plt.tight_layout()
plt.savefig("pass_network.png", dpi=150, facecolor="#1a472a")
plt.show()# R: Visualize pass network on pitch
library(ggplot2)
library(ggsoccer)
# Convert to tidygraph for ggraph
tg <- as_tbl_graph(g) %>%
activate(nodes) %>%
mutate(
degree = centrality_degree(mode = "all"),
betweenness = centrality_betweenness()
)
# Get node data for plotting
node_data <- tg %>%
activate(nodes) %>%
as_tibble()
# Get edge data
edge_data <- tg %>%
activate(edges) %>%
as_tibble() %>%
left_join(node_data %>% select(name, x, y) %>% rename(from_x = x, from_y = y),
by = c("from" = "name")) %>%
left_join(node_data %>% select(name, x, y) %>% rename(to_x = x, to_y = y),
by = c("to" = "name"))
# Create pitch visualization
ggplot() +
annotate_pitch(colour = "white", fill = "#1a472a") +
# Draw edges (passes)
geom_segment(data = edge_data,
aes(x = from_x, y = from_y,
xend = to_x, yend = to_y,
linewidth = weight),
alpha = 0.6, color = "white",
arrow = arrow(length = unit(0.15, "cm"))) +
# Draw nodes (players)
geom_point(data = node_data,
aes(x = x, y = y, size = degree),
color = "#a50044", fill = "#a50044",
shape = 21, stroke = 2) +
# Add labels
geom_text(data = node_data,
aes(x = x, y = y + 3, label = name),
color = "white", size = 3, fontface = "bold") +
scale_linewidth_continuous(range = c(0.5, 3)) +
scale_size_continuous(range = c(4, 12)) +
coord_flip(xlim = c(0, 120), ylim = c(0, 80)) +
theme_pitch() +
theme(legend.position = "none") +
labs(title = "Barcelona Pass Network",
subtitle = "Node size = degree centrality, Edge width = passes")Centrality Metrics
Centrality metrics quantify the importance of nodes within a network. Different centrality measures capture different aspects of importance - who touches the ball most, who connects different parts of the team, and who controls the flow.
| Metric | Measures | Football Interpretation |
|---|---|---|
| Degree | Number of connections | Passing options / involvement |
| Betweenness | Bridge between groups | Playmaker connecting lines |
| Closeness | Average distance to all nodes | Ball circulation efficiency |
| Eigenvector | Connected to important nodes | Quality of passing partners |
| PageRank | Influence via connections | Expected ball reception |
# Python: Calculate all centrality metrics
import pandas as pd
# Calculate centrality measures
centrality = pd.DataFrame({
"player": list(G.nodes()),
# Degree centrality
"degree_in": [G.in_degree(n) for n in G.nodes()],
"degree_out": [G.out_degree(n) for n in G.nodes()],
"degree_total": [G.degree(n) for n in G.nodes()],
# Betweenness - who bridges groups
"betweenness": list(nx.betweenness_centrality(G, normalized=True).values()),
# Closeness - how quickly can reach others
"closeness": list(nx.closeness_centrality(G).values()),
# Eigenvector - connected to important players
"eigenvector": list(nx.eigenvector_centrality(G.to_undirected(),
max_iter=1000).values()),
# PageRank - influence measure
"pagerank": list(nx.pagerank(G, alpha=0.85).values())
})
# Top players by different metrics
print("Top 5 by Betweenness (Playmakers):")
print(centrality.nlargest(5, "betweenness")[["player", "betweenness", "degree_total"]])
print("\nTop 5 by PageRank (Ball Magnets):")
print(centrality.nlargest(5, "pagerank")[["player", "pagerank", "degree_in"]])# R: Calculate all centrality metrics
library(tidygraph)
# Calculate centrality measures
centrality_df <- tg %>%
activate(nodes) %>%
mutate(
# Degree centrality
degree_in = centrality_degree(mode = "in"),
degree_out = centrality_degree(mode = "out"),
degree_total = centrality_degree(mode = "all"),
# Betweenness - who bridges groups
betweenness = centrality_betweenness(directed = TRUE, normalized = TRUE),
# Closeness - how quickly can reach others
closeness = centrality_closeness(mode = "all"),
# Eigenvector - connected to important players
eigenvector = centrality_eigen(directed = FALSE),
# PageRank - influence measure
pagerank = centrality_pagerank(directed = TRUE)
) %>%
as_tibble() %>%
arrange(desc(betweenness))
# Display top players by different metrics
cat("Top 5 by Betweenness (Playmakers):\n")
centrality_df %>%
select(name, betweenness, degree_total) %>%
head(5) %>%
print()
cat("\nTop 5 by PageRank (Ball Magnets):\n")
centrality_df %>%
arrange(desc(pagerank)) %>%
select(name, pagerank, degree_in) %>%
head(5) %>%
print()Top 5 by Betweenness (Playmakers):
player betweenness degree_total
0 Sergio Busquets 0.284 18
1 Lionel Messi 0.198 15
2 Jordi Alba 0.156 12
Top 5 by PageRank (Ball Magnets):
player pagerank degree_in
0 Lionel Messi 0.142 12
1 Sergio Busquets 0.128 14
2 Gerard Pique 0.098 8Centrality Radar Charts
# Python: Create centrality radar chart
import matplotlib.pyplot as plt
import numpy as np
from math import pi
# Prepare data
metrics = ["degree_total", "betweenness", "closeness", "eigenvector", "pagerank"]
top_players = centrality.nlargest(5, "betweenness")
# Normalize to 0-100
for col in metrics:
top_players[col] = (top_players[col] - top_players[col].min()) / \
(top_players[col].max() - top_players[col].min()) * 100
# Create radar chart
fig, ax = plt.subplots(figsize=(10, 10), subplot_kw=dict(polar=True))
# Number of variables
N = len(metrics)
angles = [n / float(N) * 2 * pi for n in range(N)]
angles += angles[:1]
# Colors
colors = ["#a50044", "#004d98", "#edbb00", "#00a19c", "#8b0000"]
for idx, (_, row) in enumerate(top_players.iterrows()):
values = row[metrics].tolist()
values += values[:1]
ax.plot(angles, values, "o-", linewidth=2,
label=row["player"], color=colors[idx])
ax.fill(angles, values, alpha=0.25, color=colors[idx])
# Add labels
ax.set_xticks(angles[:-1])
ax.set_xticklabels(["Degree", "Betweenness", "Closeness",
"Eigenvector", "PageRank"])
ax.legend(loc="upper right", bbox_to_anchor=(1.3, 1.0))
plt.title("Player Centrality Profiles", size=14, y=1.1)
plt.tight_layout()
plt.show()# R: Create centrality radar chart
library(fmsb)
# Prepare data for radar chart
radar_data <- centrality_df %>%
select(name, degree_total, betweenness, closeness, eigenvector, pagerank) %>%
mutate(across(-name, ~ scales::rescale(., to = c(0, 100)))) %>%
head(5)
# Format for fmsb
radar_matrix <- radar_data %>%
column_to_rownames("name") %>%
as.data.frame()
# Add max and min rows
radar_matrix <- rbind(
rep(100, 5), # Max
rep(0, 5), # Min
radar_matrix
)
# Create radar chart
colors <- c("#a50044", "#004d98", "#edbb00", "#00a19c", "#8b0000")
radarchart(radar_matrix,
axistype = 1,
pcol = colors,
pfcol = scales::alpha(colors, 0.3),
plwd = 2,
cglcol = "grey",
cglty = 1,
axislabcol = "grey",
vlcex = 0.8,
title = "Player Centrality Profiles")Team-Level Network Metrics
Beyond individual player metrics, we can characterize entire team networks. These metrics reveal playing style - is the team hierarchical or egalitarian? How connected are the players?
# Python: Calculate team network metrics
def calculate_team_network_metrics(G):
"""Calculate comprehensive network metrics for a team."""
# Convert to undirected for some metrics
G_undirected = G.to_undirected()
metrics = {
# Basic properties
"nodes": G.number_of_nodes(),
"edges": G.number_of_edges(),
# Density
"density": nx.density(G),
# Clustering coefficient
"clustering": nx.average_clustering(G_undirected),
# Average path length (if connected)
"avg_path_length": nx.average_shortest_path_length(G_undirected)
if nx.is_connected(G_undirected) else None,
# Diameter
"diameter": nx.diameter(G_undirected)
if nx.is_connected(G_undirected) else None,
# Centralization (degree)
"degree_centralization": calculate_centralization(
list(dict(G.degree()).values())
),
# Reciprocity
"reciprocity": nx.reciprocity(G),
# Assortativity
"assortativity": nx.degree_assortativity_coefficient(G)
}
return metrics
def calculate_centralization(values):
"""Calculate Freeman centralization index."""
max_val = max(values)
n = len(values)
numerator = sum(max_val - v for v in values)
denominator = (n - 1) * (n - 2)
return numerator / denominator if denominator > 0 else 0
# Calculate metrics
team_metrics = calculate_team_network_metrics(G)
print("Team Network Metrics:")
for metric, value in team_metrics.items():
if value is not None:
print(f" {metric}: {value:.3f}" if isinstance(value, float)
else f" {metric}: {value}")# R: Calculate team network metrics
calculate_team_network_metrics <- function(g) {
tibble(
# Basic properties
nodes = vcount(g),
edges = ecount(g),
# Density - proportion of possible edges that exist
density = edge_density(g),
# Clustering coefficient - transitivity
clustering = transitivity(g, type = "global"),
# Average path length
avg_path_length = mean_distance(g, directed = FALSE),
# Diameter - longest shortest path
diameter = diameter(g, directed = FALSE),
# Centralization - how concentrated is importance
degree_centralization = centr_degree(g, mode = "all")$centralization,
betweenness_centralization = centr_betw(g, directed = TRUE)$centralization,
# Reciprocity - proportion of mutual connections
reciprocity = reciprocity(g),
# Assortativity - do similar connect to similar
assortativity = assortativity_degree(g, directed = FALSE)
)
}
# Calculate for our team
team_metrics <- calculate_team_network_metrics(g)
# Display
team_metrics %>%
pivot_longer(everything(), names_to = "metric", values_to = "value") %>%
mutate(value = round(value, 3)) %>%
print(n = 15)Team Network Metrics:
nodes: 11
edges: 42
density: 0.382
clustering: 0.456
avg_path_length: 1.764
diameter: 3
degree_centralization: 0.287
reciprocity: 0.714
assortativity: -0.156Interpreting Network Metrics
Style: Possession-based, tiki-taka
- Ball circulates freely
- No single dependency
- Many passing triangles
- Example: Guardiola's Barcelona
Style: Direct, star-dependent
- Play through key player
- Fewer passing combinations
- More predictable
- Example: Counter-attacking teams
Community Detection
Community detection algorithms identify subgroups of players who pass more frequently among themselves. This reveals tactical structures like defensive units, midfield partnerships, and attacking combinations.
# Python: Community detection in pass networks
import community as community_louvain
from networkx.algorithms import community as nx_community
# Convert to undirected for community detection
G_undirected = G.to_undirected()
# Louvain community detection
partition = community_louvain.best_partition(G_undirected)
# Add community membership to nodes
for node, comm in partition.items():
G.nodes[node]["community"] = comm
# Analyze communities
community_df = pd.DataFrame([
{"player": node, "community": data["community"],
"degree": G.degree(node)}
for node, data in G.nodes(data=True)
]).sort_values(["community", "degree"], ascending=[True, False])
# Summary by community
community_summary = community_df.groupby("community").agg(
players=("player", "count"),
members=("player", lambda x: ", ".join(x)),
avg_degree=("degree", "mean")
).reset_index()
print("Community Structure:")
print(community_summary)
# Modularity score
modularity = community_louvain.modularity(partition, G_undirected)
print(f"\nModularity: {modularity:.3f}")
# Try Girvan-Newman algorithm
gn_communities = list(nx_community.girvan_newman(G_undirected))
print(f"\nGirvan-Newman found {len(gn_communities[0])} communities at first level")# R: Community detection in pass networks
library(igraph)
# Louvain community detection (most common)
communities_louvain <- cluster_louvain(as.undirected(g))
# Add community membership to network
V(g)$community <- membership(communities_louvain)
# Analyze communities
community_df <- tibble(
player = V(g)$name,
community = V(g)$community,
degree = degree(g, mode = "all")
) %>%
arrange(community, desc(degree))
# Summary by community
community_summary <- community_df %>%
group_by(community) %>%
summarise(
players = n(),
members = paste(player, collapse = ", "),
avg_degree = mean(degree)
)
print(community_summary)
# Modularity score (quality of community structure)
cat("\nModularity:", modularity(communities_louvain), "\n")
# Try different algorithms
communities_walktrap <- cluster_walktrap(as.undirected(g))
communities_infomap <- cluster_infomap(g)
cat("\nAlgorithm comparison:\n")
cat("Louvain communities:", length(communities_louvain), "\n")
cat("Walktrap communities:", length(communities_walktrap), "\n")
cat("Infomap communities:", length(communities_infomap), "\n")Community Structure:
community players members avg_degree
0 0 4 GK, CB1, CB2, LB 8.5
1 1 4 CM1, CM2, CDM 12.3
2 2 3 LW, RW, ST 9.7
Modularity: 0.384Visualizing Communities
# Python: Visualize communities on pitch
from mplsoccer import Pitch
pitch = Pitch(pitch_type="statsbomb", pitch_color="#1a472a",
line_color="white")
fig, ax = pitch.draw(figsize=(12, 8))
# Community colors
colors = ["#e41a1c", "#377eb8", "#4daf4a", "#984ea3"]
pos = nx.get_node_attributes(G, "pos")
# Draw edges
for (u, v, d) in G.edges(data=True):
if u in pos and v in pos:
x1, y1 = pos[u]
x2, y2 = pos[v]
ax.plot([x1, x2], [y1, y2], "w-", alpha=0.3, linewidth=0.5)
# Draw nodes colored by community
for node in G.nodes():
if node in pos:
x, y = pos[node]
comm = G.nodes[node].get("community", 0)
ax.scatter(x, y, s=400, c=colors[comm % len(colors)],
edgecolors="white", linewidths=2, zorder=5)
ax.annotate(node.split()[-1], (x, y + 3),
color="white", ha="center", fontsize=8)
# Add legend
for i in range(max(partition.values()) + 1):
ax.scatter([], [], c=colors[i], s=100, label=f"Community {i+1}")
ax.legend(loc="upper left")
ax.set_title("Pass Network Communities", color="white", fontsize=14)
plt.tight_layout()
plt.show()# R: Visualize communities on pitch
community_colors <- c("#e41a1c", "#377eb8", "#4daf4a", "#984ea3")
# Create visualization
ggplot() +
annotate_pitch(colour = "white", fill = "#1a472a") +
# Draw edges
geom_segment(data = edge_data,
aes(x = from_x, y = from_y,
xend = to_x, yend = to_y),
alpha = 0.3, color = "white") +
# Draw nodes colored by community
geom_point(data = node_data %>%
left_join(community_df, by = c("name" = "player")),
aes(x = x, y = y, fill = factor(community)),
size = 8, shape = 21, stroke = 2, color = "white") +
scale_fill_manual(values = community_colors,
name = "Community") +
coord_flip(xlim = c(0, 120), ylim = c(0, 80)) +
theme_pitch() +
labs(title = "Pass Network Communities",
subtitle = "Colors indicate detected player groupings")Weighted Network Analysis
Pass count alone doesn't tell the full story. Weighting edges by pass quality, progressiveness, or danger created provides deeper insights into network effectiveness.
# Python: Weighted Network with Pass Quality
import numpy as np
import pandas as pd
import networkx as nx
def create_weighted_network(events, team_name):
"""Create networks with different weighting schemes."""
passes = events[
(events["type"] == "Pass") &
(events["team"] == team_name) &
(events["pass_recipient"].notna())
].copy()
# Parse locations
passes["start_x"] = passes["location"].apply(lambda x: x[0] if x else 0)
passes["end_x"] = passes["pass_end_location"].apply(lambda x: x[0] if x else 0)
passes["end_y"] = passes["pass_end_location"].apply(lambda x: x[1] if x else 0)
# Calculate quality metrics
passes["progressive"] = (passes["end_x"] - passes["start_x"] > 10) & (passes["end_x"] > 40)
passes["final_third"] = passes["end_x"] >= 80
passes["box_entry"] = (passes["end_x"] >= 102) & (passes["end_y"] >= 18) & (passes["end_y"] <= 62)
passes["successful"] = passes["pass_outcome"].isna() | (passes["pass_outcome"] == "Complete")
# Quality score
def calc_quality(row):
if row["box_entry"] and row["successful"]:
return 3
if row["final_third"] and row["successful"]:
return 2
if row["progressive"] and row["successful"]:
return 1.5
if row["successful"]:
return 1
return 0
passes["quality"] = passes.apply(calc_quality, axis=1)
# Aggregate
pass_matrix = passes.groupby(["player", "pass_recipient"]).agg(
count=("player", "count"),
total_quality=("quality", "sum"),
avg_quality=("quality", "mean"),
progressive_pct=("progressive", "mean"),
final_third_pct=("final_third", "mean"),
success_rate=("successful", "mean")
).reset_index()
pass_matrix = pass_matrix[pass_matrix["count"] >= 3]
# Create networks
G_count = nx.DiGraph()
G_quality = nx.DiGraph()
for _, row in pass_matrix.iterrows():
G_count.add_edge(row["player"], row["pass_recipient"], weight=row["count"])
G_quality.add_edge(row["player"], row["pass_recipient"], weight=row["total_quality"])
return {"count": G_count, "quality": G_quality, "data": pass_matrix}
# Create weighted networks
weighted = create_weighted_network(events, "Barcelona")
# Compare centrality
def weighted_degree(G):
return {n: sum(d["weight"] for _, _, d in G.edges(n, data=True)) for n in G.nodes()}
strength_count = weighted_degree(weighted["count"])
strength_quality = weighted_degree(weighted["quality"])
pagerank_count = nx.pagerank(weighted["count"], weight="weight")
pagerank_quality = nx.pagerank(weighted["quality"], weight="weight")
comparison = pd.DataFrame({
"player": list(strength_count.keys()),
"strength_count": list(strength_count.values()),
"strength_quality": list(strength_quality.values()),
"pagerank_count": [pagerank_count.get(p, 0) for p in strength_count.keys()],
"pagerank_quality": [pagerank_quality.get(p, 0) for p in strength_count.keys()]
})
comparison["quality_boost"] = (comparison["strength_quality"] / comparison["strength_count"]) - 1
comparison = comparison.sort_values("quality_boost", ascending=False)
print("Players with highest quality boost:")
print(comparison.head())# R: Weighted Network with Pass Quality
library(tidyverse)
library(igraph)
create_weighted_network <- function(events, team_name) {
# Extract passes with quality metrics
passes <- events %>%
filter(type.name == "Pass",
team.name == team_name,
!is.na(pass.recipient.name)) %>%
mutate(
# Calculate progressiveness
start_x = location.x,
end_x = pass.end_location.x,
progressive = end_x - start_x > 10 & end_x > 40,
# Calculate danger zone entry
final_third = end_x >= 80,
box_entry = end_x >= 102 & pass.end_location.y >= 18 &
pass.end_location.y <= 62,
# Pass success
successful = is.na(pass.outcome.name) | pass.outcome.name == "Complete",
# Pass quality score
quality = case_when(
box_entry & successful ~ 3,
final_third & successful ~ 2,
progressive & successful ~ 1.5,
successful ~ 1,
TRUE ~ 0
)
)
# Aggregate with different weighting schemes
pass_matrix <- passes %>%
group_by(passer = player.name, receiver = pass.recipient.name) %>%
summarise(
count = n(),
total_quality = sum(quality),
avg_quality = mean(quality),
progressive_pct = mean(progressive),
final_third_pct = mean(final_third),
success_rate = mean(successful),
.groups = "drop"
) %>%
filter(count >= 3)
# Create weighted networks
networks <- list()
# Count-weighted
g_count <- graph_from_data_frame(pass_matrix, directed = TRUE)
E(g_count)$weight <- pass_matrix$count
networks$count <- g_count
# Quality-weighted
g_quality <- graph_from_data_frame(pass_matrix, directed = TRUE)
E(g_quality)$weight <- pass_matrix$total_quality
networks$quality <- g_quality
return(list(networks = networks, data = pass_matrix))
}
# Create weighted networks
weighted <- create_weighted_network(events, "Barcelona")
# Compare centrality between count and quality weighting
centrality_comparison <- tibble(
player = V(weighted$networks$count)$name,
strength_count = strength(weighted$networks$count, mode = "all"),
strength_quality = strength(weighted$networks$quality, mode = "all"),
pagerank_count = page_rank(weighted$networks$count)$vector,
pagerank_quality = page_rank(weighted$networks$quality)$vector
) %>%
mutate(
quality_boost = (strength_quality / strength_count) - 1,
rank_change = rank(-pagerank_quality) - rank(-pagerank_count)
) %>%
arrange(desc(quality_boost))
cat("Players with highest quality boost:\n")
print(head(centrality_comparison, 5))Players with highest quality boost:
player strength_count strength_quality quality_boost
0 Lionel Messi 156 312.5 1.004
1 Luis Suárez 98 189.2 0.931
2 Jordi Alba 87 154.8 0.779
3 Coutinho 72 123.6 0.717
4 Ousmane Dembélé 45 76.5 0.700Progressive Pass Networks
Focusing only on progressive passes reveals which players drive the team forward and who receives in dangerous positions.
# Python: Progressive Pass Network
def build_progressive_network(events, team_name):
"""Build network using only progressive passes."""
passes = events[
(events["type"] == "Pass") &
(events["team"] == team_name) &
(events["pass_recipient"].notna()) &
(events["pass_outcome"].isna()) # Successful only
].copy()
passes["start_x"] = passes["location"].apply(lambda x: x[0] if x else 0)
passes["end_x"] = passes["pass_end_location"].apply(lambda x: x[0] if x else 0)
# Define progressive
passes["progressive"] = (
((passes["end_x"] - passes["start_x"]) >= 10) & (passes["end_x"] >= 40)
) | (passes["end_x"] >= 102)
# Filter progressive only
prog_passes = passes[passes["progressive"]].groupby(
["player", "pass_recipient"]
).size().reset_index(name="progressive_passes")
# Build network
G = nx.DiGraph()
for _, row in prog_passes.iterrows():
G.add_edge(row["player"], row["pass_recipient"],
weight=row["progressive_passes"])
# Calculate role metrics
in_strength = dict(G.in_degree(weight="weight"))
out_strength = dict(G.out_degree(weight="weight"))
player_roles = pd.DataFrame({
"player": list(G.nodes()),
"receives_progressive": [in_strength.get(n, 0) for n in G.nodes()],
"makes_progressive": [out_strength.get(n, 0) for n in G.nodes()]
})
player_roles["progressive_balance"] = (
player_roles["receives_progressive"] - player_roles["makes_progressive"]
)
def assign_role(balance):
if balance > 3:
return "Progressive Receiver"
elif balance < -3:
return "Progressive Passer"
return "Balanced"
player_roles["role"] = player_roles["progressive_balance"].apply(assign_role)
return G, player_roles.sort_values("receives_progressive", ascending=False)
G_prog, roles = build_progressive_network(events, "Barcelona")
print(roles)# R: Progressive Pass Network
build_progressive_network <- function(events, team_name) {
progressive_passes <- events %>%
filter(type.name == "Pass",
team.name == team_name,
!is.na(pass.recipient.name)) %>%
mutate(
start_x = location.x,
end_x = pass.end_location.x,
# Progressive: moves ball at least 10m towards goal in final 60%
progressive = (end_x - start_x >= 10 & end_x >= 40) |
(end_x >= 102) # Any pass into box is progressive
) %>%
filter(progressive, is.na(pass.outcome.name)) %>%
group_by(passer = player.name, receiver = pass.recipient.name) %>%
summarise(progressive_passes = n(), .groups = "drop")
g <- graph_from_data_frame(progressive_passes, directed = TRUE)
E(g)$weight <- progressive_passes$progressive_passes
# Key metrics for progressive network
in_progressive <- strength(g, mode = "in")
out_progressive <- strength(g, mode = "out")
player_roles <- tibble(
player = V(g)$name,
receives_progressive = as.numeric(in_progressive),
makes_progressive = as.numeric(out_progressive),
progressive_balance = receives_progressive - makes_progressive
) %>%
mutate(
role = case_when(
progressive_balance > 3 ~ "Progressive Receiver",
progressive_balance < -3 ~ "Progressive Passer",
TRUE ~ "Balanced"
)
) %>%
arrange(desc(receives_progressive))
return(list(network = g, roles = player_roles))
}
prog_network <- build_progressive_network(events, "Barcelona")
print(prog_network$roles) player receives_progressive makes_progressive progressive_balance role
0 Lionel Messi 32 18 14 Progressive Receiver
1 Luis Suárez 24 8 16 Progressive Receiver
2 Ousmane Dembélé 18 6 12 Progressive Receiver
3 Jordi Alba 14 22 -8 Progressive Passer
4 Sergio Busquets 8 28 -20 Progressive PasserFormation Detection from Networks
By analyzing player positions within pass networks, we can detect formations and understand how teams actually shape up during play.
# Python: Formation Detection
from sklearn.cluster import KMeans
import numpy as np
def detect_formation(events, team_name):
"""Detect formation from average positions."""
# Get average positions
player_data = events[
(events["team"] == team_name) &
(events["location"].notna())
].copy()
player_data["x"] = player_data["location"].apply(lambda p: p[0] if p else 0)
player_data["y"] = player_data["location"].apply(lambda p: p[1] if p else 0)
positions = player_data.groupby("player").agg(
avg_x=("x", "mean"),
avg_y=("y", "mean"),
touches=("player", "count")
).reset_index()
positions = positions[positions["touches"] >= 20]
# Exclude goalkeeper
outfield = positions[positions["avg_x"] > 25].copy()
# Find optimal number of lines using elbow method
def find_best_lines(x_positions):
wss = []
for k in range(2, 6):
km = KMeans(n_clusters=k, random_state=42, n_init=25)
km.fit(x_positions.reshape(-1, 1))
wss.append(km.inertia_)
# Simple elbow detection
diffs = np.diff(wss)
best_k = np.argmin(diffs) + 2
return max(3, min(4, best_k)) # Clamp to reasonable values
n_lines = find_best_lines(outfield["avg_x"].values)
# Cluster into lines
km = KMeans(n_clusters=n_lines, random_state=42, n_init=25)
outfield["line"] = km.fit_predict(outfield[["avg_x"]])
# Reorder from back to front
line_order = outfield.groupby("line")["avg_x"].mean().sort_values()
line_mapping = {old: new for new, old in enumerate(line_order.index, 1)}
outfield["line"] = outfield["line"].map(line_mapping)
# Count per line
formation = outfield.groupby("line").size().sort_index().tolist()
formation_string = "-".join(map(str, formation))
return {
"positions": outfield.sort_values(["line", "avg_y"]),
"formation": formation_string,
"n_lines": n_lines
}
result = detect_formation(events, "Barcelona")
print(f"Detected formation: {result[\"formation\"]}")
print(result["positions"])# R: Formation Detection
library(cluster)
detect_formation <- function(events, team_name) {
# Get average positions for all players
player_positions <- events %>%
filter(team.name == team_name,
!is.na(location.x)) %>%
group_by(player.name) %>%
summarise(
avg_x = mean(location.x),
avg_y = mean(location.y),
touches = n(),
.groups = "drop"
) %>%
filter(touches >= 20) # Filter out substitutes with few touches
# Identify lines using k-means on x-position
# Try different numbers of lines
find_best_lines <- function(positions) {
# Exclude goalkeeper
outfield <- positions %>% filter(avg_x > 25)
wss <- sapply(2:5, function(k) {
kmeans(outfield$avg_x, centers = k, nstart = 25)$tot.withinss
})
# Use elbow method or default to 3 lines
best_k <- which.min(diff(wss)) + 2
return(best_k)
}
n_lines <- find_best_lines(player_positions)
outfield <- player_positions %>% filter(avg_x > 25)
# Cluster into lines
km <- kmeans(outfield$avg_x, centers = n_lines, nstart = 25)
outfield$line <- km$cluster
# Reorder lines from back to front
line_order <- outfield %>%
group_by(line) %>%
summarise(avg_pos = mean(avg_x)) %>%
arrange(avg_pos) %>%
mutate(new_line = row_number())
outfield <- outfield %>%
left_join(line_order %>% select(line, new_line), by = "line") %>%
mutate(line = new_line) %>%
select(-new_line)
# Count players per line
formation <- outfield %>%
count(line) %>%
arrange(line) %>%
pull(n)
formation_string <- paste(formation, collapse = "-")
return(list(
positions = outfield,
formation = formation_string,
n_lines = n_lines
))
}
formation_result <- detect_formation(events, "Barcelona")
cat("Detected formation:", formation_result$formation, "\n")
print(formation_result$positions %>% arrange(line, avg_y))Detected formation: 4-3-3
player avg_x avg_y touches line
0 Gerard Piqué 35.2 32.5 124 1
1 Samuel Umtiti 34.8 48.2 118 1
2 Jordi Alba 42.3 12.4 156 1
3 Sergi Roberto 41.8 68.2 142 1
4 Sergio Busquets 52.4 38.6 234 2
5 Ivan Rakitic 58.2 52.3 189 2
6 Arthur 55.8 28.4 167 2
7 Ousmane Dembélé 78.4 72.1 89 3
8 Lionel Messi 82.6 54.2 198 3
9 Luis Suárez 85.1 38.4 145 3Case Study: El Clásico Network Analysis
Let's apply all our network analysis techniques to compare Barcelona and Real Madrid in a Clásico match.
# Python: Complete Network Comparison Case Study
import pandas as pd
import networkx as nx
def analyze_clasico_networks(events):
"""Complete network analysis comparison for El Clásico."""
teams = ["Barcelona", "Real Madrid"]
results = {}
for team in teams:
# Build pass network
passes = events[
(events["type"] == "Pass") &
(events["team"] == team) &
(events["pass_recipient"].notna())
].groupby(["player", "pass_recipient"]).agg(
count=("player", "count")
).reset_index()
passes = passes[passes["count"] >= 2]
G = nx.DiGraph()
for _, row in passes.iterrows():
G.add_edge(row["player"], row["pass_recipient"], weight=row["count"])
G_undirected = G.to_undirected()
# Calculate metrics
betweenness = nx.betweenness_centrality(G, normalized=True)
pagerank = nx.pagerank(G, weight="weight")
top_betweenness = max(betweenness, key=betweenness.get)
top_pagerank = max(pagerank, key=pagerank.get)
metrics = {
"team": team,
"density": nx.density(G),
"clustering": nx.average_clustering(G_undirected),
"centralization": calculate_centralization(list(dict(G.degree()).values())),
"reciprocity": nx.reciprocity(G),
"top_betweenness": top_betweenness,
"top_pagerank": top_pagerank,
"total_passes": sum(d["weight"] for _, _, d in G.edges(data=True))
}
centrality = pd.DataFrame({
"player": list(G.nodes()),
"team": team,
"degree": [G.degree(n) for n in G.nodes()],
"betweenness": [betweenness[n] for n in G.nodes()],
"pagerank": [pagerank[n] for n in G.nodes()]
})
results[team] = {
"network": G,
"metrics": metrics,
"centrality": centrality
}
# Combine and print
print("\n=== EL CLÁSICO NETWORK COMPARISON ===\n")
metrics_df = pd.DataFrame([r["metrics"] for r in results.values()])
print("Team Metrics:")
print(metrics_df[["team", "density", "clustering", "centralization", "total_passes"]])
print("\nTop Playmakers (by Betweenness):")
all_centrality = pd.concat([r["centrality"] for r in results.values()])
top_players = all_centrality.groupby("team").apply(
lambda x: x.nlargest(3, "betweenness")
).reset_index(drop=True)
print(top_players[["team", "player", "betweenness", "pagerank"]])
return results
clasico_analysis = analyze_clasico_networks(events)# R: Complete Network Comparison Case Study
library(tidyverse)
library(igraph)
analyze_clasico_networks <- function(events) {
teams <- c("Barcelona", "Real Madrid")
results <- map(teams, function(team) {
# Build pass network
passes <- events %>%
filter(type.name == "Pass",
team.name == team,
!is.na(pass.recipient.name)) %>%
group_by(passer = player.name, receiver = pass.recipient.name) %>%
summarise(
count = n(),
successful = sum(is.na(pass.outcome.name)),
progressive = sum(pass.end_location.x - location.x > 10),
.groups = "drop"
) %>%
filter(count >= 2)
g <- graph_from_data_frame(passes, directed = TRUE)
E(g)$weight <- passes$count
# Calculate metrics
list(
team = team,
network = g,
metrics = tibble(
team = team,
density = edge_density(g),
clustering = transitivity(g, type = "global"),
centralization = centr_degree(g)$centralization,
reciprocity = reciprocity(g),
top_betweenness = V(g)$name[which.max(betweenness(g))],
top_pagerank = V(g)$name[which.max(page_rank(g)$vector)],
total_passes = sum(E(g)$weight)
),
centrality = tibble(
player = V(g)$name,
team = team,
degree = degree(g, mode = "all"),
betweenness = betweenness(g, normalized = TRUE),
pagerank = page_rank(g)$vector
)
)
})
# Combine results
metrics_comparison <- bind_rows(map(results, "metrics"))
centrality_all <- bind_rows(map(results, "centrality"))
# Print comparison
cat("\n=== EL CLÁSICO NETWORK COMPARISON ===\n\n")
cat("Team Metrics:\n")
print(metrics_comparison %>%
select(team, density, clustering, centralization, total_passes))
cat("\nTop Playmakers (by Betweenness):\n")
centrality_all %>%
group_by(team) %>%
slice_max(betweenness, n = 3) %>%
select(team, player, betweenness, pagerank) %>%
print()
return(list(
metrics = metrics_comparison,
centrality = centrality_all,
networks = map(results, "network")
))
}
clasico_analysis <- analyze_clasico_networks(events)=== EL CLÁSICO NETWORK COMPARISON ===
Team Metrics:
team density clustering centralization total_passes
0 Barcelona 0.412 0.523 0.234 524
1 Real Madrid 0.356 0.467 0.298 478
Top Playmakers (by Betweenness):
team player betweenness pagerank
0 Barcelona Sergio Busquets 0.284 0.128
1 Barcelona Lionel Messi 0.198 0.142
2 Barcelona Jordi Alba 0.156 0.098
3 Real Madrid Toni Kroos 0.312 0.134
4 Real Madrid Luka Modrić 0.256 0.121
5 Real Madrid Casemiro 0.178 0.092- Higher density (0.412): More passing combinations
- Higher clustering (0.523): More triangular play
- Lower centralization: Less dependent on individuals
- Key hub: Busquets orchestrates from deep
- Style: Possession-based, patient buildup
- Lower density (0.356): Fewer combinations used
- Higher centralization (0.298): Play through key players
- Key hubs: Kroos and Modrić dominate passing
- Less reciprocity: More direct, vertical play
- Style: More direct transitions
Temporal Network Analysis
Football networks change throughout a match. Analyzing how networks evolve over time reveals tactical shifts, the impact of substitutions, and momentum changes.
# Python: Temporal network analysis
def analyze_network_by_period(events, team, periods=[15, 30, 45, 60, 75, 90]):
"""Analyze how network metrics change over time."""
results = []
for i, end_min in enumerate(periods):
start_min = 0 if i == 0 else periods[i-1]
# Filter passes for this period
period_passes = events[
(events["type"] == "Pass") &
(events["team"] == team) &
(events["minute"] >= start_min) &
(events["minute"] < end_min) &
(events["pass_recipient"].notna())
].groupby(["player", "pass_recipient"]).size().reset_index(name="passes")
period_passes = period_passes[period_passes["passes"] >= 2]
if len(period_passes) < 5:
continue
# Build network
G = nx.DiGraph()
for _, row in period_passes.iterrows():
G.add_edge(row["player"], row["pass_recipient"],
weight=row["passes"])
results.append({
"period": f"{start_min}-{end_min}",
"density": nx.density(G),
"clustering": nx.average_clustering(G.to_undirected()),
"centralization": calculate_centralization(list(dict(G.degree()).values())),
"edges": G.number_of_edges(),
"total_passes": sum(d["weight"] for _, _, d in G.edges(data=True))
})
return pd.DataFrame(results)
# Analyze evolution
network_evolution = analyze_network_by_period(events, "Barcelona")
# Plot
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(network_evolution["period"], network_evolution["density"],
"o-", label="Density", color="#1B5E20", linewidth=2)
ax.plot(network_evolution["period"], network_evolution["centralization"],
"s-", label="Centralization", color="#FF6B00", linewidth=2)
ax.set_xlabel("Period (minutes)")
ax.set_ylabel("Metric Value")
ax.set_title("Network Evolution Throughout Match")
ax.legend()
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()# R: Temporal network analysis
analyze_network_by_period <- function(events, team, periods = c(15, 30, 45, 60, 75, 90)) {
results <- map_dfr(seq_along(periods), function(i) {
start_min <- if (i == 1) 0 else periods[i-1]
end_min <- periods[i]
# Filter passes for this period
period_passes <- events %>%
filter(type.name == "Pass",
team.name == team,
minute >= start_min,
minute < end_min,
!is.na(pass.recipient.name)) %>%
group_by(player.name, pass.recipient.name) %>%
summarise(passes = n(), .groups = "drop") %>%
filter(passes >= 2)
if (nrow(period_passes) < 5) return(NULL)
# Build network
g <- graph_from_data_frame(period_passes, directed = TRUE)
E(g)$weight <- period_passes$passes
tibble(
period = paste0(start_min, "-", end_min),
density = edge_density(g),
clustering = transitivity(g, type = "global"),
centralization = centr_degree(g)$centralization,
edges = ecount(g),
total_passes = sum(E(g)$weight)
)
})
results
}
# Analyze network evolution
network_evolution <- analyze_network_by_period(events, "Barcelona")
# Plot evolution
ggplot(network_evolution, aes(x = period)) +
geom_line(aes(y = density, group = 1, color = "Density"), linewidth = 1.2) +
geom_line(aes(y = centralization, group = 1, color = "Centralization"), linewidth = 1.2) +
geom_point(aes(y = density, color = "Density"), size = 3) +
geom_point(aes(y = centralization, color = "Centralization"), size = 3) +
scale_color_manual(values = c("Density" = "#1B5E20", "Centralization" = "#FF6B00")) +
labs(title = "Network Evolution Throughout Match",
x = "Period (minutes)", y = "Metric Value",
color = "Metric") +
theme_minimal()Comparing Team Networks
Network analysis enables objective comparison of team playing styles. We can create fingerprints based on network metrics to understand what makes each team unique.
# Python: Compare multiple teams
def compare_team_networks(events, teams):
"""Compare network metrics across multiple teams."""
results = []
for team_name in teams:
# Build network
team_passes = events[
(events["type"] == "Pass") &
(events["team"] == team_name) &
(events["pass_recipient"].notna())
].groupby(["player", "pass_recipient"]).size().reset_index(name="passes")
team_passes = team_passes[team_passes["passes"] >= 3]
G = nx.DiGraph()
for _, row in team_passes.iterrows():
G.add_edge(row["player"], row["pass_recipient"],
weight=row["passes"])
G_undirected = G.to_undirected()
results.append({
"team": team_name,
"density": nx.density(G),
"clustering": nx.average_clustering(G_undirected),
"degree_centralization": calculate_centralization(
list(dict(G.degree()).values())),
"reciprocity": nx.reciprocity(G),
"avg_path_length": nx.average_shortest_path_length(G_undirected)
if nx.is_connected(G_undirected) else np.nan,
"total_passes": sum(d["weight"] for _, _, d in G.edges(data=True)),
"unique_combinations": G.number_of_edges()
})
return pd.DataFrame(results)
# Compare teams
comparison = compare_team_networks(all_events,
["Barcelona", "Real Madrid", "Bayern Munich", "Liverpool"])
# Create heatmap comparison
from sklearn.preprocessing import MinMaxScaler
metrics = ["density", "clustering", "degree_centralization",
"reciprocity", "unique_combinations"]
scaled_data = comparison[metrics].copy()
scaler = MinMaxScaler()
scaled_data[metrics] = scaler.fit_transform(scaled_data[metrics])
scaled_data["team"] = comparison["team"]
# Plot heatmap
fig, ax = plt.subplots(figsize=(10, 6))
im = ax.imshow(scaled_data[metrics].values, cmap="RdYlGn", aspect="auto")
ax.set_xticks(range(len(metrics)))
ax.set_xticklabels(metrics, rotation=45, ha="right")
ax.set_yticks(range(len(comparison)))
ax.set_yticklabels(comparison["team"])
plt.colorbar(im, label="Scaled Value")
ax.set_title("Team Network Comparison")
plt.tight_layout()
plt.show()# R: Compare multiple teams
compare_team_networks <- function(events, teams) {
map_dfr(teams, function(team_name) {
# Build network
team_passes <- events %>%
filter(type.name == "Pass",
team.name == team_name,
!is.na(pass.recipient.name)) %>%
group_by(player.name, pass.recipient.name) %>%
summarise(passes = n(), .groups = "drop") %>%
filter(passes >= 3)
g <- graph_from_data_frame(team_passes, directed = TRUE)
E(g)$weight <- team_passes$passes
tibble(
team = team_name,
density = edge_density(g),
clustering = transitivity(g, type = "global"),
degree_centralization = centr_degree(g)$centralization,
betweenness_centralization = centr_betw(g)$centralization,
reciprocity = reciprocity(g),
avg_path_length = mean_distance(g, directed = FALSE),
total_passes = sum(E(g)$weight),
unique_combinations = ecount(g)
)
})
}
# Compare teams in competition
team_comparison <- compare_team_networks(all_events,
c("Barcelona", "Real Madrid",
"Bayern Munich", "Liverpool"))
# Create comparison visualization
team_comparison %>%
pivot_longer(-team, names_to = "metric", values_to = "value") %>%
group_by(metric) %>%
mutate(scaled = scales::rescale(value, to = c(0, 100))) %>%
ggplot(aes(x = metric, y = scaled, fill = team)) +
geom_col(position = "dodge") +
coord_flip() +
scale_fill_brewer(palette = "Set1") +
labs(title = "Team Network Comparison",
x = "Metric", y = "Scaled Value (0-100)") +
theme_minimal()Practice Exercises
Hands-On Practice
Complete these exercises to master network analysis in football:
Using StatsBomb free data, build pass networks for both teams in a match. Compare their density and centralization metrics. Which team had a more hierarchical passing structure?
Calculate betweenness centrality for all players in a team. Who has the highest betweenness? Does this match your intuition about who the team's playmaker is?
Build separate networks for the first and second half of a match. How do the metrics change? Can you identify any tactical shifts from the network evolution?
Apply community detection to a team's pass network. Do the detected communities align with defensive/midfield/attacking units? Visualize the result on a pitch.
Build pass networks weighted by (1) pass count, (2) progressive passes, and (3) passes into the final third. Compare the centrality rankings under each weighting scheme. Which players rise or fall in importance?
Use the formation detection algorithm on 5 different matches for the same team. Does the detected formation match the reported lineup? How consistent is the detected formation across matches?
Simulate removing key players from the network (as if they were sent off). How does the network structure change? Calculate the "importance" of each player by measuring how much network metrics deteriorate when they're removed.
Identify common passing triangles (3-player subgraphs) in the network. Which triangular combinations are used most frequently? Are high-frequency triangles associated with better attacking outcomes?
Summary
Key Takeaways
- Pass networks represent football as a graph where players are nodes and passes are edges
- Centrality metrics identify important players: degree (involvement), betweenness (playmaker), PageRank (ball magnet)
- Team metrics like density and centralization reveal playing styles (possession vs direct)
- Community detection finds natural player groupings that may reveal tactical structures
- Temporal analysis tracks how networks evolve during matches
- Weighted networks incorporate pass quality, progressiveness, and danger for richer analysis
- Formation detection reveals actual team shapes from positional data
- Network comparison enables objective measurement of style differences between teams
Key Network Metrics Reference
| Metric | Level | Interpretation | Football Application |
|---|---|---|---|
| Degree | Node | Number of connections | Passing options, involvement |
| Betweenness | Node | Bridge between groups | Playmaker identification |
| PageRank | Node | Influence via connections | Expected ball recipient |
| Closeness | Node | Avg distance to all nodes | Ball circulation efficiency |
| Eigenvector | Node | Connected to important nodes | Quality of passing partners |
| Density | Network | Proportion of possible edges | Passing variety (high = tiki-taka) |
| Clustering | Network | Transitivity of connections | Triangular play frequency |
| Centralization | Network | Concentration of importance | Star player dependency |
| Reciprocity | Network | Mutual connections | Two-way passing combinations |
| Modularity | Network | Quality of community structure | Tactical unit cohesion |
Key Libraries and Tools
R Libraries
igraph- Core network analysistidygraph- Tidy interface to igraphggraph- Grammar of graphics for networksggsoccer- Football pitch visualizationvisNetwork- Interactive network visualizationsna- Social network analysis
Python Libraries
networkx- Core network analysispython-louvain- Community detectionmplsoccer- Football pitch visualizationpyvis- Interactive networksgraph-tool- High-performance graphsscikit-network- Network ML
Common Pitfalls to Avoid
- Ignoring direction: Pass networks are directed - A passing to B ≠ B passing to A
- Minimum threshold selection: Too low includes noise, too high misses connections
- Sample size issues: Networks from short periods (e.g., 15 min) may be unreliable
- Ignoring game state: Networks differ when winning vs losing
- Substitution effects: Players with few minutes shouldn't be compared directly
- Over-interpreting communities: Algorithms may find spurious groupings
- Formation detection errors: Fluid formations may not cluster cleanly
Style Archetypes from Network Metrics
- High density (> 0.4)
- High clustering (> 0.5)
- Low centralization (< 0.25)
- High reciprocity (> 0.7)
- Many unique combinations
- Example: Guardiola's Barcelona
- Lower density (< 0.35)
- Lower clustering (< 0.45)
- Higher centralization (> 0.3)
- Lower reciprocity
- More vertical passing
- Example: Mourinho's counter-attacking teams
Network analysis provides a mathematically rigorous framework for understanding team structure and player importance. In the next chapter, we'll explore computer vision applications in football analytics.