Capstone - Complete Analytics System
Deep Learning for Football
Deep learning has revolutionized many domains, and football analytics is no exception. Neural networks can learn complex patterns from tracking data, predict match outcomes with high accuracy, and generate player embeddings that capture playing style. This chapter introduces deep learning fundamentals and their applications to football.
Learning Objectives
- Understand neural network fundamentals for sports analytics
- Build feedforward networks for match outcome prediction
- Apply recurrent networks (LSTM/GRU) to sequence data
- Create player embeddings using neural networks
- Use graph neural networks for team analysis
- Implement attention mechanisms for event data
Prerequisites
This chapter assumes familiarity with Python, basic machine learning concepts, and linear algebra. We'll use PyTorch and TensorFlow/Keras for implementations.
Deep Learning Fundamentals
Deep learning uses neural networks with multiple layers to learn hierarchical representations of data. For football, this enables learning complex patterns from raw event data that traditional methods might miss.
Basic neural networks for tabular data and predictions.
Use: Match prediction, player ratings
Process sequential data with memory of past events.
Use: Event sequences, match progression
Learn from graph-structured data (players as nodes).
Use: Pass networks, team interactions
# Python: Deep learning with PyTorch
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
class MatchPredictor(nn.Module):
"""Simple feedforward network for match outcome prediction."""
def __init__(self, input_dim=20, hidden_dims=[64, 32], num_classes=3):
super().__init__()
layers = []
prev_dim = input_dim
for hidden_dim in hidden_dims:
layers.extend([
nn.Linear(prev_dim, hidden_dim),
nn.ReLU(),
nn.Dropout(0.3),
nn.BatchNorm1d(hidden_dim)
])
prev_dim = hidden_dim
layers.append(nn.Linear(prev_dim, num_classes))
self.network = nn.Sequential(*layers)
def forward(self, x):
return self.network(x)
# Create model
model = MatchPredictor(input_dim=20, hidden_dims=[64, 32], num_classes=3)
print(model)
# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
# Training loop example
def train_epoch(model, dataloader, criterion, optimizer):
model.train()
total_loss = 0
for batch_x, batch_y in dataloader:
optimizer.zero_grad()
outputs = model(batch_x)
loss = criterion(outputs, batch_y)
loss.backward()
optimizer.step()
total_loss += loss.item()
return total_loss / len(dataloader)
# Example training
# for epoch in range(50):
# loss = train_epoch(model, train_loader, criterion, optimizer)
# print(f"Epoch {epoch+1}, Loss: {loss:.4f}")# R: Deep learning with keras/tensorflow
library(keras)
library(tensorflow)
# Simple feedforward network example
model <- keras_model_sequential() %>%
layer_dense(units = 64, activation = "relu", input_shape = c(20)) %>%
layer_dropout(rate = 0.3) %>%
layer_dense(units = 32, activation = "relu") %>%
layer_dropout(rate = 0.2) %>%
layer_dense(units = 3, activation = "softmax") # 3 classes: W/D/L
# Compile model
model %>% compile(
optimizer = optimizer_adam(learning_rate = 0.001),
loss = "categorical_crossentropy",
metrics = c("accuracy")
)
# Model summary
summary(model)
# Training would be:
# history <- model %>% fit(
# x_train, y_train,
# epochs = 50,
# batch_size = 32,
# validation_split = 0.2
# )MatchPredictor(
(network): Sequential(
(0): Linear(in_features=20, out_features=64)
(1): ReLU()
(2): Dropout(p=0.3)
(3): BatchNorm1d(64)
(4): Linear(in_features=64, out_features=32)
(5): ReLU()
(6): Dropout(p=0.3)
(7): BatchNorm1d(32)
(8): Linear(in_features=32, out_features=3)
)
)Match Outcome Prediction
Neural networks can combine multiple feature types (team stats, recent form, head-to-head records) to predict match outcomes more accurately than traditional models.
# Python: Match prediction with PyTorch
import torch
import torch.nn as nn
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.model_selection import train_test_split
class MatchPredictorWithEmbeddings(nn.Module):
"""
Neural network for match prediction with team embeddings.
"""
def __init__(self, num_teams, num_numerical_features,
team_embed_dim=16, hidden_dims=[128, 64]):
super().__init__()
# Team embeddings
self.team_embedding = nn.Embedding(num_teams, team_embed_dim)
# Calculate input dimension for dense layers
# numerical + home_embed + away_embed
input_dim = num_numerical_features + 2 * team_embed_dim
# Build dense layers
layers = []
prev_dim = input_dim
for hidden_dim in hidden_dims:
layers.extend([
nn.Linear(prev_dim, hidden_dim),
nn.ReLU(),
nn.BatchNorm1d(hidden_dim),
nn.Dropout(0.3)
])
prev_dim = hidden_dim
# Output layer (3 classes: home win, draw, away win)
layers.append(nn.Linear(prev_dim, 3))
self.classifier = nn.Sequential(*layers)
def forward(self, numerical_features, home_team_id, away_team_id):
# Get team embeddings
home_embed = self.team_embedding(home_team_id)
away_embed = self.team_embedding(away_team_id)
# Concatenate all features
x = torch.cat([numerical_features, home_embed, away_embed], dim=1)
# Pass through classifier
return self.classifier(x)
# Prepare data
def prepare_match_data(matches_df):
"""Prepare features for match prediction."""
# Encode teams
le = LabelEncoder()
all_teams = pd.concat([matches_df["home_team"], matches_df["away_team"]])
le.fit(all_teams)
matches_df["home_team_id"] = le.transform(matches_df["home_team"])
matches_df["away_team_id"] = le.transform(matches_df["away_team"])
# Numerical features
numerical_cols = [
"home_xg_avg", "home_xga_avg", "away_xg_avg", "away_xga_avg",
"home_form", "away_form", "home_goals_avg", "away_goals_avg",
"home_shots_avg", "away_shots_avg"
]
X_numerical = matches_df[numerical_cols].values
X_home_team = matches_df["home_team_id"].values
X_away_team = matches_df["away_team_id"].values
# Target: 0=away win, 1=draw, 2=home win
y = matches_df["result"].map({"H": 2, "D": 1, "A": 0}).values
return X_numerical, X_home_team, X_away_team, y, le
# Training function
def train_match_predictor(model, train_loader, val_loader, epochs=50):
"""Train match prediction model."""
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
optimizer, patience=5, factor=0.5
)
best_val_acc = 0
history = {"train_loss": [], "val_loss": [], "val_acc": []}
for epoch in range(epochs):
# Training
model.train()
train_loss = 0
for numerical, home_id, away_id, target in train_loader:
optimizer.zero_grad()
output = model(numerical, home_id, away_id)
loss = criterion(output, target)
loss.backward()
optimizer.step()
train_loss += loss.item()
# Validation
model.eval()
val_loss = 0
correct = 0
total = 0
with torch.no_grad():
for numerical, home_id, away_id, target in val_loader:
output = model(numerical, home_id, away_id)
val_loss += criterion(output, target).item()
_, predicted = torch.max(output, 1)
total += target.size(0)
correct += (predicted == target).sum().item()
val_acc = correct / total
scheduler.step(val_loss)
history["train_loss"].append(train_loss / len(train_loader))
history["val_loss"].append(val_loss / len(val_loader))
history["val_acc"].append(val_acc)
if val_acc > best_val_acc:
best_val_acc = val_acc
torch.save(model.state_dict(), "best_match_model.pt")
if (epoch + 1) % 10 == 0:
print(f"Epoch {epoch+1}: Val Acc = {val_acc:.3f}")
return history
# Example usage
model = MatchPredictorWithEmbeddings(
num_teams=40,
num_numerical_features=10,
team_embed_dim=16
)
print(f"Model parameters: {sum(p.numel() for p in model.parameters()):,}")# R: Match prediction with keras
library(keras)
library(tidyverse)
# Prepare match features
prepare_match_features <- function(matches) {
matches %>%
mutate(
# Team strength metrics
home_xg_for_avg = home_xg_for / matches_played,
home_xg_against_avg = home_xg_against / matches_played,
away_xg_for_avg = away_xg_for / matches_played,
away_xg_against_avg = away_xg_against / matches_played,
# Form (last 5 matches)
home_form = home_points_l5 / 15, # Normalized 0-1
away_form = away_points_l5 / 15,
# Historical head-to-head
h2h_home_win_rate = h2h_home_wins / (h2h_home_wins + h2h_draws + h2h_away_wins + 0.1),
# Relative strength
xg_diff = home_xg_for_avg - away_xg_for_avg,
form_diff = home_form - away_form
) %>%
select(home_xg_for_avg, home_xg_against_avg, away_xg_for_avg,
away_xg_against_avg, home_form, away_form,
h2h_home_win_rate, xg_diff, form_diff)
}
# Build model with embedding for categorical features
build_match_model <- function(num_numerical = 9, num_teams = 40,
team_embed_dim = 8) {
# Numerical input
numerical_input <- layer_input(shape = num_numerical, name = "numerical")
# Team embedding inputs
home_team_input <- layer_input(shape = 1, name = "home_team")
away_team_input <- layer_input(shape = 1, name = "away_team")
# Embedding layer (shared)
team_embedding <- layer_embedding(
input_dim = num_teams,
output_dim = team_embed_dim,
name = "team_embedding"
)
home_embed <- home_team_input %>%
team_embedding() %>%
layer_flatten()
away_embed <- away_team_input %>%
team_embedding() %>%
layer_flatten()
# Concatenate all features
combined <- layer_concatenate(list(numerical_input, home_embed, away_embed))
# Dense layers
output <- combined %>%
layer_dense(64, activation = "relu") %>%
layer_dropout(0.3) %>%
layer_dense(32, activation = "relu") %>%
layer_dense(3, activation = "softmax") # W/D/L
keras_model(
inputs = list(numerical_input, home_team_input, away_team_input),
outputs = output
)
}
model <- build_match_model()
summary(model)Model parameters: 14,851Sequence Models for Event Data
Football matches are sequences of events. Recurrent Neural Networks (RNNs), particularly LSTMs and GRUs, can model these sequences to predict future events or classify possession outcomes.
# Python: LSTM for event sequence modeling
import torch
import torch.nn as nn
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence
class EventSequenceModel(nn.Module):
"""
LSTM model for predicting possession outcomes from event sequences.
"""
def __init__(self, num_event_types=20, embed_dim=32, hidden_dim=64,
num_layers=2, dropout=0.3, num_spatial_features=4):
super().__init__()
# Event type embedding
self.event_embedding = nn.Embedding(num_event_types, embed_dim)
# Spatial features (x, y coordinates, distance, angle)
self.spatial_dim = num_spatial_features
# LSTM
lstm_input_dim = embed_dim + num_spatial_features
self.lstm = nn.LSTM(
input_size=lstm_input_dim,
hidden_size=hidden_dim,
num_layers=num_layers,
batch_first=True,
dropout=dropout if num_layers > 1 else 0,
bidirectional=True
)
# Output layers
self.classifier = nn.Sequential(
nn.Linear(hidden_dim * 2, 64), # *2 for bidirectional
nn.ReLU(),
nn.Dropout(dropout),
nn.Linear(64, 1),
nn.Sigmoid()
)
def forward(self, event_types, spatial_features, lengths):
"""
Args:
event_types: (batch, max_seq_len) event type indices
spatial_features: (batch, max_seq_len, 4) x,y,dist,angle
lengths: (batch,) actual sequence lengths
"""
# Get embeddings
event_embeds = self.event_embedding(event_types)
# Concatenate with spatial features
x = torch.cat([event_embeds, spatial_features], dim=-1)
# Pack sequences
packed = pack_padded_sequence(x, lengths.cpu(),
batch_first=True,
enforce_sorted=False)
# LSTM forward
packed_output, (hidden, cell) = self.lstm(packed)
# Use final hidden state (concatenate forward and backward)
hidden_concat = torch.cat([hidden[-2], hidden[-1]], dim=1)
# Classify
return self.classifier(hidden_concat)
# Event vocabulary
EVENT_VOCAB = {
"Pass": 0, "Carry": 1, "Dribble": 2, "Shot": 3, "Cross": 4,
"Clearance": 5, "Tackle": 6, "Interception": 7, "Foul": 8,
"Ball Receipt": 9, "Pressure": 10, "Block": 11
}
def prepare_possession_sequences(events_df, max_length=30):
"""Convert event data to sequences for training."""
sequences = []
for poss_id, poss_events in events_df.groupby("possession_id"):
# Get event types
event_types = [EVENT_VOCAB.get(e, 0) for e in poss_events["type"]]
# Get spatial features (normalized to 0-1)
spatial = poss_events[["x", "y"]].values / 100
# Add distance and angle to goal
goal_x, goal_y = 100, 50
distances = np.sqrt((goal_x - poss_events["x"])**2 +
(goal_y - poss_events["y"])**2) / 100
angles = np.arctan2(goal_y - poss_events["y"],
goal_x - poss_events["x"]) / np.pi
spatial = np.column_stack([spatial, distances, angles])
# Truncate or pad
seq_len = min(len(event_types), max_length)
event_types = event_types[:max_length]
spatial = spatial[:max_length]
# Padding
if len(event_types) < max_length:
pad_len = max_length - len(event_types)
event_types.extend([0] * pad_len)
spatial = np.vstack([spatial, np.zeros((pad_len, 4))])
# Target: did possession end in shot?
target = int(poss_events["type"].iloc[-1] == "Shot")
sequences.append({
"event_types": event_types,
"spatial": spatial,
"length": seq_len,
"target": target
})
return sequences
# Create model
model = EventSequenceModel(
num_event_types=len(EVENT_VOCAB),
embed_dim=32,
hidden_dim=64
)
print(f"Sequence model parameters: {sum(p.numel() for p in model.parameters()):,}")# R: LSTM for sequence prediction
library(keras)
# Build LSTM model for possession outcome prediction
build_lstm_model <- function(vocab_size = 50, embed_dim = 32,
lstm_units = 64, max_seq_length = 30) {
model <- keras_model_sequential() %>%
# Embedding for event types
layer_embedding(input_dim = vocab_size, output_dim = embed_dim,
input_length = max_seq_length) %>%
# LSTM layers
layer_lstm(units = lstm_units, return_sequences = TRUE) %>%
layer_dropout(0.3) %>%
layer_lstm(units = lstm_units %/% 2) %>%
layer_dropout(0.2) %>%
# Output: probability of shot/goal at end of possession
layer_dense(32, activation = "relu") %>%
layer_dense(1, activation = "sigmoid")
model %>% compile(
optimizer = "adam",
loss = "binary_crossentropy",
metrics = c("accuracy", "AUC")
)
model
}
# Prepare sequence data
prepare_event_sequences <- function(events, max_length = 30) {
# Event type encoding
event_vocab <- c("Pass", "Carry", "Dribble", "Shot", "Cross",
"Clearance", "Tackle", "Interception", "Foul")
# Convert possessions to sequences
sequences <- events %>%
group_by(possession_id) %>%
arrange(event_id) %>%
summarise(
event_seq = list(match(type, event_vocab)),
ends_in_shot = any(type == "Shot"),
.groups = "drop"
)
# Pad sequences
# pad_sequences() would be used here
sequences
}Sequence model parameters: 53,569Player Embeddings
Player embeddings are dense vector representations that capture playing style. Similar to word embeddings in NLP, they enable similarity search, clustering, and transfer learning.
# Python: Player embeddings with neural networks
import torch
import torch.nn as nn
import torch.nn.functional as F
from sklearn.preprocessing import StandardScaler
import numpy as np
class PlayerEmbeddingModel(nn.Module):
"""
Learn player embeddings from their statistics.
Uses an autoencoder architecture to compress stats into embeddings.
"""
def __init__(self, input_dim, embed_dim=32, hidden_dims=[128, 64]):
super().__init__()
# Encoder
encoder_layers = []
prev_dim = input_dim
for hidden_dim in hidden_dims:
encoder_layers.extend([
nn.Linear(prev_dim, hidden_dim),
nn.ReLU(),
nn.BatchNorm1d(hidden_dim),
nn.Dropout(0.2)
])
prev_dim = hidden_dim
encoder_layers.append(nn.Linear(prev_dim, embed_dim))
self.encoder = nn.Sequential(*encoder_layers)
# Decoder (mirror of encoder)
decoder_layers = []
prev_dim = embed_dim
for hidden_dim in reversed(hidden_dims):
decoder_layers.extend([
nn.Linear(prev_dim, hidden_dim),
nn.ReLU(),
nn.BatchNorm1d(hidden_dim)
])
prev_dim = hidden_dim
decoder_layers.append(nn.Linear(prev_dim, input_dim))
self.decoder = nn.Sequential(*decoder_layers)
def forward(self, x):
embedding = self.encoder(x)
reconstruction = self.decoder(embedding)
return reconstruction, embedding
def get_embedding(self, x):
"""Get only the embedding without reconstruction."""
return self.encoder(x)
class ContrastivePlayerEmbedding(nn.Module):
"""
Learn player embeddings using contrastive learning.
Similar players (same position/role) should have similar embeddings.
"""
def __init__(self, input_dim, embed_dim=32):
super().__init__()
self.encoder = nn.Sequential(
nn.Linear(input_dim, 128),
nn.ReLU(),
nn.BatchNorm1d(128),
nn.Dropout(0.3),
nn.Linear(128, 64),
nn.ReLU(),
nn.Linear(64, embed_dim)
)
# Projection head for contrastive learning
self.projection = nn.Sequential(
nn.Linear(embed_dim, embed_dim),
nn.ReLU(),
nn.Linear(embed_dim, embed_dim)
)
def forward(self, x):
embedding = self.encoder(x)
projection = self.projection(embedding)
return F.normalize(projection, dim=1), embedding
def train_contrastive(model, dataloader, epochs=100, temperature=0.1):
"""Train with NT-Xent contrastive loss."""
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
for epoch in range(epochs):
total_loss = 0
for batch in dataloader:
# batch contains (anchor, positive) pairs
anchor, positive = batch
optimizer.zero_grad()
# Get projections
z_anchor, _ = model(anchor)
z_positive, _ = model(positive)
# NT-Xent loss
similarity = torch.mm(z_anchor, z_positive.t()) / temperature
labels = torch.arange(z_anchor.size(0))
loss = F.cross_entropy(similarity, labels)
loss.backward()
optimizer.step()
total_loss += loss.item()
if (epoch + 1) % 20 == 0:
print(f"Epoch {epoch+1}, Loss: {total_loss/len(dataloader):.4f}")
# Finding similar players
def find_similar_players(target_embedding, all_embeddings, player_names, k=5):
"""Find k most similar players based on embedding similarity."""
# Cosine similarity
similarities = F.cosine_similarity(
target_embedding.unsqueeze(0),
all_embeddings
)
# Get top k
top_k = torch.topk(similarities, k=k)
results = []
for idx, sim in zip(top_k.indices, top_k.values):
results.append({
"player": player_names[idx],
"similarity": sim.item()
})
return results
# Example usage
model = PlayerEmbeddingModel(input_dim=50, embed_dim=32)
print(f"Embedding model parameters: {sum(p.numel() for p in model.parameters()):,}")# R: Player embeddings concept
library(tidyverse)
# Player embedding idea:
# Learn a low-dimensional representation from high-dimensional stats
# Method 1: Autoencoder for dimensionality reduction
build_player_autoencoder <- function(input_dim, embed_dim = 16) {
# Encoder
encoder_input <- layer_input(shape = input_dim)
encoded <- encoder_input %>%
layer_dense(64, activation = "relu") %>%
layer_dense(32, activation = "relu") %>%
layer_dense(embed_dim, activation = "linear", name = "embedding")
# Decoder
decoded <- encoded %>%
layer_dense(32, activation = "relu") %>%
layer_dense(64, activation = "relu") %>%
layer_dense(input_dim, activation = "linear")
# Full autoencoder
autoencoder <- keras_model(encoder_input, decoded)
# Encoder only (for extracting embeddings)
encoder <- keras_model(encoder_input, encoded)
list(autoencoder = autoencoder, encoder = encoder)
}
# Train autoencoder
# models <- build_player_autoencoder(input_dim = 50, embed_dim = 16)
# models$autoencoder %>% compile(optimizer = "adam", loss = "mse")
# models$autoencoder %>% fit(player_stats, player_stats, epochs = 100)
# Extract embeddings
# player_embeddings <- models$encoder %>% predict(player_stats)Embedding model parameters: 18,434Graph Neural Networks
Graph Neural Networks (GNNs) are naturally suited for football analysis where players form networks through passes and spatial relationships. GNNs can learn team-level representations that capture interaction patterns.
# Python: Graph Neural Networks with PyTorch Geometric
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch_geometric.nn import GCNConv, GATConv, global_mean_pool
from torch_geometric.data import Data, Batch
class FootballGNN(nn.Module):
"""
Graph Neural Network for team-level analysis.
Nodes are players, edges are passes.
"""
def __init__(self, node_features, hidden_dim=64, output_dim=32,
num_layers=3, dropout=0.3):
super().__init__()
self.convs = nn.ModuleList()
self.bns = nn.ModuleList()
# First layer
self.convs.append(GCNConv(node_features, hidden_dim))
self.bns.append(nn.BatchNorm1d(hidden_dim))
# Hidden layers
for _ in range(num_layers - 2):
self.convs.append(GCNConv(hidden_dim, hidden_dim))
self.bns.append(nn.BatchNorm1d(hidden_dim))
# Final conv layer
self.convs.append(GCNConv(hidden_dim, output_dim))
self.dropout = dropout
def forward(self, x, edge_index, batch=None):
"""
Args:
x: Node features (num_nodes, node_features)
edge_index: Edge connections (2, num_edges)
batch: Batch assignment for multiple graphs
"""
for conv, bn in zip(self.convs[:-1], self.bns):
x = conv(x, edge_index)
x = bn(x)
x = F.relu(x)
x = F.dropout(x, p=self.dropout, training=self.training)
x = self.convs[-1](x, edge_index)
# Global pooling if processing multiple graphs
if batch is not None:
x = global_mean_pool(x, batch)
return x
class TeamMatchPredictor(nn.Module):
"""
Predict match outcome from two team graphs.
"""
def __init__(self, node_features, hidden_dim=64):
super().__init__()
# GNN for encoding teams
self.team_encoder = FootballGNN(
node_features=node_features,
hidden_dim=hidden_dim,
output_dim=32
)
# Classifier
self.classifier = nn.Sequential(
nn.Linear(64, 32), # 32*2 for both teams
nn.ReLU(),
nn.Dropout(0.3),
nn.Linear(32, 3) # H/D/A
)
def forward(self, home_graph, away_graph):
# Encode both teams
home_embed = self.team_encoder(
home_graph.x, home_graph.edge_index, home_graph.batch
)
away_embed = self.team_encoder(
away_graph.x, away_graph.edge_index, away_graph.batch
)
# Concatenate and classify
combined = torch.cat([home_embed, away_embed], dim=1)
return self.classifier(combined)
def create_team_graph(events_df, players_df):
"""Create PyTorch Geometric graph from match data."""
# Node features (per player)
player_ids = players_df["player_id"].unique()
id_to_idx = {pid: i for i, pid in enumerate(player_ids)}
# Build node feature matrix
node_features = []
for pid in player_ids:
player_data = players_df[players_df["player_id"] == pid].iloc[0]
features = [
player_data["x_avg"] / 100,
player_data["y_avg"] / 100,
player_data["passes"] / 50,
player_data["touches"] / 100,
player_data["duels_won_pct"]
]
node_features.append(features)
x = torch.tensor(node_features, dtype=torch.float)
# Build edge index from passes
passes = events_df[events_df["type"] == "Pass"]
pass_counts = passes.groupby(["player_id", "recipient_id"]).size()
edges = []
edge_weights = []
for (passer, receiver), count in pass_counts.items():
if passer in id_to_idx and receiver in id_to_idx:
edges.append([id_to_idx[passer], id_to_idx[receiver]])
edge_weights.append(count)
edge_index = torch.tensor(edges, dtype=torch.long).t()
edge_attr = torch.tensor(edge_weights, dtype=torch.float).unsqueeze(1)
return Data(x=x, edge_index=edge_index, edge_attr=edge_attr)
# Create model
model = FootballGNN(node_features=5, hidden_dim=64)
print(f"GNN parameters: {sum(p.numel() for p in model.parameters()):,}")# R: Graph neural networks concept
# GNNs in R typically use Python via reticulate
library(reticulate)
# Conceptual structure for football GNN:
# - Nodes: Players
# - Edges: Passes / spatial proximity
# - Node features: Player stats, position
# - Edge features: Pass count, distance
# Graph structure representation
create_match_graph <- function(events, players) {
# Build adjacency from passes
pass_edges <- events %>%
filter(type == "Pass") %>%
group_by(from = player_id, to = recipient_id) %>%
summarise(weight = n(), .groups = "drop")
# Node features (player stats for this match)
node_features <- players %>%
select(player_id, x_avg, y_avg, passes, touches, duels_won)
list(
edges = pass_edges,
nodes = node_features
)
}GNN parameters: 12,768Attention Mechanisms
Attention mechanisms allow models to focus on relevant parts of the input. For football, this helps identify key events in a sequence or important player interactions.
# Python: Transformer attention for event sequences
import torch
import torch.nn as nn
import math
class MultiHeadAttention(nn.Module):
"""Multi-head self-attention layer."""
def __init__(self, d_model, num_heads, dropout=0.1):
super().__init__()
assert d_model % num_heads == 0
self.d_model = d_model
self.num_heads = num_heads
self.d_k = d_model // num_heads
self.W_q = nn.Linear(d_model, d_model)
self.W_k = nn.Linear(d_model, d_model)
self.W_v = nn.Linear(d_model, d_model)
self.W_o = nn.Linear(d_model, d_model)
self.dropout = nn.Dropout(dropout)
def forward(self, x, mask=None):
batch_size, seq_len, _ = x.size()
# Linear projections
Q = self.W_q(x).view(batch_size, seq_len, self.num_heads, self.d_k).transpose(1, 2)
K = self.W_k(x).view(batch_size, seq_len, self.num_heads, self.d_k).transpose(1, 2)
V = self.W_v(x).view(batch_size, seq_len, self.num_heads, self.d_k).transpose(1, 2)
# Attention scores
scores = torch.matmul(Q, K.transpose(-2, -1)) / math.sqrt(self.d_k)
if mask is not None:
scores = scores.masked_fill(mask == 0, -1e9)
attention_weights = torch.softmax(scores, dim=-1)
attention_weights = self.dropout(attention_weights)
# Apply attention to values
context = torch.matmul(attention_weights, V)
# Concatenate heads
context = context.transpose(1, 2).contiguous().view(
batch_size, seq_len, self.d_model
)
return self.W_o(context), attention_weights
class EventTransformer(nn.Module):
"""
Transformer model for football event sequences.
Uses attention to identify important events.
"""
def __init__(self, num_event_types, d_model=64, num_heads=4,
num_layers=3, dropout=0.1):
super().__init__()
self.d_model = d_model
# Event embedding
self.event_embedding = nn.Embedding(num_event_types, d_model)
# Positional encoding
self.pos_embedding = nn.Embedding(500, d_model) # Max sequence length
# Transformer layers
self.layers = nn.ModuleList([
nn.TransformerEncoderLayer(
d_model=d_model,
nhead=num_heads,
dim_feedforward=d_model * 4,
dropout=dropout,
batch_first=True
)
for _ in range(num_layers)
])
# Custom attention for interpretability
self.attention = MultiHeadAttention(d_model, num_heads, dropout)
# Output
self.classifier = nn.Sequential(
nn.Linear(d_model, 32),
nn.ReLU(),
nn.Dropout(dropout),
nn.Linear(32, 1),
nn.Sigmoid()
)
def forward(self, event_types, mask=None):
batch_size, seq_len = event_types.size()
# Get embeddings
positions = torch.arange(seq_len, device=event_types.device)
x = self.event_embedding(event_types) + self.pos_embedding(positions)
# Apply transformer layers
for layer in self.layers:
x = layer(x, src_key_padding_mask=mask)
# Get attention weights for interpretability
_, attention_weights = self.attention(x, mask)
# Pool and classify (use CLS-like approach with first token or mean)
pooled = x.mean(dim=1)
output = self.classifier(pooled)
return output, attention_weights
# Example: Interpret which events matter
def interpret_attention(model, event_sequence, event_names):
"""Visualize attention weights to understand model focus."""
model.eval()
with torch.no_grad():
output, attention = model(event_sequence.unsqueeze(0))
# Average attention across heads
avg_attention = attention.mean(dim=1).squeeze()
# Get attention to each event
event_importance = avg_attention.mean(dim=0)
print("Event Importance (Attention Weights):")
for i, (name, weight) in enumerate(zip(event_names, event_importance)):
bar = "█" * int(weight * 50)
print(f" {name:15s} {weight:.3f} {bar}")
# Create model
transformer = EventTransformer(num_event_types=20)
print(f"Transformer parameters: {sum(p.numel() for p in transformer.parameters()):,}")# R: Attention concept
# Attention weights show which events matter most
# Self-attention formula:
# Attention(Q, K, V) = softmax(QK^T / sqrt(d_k)) * V
# For football sequences:
# - Q, K, V derived from event embeddings
# - Attention weights reveal important events
# - High weights on shots, key passes, etc.
attention_weights_example <- function() {
# Example attention weights for a possession
events <- c("Pass", "Pass", "Dribble", "Pass", "Shot")
weights <- c(0.05, 0.08, 0.15, 0.22, 0.50)
tibble(event = events, weight = weights) %>%
ggplot(aes(x = seq_along(event), y = weight, fill = event)) +
geom_col() +
labs(title = "Attention Weights in Possession",
x = "Event Order", y = "Attention Weight")
}Transformer parameters: 67,937
Event Importance (Attention Weights):
Pass 0.082 ████
Pass 0.095 ████
Carry 0.124 ██████
Dribble 0.189 █████████
Shot 0.510 █████████████████████████Deep Learning for Expected Goals
Traditional xG models use logistic regression or gradient boosting. Deep learning can capture more complex spatial patterns and context from the sequence of events leading to a shot.
# Python: Deep xG model with context
import torch
import torch.nn as nn
import numpy as np
import pandas as pd
from sklearn.metrics import roc_auc_score, brier_score_loss
class DeepXGModel(nn.Module):
"""
Neural network xG model that considers:
1. Shot location and characteristics
2. Sequence of preceding events
3. Game state context
"""
def __init__(self, num_shot_features=12, num_event_types=20,
event_embed_dim=16, lstm_hidden=32):
super().__init__()
# Event embedding for sequence
self.event_embedding = nn.Embedding(num_event_types, event_embed_dim)
# LSTM for event sequence
self.lstm = nn.LSTM(
input_size=event_embed_dim + 4, # embed + x,y,dx,dy
hidden_size=lstm_hidden,
num_layers=2,
batch_first=True,
dropout=0.3
)
# Shot feature processing
self.shot_encoder = nn.Sequential(
nn.Linear(num_shot_features, 64),
nn.ReLU(),
nn.BatchNorm1d(64),
nn.Dropout(0.3),
nn.Linear(64, 32),
nn.ReLU()
)
# Combined classifier
combined_dim = 32 + lstm_hidden # shot + sequence
self.classifier = nn.Sequential(
nn.Linear(combined_dim, 64),
nn.ReLU(),
nn.Dropout(0.3),
nn.Linear(64, 32),
nn.ReLU(),
nn.Linear(32, 1),
nn.Sigmoid()
)
def forward(self, shot_features, event_types, event_locations, lengths):
"""
Args:
shot_features: (batch, num_shot_features) shot characteristics
event_types: (batch, max_seq_len) event type indices
event_locations: (batch, max_seq_len, 4) x,y,dx,dy
lengths: (batch,) actual sequence lengths
"""
# Encode shot
shot_encoded = self.shot_encoder(shot_features)
# Encode event sequence
event_embeds = self.event_embedding(event_types)
sequence_input = torch.cat([event_embeds, event_locations], dim=-1)
# Pack and process
packed = nn.utils.rnn.pack_padded_sequence(
sequence_input, lengths.cpu(),
batch_first=True, enforce_sorted=False
)
_, (hidden, _) = self.lstm(packed)
# Use final hidden state
sequence_encoded = hidden[-1]
# Combine and classify
combined = torch.cat([shot_encoded, sequence_encoded], dim=1)
return self.classifier(combined)
def prepare_shot_features(shots_df):
"""Extract features for each shot."""
# Goal location (center of goal)
goal_x, goal_y = 100, 50
features = pd.DataFrame({
# Location (normalized)
"x": shots_df["x"] / 100,
"y": shots_df["y"] / 100,
# Distance and angle
"distance": np.sqrt(
(goal_x - shots_df["x"])**2 + (goal_y - shots_df["y"])**2
) / 100,
"angle": np.abs(np.arctan2(
goal_y - shots_df["y"],
goal_x - shots_df["x"]
)),
# Shot type (one-hot)
"header": (shots_df["body_part"] == "Head").astype(int),
"right_foot": (shots_df["body_part"] == "Right Foot").astype(int),
"left_foot": (shots_df["body_part"] == "Left Foot").astype(int),
# Context
"under_pressure": shots_df["under_pressure"].fillna(0).astype(int),
"first_time": shots_df["first_time"].fillna(0).astype(int),
"counter": (shots_df["play_pattern"] == "From Counter").astype(int),
"set_piece": shots_df["play_pattern"].str.contains("Set").fillna(False).astype(int),
# Preceding event distance
"prev_event_dist": shots_df.get("prev_distance", 0) / 100
})
return features.values
def evaluate_xg_model(model, test_loader, device="cpu"):
"""Evaluate xG model performance."""
model.eval()
predictions = []
actuals = []
with torch.no_grad():
for batch in test_loader:
shot_feat, event_types, event_locs, lengths, target = batch
shot_feat = shot_feat.to(device)
event_types = event_types.to(device)
event_locs = event_locs.to(device)
pred = model(shot_feat, event_types, event_locs, lengths)
predictions.extend(pred.cpu().numpy().flatten())
actuals.extend(target.numpy().flatten())
predictions = np.array(predictions)
actuals = np.array(actuals)
return {
"auc": roc_auc_score(actuals, predictions),
"brier": brier_score_loss(actuals, predictions),
"log_loss": -np.mean(
actuals * np.log(predictions + 1e-7) +
(1 - actuals) * np.log(1 - predictions + 1e-7)
)
}
# Example
model = DeepXGModel()
print(f"Deep xG model parameters: {sum(p.numel() for p in model.parameters()):,}")# R: Deep xG model with keras
library(keras)
library(tidyverse)
# Build deep xG model
build_deep_xg_model <- function() {
# Shot features input
shot_input <- layer_input(shape = 10, name = "shot_features")
# Sequence of preceding events (LSTM)
sequence_input <- layer_input(shape = c(10, 8), name = "event_sequence")
# Process sequence
sequence_processed <- sequence_input %>%
layer_lstm(32, return_sequences = FALSE) %>%
layer_dropout(0.3)
# Process shot features
shot_processed <- shot_input %>%
layer_dense(32, activation = "relu") %>%
layer_dropout(0.2)
# Combine
combined <- layer_concatenate(list(shot_processed, sequence_processed))
# Output probability
output <- combined %>%
layer_dense(32, activation = "relu") %>%
layer_dropout(0.2) %>%
layer_dense(1, activation = "sigmoid")
model <- keras_model(
inputs = list(shot_input, sequence_input),
outputs = output
)
model %>% compile(
optimizer = "adam",
loss = "binary_crossentropy",
metrics = c("AUC")
)
model
}
# Prepare xG features
prepare_xg_features <- function(shots_df) {
shots_df %>%
mutate(
# Distance and angle
distance_to_goal = sqrt((100 - x)^2 + (50 - y)^2),
angle_to_goal = atan2(50 - y, 100 - x) * 180 / pi,
# Shot characteristics (normalized)
x_norm = x / 100,
y_norm = y / 100,
distance_norm = distance_to_goal / 100,
# Categorical encodings
body_part_head = as.integer(body_part == "Head"),
shot_type_penalty = as.integer(shot_type == "Penalty"),
# Context
under_pressure = as.integer(under_pressure),
counter_attack = as.integer(play_pattern == "Counter")
)
}Deep xG model parameters: 26,177Transfer Learning for Football
Transfer learning uses pre-trained models as starting points. For football, this includes pre-trained vision models for video analysis and language models for text data. We can also transfer embeddings between leagues or seasons.
# Python: Transfer learning strategies
import torch
import torch.nn as nn
from torchvision import models
from torch.utils.data import DataLoader
class TransferLearningPlayer(nn.Module):
"""
Transfer learning for player classification from jersey images.
Uses pre-trained ResNet as feature extractor.
"""
def __init__(self, num_players, freeze_backbone=True):
super().__init__()
# Load pre-trained ResNet
self.backbone = models.resnet50(pretrained=True)
# Freeze backbone if specified
if freeze_backbone:
for param in self.backbone.parameters():
param.requires_grad = False
# Replace final layer
num_features = self.backbone.fc.in_features
self.backbone.fc = nn.Identity()
# Custom head for player classification
self.classifier = nn.Sequential(
nn.Linear(num_features, 512),
nn.ReLU(),
nn.Dropout(0.5),
nn.Linear(512, num_players)
)
def forward(self, x):
features = self.backbone(x)
return self.classifier(features)
def unfreeze_backbone(self, num_layers=0):
"""Unfreeze last n layers for fine-tuning."""
layers = list(self.backbone.children())
if num_layers == 0:
# Unfreeze all
for param in self.backbone.parameters():
param.requires_grad = True
else:
# Unfreeze last n layers
for layer in layers[-num_layers:]:
for param in layer.parameters():
param.requires_grad = True
class EmbeddingTransfer:
"""
Transfer player embeddings between leagues or seasons.
Uses common players to learn a mapping.
"""
def __init__(self, source_embeddings, target_embeddings):
self.source = source_embeddings # dict: player_id -> embedding
self.target = target_embeddings
def find_common_players(self):
"""Find players present in both source and target."""
source_ids = set(self.source.keys())
target_ids = set(self.target.keys())
return source_ids.intersection(target_ids)
def learn_mapping(self, common_players):
"""Learn linear mapping from source to target space."""
# Gather paired embeddings
X_source = torch.stack([self.source[p] for p in common_players])
X_target = torch.stack([self.target[p] for p in common_players])
# Learn linear transformation W: source -> target
# Using least squares: W = (X_s^T X_s)^-1 X_s^T X_t
self.W = torch.linalg.lstsq(X_source, X_target).solution
return self.W
def transfer_embedding(self, source_embedding):
"""Map source embedding to target space."""
return source_embedding @ self.W
class DomainAdaptation(nn.Module):
"""
Domain adaptation for transferring models between leagues.
Uses gradient reversal for domain-invariant features.
"""
def __init__(self, input_dim, hidden_dim=64, output_dim=32):
super().__init__()
# Shared feature extractor
self.feature_extractor = nn.Sequential(
nn.Linear(input_dim, hidden_dim),
nn.ReLU(),
nn.Dropout(0.3),
nn.Linear(hidden_dim, output_dim),
nn.ReLU()
)
# Task classifier (e.g., player position)
self.task_classifier = nn.Sequential(
nn.Linear(output_dim, 32),
nn.ReLU(),
nn.Linear(32, 4) # 4 positions
)
# Domain classifier (which league)
self.domain_classifier = nn.Sequential(
nn.Linear(output_dim, 32),
nn.ReLU(),
nn.Linear(32, 2) # 2 leagues
)
def forward(self, x, alpha=1.0):
"""
Args:
x: Input features
alpha: Gradient reversal scale (0 = no reversal, 1 = full)
"""
features = self.feature_extractor(x)
# Task prediction (normal gradient)
task_output = self.task_classifier(features)
# Domain prediction (reversed gradient for adversarial training)
reversed_features = GradientReversal.apply(features, alpha)
domain_output = self.domain_classifier(reversed_features)
return task_output, domain_output
class GradientReversal(torch.autograd.Function):
"""Gradient reversal layer for domain adaptation."""
@staticmethod
def forward(ctx, x, alpha):
ctx.alpha = alpha
return x.view_as(x)
@staticmethod
def backward(ctx, grad_output):
return grad_output.neg() * ctx.alpha, None
# Example fine-tuning schedule
def fine_tune_schedule(model, train_loader, epochs_frozen=10, epochs_unfrozen=20):
"""Two-stage fine-tuning: frozen backbone then unfrozen."""
optimizer = torch.optim.Adam(
filter(lambda p: p.requires_grad, model.parameters()),
lr=0.001
)
# Stage 1: Train with frozen backbone
print("Stage 1: Training classifier with frozen backbone...")
for epoch in range(epochs_frozen):
train_epoch(model, train_loader, optimizer)
# Stage 2: Unfreeze and fine-tune
print("\nStage 2: Fine-tuning full model...")
model.unfreeze_backbone(num_layers=2)
optimizer = torch.optim.Adam(model.parameters(), lr=0.0001)
for epoch in range(epochs_unfrozen):
train_epoch(model, train_loader, optimizer)
print("Transfer learning example initialized")# R: Transfer learning concepts
library(keras)
# Transfer learning for player identification from images
build_player_id_model <- function(num_players, freeze_base = TRUE) {
# Load pre-trained ResNet50
base_model <- application_resnet50(
weights = "imagenet",
include_top = FALSE,
input_shape = c(224, 224, 3)
)
# Freeze base layers
if (freeze_base) {
freeze_weights(base_model)
}
# Add custom classification head
model <- keras_model_sequential() %>%
base_model %>%
layer_global_average_pooling_2d() %>%
layer_dense(512, activation = "relu") %>%
layer_dropout(0.5) %>%
layer_dense(num_players, activation = "softmax")
model %>% compile(
optimizer = optimizer_adam(learning_rate = 0.001),
loss = "categorical_crossentropy",
metrics = "accuracy"
)
model
}
# Transfer embeddings between leagues
transfer_embeddings <- function(source_embeddings, target_data,
common_players) {
# Find common players between leagues
common_indices <- which(rownames(source_embeddings) %in% common_players)
# Use common players to learn mapping
# Would train a linear transformation here
cat("Found", length(common_indices), "common players for transfer\n")
}Transfer learning example initializedTraining Best Practices
Deep learning requires careful attention to training dynamics. Here are key practices for football analytics applications.
- Temporal train/val/test splits (no leakage)
- Class balancing for rare events (goals)
- Feature normalization (StandardScaler)
- Data augmentation where applicable
- Cross-validation for small datasets
- Learning rate scheduling (warmup + decay)
- Early stopping on validation loss
- Gradient clipping for RNNs
- Dropout and regularization
- Mixed precision for speed
# Python: Training best practices
import torch
import torch.nn as nn
from torch.utils.data import DataLoader, WeightedRandomSampler
from sklearn.preprocessing import StandardScaler
import numpy as np
from typing import Dict, List
class TrainingConfig:
"""Configuration for training deep learning models."""
def __init__(self):
self.learning_rate = 0.001
self.batch_size = 32
self.epochs = 100
self.early_stopping_patience = 10
self.lr_scheduler_patience = 5
self.weight_decay = 1e-5
self.gradient_clip = 1.0
self.dropout = 0.3
class EarlyStopping:
"""Stop training when validation loss stops improving."""
def __init__(self, patience=10, min_delta=1e-4):
self.patience = patience
self.min_delta = min_delta
self.counter = 0
self.best_loss = float("inf")
self.should_stop = False
def __call__(self, val_loss):
if val_loss < self.best_loss - self.min_delta:
self.best_loss = val_loss
self.counter = 0
else:
self.counter += 1
if self.counter >= self.patience:
self.should_stop = True
return self.should_stop
class FootballTrainer:
"""Complete training pipeline for football DL models."""
def __init__(self, model, config: TrainingConfig, device="cuda"):
self.model = model.to(device)
self.config = config
self.device = device
self.optimizer = torch.optim.AdamW(
model.parameters(),
lr=config.learning_rate,
weight_decay=config.weight_decay
)
self.scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
self.optimizer,
patience=config.lr_scheduler_patience,
factor=0.5
)
self.early_stopping = EarlyStopping(config.early_stopping_patience)
self.history = {"train_loss": [], "val_loss": [], "val_metric": []}
def get_class_weights(self, labels) -> torch.Tensor:
"""Calculate weights for imbalanced classes."""
class_counts = np.bincount(labels)
total = len(labels)
weights = total / (len(class_counts) * class_counts)
return torch.tensor(weights, dtype=torch.float32, device=self.device)
def get_weighted_sampler(self, labels):
"""Create weighted sampler for imbalanced datasets."""
class_weights = self.get_class_weights(labels)
sample_weights = class_weights[labels]
return WeightedRandomSampler(
sample_weights, len(sample_weights), replacement=True
)
def train_epoch(self, train_loader, criterion):
"""Run one training epoch."""
self.model.train()
total_loss = 0
num_batches = 0
for batch in train_loader:
self.optimizer.zero_grad()
# Forward pass (adapt based on your model)
inputs, targets = batch[0].to(self.device), batch[1].to(self.device)
outputs = self.model(inputs)
loss = criterion(outputs, targets)
# Backward pass
loss.backward()
# Gradient clipping
torch.nn.utils.clip_grad_norm_(
self.model.parameters(), self.config.gradient_clip
)
self.optimizer.step()
total_loss += loss.item()
num_batches += 1
return total_loss / num_batches
def validate(self, val_loader, criterion, metric_fn=None):
"""Evaluate on validation set."""
self.model.eval()
total_loss = 0
all_preds = []
all_targets = []
with torch.no_grad():
for batch in val_loader:
inputs, targets = batch[0].to(self.device), batch[1].to(self.device)
outputs = self.model(inputs)
loss = criterion(outputs, targets)
total_loss += loss.item()
all_preds.append(outputs.cpu())
all_targets.append(targets.cpu())
avg_loss = total_loss / len(val_loader)
metric = None
if metric_fn:
all_preds = torch.cat(all_preds)
all_targets = torch.cat(all_targets)
metric = metric_fn(all_preds, all_targets)
return avg_loss, metric
def fit(self, train_loader, val_loader, criterion, metric_fn=None):
"""Full training loop with best practices."""
best_model_state = None
best_val_loss = float("inf")
for epoch in range(self.config.epochs):
# Training
train_loss = self.train_epoch(train_loader, criterion)
# Validation
val_loss, val_metric = self.validate(val_loader, criterion, metric_fn)
# Learning rate scheduling
self.scheduler.step(val_loss)
# Track history
self.history["train_loss"].append(train_loss)
self.history["val_loss"].append(val_loss)
if val_metric is not None:
self.history["val_metric"].append(val_metric)
# Save best model
if val_loss < best_val_loss:
best_val_loss = val_loss
best_model_state = self.model.state_dict().copy()
# Logging
if (epoch + 1) % 5 == 0:
metric_str = f", Metric: {val_metric:.4f}" if val_metric else ""
print(f"Epoch {epoch+1}: Train={train_loss:.4f}, "
f"Val={val_loss:.4f}{metric_str}")
# Early stopping
if self.early_stopping(val_loss):
print(f"\nEarly stopping at epoch {epoch+1}")
break
# Load best model
if best_model_state:
self.model.load_state_dict(best_model_state)
return self.history
def temporal_train_test_split(df, date_col, train_ratio=0.7, val_ratio=0.15):
"""Split data temporally to avoid leakage."""
df = df.sort_values(date_col)
n = len(df)
train_end = int(n * train_ratio)
val_end = int(n * (train_ratio + val_ratio))
return {
"train": df.iloc[:train_end],
"val": df.iloc[train_end:val_end],
"test": df.iloc[val_end:]
}
# Example usage
config = TrainingConfig()
print(f"Training config: lr={config.learning_rate}, batch={config.batch_size}")# R: Training best practices
library(keras)
library(tidyverse)
# Temporal split for football data
temporal_split <- function(data, train_end, val_end) {
list(
train = data %>% filter(date < train_end),
val = data %>% filter(date >= train_end, date < val_end),
test = data %>% filter(date >= val_end)
)
}
# Class weighting for imbalanced data
calculate_class_weights <- function(y) {
counts <- table(y)
total <- sum(counts)
n_classes <- length(counts)
weights <- total / (n_classes * counts)
as.list(weights)
}
# Callbacks for training
training_callbacks <- function(model_path = "best_model.h5") {
list(
callback_early_stopping(
monitor = "val_loss",
patience = 10,
restore_best_weights = TRUE
),
callback_reduce_lr_on_plateau(
monitor = "val_loss",
factor = 0.5,
patience = 5
),
callback_model_checkpoint(
filepath = model_path,
save_best_only = TRUE,
monitor = "val_loss"
)
)
}Training config: lr=0.001, batch=32Model Deployment
Deploying deep learning models for football analytics requires consideration of latency, scalability, and model updates. Here are patterns for production deployment.
# Python: Model deployment with FastAPI
import torch
import torch.nn as nn
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import numpy as np
import pickle
from typing import List, Optional
app = FastAPI(title="Football xG API")
# Global model cache
model_cache = {}
class ShotInput(BaseModel):
x: float
y: float
body_part: str = "Right Foot"
under_pressure: bool = False
class PredictionOutput(BaseModel):
xg: float
confidence_interval: List[float]
class ModelServer:
"""Serve deep learning models for predictions."""
def __init__(self, model_path: str, scaler_path: str):
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Load model
self.model = self._load_model(model_path)
self.model.eval()
# Load preprocessing
with open(scaler_path, "rb") as f:
self.scaler = pickle.load(f)
def _load_model(self, path):
"""Load model with proper device mapping."""
model = torch.load(path, map_location=self.device)
return model.to(self.device)
def preprocess(self, shot: ShotInput) -> torch.Tensor:
"""Convert input to model-ready tensor."""
features = np.array([
shot.x / 100,
shot.y / 100,
np.sqrt((100 - shot.x)**2 + (50 - shot.y)**2) / 100,
1 if shot.body_part == "Head" else 0,
1 if shot.under_pressure else 0
]).reshape(1, -1)
features_scaled = self.scaler.transform(features)
return torch.tensor(features_scaled, dtype=torch.float32, device=self.device)
@torch.no_grad()
def predict(self, shot: ShotInput) -> PredictionOutput:
"""Generate xG prediction."""
x = self.preprocess(shot)
# Get prediction
xg = self.model(x).item()
# Estimate confidence (using dropout at inference for uncertainty)
predictions = []
self.model.train() # Enable dropout
for _ in range(100):
pred = self.model(x).item()
predictions.append(pred)
self.model.eval()
ci_low = np.percentile(predictions, 2.5)
ci_high = np.percentile(predictions, 97.5)
return PredictionOutput(
xg=xg,
confidence_interval=[ci_low, ci_high]
)
# Batch prediction for efficiency
class BatchPredictor:
"""Efficient batch prediction for multiple shots."""
def __init__(self, model, scaler, batch_size=32):
self.model = model
self.scaler = scaler
self.batch_size = batch_size
self.device = next(model.parameters()).device
@torch.no_grad()
def predict_batch(self, shots: List[ShotInput]) -> List[float]:
"""Predict xG for multiple shots efficiently."""
self.model.eval()
# Preprocess all shots
features = []
for shot in shots:
feat = [
shot.x / 100,
shot.y / 100,
np.sqrt((100 - shot.x)**2 + (50 - shot.y)**2) / 100,
1 if shot.body_part == "Head" else 0,
1 if shot.under_pressure else 0
]
features.append(feat)
features = np.array(features)
features_scaled = self.scaler.transform(features)
x = torch.tensor(features_scaled, dtype=torch.float32, device=self.device)
# Batch prediction
predictions = []
for i in range(0, len(x), self.batch_size):
batch = x[i:i + self.batch_size]
pred = self.model(batch)
predictions.extend(pred.cpu().numpy().flatten())
return predictions
# ONNX export for production
def export_to_onnx(model, sample_input, output_path):
"""Export PyTorch model to ONNX for deployment."""
torch.onnx.export(
model,
sample_input,
output_path,
input_names=["features"],
output_names=["xg"],
dynamic_axes={
"features": {0: "batch_size"},
"xg": {0: "batch_size"}
}
)
print(f"Model exported to {output_path}")
# TorchScript for production
def export_to_torchscript(model, sample_input, output_path):
"""Export to TorchScript for C++ deployment."""
traced = torch.jit.trace(model, sample_input)
traced.save(output_path)
print(f"TorchScript model saved to {output_path}")
print("Deployment utilities ready")# R: Model deployment considerations
library(plumber)
library(keras)
# Save model for deployment
save_for_deployment <- function(model, path) {
# Save keras model
save_model_hdf5(model, paste0(path, "/model.h5"))
# Save preprocessing objects
saveRDS(scaler, paste0(path, "/scaler.rds"))
saveRDS(label_encoder, paste0(path, "/label_encoder.rds"))
cat("Model saved to:", path, "\n")
}
# Plumber API endpoint example
#* @post /predict
#* @param shot_x Shot x coordinate
#* @param shot_y Shot y coordinate
function(shot_x, shot_y) {
# Load model (would be cached in production)
# model <- load_model_hdf5("model.h5")
# Prepare input
features <- c(as.numeric(shot_x), as.numeric(shot_y))
# Predict
# xg <- predict(model, matrix(features, nrow = 1))
list(xg = 0.15) # Placeholder
}Deployment utilities readyAdvanced Architectures
Beyond basic architectures, specialized designs can better capture football-specific patterns.
# Python: Advanced football-specific architectures
import torch
import torch.nn as nn
import torch.nn.functional as F
class HierarchicalMatchModel(nn.Module):
"""
Hierarchical model: Match -> Possessions -> Events
Learns representations at each level.
"""
def __init__(self, event_vocab_size, embed_dim=32, hidden_dim=64):
super().__init__()
# Event-level encoding
self.event_encoder = nn.Sequential(
nn.Embedding(event_vocab_size, embed_dim),
nn.LSTM(embed_dim + 4, hidden_dim, batch_first=True)
)
# Possession-level encoding (over event summaries)
self.possession_encoder = nn.LSTM(
hidden_dim, hidden_dim, batch_first=True
)
# Match-level encoding (over possession summaries)
self.match_encoder = nn.LSTM(
hidden_dim, hidden_dim, batch_first=True
)
# Output heads
self.xg_head = nn.Linear(hidden_dim, 1)
self.possession_outcome_head = nn.Linear(hidden_dim, 4)
self.match_outcome_head = nn.Linear(hidden_dim, 3)
def encode_events(self, events, locations):
"""Encode a sequence of events."""
embeds = self.event_encoder[0](events)
x = torch.cat([embeds, locations], dim=-1)
_, (h, _) = self.event_encoder[1](x)
return h.squeeze(0)
class MultiTaskFootballModel(nn.Module):
"""
Multi-task model for football predictions.
Shared backbone, task-specific heads.
"""
def __init__(self, input_dim, hidden_dim=128):
super().__init__()
# Shared backbone
self.backbone = nn.Sequential(
nn.Linear(input_dim, hidden_dim),
nn.ReLU(),
nn.BatchNorm1d(hidden_dim),
nn.Dropout(0.3),
nn.Linear(hidden_dim, hidden_dim),
nn.ReLU(),
nn.BatchNorm1d(hidden_dim),
nn.Dropout(0.3),
nn.Linear(hidden_dim, 64),
nn.ReLU()
)
# Task-specific heads
self.xg_head = nn.Sequential(
nn.Linear(64, 32),
nn.ReLU(),
nn.Linear(32, 1),
nn.Sigmoid()
)
self.pass_success_head = nn.Sequential(
nn.Linear(64, 32),
nn.ReLU(),
nn.Linear(32, 1),
nn.Sigmoid()
)
self.event_type_head = nn.Sequential(
nn.Linear(64, 32),
nn.ReLU(),
nn.Linear(32, 10) # 10 event types
)
def forward(self, x, task="all"):
"""
Forward pass for specified task(s).
Args:
x: Input features
task: "xg", "pass", "event", or "all"
"""
features = self.backbone(x)
if task == "xg":
return self.xg_head(features)
elif task == "pass":
return self.pass_success_head(features)
elif task == "event":
return self.event_type_head(features)
else:
return {
"xg": self.xg_head(features),
"pass_success": self.pass_success_head(features),
"event_type": self.event_type_head(features)
}
class SpatialAttentionModel(nn.Module):
"""
Model with spatial attention for pitch-aware predictions.
"""
def __init__(self, hidden_dim=64):
super().__init__()
# Spatial encoder (pitch as 2D grid)
self.spatial_conv = nn.Sequential(
nn.Conv2d(1, 16, kernel_size=3, padding=1),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Conv2d(16, 32, kernel_size=3, padding=1),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Conv2d(32, 64, kernel_size=3, padding=1),
nn.ReLU()
)
# Spatial attention
self.spatial_attention = nn.Sequential(
nn.Conv2d(64, 1, kernel_size=1),
nn.Sigmoid()
)
# Output
self.classifier = nn.Sequential(
nn.Flatten(),
nn.Linear(64 * 10 * 7, hidden_dim), # Assuming 80x56 -> 10x7
nn.ReLU(),
nn.Linear(hidden_dim, 1),
nn.Sigmoid()
)
def forward(self, pitch_heatmap):
"""
Args:
pitch_heatmap: (batch, 1, 80, 56) spatial distribution
"""
features = self.spatial_conv(pitch_heatmap)
# Apply spatial attention
attention = self.spatial_attention(features)
attended = features * attention
return self.classifier(attended), attention
# Variational model for uncertainty
class VariationalXGModel(nn.Module):
"""
Variational xG model that provides uncertainty estimates.
"""
def __init__(self, input_dim, latent_dim=16, hidden_dim=64):
super().__init__()
# Encoder to latent space
self.encoder = nn.Sequential(
nn.Linear(input_dim, hidden_dim),
nn.ReLU(),
nn.Linear(hidden_dim, hidden_dim),
nn.ReLU()
)
self.mu_layer = nn.Linear(hidden_dim, latent_dim)
self.logvar_layer = nn.Linear(hidden_dim, latent_dim)
# Decoder/predictor
self.decoder = nn.Sequential(
nn.Linear(latent_dim, hidden_dim),
nn.ReLU(),
nn.Linear(hidden_dim, 1),
nn.Sigmoid()
)
def reparameterize(self, mu, logvar):
"""Sample from latent distribution."""
std = torch.exp(0.5 * logvar)
eps = torch.randn_like(std)
return mu + eps * std
def forward(self, x, num_samples=1):
"""
Forward pass with optional multiple samples for uncertainty.
"""
h = self.encoder(x)
mu = self.mu_layer(h)
logvar = self.logvar_layer(h)
if num_samples == 1:
z = self.reparameterize(mu, logvar)
return self.decoder(z), mu, logvar
else:
predictions = []
for _ in range(num_samples):
z = self.reparameterize(mu, logvar)
pred = self.decoder(z)
predictions.append(pred)
return torch.stack(predictions), mu, logvar
print("Advanced architectures defined")# R: Advanced architecture concepts
# Hierarchical model for match -> possession -> event
# Would be implemented in Python typically
# Concept: Multi-task learning
# Simultaneously predict:
# 1. xG (shot outcome)
# 2. Pass success probability
# 3. Player value
# Shared lower layers, task-specific headsAdvanced architectures definedPractice Exercises
Hands-On Practice
Complete these exercises to master deep learning for football:
Build a feedforward neural network to predict match outcomes (H/D/A) from team statistics. Compare performance against logistic regression. Use proper train/validation/test splits.
Implement an LSTM that predicts whether a possession will end in a shot. Use StatsBomb event data to create sequences. Evaluate with ROC-AUC.
Train an autoencoder on player statistics to create 16-dimensional embeddings. Find the 5 most similar players to a target player. Visualize embeddings with t-SNE.
Build a transformer model for event sequences. Analyze the attention weights - do they align with intuition about important events (shots, key passes)?
Implement a deep learning xG model that incorporates the sequence of events leading to a shot. Compare performance (AUC, Brier score) against a traditional logistic regression model.
Hint
Use an LSTM to encode the preceding 5-10 events, concatenate with shot location features, and pass through dense layers. The sequence context should improve predictions for open-play shots.
Build a Graph Neural Network where nodes are players and edges are passes. Use the GNN to generate a team-level embedding and predict match outcomes from two team graphs.
Hint
Start with GCNConv layers for message passing. Use global_mean_pool to aggregate node embeddings into a single team vector. Combine home and away team vectors for match prediction.
Train a player embedding model on Premier League data. Transfer the embeddings to La Liga using players who played in both leagues as anchors. Evaluate whether the transferred embeddings capture similar player roles.
Deploy your trained xG model as a REST API using FastAPI. Implement batch prediction, uncertainty estimates via MC Dropout, and model versioning. Benchmark latency for single and batch predictions.
Hint
Export your model to TorchScript or ONNX for faster inference. Use caching for the model and preprocessing objects. Implement async endpoints for high throughput.
Summary
Key Takeaways
- Feedforward networks combine diverse features for match prediction
- Team embeddings via shared weight matrices capture team identity
- RNNs (LSTM/GRU) model sequential event data with memory
- Player embeddings capture playing style in dense vectors (autoencoders, contrastive)
- Graph Neural Networks learn from player interaction networks
- Attention mechanisms identify important events and enable interpretability
- Deep xG models incorporate sequence context for better predictions
- Transfer learning reuses pre-trained models and cross-league embeddings
- Deep learning requires careful hyperparameter tuning, temporal splits, and sufficient data
Common Pitfalls
- Data leakage: Random splits on football data cause leakage—always use temporal splits
- Overfitting: Football datasets are often small—use regularization, dropout, early stopping
- Class imbalance: Goals are rare events—use weighted loss, oversampling, or focal loss
- Sequence length variance: Possessions vary in length—use packed sequences properly
- Vanishing gradients: Long sequences can cause issues—use LSTM/GRU over vanilla RNN
- GPU memory: Graph batching in GNNs requires care—use PyG's batch utilities
- Deployment latency: Complex models are slow—profile and optimize for production
- Model interpretability: Black-box predictions are hard to trust—use attention visualization
Essential Libraries
Python Libraries:
torch- PyTorch deep learningtensorflow/keras- Alternative frameworktorch-geometric- Graph neural networkstransformers- Pre-trained modelsscikit-learn- Preprocessing, metricsonnx/onnxruntime- Model exportfastapi- Model servingwandb/tensorboard- Experiment tracking
R Packages:
keras/keras3- Deep learning in Rtensorflow- TensorFlow backendtorch- PyTorch in Rreticulate- Python interopcaret- Model evaluationplumber- API deployment
Model Complexity vs. Performance
| Model Type | Params | Training Time | Inference | Best For |
|---|---|---|---|---|
| Logistic Regression | ~100 | Seconds | <1ms | Baselines, interpretability |
| Feedforward NN | ~10K | Minutes | ~1ms | Tabular features |
| LSTM/GRU | ~50K | Hours | ~5ms | Event sequences |
| Transformer | ~100K+ | Hours | ~10ms | Long sequences, attention |
| GNN | ~20K | Hours | ~5ms | Pass networks, team analysis |
When to Use Deep Learning
Deep learning isn't always the answer. Consider using it when:
- Large datasets: You have thousands of samples (matches, possessions)
- Sequential data: Event sequences benefit from memory (LSTM/GRU)
- Graph structure: Pass networks are naturally suited to GNNs
- Feature learning: When manual features are insufficient
- Transfer needed: Pre-trained models accelerate development
For smaller datasets or when interpretability is critical, start with simpler models (logistic regression, gradient boosting) and only move to deep learning if needed.
Deep learning opens up new possibilities for football analytics, from better predictions to richer player representations. In the next chapter, we'll explore simulation and agent-based modeling for tactical analysis.