Capstone - Complete Analytics System
Introduction to Manager Analytics
Football managers are often credited with success or blamed for failure, but analytically evaluating their impact is complex. How do we separate manager skill from player quality? Can we identify distinctive tactical signatures? This chapter explores frameworks for manager analysis.
The Manager Evaluation Challenge
Managers influence team performance through tactics, player development, squad management, and motivation. Isolating their contribution from the players' inherent abilities and organizational factors requires careful analytical approaches.
# Manager Performance Analysis Framework
import pandas as pd
import numpy as np
# Manager career data
manager_data = pd.DataFrame({
"manager": ["Manager A"]*3 + ["Manager B"]*3 + ["Manager C"]*3,
"club": ["Club X"]*3 + ["Club Y", "Club Y", "Club Z"] + ["Club W", "Club W", "Club V"],
"season": ["2021-22", "2022-23", "2023-24"]*3,
"matches": [38]*9,
"wins": [22, 25, 28, 15, 18, 20, 12, 16, 14],
"draws": [8, 7, 5, 10, 9, 10, 12, 10, 8],
"losses": [8, 6, 5, 13, 11, 8, 14, 12, 16],
"goals_for": [68, 75, 82, 52, 58, 62, 45, 50, 48],
"goals_against": [35, 32, 28, 48, 45, 40, 55, 48, 58],
"xG_for": [62.5, 70.1, 78.3, 48.2, 55.6, 60.2, 42.1, 52.4, 45.8],
"xG_against": [38.2, 35.8, 30.5, 50.1, 47.3, 42.8, 52.8, 48.9, 55.2],
"budget_rank": [3, 2, 1, 8, 7, 5, 12, 10, 15]
})
# Calculate metrics
manager_data["points"] = manager_data["wins"] * 3 + manager_data["draws"]
manager_data["ppg"] = manager_data["points"] / manager_data["matches"]
manager_data["gd"] = manager_data["goals_for"] - manager_data["goals_against"]
manager_data["xgd"] = manager_data["xG_for"] - manager_data["xG_against"]
manager_data["goal_overperformance"] = manager_data["goals_for"] - manager_data["xG_for"]
manager_data["defensive_overperformance"] = manager_data["xG_against"] - manager_data["goals_against"]
manager_data["total_overperformance"] = manager_data["gd"] - manager_data["xgd"]
manager_data["win_rate"] = manager_data["wins"] / manager_data["matches"] * 100
# Career summaries
def weighted_avg(group, value_col, weight_col):
return np.average(group[value_col], weights=group[weight_col])
career_summary = manager_data.groupby("manager").apply(
lambda g: pd.Series({
"seasons": len(g),
"total_matches": g["matches"].sum(),
"avg_ppg": weighted_avg(g, "ppg", "matches"),
"avg_win_rate": weighted_avg(g, "win_rate", "matches"),
"avg_budget_rank": g["budget_rank"].mean(),
"avg_xgd": weighted_avg(g, "xgd", "matches"),
"avg_overperformance": weighted_avg(g, "total_overperformance", "matches")
})
).reset_index()
# Resource-adjusted performance
career_summary["resource_adjusted_ppg"] = (
career_summary["avg_ppg"] * (career_summary["avg_budget_rank"] / 10)
)
career_summary = career_summary.sort_values("avg_ppg", ascending=False)
print("Manager Career Summaries:")
print(career_summary.to_string(index=False))
# Manager Performance Analysis Framework
library(tidyverse)
# Sample manager career data
manager_data <- tribble(
~manager, ~club, ~season, ~matches, ~wins, ~draws, ~losses,
~goals_for, ~goals_against, ~xG_for, ~xG_against, ~budget_rank,
"Manager A", "Club X", "2021-22", 38, 22, 8, 8, 68, 35, 62.5, 38.2, 3,
"Manager A", "Club X", "2022-23", 38, 25, 7, 6, 75, 32, 70.1, 35.8, 2,
"Manager A", "Club X", "2023-24", 38, 28, 5, 5, 82, 28, 78.3, 30.5, 1,
"Manager B", "Club Y", "2021-22", 38, 15, 10, 13, 52, 48, 48.2, 50.1, 8,
"Manager B", "Club Y", "2022-23", 38, 18, 9, 11, 58, 45, 55.6, 47.3, 7,
"Manager B", "Club Z", "2023-24", 38, 20, 10, 8, 62, 40, 60.2, 42.8, 5,
"Manager C", "Club W", "2021-22", 38, 12, 12, 14, 45, 55, 42.1, 52.8, 12,
"Manager C", "Club W", "2022-23", 38, 16, 10, 12, 50, 48, 52.4, 48.9, 10,
"Manager C", "Club V", "2023-24", 38, 14, 8, 16, 48, 58, 45.8, 55.2, 15
)
# Calculate performance metrics
manager_metrics <- manager_data %>%
mutate(
# Points and PPG
points = wins * 3 + draws,
ppg = points / matches,
# Goal difference and xG difference
gd = goals_for - goals_against,
xgd = xG_for - xG_against,
# Overperformance (actual vs expected)
goal_overperformance = goals_for - xG_for,
defensive_overperformance = xG_against - goals_against,
total_overperformance = gd - xgd,
# Win rate
win_rate = wins / matches * 100
)
# Career summaries
career_summary <- manager_metrics %>%
group_by(manager) %>%
summarise(
seasons = n(),
total_matches = sum(matches),
avg_ppg = weighted.mean(ppg, matches),
avg_win_rate = weighted.mean(win_rate, matches),
avg_budget_rank = mean(budget_rank),
avg_xgd = weighted.mean(xgd, matches),
avg_overperformance = weighted.mean(total_overperformance, matches),
.groups = "drop"
) %>%
mutate(
# Performance relative to resources
resource_adjusted_ppg = avg_ppg * (avg_budget_rank / 10)
) %>%
arrange(desc(avg_ppg))
print("Manager Career Summaries:")
print(career_summary)
Tactical Fingerprints
Every manager has distinctive tactical tendencies that create a "fingerprint" visible in team performance data. Identifying these patterns helps understand managerial style and predict how teams will perform under new leadership.
Attacking Style
Build-up patterns, shot locations, crossing tendency
Defensive Style
Pressing height, defensive line, duels approach
Transition Style
Counter-attacking pace, possession recycling
# Tactical Fingerprint Analysis
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.cluster import KMeans
# Team tactical metrics by manager
tactical_data = pd.DataFrame({
"manager": ["Guardiola", "Klopp", "Mourinho", "Arteta", "Conte", "Ancelotti"],
"possession": [67.2, 58.5, 45.2, 62.8, 48.5, 55.2],
"ppda": [7.8, 8.2, 12.5, 9.1, 11.2, 10.5],
"deep_completions": [15.2, 12.8, 8.5, 13.5, 9.2, 11.8],
"crosses_per_90": [18.5, 22.3, 15.2, 16.8, 20.5, 19.2],
"shots_from_counter": [8, 18, 22, 12, 25, 15],
"high_press_pct": [42, 55, 28, 48, 32, 38],
"avg_pass_length": [14.2, 16.8, 18.5, 15.5, 17.2, 16.2],
"progressive_passes_p90": [78.5, 65.2, 42.8, 72.1, 48.5, 58.3]
})
class TacticalFingerprint:
"""Analyze manager tactical styles"""
def __init__(self, data):
self.data = data
self.metrics = [c for c in data.columns if c != "manager"]
self._normalize()
def _normalize(self):
"""Normalize metrics to 0-100 scale"""
scaler = MinMaxScaler(feature_range=(0, 100))
normalized = scaler.fit_transform(self.data[self.metrics])
self.normalized = self.data[["manager"]].copy()
for i, metric in enumerate(self.metrics):
self.normalized[f"{metric}_norm"] = normalized[:, i]
def get_profile(self, manager_name):
"""Get tactical profile for a manager"""
profile = self.normalized[self.normalized["manager"] == manager_name]
return profile.iloc[0]
def compare_managers(self, manager1, manager2):
"""Compare tactical profiles of two managers"""
p1 = self.get_profile(manager1)
p2 = self.get_profile(manager2)
comparison = []
for metric in self.metrics:
norm_metric = f"{metric}_norm"
comparison.append({
"metric": metric,
manager1: p1[norm_metric],
manager2: p2[norm_metric],
"difference": p1[norm_metric] - p2[norm_metric]
})
return pd.DataFrame(comparison)
def cluster_styles(self, n_clusters=3):
"""Cluster managers by tactical style"""
norm_cols = [c for c in self.normalized.columns if c.endswith("_norm")]
X = self.normalized[norm_cols].values
kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=10)
clusters = kmeans.fit_predict(X)
self.normalized["style_cluster"] = clusters
# Label clusters based on characteristics
cluster_labels = {}
for i in range(n_clusters):
cluster_data = self.normalized[self.normalized["style_cluster"] == i]
avg_possession = cluster_data["possession_norm"].mean()
avg_counter = cluster_data["shots_from_counter_norm"].mean()
if avg_possession > 60:
cluster_labels[i] = "Possession-based"
elif avg_counter > 60:
cluster_labels[i] = "Counter-attacking"
else:
cluster_labels[i] = "Balanced"
self.normalized["style_label"] = self.normalized["style_cluster"].map(cluster_labels)
return self.normalized[["manager", "style_cluster", "style_label"]]
# Analysis
fingerprint = TacticalFingerprint(tactical_data)
# Compare managers
comparison = fingerprint.compare_managers("Guardiola", "Mourinho")
print("Guardiola vs Mourinho Tactical Comparison:")
print(comparison.to_string(index=False))
# Cluster by style
styles = fingerprint.cluster_styles(n_clusters=3)
print("\nManager Style Clusters:")
print(styles.to_string(index=False))
# Tactical Fingerprint Analysis
library(tidyverse)
# Team tactical metrics by manager
tactical_data <- tribble(
~manager, ~season, ~possession, ~ppda, ~deep_completions,
~crosses_per_90, ~shots_from_counter, ~high_press_pct,
~avg_pass_length, ~progressive_passes_p90,
"Guardiola", "2023-24", 67.2, 7.8, 15.2, 18.5, 8, 42, 14.2, 78.5,
"Klopp", "2023-24", 58.5, 8.2, 12.8, 22.3, 18, 55, 16.8, 65.2,
"Mourinho", "2023-24", 45.2, 12.5, 8.5, 15.2, 22, 28, 18.5, 42.8,
"Arteta", "2023-24", 62.8, 9.1, 13.5, 16.8, 12, 48, 15.5, 72.1,
"Conte", "2023-24", 48.5, 11.2, 9.2, 20.5, 25, 32, 17.2, 48.5,
"Ancelotti", "2023-24", 55.2, 10.5, 11.8, 19.2, 15, 38, 16.2, 58.3
)
# Normalize metrics for comparison (0-100 scale)
normalize <- function(x) {
(x - min(x)) / (max(x) - min(x)) * 100
}
tactical_normalized <- tactical_data %>%
mutate(across(possession:progressive_passes_p90, normalize, .names = "{.col}_norm"))
# Create tactical profile
create_tactical_profile <- function(manager_name, data) {
profile <- data %>%
filter(manager == manager_name) %>%
select(ends_with("_norm")) %>%
pivot_longer(everything(), names_to = "metric", values_to = "value") %>%
mutate(metric = str_remove(metric, "_norm"))
return(profile)
}
# Compare two managers
compare_managers <- function(manager1, manager2, data) {
p1 <- create_tactical_profile(manager1, data) %>%
rename(!!manager1 := value)
p2 <- create_tactical_profile(manager2, data) %>%
rename(!!manager2 := value)
comparison <- p1 %>%
left_join(p2, by = "metric") %>%
mutate(difference = .data[[manager1]] - .data[[manager2]])
return(comparison)
}
# Example comparison
comparison <- compare_managers("Guardiola", "Mourinho", tactical_normalized)
print("Guardiola vs Mourinho Tactical Comparison:")
print(comparison)
# Cluster managers by style
library(cluster)
style_matrix <- tactical_normalized %>%
select(manager, ends_with("_norm")) %>%
column_to_rownames("manager")
# K-means clustering
set.seed(42)
clusters <- kmeans(style_matrix, centers = 3)
tactical_normalized$style_cluster <- clusters$cluster
tactical_normalized <- tactical_normalized %>%
mutate(
style_label = case_when(
style_cluster == 1 ~ "Possession-based",
style_cluster == 2 ~ "Counter-attacking",
style_cluster == 3 ~ "Balanced"
)
)
print("\nManager Style Clusters:")
print(tactical_normalized %>% select(manager, style_label))
Measuring Manager Impact
Isolating a manager's impact from squad quality is one of the most challenging problems in football analytics. Several approaches can help estimate the "manager effect".
Attribution Challenges
- Managers inherit squads they didn't build
- Transfer market success is partly luck and partly club infrastructure
- Short tenures limit sample sizes
- Player improvement could be natural development
- Fixture difficulty varies between clubs and seasons
# Manager Impact Estimation
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
# Historical data
historical_data = pd.DataFrame({
"manager": ["Manager A"]*2 + ["Manager B"] + ["Manager C"]*2 + ["Manager D"] +
["Manager E"]*2 + ["Manager F"],
"club": ["Club X"]*3 + ["Club Y"]*3 + ["Club Z"]*3,
"season": ["2020-21", "2021-22", "2022-23"]*3,
"squad_value_m": [450, 520, 580, 280, 310, 350, 180, 195, 220],
"wage_bill_m": [180, 200, 220, 95, 105, 115, 65, 70, 80],
"points": [74, 86, 75, 58, 65, 52, 45, 55, 62],
"xG_for": [58.5, 72.3, 65.2, 48.2, 55.8, 45.1, 38.5, 48.2, 52.8],
"xG_against": [42.1, 32.5, 38.8, 52.3, 48.1, 55.2, 58.2, 52.1, 45.5]
})
class ManagerImpactAnalyzer:
"""Estimate manager impact on team performance"""
def __init__(self, data):
self.data = data.copy()
self._calculate_metrics()
def _calculate_metrics(self):
"""Calculate resource index and expected points"""
self.data["resource_index"] = (
self.data["squad_value_m"] + self.data["wage_bill_m"] * 2
) / 3
# Fit resource -> points model
X = self.data[["resource_index"]].values
y = self.data["points"].values
self.model = LinearRegression()
self.model.fit(X, y)
self.data["expected_points"] = self.model.predict(X)
self.data["points_above_expected"] = (
self.data["points"] - self.data["expected_points"]
)
def get_manager_impact(self):
"""Summarize manager impact"""
impact = self.data.groupby("manager").agg({
"season": "count",
"resource_index": "mean",
"points": "mean",
"expected_points": "mean",
"points_above_expected": ["mean", "sum"]
}).reset_index()
impact.columns = ["manager", "seasons", "avg_resources", "avg_points",
"avg_expected_points", "avg_above_expected", "total_above_expected"]
return impact.sort_values("avg_above_expected", ascending=False)
def before_after_analysis(self, club_name):
"""Analyze before/after for manager changes at a club"""
club_data = self.data[self.data["club"] == club_name].sort_values("season")
# Identify manager changes
club_data["manager_change"] = club_data["manager"] != club_data["manager"].shift(1)
club_data["period"] = club_data["manager_change"].cumsum()
# Summarize by period
period_summary = club_data.groupby(["manager", "period"]).agg({
"points": "mean",
"expected_points": "mean",
"points_above_expected": "mean"
}).reset_index()
return period_summary
def rank_managers(self, min_seasons=2):
"""Rank managers by impact, filtered by minimum tenure"""
impact = self.get_manager_impact()
qualified = impact[impact["seasons"] >= min_seasons]
return qualified.sort_values("avg_above_expected", ascending=False)
# Analysis
analyzer = ManagerImpactAnalyzer(historical_data)
print("Manager Impact (Points Above Resource Expectation):")
print(analyzer.get_manager_impact().to_string(index=False))
print("\nClub X Before/After Manager Changes:")
print(analyzer.before_after_analysis("Club X").to_string(index=False))
# Manager Impact Estimation
library(tidyverse)
library(lme4)
# Historical data with squad values
historical_data <- tribble(
~manager, ~club, ~season, ~squad_value_m, ~wage_bill_m,
~points, ~xG_for, ~xG_against, ~final_position,
"Manager A", "Club X", "2020-21", 450, 180, 74, 58.5, 42.1, 4,
"Manager A", "Club X", "2021-22", 520, 200, 86, 72.3, 32.5, 2,
"Manager B", "Club X", "2022-23", 580, 220, 75, 65.2, 38.8, 4,
"Manager C", "Club Y", "2020-21", 280, 95, 58, 48.2, 52.3, 10,
"Manager C", "Club Y", "2021-22", 310, 105, 65, 55.8, 48.1, 7,
"Manager D", "Club Y", "2022-23", 350, 115, 52, 45.1, 55.2, 12,
"Manager E", "Club Z", "2020-21", 180, 65, 45, 38.5, 58.2, 15,
"Manager E", "Club Z", "2021-22", 195, 70, 55, 48.2, 52.1, 11,
"Manager F", "Club Z", "2022-23", 220, 80, 62, 52.8, 45.5, 8
)
# Calculate expected points based on resources
historical_data <- historical_data %>%
mutate(
# Log-transform financial metrics
log_value = log(squad_value_m),
log_wage = log(wage_bill_m),
# Combined resource metric
resource_index = (squad_value_m + wage_bill_m * 2) / 3
)
# Simple regression: Points ~ Resources
resource_model <- lm(points ~ resource_index, data = historical_data)
# Add expected points and residual (manager effect proxy)
historical_data <- historical_data %>%
mutate(
expected_points = predict(resource_model, newdata = .),
points_above_expected = points - expected_points
)
# Manager performance summary
manager_impact <- historical_data %>%
group_by(manager) %>%
summarise(
seasons = n(),
avg_resources = mean(resource_index),
avg_points = mean(points),
avg_expected_points = mean(expected_points),
avg_points_above_expected = mean(points_above_expected),
total_overperformance = sum(points_above_expected),
.groups = "drop"
) %>%
arrange(desc(avg_points_above_expected))
print("Manager Impact (Points Above Resource Expectation):")
print(manager_impact)
# Mixed effects model (accounting for club random effects)
# Requires more data in practice
# manager_effect_model <- lmer(
# points ~ resource_index + (1|club) + (1|manager),
# data = historical_data
# )
# Before/After comparison for manager changes
before_after_analysis <- function(data, club_name) {
club_data <- data %>%
filter(club == club_name) %>%
arrange(season)
# Identify manager changes
club_data <- club_data %>%
mutate(
manager_change = manager != lag(manager),
period = cumsum(replace_na(manager_change, FALSE))
)
# Compare periods
period_summary <- club_data %>%
group_by(manager, period) %>%
summarise(
avg_points = mean(points),
avg_expected = mean(expected_points),
overperformance = mean(points_above_expected),
.groups = "drop"
)
return(period_summary)
}
print("\nClub X Before/After Manager Changes:")
print(before_after_analysis(historical_data, "Club X"))
In-Game Management Analytics
Managers make dozens of decisions during matches: substitutions, tactical adjustments, and motivational interventions. Analyzing these in-game decisions provides insights into managerial skill.
# Substitution Analytics
import pandas as pd
import numpy as np
# Substitution data
substitutions = pd.DataFrame({
"match_id": [1, 1, 2, 3, 3, 4],
"manager": ["Manager A"]*3 + ["Manager B"]*3,
"minute": [62, 75, 55, 70, 82, 88],
"player_off": ["Player X", "Player Z", "Player V", "Player A", "Player C", "Player E"],
"player_on": ["Player Y", "Player W", "Player U", "Player B", "Player D", "Player F"],
"score_diff": [-1, 0, 1, 0, 0, -1],
"xG_diff_before": [-0.8, 0.2, 1.2, -0.3, 0.1, -0.5],
"xG_diff_after": [0.5, 0.3, 0.8, 0.2, 0.4, -0.2],
"result_change": ["improved", "maintained", "declined", "improved", "improved", "improved"]
})
class InGameAnalytics:
"""Analyze in-game management decisions"""
def __init__(self, sub_data):
self.subs = sub_data
def substitution_effectiveness(self):
"""Analyze substitution effectiveness by manager"""
effectiveness = self.subs.groupby("manager").apply(
lambda g: pd.Series({
"total_subs": len(g),
"avg_minute": g["minute"].mean(),
"improved_pct": (g["result_change"] == "improved").mean() * 100,
"when_losing": (g["score_diff"] < 0).sum(),
"when_losing_improved": (
(g["score_diff"] < 0) & (g["result_change"] == "improved")
).sum()
})
).reset_index()
effectiveness["losing_success_rate"] = (
effectiveness["when_losing_improved"] /
effectiveness["when_losing"].replace(0, np.nan) * 100
)
return effectiveness
def timing_analysis(self):
"""Analyze substitution timing patterns"""
subs = self.subs.copy()
def timing_bucket(minute):
if minute < 60: return "Early (< 60)"
elif minute < 75: return "Mid (60-75)"
elif minute < 85: return "Late (75-85)"
return "Very Late (85+)"
subs["timing_bucket"] = subs["minute"].apply(timing_bucket)
timing = subs.groupby(["manager", "timing_bucket"]).agg({
"match_id": "count",
"result_change": lambda x: (x == "improved").mean() * 100
}).reset_index()
timing.columns = ["manager", "timing_bucket", "count", "success_rate"]
return timing
def analyze_situational_changes(self):
"""Analyze substitutions by game situation"""
situations = {
"winning": self.subs[self.subs["score_diff"] > 0],
"drawing": self.subs[self.subs["score_diff"] == 0],
"losing": self.subs[self.subs["score_diff"] < 0]
}
analysis = []
for situation, data in situations.items():
if len(data) > 0:
analysis.append({
"situation": situation,
"count": len(data),
"avg_minute": data["minute"].mean(),
"success_rate": (data["result_change"] == "improved").mean() * 100
})
return pd.DataFrame(analysis)
# Analysis
analyzer = InGameAnalytics(substitutions)
print("Substitution Effectiveness by Manager:")
print(analyzer.substitution_effectiveness().to_string(index=False))
print("\nSubstitution Timing Analysis:")
print(analyzer.timing_analysis().to_string(index=False))
print("\nSituational Analysis:")
print(analyzer.analyze_situational_changes().to_string(index=False))
# Substitution Analytics
library(tidyverse)
# Substitution data
substitutions <- tribble(
~match_id, ~manager, ~minute, ~player_off, ~player_on, ~score_diff,
~xG_diff_before, ~xG_diff_after, ~result_change,
1, "Manager A", 62, "Player X", "Player Y", -1, -0.8, 0.5, "improved",
1, "Manager A", 75, "Player Z", "Player W", 0, 0.2, 0.3, "maintained",
2, "Manager A", 55, "Player V", "Player U", 1, 1.2, 0.8, "declined",
3, "Manager B", 70, "Player A", "Player B", 0, -0.3, 0.2, "improved",
3, "Manager B", 82, "Player C", "Player D", 0, 0.1, 0.4, "improved",
4, "Manager B", 88, "Player E", "Player F", -1, -0.5, -0.2, "improved"
)
# Substitution effectiveness analysis
sub_effectiveness <- substitutions %>%
group_by(manager) %>%
summarise(
total_subs = n(),
avg_minute = mean(minute),
improved_pct = mean(result_change == "improved") * 100,
when_losing = sum(score_diff < 0),
when_losing_improved = sum(score_diff < 0 & result_change == "improved"),
losing_success_rate = when_losing_improved / when_losing * 100,
.groups = "drop"
)
print("Substitution Effectiveness by Manager:")
print(sub_effectiveness)
# Timing analysis
timing_analysis <- substitutions %>%
mutate(
timing_bucket = case_when(
minute < 60 ~ "Early (< 60)",
minute < 75 ~ "Mid (60-75)",
minute < 85 ~ "Late (75-85)",
TRUE ~ "Very Late (85+)"
)
) %>%
group_by(manager, timing_bucket) %>%
summarise(
count = n(),
success_rate = mean(result_change == "improved") * 100,
.groups = "drop"
)
print("\nSubstitution Timing Analysis:")
print(timing_analysis)
# Tactical change detection (simplified)
tactical_changes <- tribble(
~match_id, ~manager, ~minute, ~change_type, ~formation_before, ~formation_after,
~xG_rate_before, ~xG_rate_after,
1, "Manager A", 55, "offensive", "4-3-3", "3-4-3", 0.015, 0.025,
2, "Manager A", 70, "defensive", "4-3-3", "5-3-2", 0.022, 0.010,
3, "Manager B", 60, "balanced", "4-4-2", "4-3-3", 0.018, 0.020,
4, "Manager B", 78, "offensive", "4-2-3-1", "4-1-3-2", 0.012, 0.028
)
# Tactical change effectiveness
tactical_effectiveness <- tactical_changes %>%
mutate(
xG_improvement = xG_rate_after - xG_rate_before,
effective = xG_improvement > 0 |
(change_type == "defensive" & xG_improvement < 0)
) %>%
group_by(manager) %>%
summarise(
total_changes = n(),
effective_changes = sum(effective),
effectiveness_rate = effective_changes / total_changes * 100,
avg_xG_improvement = mean(xG_improvement),
.groups = "drop"
)
print("\nTactical Change Effectiveness:")
print(tactical_effectiveness)
Player Development Under Managers
One of the most important but hardest-to-measure managerial skills is player development. Some managers consistently improve players, while others seem to extract less than expected.
# Player Development Analysis
import pandas as pd
import numpy as np
# Player trajectories
player_trajectories = pd.DataFrame({
"player": ["Player A"]*3 + ["Player B"]*3 + ["Player C"]*3,
"manager": ["Manager X", "Manager X", "Manager Y",
"Manager X", "Manager X", "Manager X",
"Manager Y", "Manager Y", "Manager Z"],
"season": ["2021-22", "2022-23", "2023-24"]*3,
"age": [23, 24, 25, 21, 22, 23, 26, 27, 28],
"xG_p90": [0.35, 0.42, 0.38, 0.15, 0.28, 0.35, 0.45, 0.42, 0.38],
"xA_p90": [0.18, 0.22, 0.20, 0.12, 0.18, 0.25, 0.28, 0.25, 0.22],
"press_p90": [22.5, 25.8, 23.2, 18.5, 24.2, 28.5, 20.1, 19.8, 18.2],
"prog_passes_p90": [5.2, 6.1, 5.8, 4.2, 5.8, 7.2, 6.5, 6.2, 5.8]
})
class PlayerDevelopmentAnalyzer:
"""Analyze player development under different managers"""
def __init__(self, trajectories):
self.data = trajectories.sort_values(["player", "season"])
self._calculate_growth()
def _calculate_growth(self):
"""Calculate season-over-season growth rates"""
df = self.data.copy()
for metric in ["xG_p90", "xA_p90", "press_p90"]:
df[f"{metric}_growth"] = df.groupby("player")[metric].pct_change() * 100
self.data = df
def manager_development_scores(self):
"""Calculate development scores by manager"""
growth_data = self.data.dropna(subset=["xG_p90_growth"])
scores = growth_data.groupby("manager").agg({
"player": "nunique",
"xG_p90_growth": "mean",
"xA_p90_growth": "mean",
"press_p90_growth": "mean"
}).reset_index()
scores.columns = ["manager", "players_developed", "avg_xG_growth",
"avg_xA_growth", "avg_press_growth"]
scores["development_score"] = (
scores["avg_xG_growth"] + scores["avg_xA_growth"] + scores["avg_press_growth"]
) / 3
return scores.sort_values("development_score", ascending=False)
def age_adjusted_development(self):
"""Calculate age-adjusted development scores"""
growth_data = self.data.dropna(subset=["xG_p90_growth"]).copy()
# Expected improvement by age
def expected_improvement(age):
if age < 23: return 10
elif age < 26: return 5
elif age < 29: return 0
return -5
growth_data["expected"] = growth_data["age"].apply(expected_improvement)
growth_data["adjusted_growth"] = growth_data["xG_p90_growth"] - growth_data["expected"]
adjusted = growth_data.groupby("manager").agg({
"adjusted_growth": "mean"
}).reset_index()
adjusted.columns = ["manager", "avg_adjusted_growth"]
return adjusted.sort_values("avg_adjusted_growth", ascending=False)
# Analysis
analyzer = PlayerDevelopmentAnalyzer(player_trajectories)
print("Manager Player Development Scores:")
print(analyzer.manager_development_scores().to_string(index=False))
print("\nAge-Adjusted Development:")
print(analyzer.age_adjusted_development().to_string(index=False))
# Player Development Analysis
library(tidyverse)
# Player performance before/during/after manager tenure
player_trajectories <- tribble(
~player, ~manager, ~season, ~age, ~xG_p90, ~xA_p90, ~press_p90, ~prog_passes_p90,
"Player A", "Manager X", "2021-22", 23, 0.35, 0.18, 22.5, 5.2,
"Player A", "Manager X", "2022-23", 24, 0.42, 0.22, 25.8, 6.1,
"Player A", "Manager Y", "2023-24", 25, 0.38, 0.20, 23.2, 5.8,
"Player B", "Manager X", "2021-22", 21, 0.15, 0.12, 18.5, 4.2,
"Player B", "Manager X", "2022-23", 22, 0.28, 0.18, 24.2, 5.8,
"Player B", "Manager X", "2023-24", 23, 0.35, 0.25, 28.5, 7.2,
"Player C", "Manager Y", "2021-22", 26, 0.45, 0.28, 20.1, 6.5,
"Player C", "Manager Y", "2022-23", 27, 0.42, 0.25, 19.8, 6.2,
"Player C", "Manager Z", "2023-24", 28, 0.38, 0.22, 18.2, 5.8
)
# Calculate improvement rates
player_improvement <- player_trajectories %>%
arrange(player, season) %>%
group_by(player, manager) %>%
mutate(
seasons_with_manager = n(),
xG_growth = (xG_p90 - lag(xG_p90)) / lag(xG_p90) * 100,
xA_growth = (xA_p90 - lag(xA_p90)) / lag(xA_p90) * 100,
press_growth = (press_p90 - lag(press_p90)) / lag(press_p90) * 100
) %>%
ungroup()
# Manager development scores
manager_development <- player_improvement %>%
filter(!is.na(xG_growth)) %>%
group_by(manager) %>%
summarise(
players_developed = n_distinct(player),
avg_xG_growth = mean(xG_growth, na.rm = TRUE),
avg_xA_growth = mean(xA_growth, na.rm = TRUE),
avg_press_growth = mean(press_growth, na.rm = TRUE),
players_improved = sum(xG_growth > 0) / n() * 100,
.groups = "drop"
) %>%
mutate(
development_score = (avg_xG_growth + avg_xA_growth + avg_press_growth) / 3
) %>%
arrange(desc(development_score))
print("Manager Player Development Scores:")
print(manager_development)
# Age-adjusted development (young players more likely to improve)
age_adjusted <- player_improvement %>%
filter(!is.na(xG_growth)) %>%
mutate(
# Expected improvement decreases with age
expected_improvement = case_when(
age < 23 ~ 10,
age < 26 ~ 5,
age < 29 ~ 0,
TRUE ~ -5
),
adjusted_xG_growth = xG_growth - expected_improvement
) %>%
group_by(manager) %>%
summarise(
avg_adjusted_growth = mean(adjusted_xG_growth, na.rm = TRUE),
.groups = "drop"
)
print("\nAge-Adjusted Development:")
print(age_adjusted)
Manager Sacking Analytics
When should a club sack their manager? This is one of the most consequential decisions boards make, yet it's often driven by emotion rather than analysis. Understanding optimal sacking timing can save clubs millions.
# Python: Manager Sacking Analysis
import pandas as pd
import numpy as np
from typing import Dict
from dataclasses import dataclass
@dataclass
class SackingDecision:
"""Result of sacking decision analysis."""
ppg_gap: float
xg_concerning: bool
recommendation: str
confidence: str
reasoning: str
class SackingAnalyzer:
"""Analyze manager sacking timing and outcomes."""
def __init__(self, historical_data: pd.DataFrame):
self.data = historical_data
self._analyze_outcomes()
def _analyze_outcomes(self):
"""Analyze historical sacking outcomes."""
df = self.data.copy()
# PPG improvement
df["ppg_improvement"] = df["ppg_after"] - df["ppg_at_sacking"]
df["improved_ppg"] = df["ppg_improvement"] > 0
# Position improvement
df["position_improvement"] = df["position_at_sacking"] - df["final_position"]
df["improved_position"] = df["position_improvement"] > 0
# Overall success
df["sacking_successful"] = df["improved_ppg"] & df["improved_position"]
# Timing assessment
def timing_category(matches):
if matches < 10: return "Too early"
elif matches < 15: return "Standard timing"
elif matches < 20: return "Patient approach"
return "Very late"
df["timing_category"] = df["matches"].apply(timing_category)
self.analyzed = df
def timing_success_rates(self) -> pd.DataFrame:
"""Calculate success rates by sacking timing."""
return self.analyzed.groupby("timing_category").agg({
"club": "count",
"sacking_successful": "mean",
"ppg_improvement": "mean"
}).reset_index().rename(columns={
"club": "sackings",
"sacking_successful": "success_rate",
"ppg_improvement": "avg_ppg_improvement"
})
def should_sack(self, current_ppg: float, xg_diff: float,
matches_played: int, position: int,
league_size: int = 20) -> SackingDecision:
"""Recommend whether to sack manager."""
# Expected PPG for position
expected_ppg = 38 / league_size
# Metrics
ppg_gap = expected_ppg - current_ppg
xg_concerning = xg_diff < -0.3
relegation_risk = position > (league_size - 3)
# Decision logic
if ppg_gap > 0.5 and xg_concerning and matches_played > 10:
recommendation = "Sack now"
confidence = "High"
reasoning = "Significant underperformance with poor underlying numbers"
elif ppg_gap > 0.3 and xg_concerning:
recommendation = "Consider sacking"
confidence = "Medium"
reasoning = "Moderate underperformance with concerning xG"
elif relegation_risk and ppg_gap > 0.2:
recommendation = "Sack now (survival mode)"
confidence = "High"
reasoning = "Relegation zone with negative trajectory"
elif ppg_gap > 0.5 and not xg_concerning:
recommendation = "Monitor closely"
confidence = "Low"
reasoning = "Results poor but underlying metrics acceptable"
else:
recommendation = "Retain manager"
confidence = "Medium"
reasoning = "Performance within acceptable range"
return SackingDecision(
ppg_gap=ppg_gap,
xg_concerning=xg_concerning,
recommendation=recommendation,
confidence=confidence,
reasoning=reasoning
)
def calculate_cost_of_sacking(self, remaining_contract_months: int,
monthly_salary: float,
new_manager_fee: float) -> Dict:
"""Calculate financial cost of sacking decision."""
severance = remaining_contract_months * monthly_salary * 0.8 # Typical settlement
total_cost = severance + new_manager_fee
return {
"severance_estimate": severance,
"hiring_cost": new_manager_fee,
"total_cost": total_cost,
"monthly_burden": total_cost / 12
}
# Example usage
sacking_data = pd.DataFrame({
"club": ["Club A", "Club B", "Club C"],
"manager": ["Manager 1", "Manager 2", "Manager 3"],
"matches": [12, 15, 18],
"ppg_at_sacking": [1.08, 0.93, 1.22],
"xg_diff": [-0.3, -0.5, -0.2],
"ppg_after": [1.52, 1.35, 1.18],
"position_at_sacking": [15, 18, 12],
"final_position": [10, 14, 13]
})
analyzer = SackingAnalyzer(sacking_data)
# Example decision
decision = analyzer.should_sack(
current_ppg=1.0,
xg_diff=-0.4,
matches_played=12,
position=15
)
print("Sacking Decision Analysis:")
print(f" PPG Gap: {decision.ppg_gap:.2f}")
print(f" xG Concerning: {decision.xg_concerning}")
print(f" Recommendation: {decision.recommendation}")
print(f" Reasoning: {decision.reasoning}")# R: Manager Sacking Analysis
library(tidyverse)
# Historical sacking data
sacking_data <- tribble(
~club, ~manager, ~sacking_date, ~matches, ~ppg_at_sacking, ~xg_diff,
~replacement, ~ppg_after, ~position_at_sacking, ~final_position,
"Club A", "Manager 1", "2023-10-15", 12, 1.08, -0.3, "Manager 2", 1.52, 15, 10,
"Club B", "Manager 3", "2023-11-20", 15, 0.93, -0.5, "Manager 4", 1.35, 18, 14,
"Club C", "Manager 5", "2023-12-05", 18, 1.22, -0.2, "Manager 6", 1.18, 12, 13,
"Club D", "Manager 7", "2024-01-10", 20, 0.85, -0.8, "Manager 8", 1.65, 19, 11,
"Club E", "Manager 9", "2023-09-28", 8, 0.88, -0.4, "Manager 10", 1.05, 16, 17
)
# Was the sacking successful?
sacking_analysis <- sacking_data %>%
mutate(
# PPG improvement
ppg_improvement = ppg_after - ppg_at_sacking,
improved_ppg = ppg_improvement > 0,
# Position improvement
position_improvement = position_at_sacking - final_position,
improved_position = position_improvement > 0,
# Overall success
sacking_successful = improved_ppg & improved_position,
# Timing assessment
timing_assessment = case_when(
matches < 10 ~ "Too early",
matches < 15 ~ "Standard timing",
matches < 20 ~ "Patient approach",
TRUE ~ "Very late"
)
)
# Success rates by timing
timing_success <- sacking_analysis %>%
group_by(timing_assessment) %>%
summarise(
sackings = n(),
success_rate = mean(sacking_successful) * 100,
avg_ppg_improvement = mean(ppg_improvement),
.groups = "drop"
)
print("Sacking Success by Timing:")
print(timing_success)
# When should you sack?
sacking_decision_model <- function(current_ppg, xg_diff, matches_played,
position, league_size = 20) {
# Calculate baseline expectation
expected_ppg <- 38 / league_size # Equal distribution assumption
# Underperformance score
ppg_gap <- expected_ppg - current_ppg
xg_signal <- xg_diff < -0.3 # Poor underlying numbers
# Relegation zone proximity
relegation_risk <- position > (league_size - 3)
# Decision logic
sack_recommendation <- case_when(
ppg_gap > 0.5 & xg_signal & matches_played > 10 ~ "Sack now",
ppg_gap > 0.3 & xg_signal ~ "Consider sacking",
ppg_gap > 0.5 & !xg_signal ~ "Monitor closely",
relegation_risk & ppg_gap > 0.2 ~ "Sack now (survival mode)",
TRUE ~ "Retain manager"
)
list(
ppg_gap = ppg_gap,
xg_concerning = xg_signal,
recommendation = sack_recommendation
)
}
# Test cases
example1 <- sacking_decision_model(1.0, -0.4, 12, 15)
example2 <- sacking_decision_model(1.5, 0.2, 15, 8)
print("\nExample Decision 1 (struggling team):")
print(example1)
print("\nExample Decision 2 (solid team):")
print(example2)Sacking Decision Framework
- Sample size matters: Wait at least 10-12 matches before judging
- xG over results: Underlying metrics are more predictive than short-term points
- Regression to mean: Early-season slumps often self-correct
- Replacement quality: Only sack if a better option is available
- Financial cost: Factor in severance and hiring costs
Manager Hiring Analytics
Finding the right manager is as important as knowing when to change. Analytics can help identify candidates whose style matches the club's needs and squad composition.
# Python: Manager Hiring Analysis
import pandas as pd
import numpy as np
from dataclasses import dataclass
from typing import Dict, List
@dataclass
class SquadProfile:
"""Profile of squad characteristics."""
avg_age: float
young_players: int
peak_players: int
technical_avg: float
physical_avg: float
pace_avg: float
recommended_style: str
@dataclass
class ManagerCandidate:
"""Manager candidate profile."""
name: str
preferred_style: str
pressing_intensity: str
possession_avg: float
youth_dev_score: int
tactical_flexibility: int
experience_level: str
available: bool
class ManagerHiringAnalyzer:
"""Analyze and rank manager candidates for a club."""
def __init__(self, squad_data: pd.DataFrame):
self.squad = squad_data
self.squad_profile = self._analyze_squad()
def _analyze_squad(self) -> SquadProfile:
"""Analyze squad to determine ideal manager profile."""
df = self.squad
avg_age = df["age"].mean()
young = (df["age"] < 23).sum()
peak = ((df["age"] >= 23) & (df["age"] <= 29)).sum()
technical = df["technical_skill"].mean()
physical = df["physical_rating"].mean()
pace = df["pace"].mean()
# Recommend style based on squad
if pace > 75 and (df["position"] == "W").sum() >= 2:
style = "Counter-attacking"
elif technical > 80:
style = "Possession-based"
elif physical > 78:
style = "Direct/Physical"
elif young > 5:
style = "Development-focused"
else:
style = "Flexible"
return SquadProfile(
avg_age=avg_age,
young_players=young,
peak_players=peak,
technical_avg=technical,
physical_avg=physical,
pace_avg=pace,
recommended_style=style
)
def calculate_match_score(self, candidate: Dict,
priorities: Dict) -> float:
"""Calculate how well a candidate matches club needs."""
score = 0
profile = self.squad_profile
# Style match
if candidate["preferred_style"] == profile.recommended_style:
score += 25
elif candidate["tactical_flexibility"] > 70:
score += 15
# Youth development
if profile.young_players > 5 and candidate["youth_dev_score"] > 70:
score += 20
# Experience
if priorities.get("needs_experienced", False):
if candidate["experience_level"] in ["Experienced", "Very Experienced"]:
score += 20
# Availability
if candidate["available"]:
score += 15
# Pressing match
if profile.physical_avg > 75 and candidate["pressing_intensity"] == "Very High":
score += 10
# Tactical flexibility bonus
score += candidate["tactical_flexibility"] * 0.1
return score
def rank_candidates(self, candidates: List[Dict],
priorities: Dict) -> pd.DataFrame:
"""Rank all candidates by match score."""
results = []
for candidate in candidates:
score = self.calculate_match_score(candidate, priorities)
results.append({
"manager": candidate["name"],
"preferred_style": candidate["preferred_style"],
"experience": candidate["experience_level"],
"available": candidate["available"],
"match_score": score
})
return pd.DataFrame(results).sort_values("match_score", ascending=False)
def generate_shortlist(self, candidates: List[Dict],
priorities: Dict, top_n: int = 3) -> pd.DataFrame:
"""Generate shortlist of top candidates."""
ranked = self.rank_candidates(candidates, priorities)
shortlist = ranked.head(top_n)
# Add reasoning for each
reasons = []
for _, row in shortlist.iterrows():
reason = f"Score: {row['match_score']:.1f} - "
if row["preferred_style"] == self.squad_profile.recommended_style:
reason += "Style match; "
if row["available"]:
reason += "Available; "
if row["experience"] in ["Experienced", "Very Experienced"]:
reason += "Experienced; "
reasons.append(reason.rstrip("; "))
shortlist["reasoning"] = reasons
return shortlist
# Example usage
squad_data = pd.DataFrame({
"player": [f"Player {i}" for i in range(20)],
"age": np.random.randint(19, 34, 20),
"position": np.random.choice(["GK", "CB", "FB", "CM", "W", "ST"], 20),
"technical_skill": np.random.randint(65, 90, 20),
"physical_rating": np.random.randint(65, 85, 20),
"pace": np.random.randint(60, 90, 20)
})
analyzer = ManagerHiringAnalyzer(squad_data)
candidates = [
{"name": "Candidate A", "preferred_style": "Possession-based",
"pressing_intensity": "Medium", "youth_dev_score": 75,
"tactical_flexibility": 60, "experience_level": "Experienced", "available": True},
{"name": "Candidate B", "preferred_style": "Counter-attacking",
"pressing_intensity": "Very High", "youth_dev_score": 55,
"tactical_flexibility": 80, "experience_level": "Very Experienced", "available": True},
{"name": "Candidate C", "preferred_style": "Development-focused",
"pressing_intensity": "Medium", "youth_dev_score": 90,
"tactical_flexibility": 70, "experience_level": "Developing", "available": True}
]
priorities = {"needs_experienced": True}
shortlist = analyzer.generate_shortlist(candidates, priorities)
print(f"Squad Profile: {analyzer.squad_profile.recommended_style} style recommended")
print(f"\nCandidate Shortlist:")
print(shortlist.to_string(index=False))# R: Manager Hiring Analysis
library(tidyverse)
# Squad profile
analyze_squad_profile <- function(squad_data) {
squad_data %>%
summarise(
# Age profile
avg_age = mean(age),
young_players = sum(age < 23),
peak_players = sum(age >= 23 & age <= 29),
veteran_players = sum(age > 29),
# Style indicators
technical_avg = mean(technical_skill),
physical_avg = mean(physical_rating),
pace_avg = mean(pace),
# Positional strengths
has_quality_striker = max(position == "ST" & quality > 80),
has_quality_wingers = sum(position == "W" & quality > 75) >= 2,
has_technical_midfield = mean(technical_skill[position == "CM"]) > 75
) %>%
mutate(
# Ideal style recommendation
recommended_style = case_when(
pace_avg > 75 & has_quality_wingers ~ "Counter-attacking",
technical_avg > 80 & has_technical_midfield ~ "Possession-based",
physical_avg > 78 ~ "Direct/Physical",
young_players > 5 ~ "Development-focused",
TRUE ~ "Flexible"
)
)
}
# Manager candidate profiles
manager_candidates <- tribble(
~manager, ~preferred_style, ~pressing_intensity, ~possession_avg,
~youth_dev_score, ~tactical_flexibility, ~experience_level, ~availability,
"Candidate A", "Possession-based", "Medium", 62, 75, 60, "Experienced", TRUE,
"Candidate B", "Counter-attacking", "High", 45, 55, 80, "Experienced", FALSE,
"Candidate C", "Pressing", "Very High", 55, 85, 55, "Developing", TRUE,
"Candidate D", "Direct", "Low", 42, 60, 90, "Very Experienced", TRUE,
"Candidate E", "Flexible", "Medium", 52, 70, 85, "Experienced", TRUE
)
# Match score calculation
calculate_match_score <- function(candidate, squad_profile, club_priorities) {
score <- 0
# Style match
if (candidate$preferred_style == squad_profile$recommended_style) {
score <- score + 25
} else if (candidate$tactical_flexibility > 70) {
score <- score + 15 # Flexible managers can adapt
}
# Youth focus match
if (squad_profile$young_players > 5 & candidate$youth_dev_score > 70) {
score <- score + 20
}
# Experience requirements
if (club_priorities$needs_experienced & candidate$experience_level %in% c("Experienced", "Very Experienced")) {
score <- score + 20
}
# Availability
if (candidate$availability) {
score <- score + 15
}
# Pressing style match
if (squad_profile$physical_avg > 75 & candidate$pressing_intensity == "Very High") {
score <- score + 10
}
score
}
# Rank candidates
rank_candidates <- function(candidates, squad_profile, priorities) {
candidates %>%
rowwise() %>%
mutate(
match_score = calculate_match_score(
cur_data(),
squad_profile,
priorities
)
) %>%
ungroup() %>%
arrange(desc(match_score))
}
# Example usage
print("Manager Candidate Evaluation Framework Ready!")
print("Use rank_candidates() to evaluate potential hires")Coaching Staff Analytics
Modern football management is a team effort. Assistant coaches, set-piece specialists, and analytics staff all contribute to team performance. Understanding these contributions adds nuance to manager evaluation.
# Python: Coaching Staff Analysis
import pandas as pd
import numpy as np
from typing import Dict, List
class CoachingStaffAnalyzer:
"""Analyze contribution of coaching staff members."""
def __init__(self, staff_data: pd.DataFrame):
self.data = staff_data
def set_piece_coach_analysis(self) -> pd.DataFrame:
"""Analyze set piece coach effectiveness."""
sp_analysis = self.data.groupby("set_piece_coach").agg({
"season": "count",
"club": "nunique",
"set_piece_goals_for": "mean",
"set_piece_goals_against": "mean",
"total_goals_for": "mean"
}).reset_index()
sp_analysis.columns = ["set_piece_coach", "seasons", "clubs",
"avg_sp_for", "avg_sp_against", "avg_total"]
sp_analysis["sp_net"] = sp_analysis["avg_sp_for"] - sp_analysis["avg_sp_against"]
sp_analysis["sp_pct"] = sp_analysis["avg_sp_for"] / sp_analysis["avg_total"] * 100
return sp_analysis.sort_values("sp_net", ascending=False)
def continuity_analysis(self) -> pd.DataFrame:
"""Analyze impact of coaching staff continuity."""
df = self.data.sort_values(["club", "season"]).copy()
# Track changes
df["prev_manager"] = df.groupby("club")["manager"].shift(1)
df["prev_assistant"] = df.groupby("club")["assistant"].shift(1)
df["manager_changed"] = df["manager"] != df["prev_manager"]
df["assistant_continuity"] = df["assistant"] == df["prev_assistant"]
# Categorize scenarios
def categorize(row):
if pd.isna(row["prev_manager"]):
return None
if not row["manager_changed"]:
return "Stability"
elif row["assistant_continuity"]:
return "New manager, same assistant"
else:
return "Complete change"
df["scenario"] = df.apply(categorize, axis=1)
# Analyze by scenario
scenarios = df.dropna(subset=["scenario"]).groupby("scenario").agg({
"club": "count",
"total_goals_for": "mean"
}).reset_index()
scenarios.columns = ["scenario", "cases", "avg_goals"]
return scenarios
def coach_value(self, coach_name: str, role: str = "assistant") -> Dict:
"""Calculate value metrics for a specific coach."""
if role == "assistant":
coached = self.data[self.data["assistant"] == coach_name]
elif role == "set_piece":
coached = self.data[self.data["set_piece_coach"] == coach_name]
else:
return {"error": "Invalid role"}
if len(coached) == 0:
return {"error": "Coach not found"}
return {
"coach": coach_name,
"role": role,
"clubs": coached["club"].nunique(),
"seasons": len(coached),
"managers_worked_with": coached["manager"].nunique(),
"avg_goals": coached["total_goals_for"].mean()
}
def assistant_manager_pairs(self) -> pd.DataFrame:
"""Analyze successful manager-assistant pairings."""
pairs = self.data.groupby(["manager", "assistant"]).agg({
"season": "count",
"total_goals_for": "mean",
"set_piece_goals_for": "mean"
}).reset_index()
pairs.columns = ["manager", "assistant", "seasons_together",
"avg_goals", "avg_sp_goals"]
return pairs.sort_values("avg_goals", ascending=False)
# Example usage
staff_data = pd.DataFrame({
"season": ["2022-23", "2023-24", "2022-23", "2023-24"],
"club": ["Club A", "Club A", "Club B", "Club B"],
"manager": ["Manager 1", "Manager 1", "Manager 2", "Manager 3"],
"assistant": ["Assistant A", "Assistant A", "Assistant B", "Assistant B"],
"set_piece_coach": ["SP Coach X", "SP Coach X", "SP Coach Y", "SP Coach Y"],
"set_piece_goals_for": [15, 18, 8, 10],
"set_piece_goals_against": [8, 6, 12, 10],
"total_goals_for": [68, 72, 52, 55]
})
analyzer = CoachingStaffAnalyzer(staff_data)
print("Set Piece Coach Analysis:")
print(analyzer.set_piece_coach_analysis().to_string(index=False))
print("\nCoaching Continuity Impact:")
print(analyzer.continuity_analysis().to_string(index=False))# R: Coaching Staff Analysis
library(tidyverse)
# Track coach movements and team performance
coaching_staff_data <- tribble(
~season, ~club, ~manager, ~assistant, ~set_piece_coach,
~set_piece_goals_for, ~set_piece_goals_against, ~total_goals_for,
"2022-23", "Club A", "Manager 1", "Assistant A", "SP Coach X", 15, 8, 68,
"2023-24", "Club A", "Manager 1", "Assistant A", "SP Coach X", 18, 6, 72,
"2022-23", "Club B", "Manager 2", "Assistant B", "SP Coach Y", 8, 12, 52,
"2023-24", "Club B", "Manager 3", "Assistant B", "SP Coach Y", 10, 10, 55,
"2022-23", "Club C", "Manager 4", "Assistant C", "SP Coach X", 12, 9, 58,
"2023-24", "Club C", "Manager 4", "Assistant C", "SP Coach Z", 8, 14, 54
)
# Set piece coach impact
sp_coach_analysis <- coaching_staff_data %>%
group_by(set_piece_coach) %>%
summarise(
seasons = n(),
clubs = n_distinct(club),
avg_sp_goals_for = mean(set_piece_goals_for),
avg_sp_goals_against = mean(set_piece_goals_against),
sp_net = mean(set_piece_goals_for - set_piece_goals_against),
sp_pct_of_total = mean(set_piece_goals_for / total_goals_for) * 100,
.groups = "drop"
) %>%
arrange(desc(sp_net))
print("Set Piece Coach Analysis:")
print(sp_coach_analysis)
# Assistant coach continuity impact
continuity_analysis <- coaching_staff_data %>%
group_by(club) %>%
mutate(
manager_changed = manager != lag(manager),
assistant_continuity = assistant == lag(assistant)
) %>%
filter(!is.na(manager_changed)) %>%
mutate(
scenario = case_when(
!manager_changed ~ "Stability",
manager_changed & assistant_continuity ~ "New manager, same assistant",
manager_changed & !assistant_continuity ~ "Complete change"
)
)
scenario_impact <- continuity_analysis %>%
group_by(scenario) %>%
summarise(
cases = n(),
avg_goals = mean(total_goals_for),
.groups = "drop"
)
print("\nCoaching Continuity Impact:")
print(scenario_impact)
# Track individual coach performance
coach_value <- function(coach_data, coach_name, role = "assistant") {
if (role == "assistant") {
coached <- coach_data %>% filter(assistant == coach_name)
} else if (role == "set_piece") {
coached <- coach_data %>% filter(set_piece_coach == coach_name)
}
coached %>%
summarise(
clubs = n_distinct(club),
seasons = n(),
managers_worked_with = n_distinct(manager),
avg_performance = mean(total_goals_for),
.groups = "drop"
)
}
print("\nAssistant A Value:")
print(coach_value(coaching_staff_data, "Assistant A"))Assistant Manager
- Session planning quality
- Player relationships
- Transition continuity
Set Piece Coach
- SP goals for/against
- % of total goals from SP
- Defensive SP organization
Goalkeeping Coach
- GK development trajectory
- Save % vs xG faced
- Distribution improvements
Practice Exercises
Exercise 47.1: Manager Fingerprint Dashboard
Create a comprehensive manager comparison dashboard that visualizes tactical fingerprints using radar charts, and ranks managers on multiple dimensions.
- Normalize all metrics to 0-100 scale for comparison
- Include both in-possession and out-of-possession metrics
- Allow user selection of managers to compare
Exercise 47.2: Manager Value Model
Build a regression model that predicts team points based on squad value and wage bill, then use residuals to estimate manager impact. Validate using historical manager changes.
- Log-transform financial variables
- Consider using mixed-effects models
- Test model stability across different seasons
Exercise 47.3: Substitution Optimizer
Using historical match data, build a model that recommends optimal substitution timing based on score line, xG flow, and player fatigue.
- Model xG rate changes after substitutions
- Include game state (winning/drawing/losing)
- Factor in player characteristics
Exercise 47.4: Sacking Impact Simulator
Build a Monte Carlo simulation that models the expected outcomes of sacking vs. keeping a manager based on historical form, remaining fixtures, and replacement manager quality bands.
- Use historical post-sacking performance distributions
- Model regression to the mean for very poor/good runs
- Simulate at least 10,000 scenarios for confidence intervals
- Include caretaker vs. permanent appointment distinction
Exercise 47.5: Coaching Staff Network Analysis
Create a network graph visualization showing the relationships between managers and their coaching staff across multiple clubs and seasons. Identify "coaching trees" and track how assistant managers transition to head manager roles.
- Use NetworkX (Python) or igraph (R) for graph analysis
- Calculate betweenness centrality for key connectors
- Track success rates of different coaching tree branches
- Include set-piece coaches and analysts in the network
Exercise 47.6: Match Preparation Effectiveness
Analyze how teams perform against different opponent types (possession-heavy, pressing, defensive, etc.) and whether managers show consistent tactical preparation effectiveness. Include first-half vs. second-half analysis to assess half-time adjustments.
- Cluster opponents by playing style first
- Compare xG difference between first and second halves
- Look for patterns in formation changes
- Account for home/away effects
Exercise 47.7: Manager Style Evolution Tracker
Build a system that tracks how a manager's tactical style evolves over their career, identifying pivotal moments (new clubs, new assistants, tournament experiences) that correlate with style changes.
- Use rolling averages for tactical metrics over 15-20 matches
- Apply change point detection algorithms (PELT, Bayesian)
- Correlate changes with career events database
- Visualize evolution using animated timeline charts
Exercise 47.8: Hiring Decision Backtester
Create a backtesting framework for manager hiring decisions. Given historical hiring scenarios, test whether different candidate selection criteria would have yielded better outcomes using actual performance data.
- Define multiple selection strategies (style match, experience, age, etc.)
- Use holdout seasons for true out-of-sample testing
- Account for survivorship bias in available candidates
- Measure outcomes at 6-month, 1-year, and full-tenure windows
Summary
Key Takeaways
- Complex Attribution: Separating manager impact from squad quality requires sophisticated analytical approaches
- Tactical Fingerprints: Managers have distinctive tactical signatures that can be identified through playing style metrics
- Resource Adjustment: Performance should be evaluated relative to available resources, not just absolute results
- In-Game Decisions: Substitution timing and tactical changes provide measurable signals of managerial quality
- Development Skill: Some managers consistently improve players, though this is confounded by natural age-related development
- Sacking Timing: Data can help identify optimal sacking windows, though regression to mean often explains poor runs
- Hiring Analytics: Squad profile analysis should drive candidate selection to find style matches over reputation
- Staff Matters: Coaching staff continuity, particularly assistants and set-piece coaches, significantly affects team performance
Common Pitfalls
- Results-Only Evaluation: Judging managers purely on points ignores process quality, opponent strength, and variance - xG-based metrics provide better signals
- Ignoring Context: A manager inheriting a relegated squad should not be compared to one inheriting a championship team without adjustment
- Small Sample Size: Most managers have only 100-300 matches to evaluate; tactical fingerprints need at least 30+ matches to stabilize
- Survivorship Bias: Only analyzing employed managers misses lessons from those who failed early or never got top opportunities
- Confusing Style with Quality: Possession-based football isn't inherently better than counter-attacking; both can be executed well or poorly
- Linear Career Progression: Assuming managers continuously improve ignores that many plateau or decline after initial success
- Ignoring Adaptation: Great managers adapt to their squad; judging them against a single "ideal" style misses this flexibility
- Caretaker Performance: Post-sacking "bounce" is often due to regression to mean, not caretaker quality - don't overvalue short-term results
| Task | R | Python |
|---|---|---|
| Tactical Fingerprinting | factoextra, cluster |
sklearn.cluster, scipy |
| Resource-Adjusted Models | lme4, brms |
statsmodels, pymc3 |
| Network Analysis (Coaching Trees) | igraph, ggraph |
networkx, pyvis |
| Change Point Detection | changepoint, mcp |
ruptures, changepy |
| Monte Carlo Simulation | mc2d, base R |
numpy, scipy.stats |
| Radar Charts | fmsb, ggradar |
matplotlib, plotly |
| Survival Analysis (Tenure) | survival, survminer |
lifelines, scikit-survival |
Process Metrics
- xG Difference: xG For minus xG Against per 90
- PPDA: Passes allowed per defensive action (pressing intensity)
- Build-Up %: Possession in own third during build-up
- Progression Rate: Forward passes / total passes
- Defensive Line Height: Average Y position of defensive block
- Transition Speed: Time from turnover to shot attempt
Outcome Metrics
- Points Above Expected: Actual points - xPts
- Resource-Adjusted Points: Points relative to wage bill rank
- Player Development Index: Average skill improvement under manager
- Adaptation Score: Performance variance across opponent types
- Second-Half Performance: xG differential change H1 to H2
- Substitution Impact: xG rate change post-substitution
| Window | Matches | Reliable Metrics | Caution Areas |
|---|---|---|---|
| Immediate | 1-10 | Formation choice, pressing setup | Results highly variable; don't overreact |
| Short-term | 10-20 | Basic tactical fingerprint emerging | Still small sample; look for trends not absolutes |
| Medium-term | 20-40 | xG metrics stabilizing, style clear | Opponent adaptation may not be visible yet |
| Full Season | 40-60 | All metrics reliable, development visible | Context matters (injuries, transfers) |
| Multi-Season | 100+ | Career patterns, consistency assessment | Football evolution may invalidate old data |
Club Ownership
- Hiring decision support
- Sacking timing optimization
- Contract extension analysis
- Manager market valuation
Sporting Directors
- Style-squad fit assessment
- Candidate shortlisting
- Coaching staff composition
- Development pathway planning
Media & Analysts
- Narrative-busting analysis
- Historical comparisons
- Prediction models
- Tactical breakdowns
# Python: Complete Manager Analytics Pipeline
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from lifelines import KaplanMeierFitter
from dataclasses import dataclass
from typing import Dict, List, Optional, Tuple
@dataclass
class ManagerAnalyticsConfig:
"""Configuration for manager analytics pipeline."""
min_matches_fingerprint: int = 30
min_matches_resource: int = 20
n_style_clusters: int = 5
fingerprint_metrics: List[str] = None
def __post_init__(self):
if self.fingerprint_metrics is None:
self.fingerprint_metrics = [
"possession_pct", "ppda", "defensive_line_height",
"build_up_pct", "direct_play_pct", "pressing_triggers",
"avg_formation_width", "transition_speed"
]
class ManagerAnalyticsPipeline:
"""Complete manager analytics system."""
def __init__(self, config: ManagerAnalyticsConfig = None):
self.config = config or ManagerAnalyticsConfig()
self.scaler = StandardScaler()
self.kmeans = None
self.results = {}
def fit(self, matches_df: pd.DataFrame, managers_df: pd.DataFrame,
staff_df: pd.DataFrame) -> "ManagerAnalyticsPipeline":
"""Run full analytics pipeline."""
# 1. Tactical fingerprinting
self.results["fingerprints"] = self._compute_fingerprints(matches_df)
# 2. Resource-adjusted performance
self.results["resource_adjusted"] = self._compute_resource_adjusted(matches_df)
# 3. Tenure survival analysis
self.results["tenure_survival"] = self._compute_tenure_survival(managers_df)
# 4. Staff continuity impact
self.results["staff_impact"] = self._compute_staff_impact(staff_df)
# 5. Generate summary
self.results["summary"] = self._generate_summary(matches_df)
return self
def _compute_fingerprints(self, df: pd.DataFrame) -> pd.DataFrame:
"""Compute tactical fingerprints for each manager."""
metrics = self.config.fingerprint_metrics
available = [m for m in metrics if m in df.columns]
# Aggregate by manager
profiles = df.groupby("manager")[available].mean().reset_index()
profiles = profiles[profiles["manager"].map(
df.groupby("manager").size() >= self.config.min_matches_fingerprint
)]
if len(profiles) < 2:
return pd.DataFrame()
# Scale and cluster
scaled = self.scaler.fit_transform(profiles[available])
n_clusters = min(self.config.n_style_clusters, len(profiles))
self.kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=25)
profiles["style_cluster"] = self.kmeans.fit_predict(scaled)
cluster_labels = ["Possession", "High Press", "Counter", "Balanced", "Defensive"]
profiles["style_label"] = profiles["style_cluster"].map(
lambda x: cluster_labels[x % len(cluster_labels)]
)
return profiles
def _compute_resource_adjusted(self, df: pd.DataFrame) -> pd.DataFrame:
"""Compute resource-adjusted performance metrics."""
# Simple resource adjustment
if "wage_rank" not in df.columns:
df["wage_rank"] = 10 # default middle rank
df["expected_ppg"] = 3 - (df["wage_rank"] - 1) * 0.1 # rough expected
df["residual"] = df.get("points", df.get("ppg", 1.5)) - df["expected_ppg"]
adjusted = df.groupby("manager").agg({
"residual": "mean",
"manager": "count"
}).reset_index(drop=True)
adjusted.columns = ["avg_residual", "matches"]
adjusted["manager"] = df.groupby("manager").first().index
return adjusted.sort_values("avg_residual", ascending=False)
def _compute_tenure_survival(self, df: pd.DataFrame) -> Dict:
"""Survival analysis for manager tenure."""
if "tenure_days" not in df.columns:
return {"status": "insufficient data"}
kmf = KaplanMeierFitter()
results = {}
for level in df["resource_level"].unique():
subset = df[df["resource_level"] == level]
if len(subset) >= 10:
kmf.fit(
subset["tenure_days"],
event_observed=subset["was_sacked"]
)
results[level] = {
"median_tenure": kmf.median_survival_time_,
"survival_6m": kmf.survival_function_at_times([180]).values[0],
"n": len(subset)
}
return results
def _compute_staff_impact(self, df: pd.DataFrame) -> pd.DataFrame:
"""Analyze coaching staff continuity impact."""
if "assistant_retained" not in df.columns:
return pd.DataFrame()
impact = df.groupby("assistant_retained").agg({
"first_season_points": ["mean", "std", "count"],
"survived_relegation": "mean"
}).reset_index()
impact.columns = ["retained", "avg_points", "std_points",
"n_cases", "survival_rate"]
return impact
def _generate_summary(self, df: pd.DataFrame) -> Dict:
"""Generate pipeline summary."""
return {
"total_managers": df["manager"].nunique(),
"total_matches": len(df),
"style_distribution": self.results.get("fingerprints", pd.DataFrame())[
"style_label"
].value_counts().to_dict() if "fingerprints" in self.results else {},
"pipeline_status": "complete"
}
def get_manager_report(self, manager_name: str) -> Dict:
"""Generate individual manager report."""
report = {"manager": manager_name}
# Get fingerprint
if "fingerprints" in self.results:
fp = self.results["fingerprints"]
if manager_name in fp["manager"].values:
row = fp[fp["manager"] == manager_name].iloc[0]
report["style"] = row["style_label"]
# Get resource-adjusted
if "resource_adjusted" in self.results:
ra = self.results["resource_adjusted"]
if manager_name in ra["manager"].values:
row = ra[ra["manager"] == manager_name].iloc[0]
report["resource_adjusted_ppg"] = round(row["avg_residual"], 3)
report["matches_analyzed"] = int(row["matches"])
return report
# Example usage
config = ManagerAnalyticsConfig(
min_matches_fingerprint=25,
n_style_clusters=4
)
pipeline = ManagerAnalyticsPipeline(config)
print("Manager Analytics Pipeline Initialized")
print(f"Fingerprint metrics: {config.fingerprint_metrics}")
print(f"Style clusters: {config.n_style_clusters}")
print("\nReady to process match, manager, and staff data")# R: Complete Manager Analytics Pipeline
library(tidyverse)
library(cluster)
library(survival)
# Define the comprehensive analyzer
create_manager_analytics <- function(matches_data, managers_data, staff_data) {
# 1. Tactical Fingerprint Analysis
fingerprint_metrics <- c(
"possession_pct", "ppda", "defensive_line_height",
"build_up_pct", "direct_play_pct", "pressing_triggers",
"avg_formation_width", "transition_speed"
)
tactical_profiles <- matches_data %>%
group_by(manager) %>%
summarise(across(all_of(fingerprint_metrics), mean, na.rm = TRUE)) %>%
mutate(across(all_of(fingerprint_metrics), scale))
# Cluster managers by style
km_model <- kmeans(tactical_profiles[, fingerprint_metrics], centers = 5, nstart = 25)
tactical_profiles$style_cluster <- km_model$cluster
cluster_labels <- c("Possession-Based", "High Press", "Counter-Attack",
"Balanced", "Defensive")
tactical_profiles$style_label <- cluster_labels[tactical_profiles$style_cluster]
# 2. Resource-Adjusted Performance
resource_model <- lm(
points_per_game ~ log(wage_bill_rank) + log(squad_value_rank) +
manager + home_pct + injury_burden,
data = matches_data
)
manager_residuals <- matches_data %>%
group_by(manager) %>%
summarise(
avg_residual = mean(residuals(resource_model)[manager == cur_group()[[1]]]),
matches = n()
) %>%
filter(matches >= 30) %>%
arrange(desc(avg_residual))
# 3. Survival Analysis (Tenure Prediction)
tenure_data <- managers_data %>%
mutate(
tenure_days = as.numeric(end_date - start_date),
was_sacked = !is.na(sacking_date),
censored = ifelse(is.na(end_date), 1, 0)
)
survival_model <- survfit(
Surv(tenure_days, was_sacked) ~ resource_level,
data = tenure_data
)
# 4. Staff Continuity Impact
staff_analysis <- staff_data %>%
group_by(assistant_retained_post_sacking) %>%
summarise(
cases = n(),
avg_first_season_points = mean(first_season_points),
avg_relegation_survival = mean(survived_relegation),
.groups = "drop"
)
# Return comprehensive results
list(
tactical_profiles = tactical_profiles,
resource_adjusted = manager_residuals,
tenure_survival = survival_model,
staff_impact = staff_analysis,
summary = list(
total_managers = n_distinct(matches_data$manager),
style_distribution = table(tactical_profiles$style_label),
avg_tenure = mean(tenure_data$tenure_days, na.rm = TRUE)
)
)
}
# Example usage
cat("Manager Analytics Pipeline Ready")
cat("\nComponents: Fingerprinting, Resource-Adjustment, Survival, Staff Analysis")
cat("\nRecommendation: Run with minimum 3 seasons of data for reliable results")Before Hiring Analysis
- Squad profile completed (age, style, strengths)
- Style compatibility matrix built
- Candidate fingerprints extracted (30+ matches each)
- Budget constraints defined
- Development vs. win-now priority set
- Coaching staff compatibility assessed
Before Sacking Analysis
- Regression to mean calculated
- xG-based metrics reviewed (not just points)
- Remaining fixture difficulty assessed
- Available replacement candidates mapped
- Staff continuity plan considered
- Historical sacking outcome data reviewed
Manager analytics remains one of the most challenging areas in football analytics due to small sample sizes and confounding factors. However, combining multiple approaches - tactical fingerprinting, resource-adjusted performance, in-game decision evaluation, and staff impact analysis - can provide meaningful insights into managerial effectiveness. The key is to resist the temptation to draw conclusions from insufficient data and to always contextualize findings within the broader football environment. As tracking data becomes more detailed and managerial career databases grow, the precision of these analyses will continue to improve.