Chapter 60

Capstone - Complete Analytics System

Intermediate 30 min read 5 sections 10 code examples
0 of 60 chapters completed (0%)

Introduction to Manager Analytics

Football managers are often credited with success or blamed for failure, but analytically evaluating their impact is complex. How do we separate manager skill from player quality? Can we identify distinctive tactical signatures? This chapter explores frameworks for manager analysis.

The Manager Evaluation Challenge

Managers influence team performance through tactics, player development, squad management, and motivation. Isolating their contribution from the players' inherent abilities and organizational factors requires careful analytical approaches.

Building a manager performance analysis framework

# Manager Performance Analysis Framework
import pandas as pd
import numpy as np

# Manager career data
manager_data = pd.DataFrame({
    "manager": ["Manager A"]*3 + ["Manager B"]*3 + ["Manager C"]*3,
    "club": ["Club X"]*3 + ["Club Y", "Club Y", "Club Z"] + ["Club W", "Club W", "Club V"],
    "season": ["2021-22", "2022-23", "2023-24"]*3,
    "matches": [38]*9,
    "wins": [22, 25, 28, 15, 18, 20, 12, 16, 14],
    "draws": [8, 7, 5, 10, 9, 10, 12, 10, 8],
    "losses": [8, 6, 5, 13, 11, 8, 14, 12, 16],
    "goals_for": [68, 75, 82, 52, 58, 62, 45, 50, 48],
    "goals_against": [35, 32, 28, 48, 45, 40, 55, 48, 58],
    "xG_for": [62.5, 70.1, 78.3, 48.2, 55.6, 60.2, 42.1, 52.4, 45.8],
    "xG_against": [38.2, 35.8, 30.5, 50.1, 47.3, 42.8, 52.8, 48.9, 55.2],
    "budget_rank": [3, 2, 1, 8, 7, 5, 12, 10, 15]
})

# Calculate metrics
manager_data["points"] = manager_data["wins"] * 3 + manager_data["draws"]
manager_data["ppg"] = manager_data["points"] / manager_data["matches"]
manager_data["gd"] = manager_data["goals_for"] - manager_data["goals_against"]
manager_data["xgd"] = manager_data["xG_for"] - manager_data["xG_against"]
manager_data["goal_overperformance"] = manager_data["goals_for"] - manager_data["xG_for"]
manager_data["defensive_overperformance"] = manager_data["xG_against"] - manager_data["goals_against"]
manager_data["total_overperformance"] = manager_data["gd"] - manager_data["xgd"]
manager_data["win_rate"] = manager_data["wins"] / manager_data["matches"] * 100

# Career summaries
def weighted_avg(group, value_col, weight_col):
    return np.average(group[value_col], weights=group[weight_col])

career_summary = manager_data.groupby("manager").apply(
    lambda g: pd.Series({
        "seasons": len(g),
        "total_matches": g["matches"].sum(),
        "avg_ppg": weighted_avg(g, "ppg", "matches"),
        "avg_win_rate": weighted_avg(g, "win_rate", "matches"),
        "avg_budget_rank": g["budget_rank"].mean(),
        "avg_xgd": weighted_avg(g, "xgd", "matches"),
        "avg_overperformance": weighted_avg(g, "total_overperformance", "matches")
    })
).reset_index()

# Resource-adjusted performance
career_summary["resource_adjusted_ppg"] = (
    career_summary["avg_ppg"] * (career_summary["avg_budget_rank"] / 10)
)

career_summary = career_summary.sort_values("avg_ppg", ascending=False)

print("Manager Career Summaries:")
print(career_summary.to_string(index=False))

# Manager Performance Analysis Framework
library(tidyverse)

# Sample manager career data
manager_data <- tribble(
  ~manager, ~club, ~season, ~matches, ~wins, ~draws, ~losses,
  ~goals_for, ~goals_against, ~xG_for, ~xG_against, ~budget_rank,
  "Manager A", "Club X", "2021-22", 38, 22, 8, 8, 68, 35, 62.5, 38.2, 3,
  "Manager A", "Club X", "2022-23", 38, 25, 7, 6, 75, 32, 70.1, 35.8, 2,
  "Manager A", "Club X", "2023-24", 38, 28, 5, 5, 82, 28, 78.3, 30.5, 1,
  "Manager B", "Club Y", "2021-22", 38, 15, 10, 13, 52, 48, 48.2, 50.1, 8,
  "Manager B", "Club Y", "2022-23", 38, 18, 9, 11, 58, 45, 55.6, 47.3, 7,
  "Manager B", "Club Z", "2023-24", 38, 20, 10, 8, 62, 40, 60.2, 42.8, 5,
  "Manager C", "Club W", "2021-22", 38, 12, 12, 14, 45, 55, 42.1, 52.8, 12,
  "Manager C", "Club W", "2022-23", 38, 16, 10, 12, 50, 48, 52.4, 48.9, 10,
  "Manager C", "Club V", "2023-24", 38, 14, 8, 16, 48, 58, 45.8, 55.2, 15
)

# Calculate performance metrics
manager_metrics <- manager_data %>%
  mutate(
    # Points and PPG
    points = wins * 3 + draws,
    ppg = points / matches,

    # Goal difference and xG difference
    gd = goals_for - goals_against,
    xgd = xG_for - xG_against,

    # Overperformance (actual vs expected)
    goal_overperformance = goals_for - xG_for,
    defensive_overperformance = xG_against - goals_against,
    total_overperformance = gd - xgd,

    # Win rate
    win_rate = wins / matches * 100
  )

# Career summaries
career_summary <- manager_metrics %>%
  group_by(manager) %>%
  summarise(
    seasons = n(),
    total_matches = sum(matches),
    avg_ppg = weighted.mean(ppg, matches),
    avg_win_rate = weighted.mean(win_rate, matches),
    avg_budget_rank = mean(budget_rank),
    avg_xgd = weighted.mean(xgd, matches),
    avg_overperformance = weighted.mean(total_overperformance, matches),
    .groups = "drop"
  ) %>%
  mutate(
    # Performance relative to resources
    resource_adjusted_ppg = avg_ppg * (avg_budget_rank / 10)
  ) %>%
  arrange(desc(avg_ppg))

print("Manager Career Summaries:")
print(career_summary)

Tactical Fingerprints

Every manager has distinctive tactical tendencies that create a "fingerprint" visible in team performance data. Identifying these patterns helps understand managerial style and predict how teams will perform under new leadership.

Attacking Style

Build-up patterns, shot locations, crossing tendency

Defensive Style

Pressing height, defensive line, duels approach

Transition Style

Counter-attacking pace, possession recycling

Analyzing manager tactical fingerprints

# Tactical Fingerprint Analysis
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.cluster import KMeans

# Team tactical metrics by manager
tactical_data = pd.DataFrame({
    "manager": ["Guardiola", "Klopp", "Mourinho", "Arteta", "Conte", "Ancelotti"],
    "possession": [67.2, 58.5, 45.2, 62.8, 48.5, 55.2],
    "ppda": [7.8, 8.2, 12.5, 9.1, 11.2, 10.5],
    "deep_completions": [15.2, 12.8, 8.5, 13.5, 9.2, 11.8],
    "crosses_per_90": [18.5, 22.3, 15.2, 16.8, 20.5, 19.2],
    "shots_from_counter": [8, 18, 22, 12, 25, 15],
    "high_press_pct": [42, 55, 28, 48, 32, 38],
    "avg_pass_length": [14.2, 16.8, 18.5, 15.5, 17.2, 16.2],
    "progressive_passes_p90": [78.5, 65.2, 42.8, 72.1, 48.5, 58.3]
})

class TacticalFingerprint:
    """Analyze manager tactical styles"""

    def __init__(self, data):
        self.data = data
        self.metrics = [c for c in data.columns if c != "manager"]
        self._normalize()

    def _normalize(self):
        """Normalize metrics to 0-100 scale"""
        scaler = MinMaxScaler(feature_range=(0, 100))
        normalized = scaler.fit_transform(self.data[self.metrics])

        self.normalized = self.data[["manager"]].copy()
        for i, metric in enumerate(self.metrics):
            self.normalized[f"{metric}_norm"] = normalized[:, i]

    def get_profile(self, manager_name):
        """Get tactical profile for a manager"""
        profile = self.normalized[self.normalized["manager"] == manager_name]
        return profile.iloc[0]

    def compare_managers(self, manager1, manager2):
        """Compare tactical profiles of two managers"""
        p1 = self.get_profile(manager1)
        p2 = self.get_profile(manager2)

        comparison = []
        for metric in self.metrics:
            norm_metric = f"{metric}_norm"
            comparison.append({
                "metric": metric,
                manager1: p1[norm_metric],
                manager2: p2[norm_metric],
                "difference": p1[norm_metric] - p2[norm_metric]
            })

        return pd.DataFrame(comparison)

    def cluster_styles(self, n_clusters=3):
        """Cluster managers by tactical style"""
        norm_cols = [c for c in self.normalized.columns if c.endswith("_norm")]
        X = self.normalized[norm_cols].values

        kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=10)
        clusters = kmeans.fit_predict(X)

        self.normalized["style_cluster"] = clusters

        # Label clusters based on characteristics
        cluster_labels = {}
        for i in range(n_clusters):
            cluster_data = self.normalized[self.normalized["style_cluster"] == i]
            avg_possession = cluster_data["possession_norm"].mean()
            avg_counter = cluster_data["shots_from_counter_norm"].mean()

            if avg_possession > 60:
                cluster_labels[i] = "Possession-based"
            elif avg_counter > 60:
                cluster_labels[i] = "Counter-attacking"
            else:
                cluster_labels[i] = "Balanced"

        self.normalized["style_label"] = self.normalized["style_cluster"].map(cluster_labels)

        return self.normalized[["manager", "style_cluster", "style_label"]]


# Analysis
fingerprint = TacticalFingerprint(tactical_data)

# Compare managers
comparison = fingerprint.compare_managers("Guardiola", "Mourinho")
print("Guardiola vs Mourinho Tactical Comparison:")
print(comparison.to_string(index=False))

# Cluster by style
styles = fingerprint.cluster_styles(n_clusters=3)
print("\nManager Style Clusters:")
print(styles.to_string(index=False))

# Tactical Fingerprint Analysis
library(tidyverse)

# Team tactical metrics by manager
tactical_data <- tribble(
  ~manager, ~season, ~possession, ~ppda, ~deep_completions,
  ~crosses_per_90, ~shots_from_counter, ~high_press_pct,
  ~avg_pass_length, ~progressive_passes_p90,
  "Guardiola", "2023-24", 67.2, 7.8, 15.2, 18.5, 8, 42, 14.2, 78.5,
  "Klopp", "2023-24", 58.5, 8.2, 12.8, 22.3, 18, 55, 16.8, 65.2,
  "Mourinho", "2023-24", 45.2, 12.5, 8.5, 15.2, 22, 28, 18.5, 42.8,
  "Arteta", "2023-24", 62.8, 9.1, 13.5, 16.8, 12, 48, 15.5, 72.1,
  "Conte", "2023-24", 48.5, 11.2, 9.2, 20.5, 25, 32, 17.2, 48.5,
  "Ancelotti", "2023-24", 55.2, 10.5, 11.8, 19.2, 15, 38, 16.2, 58.3
)

# Normalize metrics for comparison (0-100 scale)
normalize <- function(x) {
  (x - min(x)) / (max(x) - min(x)) * 100
}

tactical_normalized <- tactical_data %>%
  mutate(across(possession:progressive_passes_p90, normalize, .names = "{.col}_norm"))

# Create tactical profile
create_tactical_profile <- function(manager_name, data) {
  profile <- data %>%
    filter(manager == manager_name) %>%
    select(ends_with("_norm")) %>%
    pivot_longer(everything(), names_to = "metric", values_to = "value") %>%
    mutate(metric = str_remove(metric, "_norm"))

  return(profile)
}

# Compare two managers
compare_managers <- function(manager1, manager2, data) {
  p1 <- create_tactical_profile(manager1, data) %>%
    rename(!!manager1 := value)
  p2 <- create_tactical_profile(manager2, data) %>%
    rename(!!manager2 := value)

  comparison <- p1 %>%
    left_join(p2, by = "metric") %>%
    mutate(difference = .data[[manager1]] - .data[[manager2]])

  return(comparison)
}

# Example comparison
comparison <- compare_managers("Guardiola", "Mourinho", tactical_normalized)
print("Guardiola vs Mourinho Tactical Comparison:")
print(comparison)

# Cluster managers by style
library(cluster)

style_matrix <- tactical_normalized %>%
  select(manager, ends_with("_norm")) %>%
  column_to_rownames("manager")

# K-means clustering
set.seed(42)
clusters <- kmeans(style_matrix, centers = 3)

tactical_normalized$style_cluster <- clusters$cluster
tactical_normalized <- tactical_normalized %>%
  mutate(
    style_label = case_when(
      style_cluster == 1 ~ "Possession-based",
      style_cluster == 2 ~ "Counter-attacking",
      style_cluster == 3 ~ "Balanced"
    )
  )

print("\nManager Style Clusters:")
print(tactical_normalized %>% select(manager, style_label))

Measuring Manager Impact

Isolating a manager's impact from squad quality is one of the most challenging problems in football analytics. Several approaches can help estimate the "manager effect".

Attribution Challenges
  • Managers inherit squads they didn't build
  • Transfer market success is partly luck and partly club infrastructure
  • Short tenures limit sample sizes
  • Player improvement could be natural development
  • Fixture difficulty varies between clubs and seasons
Estimating manager impact on team performance

# Manager Impact Estimation
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression

# Historical data
historical_data = pd.DataFrame({
    "manager": ["Manager A"]*2 + ["Manager B"] + ["Manager C"]*2 + ["Manager D"] +
               ["Manager E"]*2 + ["Manager F"],
    "club": ["Club X"]*3 + ["Club Y"]*3 + ["Club Z"]*3,
    "season": ["2020-21", "2021-22", "2022-23"]*3,
    "squad_value_m": [450, 520, 580, 280, 310, 350, 180, 195, 220],
    "wage_bill_m": [180, 200, 220, 95, 105, 115, 65, 70, 80],
    "points": [74, 86, 75, 58, 65, 52, 45, 55, 62],
    "xG_for": [58.5, 72.3, 65.2, 48.2, 55.8, 45.1, 38.5, 48.2, 52.8],
    "xG_against": [42.1, 32.5, 38.8, 52.3, 48.1, 55.2, 58.2, 52.1, 45.5]
})

class ManagerImpactAnalyzer:
    """Estimate manager impact on team performance"""

    def __init__(self, data):
        self.data = data.copy()
        self._calculate_metrics()

    def _calculate_metrics(self):
        """Calculate resource index and expected points"""
        self.data["resource_index"] = (
            self.data["squad_value_m"] + self.data["wage_bill_m"] * 2
        ) / 3

        # Fit resource -> points model
        X = self.data[["resource_index"]].values
        y = self.data["points"].values

        self.model = LinearRegression()
        self.model.fit(X, y)

        self.data["expected_points"] = self.model.predict(X)
        self.data["points_above_expected"] = (
            self.data["points"] - self.data["expected_points"]
        )

    def get_manager_impact(self):
        """Summarize manager impact"""
        impact = self.data.groupby("manager").agg({
            "season": "count",
            "resource_index": "mean",
            "points": "mean",
            "expected_points": "mean",
            "points_above_expected": ["mean", "sum"]
        }).reset_index()

        impact.columns = ["manager", "seasons", "avg_resources", "avg_points",
                         "avg_expected_points", "avg_above_expected", "total_above_expected"]

        return impact.sort_values("avg_above_expected", ascending=False)

    def before_after_analysis(self, club_name):
        """Analyze before/after for manager changes at a club"""
        club_data = self.data[self.data["club"] == club_name].sort_values("season")

        # Identify manager changes
        club_data["manager_change"] = club_data["manager"] != club_data["manager"].shift(1)
        club_data["period"] = club_data["manager_change"].cumsum()

        # Summarize by period
        period_summary = club_data.groupby(["manager", "period"]).agg({
            "points": "mean",
            "expected_points": "mean",
            "points_above_expected": "mean"
        }).reset_index()

        return period_summary

    def rank_managers(self, min_seasons=2):
        """Rank managers by impact, filtered by minimum tenure"""
        impact = self.get_manager_impact()
        qualified = impact[impact["seasons"] >= min_seasons]
        return qualified.sort_values("avg_above_expected", ascending=False)


# Analysis
analyzer = ManagerImpactAnalyzer(historical_data)

print("Manager Impact (Points Above Resource Expectation):")
print(analyzer.get_manager_impact().to_string(index=False))

print("\nClub X Before/After Manager Changes:")
print(analyzer.before_after_analysis("Club X").to_string(index=False))

# Manager Impact Estimation
library(tidyverse)
library(lme4)

# Historical data with squad values
historical_data <- tribble(
  ~manager, ~club, ~season, ~squad_value_m, ~wage_bill_m,
  ~points, ~xG_for, ~xG_against, ~final_position,
  "Manager A", "Club X", "2020-21", 450, 180, 74, 58.5, 42.1, 4,
  "Manager A", "Club X", "2021-22", 520, 200, 86, 72.3, 32.5, 2,
  "Manager B", "Club X", "2022-23", 580, 220, 75, 65.2, 38.8, 4,
  "Manager C", "Club Y", "2020-21", 280, 95, 58, 48.2, 52.3, 10,
  "Manager C", "Club Y", "2021-22", 310, 105, 65, 55.8, 48.1, 7,
  "Manager D", "Club Y", "2022-23", 350, 115, 52, 45.1, 55.2, 12,
  "Manager E", "Club Z", "2020-21", 180, 65, 45, 38.5, 58.2, 15,
  "Manager E", "Club Z", "2021-22", 195, 70, 55, 48.2, 52.1, 11,
  "Manager F", "Club Z", "2022-23", 220, 80, 62, 52.8, 45.5, 8
)

# Calculate expected points based on resources
historical_data <- historical_data %>%
  mutate(
    # Log-transform financial metrics
    log_value = log(squad_value_m),
    log_wage = log(wage_bill_m),

    # Combined resource metric
    resource_index = (squad_value_m + wage_bill_m * 2) / 3
  )

# Simple regression: Points ~ Resources
resource_model <- lm(points ~ resource_index, data = historical_data)

# Add expected points and residual (manager effect proxy)
historical_data <- historical_data %>%
  mutate(
    expected_points = predict(resource_model, newdata = .),
    points_above_expected = points - expected_points
  )

# Manager performance summary
manager_impact <- historical_data %>%
  group_by(manager) %>%
  summarise(
    seasons = n(),
    avg_resources = mean(resource_index),
    avg_points = mean(points),
    avg_expected_points = mean(expected_points),
    avg_points_above_expected = mean(points_above_expected),
    total_overperformance = sum(points_above_expected),
    .groups = "drop"
  ) %>%
  arrange(desc(avg_points_above_expected))

print("Manager Impact (Points Above Resource Expectation):")
print(manager_impact)

# Mixed effects model (accounting for club random effects)
# Requires more data in practice
# manager_effect_model <- lmer(
#   points ~ resource_index + (1|club) + (1|manager),
#   data = historical_data
# )

# Before/After comparison for manager changes
before_after_analysis <- function(data, club_name) {
  club_data <- data %>%
    filter(club == club_name) %>%
    arrange(season)

  # Identify manager changes
  club_data <- club_data %>%
    mutate(
      manager_change = manager != lag(manager),
      period = cumsum(replace_na(manager_change, FALSE))
    )

  # Compare periods
  period_summary <- club_data %>%
    group_by(manager, period) %>%
    summarise(
      avg_points = mean(points),
      avg_expected = mean(expected_points),
      overperformance = mean(points_above_expected),
      .groups = "drop"
    )

  return(period_summary)
}

print("\nClub X Before/After Manager Changes:")
print(before_after_analysis(historical_data, "Club X"))

In-Game Management Analytics

Managers make dozens of decisions during matches: substitutions, tactical adjustments, and motivational interventions. Analyzing these in-game decisions provides insights into managerial skill.

Analyzing in-game management decisions

# Substitution Analytics
import pandas as pd
import numpy as np

# Substitution data
substitutions = pd.DataFrame({
    "match_id": [1, 1, 2, 3, 3, 4],
    "manager": ["Manager A"]*3 + ["Manager B"]*3,
    "minute": [62, 75, 55, 70, 82, 88],
    "player_off": ["Player X", "Player Z", "Player V", "Player A", "Player C", "Player E"],
    "player_on": ["Player Y", "Player W", "Player U", "Player B", "Player D", "Player F"],
    "score_diff": [-1, 0, 1, 0, 0, -1],
    "xG_diff_before": [-0.8, 0.2, 1.2, -0.3, 0.1, -0.5],
    "xG_diff_after": [0.5, 0.3, 0.8, 0.2, 0.4, -0.2],
    "result_change": ["improved", "maintained", "declined", "improved", "improved", "improved"]
})

class InGameAnalytics:
    """Analyze in-game management decisions"""

    def __init__(self, sub_data):
        self.subs = sub_data

    def substitution_effectiveness(self):
        """Analyze substitution effectiveness by manager"""
        effectiveness = self.subs.groupby("manager").apply(
            lambda g: pd.Series({
                "total_subs": len(g),
                "avg_minute": g["minute"].mean(),
                "improved_pct": (g["result_change"] == "improved").mean() * 100,
                "when_losing": (g["score_diff"] < 0).sum(),
                "when_losing_improved": (
                    (g["score_diff"] < 0) & (g["result_change"] == "improved")
                ).sum()
            })
        ).reset_index()

        effectiveness["losing_success_rate"] = (
            effectiveness["when_losing_improved"] /
            effectiveness["when_losing"].replace(0, np.nan) * 100
        )

        return effectiveness

    def timing_analysis(self):
        """Analyze substitution timing patterns"""
        subs = self.subs.copy()

        def timing_bucket(minute):
            if minute < 60: return "Early (< 60)"
            elif minute < 75: return "Mid (60-75)"
            elif minute < 85: return "Late (75-85)"
            return "Very Late (85+)"

        subs["timing_bucket"] = subs["minute"].apply(timing_bucket)

        timing = subs.groupby(["manager", "timing_bucket"]).agg({
            "match_id": "count",
            "result_change": lambda x: (x == "improved").mean() * 100
        }).reset_index()

        timing.columns = ["manager", "timing_bucket", "count", "success_rate"]
        return timing

    def analyze_situational_changes(self):
        """Analyze substitutions by game situation"""
        situations = {
            "winning": self.subs[self.subs["score_diff"] > 0],
            "drawing": self.subs[self.subs["score_diff"] == 0],
            "losing": self.subs[self.subs["score_diff"] < 0]
        }

        analysis = []
        for situation, data in situations.items():
            if len(data) > 0:
                analysis.append({
                    "situation": situation,
                    "count": len(data),
                    "avg_minute": data["minute"].mean(),
                    "success_rate": (data["result_change"] == "improved").mean() * 100
                })

        return pd.DataFrame(analysis)


# Analysis
analyzer = InGameAnalytics(substitutions)

print("Substitution Effectiveness by Manager:")
print(analyzer.substitution_effectiveness().to_string(index=False))

print("\nSubstitution Timing Analysis:")
print(analyzer.timing_analysis().to_string(index=False))

print("\nSituational Analysis:")
print(analyzer.analyze_situational_changes().to_string(index=False))

# Substitution Analytics
library(tidyverse)

# Substitution data
substitutions <- tribble(
  ~match_id, ~manager, ~minute, ~player_off, ~player_on, ~score_diff,
  ~xG_diff_before, ~xG_diff_after, ~result_change,
  1, "Manager A", 62, "Player X", "Player Y", -1, -0.8, 0.5, "improved",
  1, "Manager A", 75, "Player Z", "Player W", 0, 0.2, 0.3, "maintained",
  2, "Manager A", 55, "Player V", "Player U", 1, 1.2, 0.8, "declined",
  3, "Manager B", 70, "Player A", "Player B", 0, -0.3, 0.2, "improved",
  3, "Manager B", 82, "Player C", "Player D", 0, 0.1, 0.4, "improved",
  4, "Manager B", 88, "Player E", "Player F", -1, -0.5, -0.2, "improved"
)

# Substitution effectiveness analysis
sub_effectiveness <- substitutions %>%
  group_by(manager) %>%
  summarise(
    total_subs = n(),
    avg_minute = mean(minute),
    improved_pct = mean(result_change == "improved") * 100,
    when_losing = sum(score_diff < 0),
    when_losing_improved = sum(score_diff < 0 & result_change == "improved"),
    losing_success_rate = when_losing_improved / when_losing * 100,
    .groups = "drop"
  )

print("Substitution Effectiveness by Manager:")
print(sub_effectiveness)

# Timing analysis
timing_analysis <- substitutions %>%
  mutate(
    timing_bucket = case_when(
      minute < 60 ~ "Early (< 60)",
      minute < 75 ~ "Mid (60-75)",
      minute < 85 ~ "Late (75-85)",
      TRUE ~ "Very Late (85+)"
    )
  ) %>%
  group_by(manager, timing_bucket) %>%
  summarise(
    count = n(),
    success_rate = mean(result_change == "improved") * 100,
    .groups = "drop"
  )

print("\nSubstitution Timing Analysis:")
print(timing_analysis)

# Tactical change detection (simplified)
tactical_changes <- tribble(
  ~match_id, ~manager, ~minute, ~change_type, ~formation_before, ~formation_after,
  ~xG_rate_before, ~xG_rate_after,
  1, "Manager A", 55, "offensive", "4-3-3", "3-4-3", 0.015, 0.025,
  2, "Manager A", 70, "defensive", "4-3-3", "5-3-2", 0.022, 0.010,
  3, "Manager B", 60, "balanced", "4-4-2", "4-3-3", 0.018, 0.020,
  4, "Manager B", 78, "offensive", "4-2-3-1", "4-1-3-2", 0.012, 0.028
)

# Tactical change effectiveness
tactical_effectiveness <- tactical_changes %>%
  mutate(
    xG_improvement = xG_rate_after - xG_rate_before,
    effective = xG_improvement > 0 |
      (change_type == "defensive" & xG_improvement < 0)
  ) %>%
  group_by(manager) %>%
  summarise(
    total_changes = n(),
    effective_changes = sum(effective),
    effectiveness_rate = effective_changes / total_changes * 100,
    avg_xG_improvement = mean(xG_improvement),
    .groups = "drop"
  )

print("\nTactical Change Effectiveness:")
print(tactical_effectiveness)

Player Development Under Managers

One of the most important but hardest-to-measure managerial skills is player development. Some managers consistently improve players, while others seem to extract less than expected.

Analyzing player development under managers

# Player Development Analysis
import pandas as pd
import numpy as np

# Player trajectories
player_trajectories = pd.DataFrame({
    "player": ["Player A"]*3 + ["Player B"]*3 + ["Player C"]*3,
    "manager": ["Manager X", "Manager X", "Manager Y",
                "Manager X", "Manager X", "Manager X",
                "Manager Y", "Manager Y", "Manager Z"],
    "season": ["2021-22", "2022-23", "2023-24"]*3,
    "age": [23, 24, 25, 21, 22, 23, 26, 27, 28],
    "xG_p90": [0.35, 0.42, 0.38, 0.15, 0.28, 0.35, 0.45, 0.42, 0.38],
    "xA_p90": [0.18, 0.22, 0.20, 0.12, 0.18, 0.25, 0.28, 0.25, 0.22],
    "press_p90": [22.5, 25.8, 23.2, 18.5, 24.2, 28.5, 20.1, 19.8, 18.2],
    "prog_passes_p90": [5.2, 6.1, 5.8, 4.2, 5.8, 7.2, 6.5, 6.2, 5.8]
})

class PlayerDevelopmentAnalyzer:
    """Analyze player development under different managers"""

    def __init__(self, trajectories):
        self.data = trajectories.sort_values(["player", "season"])
        self._calculate_growth()

    def _calculate_growth(self):
        """Calculate season-over-season growth rates"""
        df = self.data.copy()

        for metric in ["xG_p90", "xA_p90", "press_p90"]:
            df[f"{metric}_growth"] = df.groupby("player")[metric].pct_change() * 100

        self.data = df

    def manager_development_scores(self):
        """Calculate development scores by manager"""
        growth_data = self.data.dropna(subset=["xG_p90_growth"])

        scores = growth_data.groupby("manager").agg({
            "player": "nunique",
            "xG_p90_growth": "mean",
            "xA_p90_growth": "mean",
            "press_p90_growth": "mean"
        }).reset_index()

        scores.columns = ["manager", "players_developed", "avg_xG_growth",
                         "avg_xA_growth", "avg_press_growth"]

        scores["development_score"] = (
            scores["avg_xG_growth"] + scores["avg_xA_growth"] + scores["avg_press_growth"]
        ) / 3

        return scores.sort_values("development_score", ascending=False)

    def age_adjusted_development(self):
        """Calculate age-adjusted development scores"""
        growth_data = self.data.dropna(subset=["xG_p90_growth"]).copy()

        # Expected improvement by age
        def expected_improvement(age):
            if age < 23: return 10
            elif age < 26: return 5
            elif age < 29: return 0
            return -5

        growth_data["expected"] = growth_data["age"].apply(expected_improvement)
        growth_data["adjusted_growth"] = growth_data["xG_p90_growth"] - growth_data["expected"]

        adjusted = growth_data.groupby("manager").agg({
            "adjusted_growth": "mean"
        }).reset_index()

        adjusted.columns = ["manager", "avg_adjusted_growth"]

        return adjusted.sort_values("avg_adjusted_growth", ascending=False)


# Analysis
analyzer = PlayerDevelopmentAnalyzer(player_trajectories)

print("Manager Player Development Scores:")
print(analyzer.manager_development_scores().to_string(index=False))

print("\nAge-Adjusted Development:")
print(analyzer.age_adjusted_development().to_string(index=False))

# Player Development Analysis
library(tidyverse)

# Player performance before/during/after manager tenure
player_trajectories <- tribble(
  ~player, ~manager, ~season, ~age, ~xG_p90, ~xA_p90, ~press_p90, ~prog_passes_p90,
  "Player A", "Manager X", "2021-22", 23, 0.35, 0.18, 22.5, 5.2,
  "Player A", "Manager X", "2022-23", 24, 0.42, 0.22, 25.8, 6.1,
  "Player A", "Manager Y", "2023-24", 25, 0.38, 0.20, 23.2, 5.8,
  "Player B", "Manager X", "2021-22", 21, 0.15, 0.12, 18.5, 4.2,
  "Player B", "Manager X", "2022-23", 22, 0.28, 0.18, 24.2, 5.8,
  "Player B", "Manager X", "2023-24", 23, 0.35, 0.25, 28.5, 7.2,
  "Player C", "Manager Y", "2021-22", 26, 0.45, 0.28, 20.1, 6.5,
  "Player C", "Manager Y", "2022-23", 27, 0.42, 0.25, 19.8, 6.2,
  "Player C", "Manager Z", "2023-24", 28, 0.38, 0.22, 18.2, 5.8
)

# Calculate improvement rates
player_improvement <- player_trajectories %>%
  arrange(player, season) %>%
  group_by(player, manager) %>%
  mutate(
    seasons_with_manager = n(),
    xG_growth = (xG_p90 - lag(xG_p90)) / lag(xG_p90) * 100,
    xA_growth = (xA_p90 - lag(xA_p90)) / lag(xA_p90) * 100,
    press_growth = (press_p90 - lag(press_p90)) / lag(press_p90) * 100
  ) %>%
  ungroup()

# Manager development scores
manager_development <- player_improvement %>%
  filter(!is.na(xG_growth)) %>%
  group_by(manager) %>%
  summarise(
    players_developed = n_distinct(player),
    avg_xG_growth = mean(xG_growth, na.rm = TRUE),
    avg_xA_growth = mean(xA_growth, na.rm = TRUE),
    avg_press_growth = mean(press_growth, na.rm = TRUE),
    players_improved = sum(xG_growth > 0) / n() * 100,
    .groups = "drop"
  ) %>%
  mutate(
    development_score = (avg_xG_growth + avg_xA_growth + avg_press_growth) / 3
  ) %>%
  arrange(desc(development_score))

print("Manager Player Development Scores:")
print(manager_development)

# Age-adjusted development (young players more likely to improve)
age_adjusted <- player_improvement %>%
  filter(!is.na(xG_growth)) %>%
  mutate(
    # Expected improvement decreases with age
    expected_improvement = case_when(
      age < 23 ~ 10,
      age < 26 ~ 5,
      age < 29 ~ 0,
      TRUE ~ -5
    ),
    adjusted_xG_growth = xG_growth - expected_improvement
  ) %>%
  group_by(manager) %>%
  summarise(
    avg_adjusted_growth = mean(adjusted_xG_growth, na.rm = TRUE),
    .groups = "drop"
  )

print("\nAge-Adjusted Development:")
print(age_adjusted)

Manager Sacking Analytics

When should a club sack their manager? This is one of the most consequential decisions boards make, yet it's often driven by emotion rather than analysis. Understanding optimal sacking timing can save clubs millions.

sacking_analytics
# Python: Manager Sacking Analysis
import pandas as pd
import numpy as np
from typing import Dict
from dataclasses import dataclass

@dataclass
class SackingDecision:
    """Result of sacking decision analysis."""
    ppg_gap: float
    xg_concerning: bool
    recommendation: str
    confidence: str
    reasoning: str

class SackingAnalyzer:
    """Analyze manager sacking timing and outcomes."""

    def __init__(self, historical_data: pd.DataFrame):
        self.data = historical_data
        self._analyze_outcomes()

    def _analyze_outcomes(self):
        """Analyze historical sacking outcomes."""

        df = self.data.copy()

        # PPG improvement
        df["ppg_improvement"] = df["ppg_after"] - df["ppg_at_sacking"]
        df["improved_ppg"] = df["ppg_improvement"] > 0

        # Position improvement
        df["position_improvement"] = df["position_at_sacking"] - df["final_position"]
        df["improved_position"] = df["position_improvement"] > 0

        # Overall success
        df["sacking_successful"] = df["improved_ppg"] & df["improved_position"]

        # Timing assessment
        def timing_category(matches):
            if matches < 10: return "Too early"
            elif matches < 15: return "Standard timing"
            elif matches < 20: return "Patient approach"
            return "Very late"

        df["timing_category"] = df["matches"].apply(timing_category)

        self.analyzed = df

    def timing_success_rates(self) -> pd.DataFrame:
        """Calculate success rates by sacking timing."""

        return self.analyzed.groupby("timing_category").agg({
            "club": "count",
            "sacking_successful": "mean",
            "ppg_improvement": "mean"
        }).reset_index().rename(columns={
            "club": "sackings",
            "sacking_successful": "success_rate",
            "ppg_improvement": "avg_ppg_improvement"
        })

    def should_sack(self, current_ppg: float, xg_diff: float,
                   matches_played: int, position: int,
                   league_size: int = 20) -> SackingDecision:
        """Recommend whether to sack manager."""

        # Expected PPG for position
        expected_ppg = 38 / league_size

        # Metrics
        ppg_gap = expected_ppg - current_ppg
        xg_concerning = xg_diff < -0.3
        relegation_risk = position > (league_size - 3)

        # Decision logic
        if ppg_gap > 0.5 and xg_concerning and matches_played > 10:
            recommendation = "Sack now"
            confidence = "High"
            reasoning = "Significant underperformance with poor underlying numbers"
        elif ppg_gap > 0.3 and xg_concerning:
            recommendation = "Consider sacking"
            confidence = "Medium"
            reasoning = "Moderate underperformance with concerning xG"
        elif relegation_risk and ppg_gap > 0.2:
            recommendation = "Sack now (survival mode)"
            confidence = "High"
            reasoning = "Relegation zone with negative trajectory"
        elif ppg_gap > 0.5 and not xg_concerning:
            recommendation = "Monitor closely"
            confidence = "Low"
            reasoning = "Results poor but underlying metrics acceptable"
        else:
            recommendation = "Retain manager"
            confidence = "Medium"
            reasoning = "Performance within acceptable range"

        return SackingDecision(
            ppg_gap=ppg_gap,
            xg_concerning=xg_concerning,
            recommendation=recommendation,
            confidence=confidence,
            reasoning=reasoning
        )

    def calculate_cost_of_sacking(self, remaining_contract_months: int,
                                  monthly_salary: float,
                                  new_manager_fee: float) -> Dict:
        """Calculate financial cost of sacking decision."""

        severance = remaining_contract_months * monthly_salary * 0.8  # Typical settlement
        total_cost = severance + new_manager_fee

        return {
            "severance_estimate": severance,
            "hiring_cost": new_manager_fee,
            "total_cost": total_cost,
            "monthly_burden": total_cost / 12
        }


# Example usage
sacking_data = pd.DataFrame({
    "club": ["Club A", "Club B", "Club C"],
    "manager": ["Manager 1", "Manager 2", "Manager 3"],
    "matches": [12, 15, 18],
    "ppg_at_sacking": [1.08, 0.93, 1.22],
    "xg_diff": [-0.3, -0.5, -0.2],
    "ppg_after": [1.52, 1.35, 1.18],
    "position_at_sacking": [15, 18, 12],
    "final_position": [10, 14, 13]
})

analyzer = SackingAnalyzer(sacking_data)

# Example decision
decision = analyzer.should_sack(
    current_ppg=1.0,
    xg_diff=-0.4,
    matches_played=12,
    position=15
)

print("Sacking Decision Analysis:")
print(f"  PPG Gap: {decision.ppg_gap:.2f}")
print(f"  xG Concerning: {decision.xg_concerning}")
print(f"  Recommendation: {decision.recommendation}")
print(f"  Reasoning: {decision.reasoning}")
# R: Manager Sacking Analysis
library(tidyverse)

# Historical sacking data
sacking_data <- tribble(
    ~club, ~manager, ~sacking_date, ~matches, ~ppg_at_sacking, ~xg_diff,
    ~replacement, ~ppg_after, ~position_at_sacking, ~final_position,
    "Club A", "Manager 1", "2023-10-15", 12, 1.08, -0.3, "Manager 2", 1.52, 15, 10,
    "Club B", "Manager 3", "2023-11-20", 15, 0.93, -0.5, "Manager 4", 1.35, 18, 14,
    "Club C", "Manager 5", "2023-12-05", 18, 1.22, -0.2, "Manager 6", 1.18, 12, 13,
    "Club D", "Manager 7", "2024-01-10", 20, 0.85, -0.8, "Manager 8", 1.65, 19, 11,
    "Club E", "Manager 9", "2023-09-28", 8, 0.88, -0.4, "Manager 10", 1.05, 16, 17
)

# Was the sacking successful?
sacking_analysis <- sacking_data %>%
    mutate(
        # PPG improvement
        ppg_improvement = ppg_after - ppg_at_sacking,
        improved_ppg = ppg_improvement > 0,

        # Position improvement
        position_improvement = position_at_sacking - final_position,
        improved_position = position_improvement > 0,

        # Overall success
        sacking_successful = improved_ppg & improved_position,

        # Timing assessment
        timing_assessment = case_when(
            matches < 10 ~ "Too early",
            matches < 15 ~ "Standard timing",
            matches < 20 ~ "Patient approach",
            TRUE ~ "Very late"
        )
    )

# Success rates by timing
timing_success <- sacking_analysis %>%
    group_by(timing_assessment) %>%
    summarise(
        sackings = n(),
        success_rate = mean(sacking_successful) * 100,
        avg_ppg_improvement = mean(ppg_improvement),
        .groups = "drop"
    )

print("Sacking Success by Timing:")
print(timing_success)

# When should you sack?
sacking_decision_model <- function(current_ppg, xg_diff, matches_played,
                                    position, league_size = 20) {

    # Calculate baseline expectation
    expected_ppg <- 38 / league_size  # Equal distribution assumption

    # Underperformance score
    ppg_gap <- expected_ppg - current_ppg
    xg_signal <- xg_diff < -0.3  # Poor underlying numbers

    # Relegation zone proximity
    relegation_risk <- position > (league_size - 3)

    # Decision logic
    sack_recommendation <- case_when(
        ppg_gap > 0.5 & xg_signal & matches_played > 10 ~ "Sack now",
        ppg_gap > 0.3 & xg_signal ~ "Consider sacking",
        ppg_gap > 0.5 & !xg_signal ~ "Monitor closely",
        relegation_risk & ppg_gap > 0.2 ~ "Sack now (survival mode)",
        TRUE ~ "Retain manager"
    )

    list(
        ppg_gap = ppg_gap,
        xg_concerning = xg_signal,
        recommendation = sack_recommendation
    )
}

# Test cases
example1 <- sacking_decision_model(1.0, -0.4, 12, 15)
example2 <- sacking_decision_model(1.5, 0.2, 15, 8)

print("\nExample Decision 1 (struggling team):")
print(example1)
print("\nExample Decision 2 (solid team):")
print(example2)

Manager Hiring Analytics

Finding the right manager is as important as knowing when to change. Analytics can help identify candidates whose style matches the club's needs and squad composition.

manager_hiring
# Python: Manager Hiring Analysis
import pandas as pd
import numpy as np
from dataclasses import dataclass
from typing import Dict, List

@dataclass
class SquadProfile:
    """Profile of squad characteristics."""
    avg_age: float
    young_players: int
    peak_players: int
    technical_avg: float
    physical_avg: float
    pace_avg: float
    recommended_style: str

@dataclass
class ManagerCandidate:
    """Manager candidate profile."""
    name: str
    preferred_style: str
    pressing_intensity: str
    possession_avg: float
    youth_dev_score: int
    tactical_flexibility: int
    experience_level: str
    available: bool

class ManagerHiringAnalyzer:
    """Analyze and rank manager candidates for a club."""

    def __init__(self, squad_data: pd.DataFrame):
        self.squad = squad_data
        self.squad_profile = self._analyze_squad()

    def _analyze_squad(self) -> SquadProfile:
        """Analyze squad to determine ideal manager profile."""

        df = self.squad

        avg_age = df["age"].mean()
        young = (df["age"] < 23).sum()
        peak = ((df["age"] >= 23) & (df["age"] <= 29)).sum()
        technical = df["technical_skill"].mean()
        physical = df["physical_rating"].mean()
        pace = df["pace"].mean()

        # Recommend style based on squad
        if pace > 75 and (df["position"] == "W").sum() >= 2:
            style = "Counter-attacking"
        elif technical > 80:
            style = "Possession-based"
        elif physical > 78:
            style = "Direct/Physical"
        elif young > 5:
            style = "Development-focused"
        else:
            style = "Flexible"

        return SquadProfile(
            avg_age=avg_age,
            young_players=young,
            peak_players=peak,
            technical_avg=technical,
            physical_avg=physical,
            pace_avg=pace,
            recommended_style=style
        )

    def calculate_match_score(self, candidate: Dict,
                             priorities: Dict) -> float:
        """Calculate how well a candidate matches club needs."""

        score = 0
        profile = self.squad_profile

        # Style match
        if candidate["preferred_style"] == profile.recommended_style:
            score += 25
        elif candidate["tactical_flexibility"] > 70:
            score += 15

        # Youth development
        if profile.young_players > 5 and candidate["youth_dev_score"] > 70:
            score += 20

        # Experience
        if priorities.get("needs_experienced", False):
            if candidate["experience_level"] in ["Experienced", "Very Experienced"]:
                score += 20

        # Availability
        if candidate["available"]:
            score += 15

        # Pressing match
        if profile.physical_avg > 75 and candidate["pressing_intensity"] == "Very High":
            score += 10

        # Tactical flexibility bonus
        score += candidate["tactical_flexibility"] * 0.1

        return score

    def rank_candidates(self, candidates: List[Dict],
                       priorities: Dict) -> pd.DataFrame:
        """Rank all candidates by match score."""

        results = []
        for candidate in candidates:
            score = self.calculate_match_score(candidate, priorities)
            results.append({
                "manager": candidate["name"],
                "preferred_style": candidate["preferred_style"],
                "experience": candidate["experience_level"],
                "available": candidate["available"],
                "match_score": score
            })

        return pd.DataFrame(results).sort_values("match_score", ascending=False)

    def generate_shortlist(self, candidates: List[Dict],
                          priorities: Dict, top_n: int = 3) -> pd.DataFrame:
        """Generate shortlist of top candidates."""

        ranked = self.rank_candidates(candidates, priorities)
        shortlist = ranked.head(top_n)

        # Add reasoning for each
        reasons = []
        for _, row in shortlist.iterrows():
            reason = f"Score: {row['match_score']:.1f} - "
            if row["preferred_style"] == self.squad_profile.recommended_style:
                reason += "Style match; "
            if row["available"]:
                reason += "Available; "
            if row["experience"] in ["Experienced", "Very Experienced"]:
                reason += "Experienced; "
            reasons.append(reason.rstrip("; "))

        shortlist["reasoning"] = reasons
        return shortlist


# Example usage
squad_data = pd.DataFrame({
    "player": [f"Player {i}" for i in range(20)],
    "age": np.random.randint(19, 34, 20),
    "position": np.random.choice(["GK", "CB", "FB", "CM", "W", "ST"], 20),
    "technical_skill": np.random.randint(65, 90, 20),
    "physical_rating": np.random.randint(65, 85, 20),
    "pace": np.random.randint(60, 90, 20)
})

analyzer = ManagerHiringAnalyzer(squad_data)

candidates = [
    {"name": "Candidate A", "preferred_style": "Possession-based",
     "pressing_intensity": "Medium", "youth_dev_score": 75,
     "tactical_flexibility": 60, "experience_level": "Experienced", "available": True},
    {"name": "Candidate B", "preferred_style": "Counter-attacking",
     "pressing_intensity": "Very High", "youth_dev_score": 55,
     "tactical_flexibility": 80, "experience_level": "Very Experienced", "available": True},
    {"name": "Candidate C", "preferred_style": "Development-focused",
     "pressing_intensity": "Medium", "youth_dev_score": 90,
     "tactical_flexibility": 70, "experience_level": "Developing", "available": True}
]

priorities = {"needs_experienced": True}
shortlist = analyzer.generate_shortlist(candidates, priorities)

print(f"Squad Profile: {analyzer.squad_profile.recommended_style} style recommended")
print(f"\nCandidate Shortlist:")
print(shortlist.to_string(index=False))
# R: Manager Hiring Analysis
library(tidyverse)

# Squad profile
analyze_squad_profile <- function(squad_data) {

    squad_data %>%
        summarise(
            # Age profile
            avg_age = mean(age),
            young_players = sum(age < 23),
            peak_players = sum(age >= 23 & age <= 29),
            veteran_players = sum(age > 29),

            # Style indicators
            technical_avg = mean(technical_skill),
            physical_avg = mean(physical_rating),
            pace_avg = mean(pace),

            # Positional strengths
            has_quality_striker = max(position == "ST" & quality > 80),
            has_quality_wingers = sum(position == "W" & quality > 75) >= 2,
            has_technical_midfield = mean(technical_skill[position == "CM"]) > 75
        ) %>%
        mutate(
            # Ideal style recommendation
            recommended_style = case_when(
                pace_avg > 75 & has_quality_wingers ~ "Counter-attacking",
                technical_avg > 80 & has_technical_midfield ~ "Possession-based",
                physical_avg > 78 ~ "Direct/Physical",
                young_players > 5 ~ "Development-focused",
                TRUE ~ "Flexible"
            )
        )
}

# Manager candidate profiles
manager_candidates <- tribble(
    ~manager, ~preferred_style, ~pressing_intensity, ~possession_avg,
    ~youth_dev_score, ~tactical_flexibility, ~experience_level, ~availability,
    "Candidate A", "Possession-based", "Medium", 62, 75, 60, "Experienced", TRUE,
    "Candidate B", "Counter-attacking", "High", 45, 55, 80, "Experienced", FALSE,
    "Candidate C", "Pressing", "Very High", 55, 85, 55, "Developing", TRUE,
    "Candidate D", "Direct", "Low", 42, 60, 90, "Very Experienced", TRUE,
    "Candidate E", "Flexible", "Medium", 52, 70, 85, "Experienced", TRUE
)

# Match score calculation
calculate_match_score <- function(candidate, squad_profile, club_priorities) {

    score <- 0

    # Style match
    if (candidate$preferred_style == squad_profile$recommended_style) {
        score <- score + 25
    } else if (candidate$tactical_flexibility > 70) {
        score <- score + 15  # Flexible managers can adapt
    }

    # Youth focus match
    if (squad_profile$young_players > 5 & candidate$youth_dev_score > 70) {
        score <- score + 20
    }

    # Experience requirements
    if (club_priorities$needs_experienced & candidate$experience_level %in% c("Experienced", "Very Experienced")) {
        score <- score + 20
    }

    # Availability
    if (candidate$availability) {
        score <- score + 15
    }

    # Pressing style match
    if (squad_profile$physical_avg > 75 & candidate$pressing_intensity == "Very High") {
        score <- score + 10
    }

    score
}

# Rank candidates
rank_candidates <- function(candidates, squad_profile, priorities) {

    candidates %>%
        rowwise() %>%
        mutate(
            match_score = calculate_match_score(
                cur_data(),
                squad_profile,
                priorities
            )
        ) %>%
        ungroup() %>%
        arrange(desc(match_score))
}

# Example usage
print("Manager Candidate Evaluation Framework Ready!")
print("Use rank_candidates() to evaluate potential hires")

Coaching Staff Analytics

Modern football management is a team effort. Assistant coaches, set-piece specialists, and analytics staff all contribute to team performance. Understanding these contributions adds nuance to manager evaluation.

coaching_staff_analysis
# Python: Coaching Staff Analysis
import pandas as pd
import numpy as np
from typing import Dict, List

class CoachingStaffAnalyzer:
    """Analyze contribution of coaching staff members."""

    def __init__(self, staff_data: pd.DataFrame):
        self.data = staff_data

    def set_piece_coach_analysis(self) -> pd.DataFrame:
        """Analyze set piece coach effectiveness."""

        sp_analysis = self.data.groupby("set_piece_coach").agg({
            "season": "count",
            "club": "nunique",
            "set_piece_goals_for": "mean",
            "set_piece_goals_against": "mean",
            "total_goals_for": "mean"
        }).reset_index()

        sp_analysis.columns = ["set_piece_coach", "seasons", "clubs",
                              "avg_sp_for", "avg_sp_against", "avg_total"]

        sp_analysis["sp_net"] = sp_analysis["avg_sp_for"] - sp_analysis["avg_sp_against"]
        sp_analysis["sp_pct"] = sp_analysis["avg_sp_for"] / sp_analysis["avg_total"] * 100

        return sp_analysis.sort_values("sp_net", ascending=False)

    def continuity_analysis(self) -> pd.DataFrame:
        """Analyze impact of coaching staff continuity."""

        df = self.data.sort_values(["club", "season"]).copy()

        # Track changes
        df["prev_manager"] = df.groupby("club")["manager"].shift(1)
        df["prev_assistant"] = df.groupby("club")["assistant"].shift(1)

        df["manager_changed"] = df["manager"] != df["prev_manager"]
        df["assistant_continuity"] = df["assistant"] == df["prev_assistant"]

        # Categorize scenarios
        def categorize(row):
            if pd.isna(row["prev_manager"]):
                return None
            if not row["manager_changed"]:
                return "Stability"
            elif row["assistant_continuity"]:
                return "New manager, same assistant"
            else:
                return "Complete change"

        df["scenario"] = df.apply(categorize, axis=1)

        # Analyze by scenario
        scenarios = df.dropna(subset=["scenario"]).groupby("scenario").agg({
            "club": "count",
            "total_goals_for": "mean"
        }).reset_index()

        scenarios.columns = ["scenario", "cases", "avg_goals"]
        return scenarios

    def coach_value(self, coach_name: str, role: str = "assistant") -> Dict:
        """Calculate value metrics for a specific coach."""

        if role == "assistant":
            coached = self.data[self.data["assistant"] == coach_name]
        elif role == "set_piece":
            coached = self.data[self.data["set_piece_coach"] == coach_name]
        else:
            return {"error": "Invalid role"}

        if len(coached) == 0:
            return {"error": "Coach not found"}

        return {
            "coach": coach_name,
            "role": role,
            "clubs": coached["club"].nunique(),
            "seasons": len(coached),
            "managers_worked_with": coached["manager"].nunique(),
            "avg_goals": coached["total_goals_for"].mean()
        }

    def assistant_manager_pairs(self) -> pd.DataFrame:
        """Analyze successful manager-assistant pairings."""

        pairs = self.data.groupby(["manager", "assistant"]).agg({
            "season": "count",
            "total_goals_for": "mean",
            "set_piece_goals_for": "mean"
        }).reset_index()

        pairs.columns = ["manager", "assistant", "seasons_together",
                        "avg_goals", "avg_sp_goals"]

        return pairs.sort_values("avg_goals", ascending=False)


# Example usage
staff_data = pd.DataFrame({
    "season": ["2022-23", "2023-24", "2022-23", "2023-24"],
    "club": ["Club A", "Club A", "Club B", "Club B"],
    "manager": ["Manager 1", "Manager 1", "Manager 2", "Manager 3"],
    "assistant": ["Assistant A", "Assistant A", "Assistant B", "Assistant B"],
    "set_piece_coach": ["SP Coach X", "SP Coach X", "SP Coach Y", "SP Coach Y"],
    "set_piece_goals_for": [15, 18, 8, 10],
    "set_piece_goals_against": [8, 6, 12, 10],
    "total_goals_for": [68, 72, 52, 55]
})

analyzer = CoachingStaffAnalyzer(staff_data)

print("Set Piece Coach Analysis:")
print(analyzer.set_piece_coach_analysis().to_string(index=False))

print("\nCoaching Continuity Impact:")
print(analyzer.continuity_analysis().to_string(index=False))
# R: Coaching Staff Analysis
library(tidyverse)

# Track coach movements and team performance
coaching_staff_data <- tribble(
    ~season, ~club, ~manager, ~assistant, ~set_piece_coach,
    ~set_piece_goals_for, ~set_piece_goals_against, ~total_goals_for,
    "2022-23", "Club A", "Manager 1", "Assistant A", "SP Coach X", 15, 8, 68,
    "2023-24", "Club A", "Manager 1", "Assistant A", "SP Coach X", 18, 6, 72,
    "2022-23", "Club B", "Manager 2", "Assistant B", "SP Coach Y", 8, 12, 52,
    "2023-24", "Club B", "Manager 3", "Assistant B", "SP Coach Y", 10, 10, 55,
    "2022-23", "Club C", "Manager 4", "Assistant C", "SP Coach X", 12, 9, 58,
    "2023-24", "Club C", "Manager 4", "Assistant C", "SP Coach Z", 8, 14, 54
)

# Set piece coach impact
sp_coach_analysis <- coaching_staff_data %>%
    group_by(set_piece_coach) %>%
    summarise(
        seasons = n(),
        clubs = n_distinct(club),
        avg_sp_goals_for = mean(set_piece_goals_for),
        avg_sp_goals_against = mean(set_piece_goals_against),
        sp_net = mean(set_piece_goals_for - set_piece_goals_against),
        sp_pct_of_total = mean(set_piece_goals_for / total_goals_for) * 100,
        .groups = "drop"
    ) %>%
    arrange(desc(sp_net))

print("Set Piece Coach Analysis:")
print(sp_coach_analysis)

# Assistant coach continuity impact
continuity_analysis <- coaching_staff_data %>%
    group_by(club) %>%
    mutate(
        manager_changed = manager != lag(manager),
        assistant_continuity = assistant == lag(assistant)
    ) %>%
    filter(!is.na(manager_changed)) %>%
    mutate(
        scenario = case_when(
            !manager_changed ~ "Stability",
            manager_changed & assistant_continuity ~ "New manager, same assistant",
            manager_changed & !assistant_continuity ~ "Complete change"
        )
    )

scenario_impact <- continuity_analysis %>%
    group_by(scenario) %>%
    summarise(
        cases = n(),
        avg_goals = mean(total_goals_for),
        .groups = "drop"
    )

print("\nCoaching Continuity Impact:")
print(scenario_impact)

# Track individual coach performance
coach_value <- function(coach_data, coach_name, role = "assistant") {

    if (role == "assistant") {
        coached <- coach_data %>% filter(assistant == coach_name)
    } else if (role == "set_piece") {
        coached <- coach_data %>% filter(set_piece_coach == coach_name)
    }

    coached %>%
        summarise(
            clubs = n_distinct(club),
            seasons = n(),
            managers_worked_with = n_distinct(manager),
            avg_performance = mean(total_goals_for),
            .groups = "drop"
        )
}

print("\nAssistant A Value:")
print(coach_value(coaching_staff_data, "Assistant A"))
Coaching Staff Roles and Analytics Focus
Assistant Manager
  • Session planning quality
  • Player relationships
  • Transition continuity
Set Piece Coach
  • SP goals for/against
  • % of total goals from SP
  • Defensive SP organization
Goalkeeping Coach
  • GK development trajectory
  • Save % vs xG faced
  • Distribution improvements

Practice Exercises

Exercise 47.1: Manager Fingerprint Dashboard

Create a comprehensive manager comparison dashboard that visualizes tactical fingerprints using radar charts, and ranks managers on multiple dimensions.

Hints:
  • Normalize all metrics to 0-100 scale for comparison
  • Include both in-possession and out-of-possession metrics
  • Allow user selection of managers to compare
Exercise 47.2: Manager Value Model

Build a regression model that predicts team points based on squad value and wage bill, then use residuals to estimate manager impact. Validate using historical manager changes.

Hints:
  • Log-transform financial variables
  • Consider using mixed-effects models
  • Test model stability across different seasons
Exercise 47.3: Substitution Optimizer

Using historical match data, build a model that recommends optimal substitution timing based on score line, xG flow, and player fatigue.

Hints:
  • Model xG rate changes after substitutions
  • Include game state (winning/drawing/losing)
  • Factor in player characteristics
Exercise 47.4: Sacking Impact Simulator

Build a Monte Carlo simulation that models the expected outcomes of sacking vs. keeping a manager based on historical form, remaining fixtures, and replacement manager quality bands.

Hints:
  • Use historical post-sacking performance distributions
  • Model regression to the mean for very poor/good runs
  • Simulate at least 10,000 scenarios for confidence intervals
  • Include caretaker vs. permanent appointment distinction
Exercise 47.5: Coaching Staff Network Analysis

Create a network graph visualization showing the relationships between managers and their coaching staff across multiple clubs and seasons. Identify "coaching trees" and track how assistant managers transition to head manager roles.

Hints:
  • Use NetworkX (Python) or igraph (R) for graph analysis
  • Calculate betweenness centrality for key connectors
  • Track success rates of different coaching tree branches
  • Include set-piece coaches and analysts in the network
Exercise 47.6: Match Preparation Effectiveness

Analyze how teams perform against different opponent types (possession-heavy, pressing, defensive, etc.) and whether managers show consistent tactical preparation effectiveness. Include first-half vs. second-half analysis to assess half-time adjustments.

Hints:
  • Cluster opponents by playing style first
  • Compare xG difference between first and second halves
  • Look for patterns in formation changes
  • Account for home/away effects
Exercise 47.7: Manager Style Evolution Tracker

Build a system that tracks how a manager's tactical style evolves over their career, identifying pivotal moments (new clubs, new assistants, tournament experiences) that correlate with style changes.

Hints:
  • Use rolling averages for tactical metrics over 15-20 matches
  • Apply change point detection algorithms (PELT, Bayesian)
  • Correlate changes with career events database
  • Visualize evolution using animated timeline charts
Exercise 47.8: Hiring Decision Backtester

Create a backtesting framework for manager hiring decisions. Given historical hiring scenarios, test whether different candidate selection criteria would have yielded better outcomes using actual performance data.

Hints:
  • Define multiple selection strategies (style match, experience, age, etc.)
  • Use holdout seasons for true out-of-sample testing
  • Account for survivorship bias in available candidates
  • Measure outcomes at 6-month, 1-year, and full-tenure windows

Summary

Key Takeaways
  • Complex Attribution: Separating manager impact from squad quality requires sophisticated analytical approaches
  • Tactical Fingerprints: Managers have distinctive tactical signatures that can be identified through playing style metrics
  • Resource Adjustment: Performance should be evaluated relative to available resources, not just absolute results
  • In-Game Decisions: Substitution timing and tactical changes provide measurable signals of managerial quality
  • Development Skill: Some managers consistently improve players, though this is confounded by natural age-related development
  • Sacking Timing: Data can help identify optimal sacking windows, though regression to mean often explains poor runs
  • Hiring Analytics: Squad profile analysis should drive candidate selection to find style matches over reputation
  • Staff Matters: Coaching staff continuity, particularly assistants and set-piece coaches, significantly affects team performance
Common Pitfalls
  • Results-Only Evaluation: Judging managers purely on points ignores process quality, opponent strength, and variance - xG-based metrics provide better signals
  • Ignoring Context: A manager inheriting a relegated squad should not be compared to one inheriting a championship team without adjustment
  • Small Sample Size: Most managers have only 100-300 matches to evaluate; tactical fingerprints need at least 30+ matches to stabilize
  • Survivorship Bias: Only analyzing employed managers misses lessons from those who failed early or never got top opportunities
  • Confusing Style with Quality: Possession-based football isn't inherently better than counter-attacking; both can be executed well or poorly
  • Linear Career Progression: Assuming managers continuously improve ignores that many plateau or decline after initial success
  • Ignoring Adaptation: Great managers adapt to their squad; judging them against a single "ideal" style misses this flexibility
  • Caretaker Performance: Post-sacking "bounce" is often due to regression to mean, not caretaker quality - don't overvalue short-term results
Essential Tools for Manager Analytics
Task R Python
Tactical Fingerprinting factoextra, cluster sklearn.cluster, scipy
Resource-Adjusted Models lme4, brms statsmodels, pymc3
Network Analysis (Coaching Trees) igraph, ggraph networkx, pyvis
Change Point Detection changepoint, mcp ruptures, changepy
Monte Carlo Simulation mc2d, base R numpy, scipy.stats
Radar Charts fmsb, ggradar matplotlib, plotly
Survival Analysis (Tenure) survival, survminer lifelines, scikit-survival
Key Metrics for Manager Evaluation
Process Metrics
  • xG Difference: xG For minus xG Against per 90
  • PPDA: Passes allowed per defensive action (pressing intensity)
  • Build-Up %: Possession in own third during build-up
  • Progression Rate: Forward passes / total passes
  • Defensive Line Height: Average Y position of defensive block
  • Transition Speed: Time from turnover to shot attempt
Outcome Metrics
  • Points Above Expected: Actual points - xPts
  • Resource-Adjusted Points: Points relative to wage bill rank
  • Player Development Index: Average skill improvement under manager
  • Adaptation Score: Performance variance across opponent types
  • Second-Half Performance: xG differential change H1 to H2
  • Substitution Impact: xG rate change post-substitution
Manager Evaluation Timeline
Window Matches Reliable Metrics Caution Areas
Immediate 1-10 Formation choice, pressing setup Results highly variable; don't overreact
Short-term 10-20 Basic tactical fingerprint emerging Still small sample; look for trends not absolutes
Medium-term 20-40 xG metrics stabilizing, style clear Opponent adaptation may not be visible yet
Full Season 40-60 All metrics reliable, development visible Context matters (injuries, transfers)
Multi-Season 100+ Career patterns, consistency assessment Football evolution may invalidate old data
Manager Analytics Use Cases by Stakeholder
Club Ownership
  • Hiring decision support
  • Sacking timing optimization
  • Contract extension analysis
  • Manager market valuation
Sporting Directors
  • Style-squad fit assessment
  • Candidate shortlisting
  • Coaching staff composition
  • Development pathway planning
Media & Analysts
  • Narrative-busting analysis
  • Historical comparisons
  • Prediction models
  • Tactical breakdowns
manager_analytics_pipeline
# Python: Complete Manager Analytics Pipeline
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from lifelines import KaplanMeierFitter
from dataclasses import dataclass
from typing import Dict, List, Optional, Tuple

@dataclass
class ManagerAnalyticsConfig:
    """Configuration for manager analytics pipeline."""
    min_matches_fingerprint: int = 30
    min_matches_resource: int = 20
    n_style_clusters: int = 5
    fingerprint_metrics: List[str] = None

    def __post_init__(self):
        if self.fingerprint_metrics is None:
            self.fingerprint_metrics = [
                "possession_pct", "ppda", "defensive_line_height",
                "build_up_pct", "direct_play_pct", "pressing_triggers",
                "avg_formation_width", "transition_speed"
            ]

class ManagerAnalyticsPipeline:
    """Complete manager analytics system."""

    def __init__(self, config: ManagerAnalyticsConfig = None):
        self.config = config or ManagerAnalyticsConfig()
        self.scaler = StandardScaler()
        self.kmeans = None
        self.results = {}

    def fit(self, matches_df: pd.DataFrame, managers_df: pd.DataFrame,
            staff_df: pd.DataFrame) -> "ManagerAnalyticsPipeline":
        """Run full analytics pipeline."""

        # 1. Tactical fingerprinting
        self.results["fingerprints"] = self._compute_fingerprints(matches_df)

        # 2. Resource-adjusted performance
        self.results["resource_adjusted"] = self._compute_resource_adjusted(matches_df)

        # 3. Tenure survival analysis
        self.results["tenure_survival"] = self._compute_tenure_survival(managers_df)

        # 4. Staff continuity impact
        self.results["staff_impact"] = self._compute_staff_impact(staff_df)

        # 5. Generate summary
        self.results["summary"] = self._generate_summary(matches_df)

        return self

    def _compute_fingerprints(self, df: pd.DataFrame) -> pd.DataFrame:
        """Compute tactical fingerprints for each manager."""

        metrics = self.config.fingerprint_metrics
        available = [m for m in metrics if m in df.columns]

        # Aggregate by manager
        profiles = df.groupby("manager")[available].mean().reset_index()
        profiles = profiles[profiles["manager"].map(
            df.groupby("manager").size() >= self.config.min_matches_fingerprint
        )]

        if len(profiles) < 2:
            return pd.DataFrame()

        # Scale and cluster
        scaled = self.scaler.fit_transform(profiles[available])

        n_clusters = min(self.config.n_style_clusters, len(profiles))
        self.kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=25)
        profiles["style_cluster"] = self.kmeans.fit_predict(scaled)

        cluster_labels = ["Possession", "High Press", "Counter", "Balanced", "Defensive"]
        profiles["style_label"] = profiles["style_cluster"].map(
            lambda x: cluster_labels[x % len(cluster_labels)]
        )

        return profiles

    def _compute_resource_adjusted(self, df: pd.DataFrame) -> pd.DataFrame:
        """Compute resource-adjusted performance metrics."""

        # Simple resource adjustment
        if "wage_rank" not in df.columns:
            df["wage_rank"] = 10  # default middle rank

        df["expected_ppg"] = 3 - (df["wage_rank"] - 1) * 0.1  # rough expected
        df["residual"] = df.get("points", df.get("ppg", 1.5)) - df["expected_ppg"]

        adjusted = df.groupby("manager").agg({
            "residual": "mean",
            "manager": "count"
        }).reset_index(drop=True)

        adjusted.columns = ["avg_residual", "matches"]
        adjusted["manager"] = df.groupby("manager").first().index

        return adjusted.sort_values("avg_residual", ascending=False)

    def _compute_tenure_survival(self, df: pd.DataFrame) -> Dict:
        """Survival analysis for manager tenure."""

        if "tenure_days" not in df.columns:
            return {"status": "insufficient data"}

        kmf = KaplanMeierFitter()

        results = {}
        for level in df["resource_level"].unique():
            subset = df[df["resource_level"] == level]
            if len(subset) >= 10:
                kmf.fit(
                    subset["tenure_days"],
                    event_observed=subset["was_sacked"]
                )
                results[level] = {
                    "median_tenure": kmf.median_survival_time_,
                    "survival_6m": kmf.survival_function_at_times([180]).values[0],
                    "n": len(subset)
                }

        return results

    def _compute_staff_impact(self, df: pd.DataFrame) -> pd.DataFrame:
        """Analyze coaching staff continuity impact."""

        if "assistant_retained" not in df.columns:
            return pd.DataFrame()

        impact = df.groupby("assistant_retained").agg({
            "first_season_points": ["mean", "std", "count"],
            "survived_relegation": "mean"
        }).reset_index()

        impact.columns = ["retained", "avg_points", "std_points",
                         "n_cases", "survival_rate"]

        return impact

    def _generate_summary(self, df: pd.DataFrame) -> Dict:
        """Generate pipeline summary."""

        return {
            "total_managers": df["manager"].nunique(),
            "total_matches": len(df),
            "style_distribution": self.results.get("fingerprints", pd.DataFrame())[
                "style_label"
            ].value_counts().to_dict() if "fingerprints" in self.results else {},
            "pipeline_status": "complete"
        }

    def get_manager_report(self, manager_name: str) -> Dict:
        """Generate individual manager report."""

        report = {"manager": manager_name}

        # Get fingerprint
        if "fingerprints" in self.results:
            fp = self.results["fingerprints"]
            if manager_name in fp["manager"].values:
                row = fp[fp["manager"] == manager_name].iloc[0]
                report["style"] = row["style_label"]

        # Get resource-adjusted
        if "resource_adjusted" in self.results:
            ra = self.results["resource_adjusted"]
            if manager_name in ra["manager"].values:
                row = ra[ra["manager"] == manager_name].iloc[0]
                report["resource_adjusted_ppg"] = round(row["avg_residual"], 3)
                report["matches_analyzed"] = int(row["matches"])

        return report


# Example usage
config = ManagerAnalyticsConfig(
    min_matches_fingerprint=25,
    n_style_clusters=4
)

pipeline = ManagerAnalyticsPipeline(config)

print("Manager Analytics Pipeline Initialized")
print(f"Fingerprint metrics: {config.fingerprint_metrics}")
print(f"Style clusters: {config.n_style_clusters}")
print("\nReady to process match, manager, and staff data")
# R: Complete Manager Analytics Pipeline
library(tidyverse)
library(cluster)
library(survival)

# Define the comprehensive analyzer
create_manager_analytics <- function(matches_data, managers_data, staff_data) {

    # 1. Tactical Fingerprint Analysis
    fingerprint_metrics <- c(
        "possession_pct", "ppda", "defensive_line_height",
        "build_up_pct", "direct_play_pct", "pressing_triggers",
        "avg_formation_width", "transition_speed"
    )

    tactical_profiles <- matches_data %>%
        group_by(manager) %>%
        summarise(across(all_of(fingerprint_metrics), mean, na.rm = TRUE)) %>%
        mutate(across(all_of(fingerprint_metrics), scale))

    # Cluster managers by style
    km_model <- kmeans(tactical_profiles[, fingerprint_metrics], centers = 5, nstart = 25)
    tactical_profiles$style_cluster <- km_model$cluster

    cluster_labels <- c("Possession-Based", "High Press", "Counter-Attack",
                        "Balanced", "Defensive")
    tactical_profiles$style_label <- cluster_labels[tactical_profiles$style_cluster]

    # 2. Resource-Adjusted Performance
    resource_model <- lm(
        points_per_game ~ log(wage_bill_rank) + log(squad_value_rank) +
            manager + home_pct + injury_burden,
        data = matches_data
    )

    manager_residuals <- matches_data %>%
        group_by(manager) %>%
        summarise(
            avg_residual = mean(residuals(resource_model)[manager == cur_group()[[1]]]),
            matches = n()
        ) %>%
        filter(matches >= 30) %>%
        arrange(desc(avg_residual))

    # 3. Survival Analysis (Tenure Prediction)
    tenure_data <- managers_data %>%
        mutate(
            tenure_days = as.numeric(end_date - start_date),
            was_sacked = !is.na(sacking_date),
            censored = ifelse(is.na(end_date), 1, 0)
        )

    survival_model <- survfit(
        Surv(tenure_days, was_sacked) ~ resource_level,
        data = tenure_data
    )

    # 4. Staff Continuity Impact
    staff_analysis <- staff_data %>%
        group_by(assistant_retained_post_sacking) %>%
        summarise(
            cases = n(),
            avg_first_season_points = mean(first_season_points),
            avg_relegation_survival = mean(survived_relegation),
            .groups = "drop"
        )

    # Return comprehensive results
    list(
        tactical_profiles = tactical_profiles,
        resource_adjusted = manager_residuals,
        tenure_survival = survival_model,
        staff_impact = staff_analysis,
        summary = list(
            total_managers = n_distinct(matches_data$manager),
            style_distribution = table(tactical_profiles$style_label),
            avg_tenure = mean(tenure_data$tenure_days, na.rm = TRUE)
        )
    )
}

# Example usage
cat("Manager Analytics Pipeline Ready")
cat("\nComponents: Fingerprinting, Resource-Adjustment, Survival, Staff Analysis")
cat("\nRecommendation: Run with minimum 3 seasons of data for reliable results")
Manager Analytics Checklist
Before Hiring Analysis
  • Squad profile completed (age, style, strengths)
  • Style compatibility matrix built
  • Candidate fingerprints extracted (30+ matches each)
  • Budget constraints defined
  • Development vs. win-now priority set
  • Coaching staff compatibility assessed
Before Sacking Analysis
  • Regression to mean calculated
  • xG-based metrics reviewed (not just points)
  • Remaining fixture difficulty assessed
  • Available replacement candidates mapped
  • Staff continuity plan considered
  • Historical sacking outcome data reviewed

Manager analytics remains one of the most challenging areas in football analytics due to small sample sizes and confounding factors. However, combining multiple approaches - tactical fingerprinting, resource-adjusted performance, in-game decision evaluation, and staff impact analysis - can provide meaningful insights into managerial effectiveness. The key is to resist the temptation to draw conclusions from insufficient data and to always contextualize findings within the broader football environment. As tracking data becomes more detailed and managerial career databases grow, the precision of these analyses will continue to improve.