Chapter 60

Capstone - Complete Analytics System

Intermediate 30 min read 5 sections 10 code examples
0 of 60 chapters completed (0%)
Learning Objectives
  • Understand challenges in comparing metrics across different leagues
  • Build league difficulty adjustment models
  • Calculate league quality coefficients from multiple data sources
  • Apply adjustments when scouting players from different competitions
  • Analyze league style profiles and playing characteristics

Not all leagues are created equal. A player scoring 15 goals in the Eredivisie is not equivalent to 15 goals in the Premier League. Understanding how to adjust metrics across leagues is essential for recruitment, valuation, and performance analysis in a globalized football market.

The Challenge of Cross-League Comparison

Football's global transfer market requires analysts to evaluate players from vastly different competitive environments. A midfielder dominating in the Belgian Pro League might struggle in Serie A. A prolific scorer in the Scottish Premiership might find the Bundesliga much tougher. These differences arise from:

Quality Factors
  • Player quality - Average talent level
  • Tactical sophistication - Coaching standards
  • Physical intensity - Pace and pressing
  • Competition depth - Gap between top and bottom
  • Financial resources - Squad depth and quality
Style Factors
  • Playing tempo - Speed of play
  • Tactical trends - Formations, pressing
  • Referee standards - Foul tolerance
  • Fixture congestion - Games per season
  • Climate/conditions - Weather, pitch quality
league_comparison.R / league_comparison.py
# Python: Load league data from multiple sources
import pandas as pd
import numpy as np
from typing import Dict, List
import soccerdata as sd

# Initialize data sources
fbref = sd.FBref(leagues=["ENG-Premier League", "ESP-La Liga",
                          "GER-Bundesliga", "ITA-Serie A",
                          "FRA-Ligue 1", "NED-Eredivisie"])

# Get team season statistics
def get_league_stats(league: str, season: str = "2023-2024") -> pd.DataFrame:
    """Get league-level statistics."""
    try:
        stats = fbref.read_team_season_stats(stat_type="standard")
        return stats.loc[(league, season), :]
    except Exception as e:
        print(f"Error fetching {league}: {e}")
        return pd.DataFrame()

# Collect all league data
leagues = ["ENG-Premier League", "ESP-La Liga", "GER-Bundesliga",
           "ITA-Serie A", "FRA-Ligue 1", "NED-Eredivisie"]

league_data = {}
for league in leagues:
    league_data[league] = get_league_stats(league)

# Calculate league averages
def calculate_league_averages(data: Dict[str, pd.DataFrame]) -> pd.DataFrame:
    """Calculate average metrics per league."""
    averages = []

    for league, df in data.items():
        if df.empty:
            continue

        avg = {
            "league": league,
            "teams": len(df),
            "avg_goals_for": df["Gls"].mean(),
            "avg_goals_against": df["GA"].mean(),
            "avg_xg": df["xG"].mean() if "xG" in df.columns else np.nan,
            "avg_possession": df["Poss"].mean() if "Poss" in df.columns else np.nan,
            "total_goals": df["Gls"].sum()
        }
        avg["goals_per_game"] = avg["total_goals"] / (avg["teams"] * (avg["teams"] - 1))
        averages.append(avg)

    return pd.DataFrame(averages)

league_averages = calculate_league_averages(league_data)
print(league_averages)
# R: Load league data from multiple sources
library(tidyverse)
library(worldfootballR)

# Get league-level statistics from FBref
leagues <- c("ENG", "ESP", "GER", "ITA", "FRA",
             "NED", "POR", "BEL", "SCO", "AUT")

# Function to get league statistics
get_league_stats <- function(country, season = 2023) {
  tryCatch({
    # Get team stats for the league
    stats <- fb_season_team_stats(
      country = country,
      gender = "M",
      season_end_year = season,
      stat_type = "standard"
    )

    stats %>%
      mutate(
        country = country,
        season = season
      )
  }, error = function(e) NULL)
}

# Collect data for all leagues
league_data <- map_dfr(leagues, get_league_stats)

# Calculate league averages
league_averages <- league_data %>%
  group_by(country) %>%
  summarise(
    teams = n(),
    avg_goals_for = mean(Gls, na.rm = TRUE),
    avg_goals_against = mean(GA, na.rm = TRUE),
    avg_xg = mean(xG, na.rm = TRUE),
    avg_xga = mean(xGA, na.rm = TRUE),
    avg_possession = mean(Poss, na.rm = TRUE),
    total_goals = sum(Gls, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  mutate(
    goals_per_game = total_goals / (teams * (teams - 1))
  )

print(league_averages)
Output
   league               teams  avg_goals_for  avg_xg  goals_per_game
0  ENG-Premier League      20         52.4      51.8           2.76
1  ESP-La Liga             20         48.9      49.2           2.58
2  GER-Bundesliga          18         51.2      50.1           2.84
3  ITA-Serie A             20         49.1      48.5           2.59
4  FRA-Ligue 1             18         44.8      45.2           2.49
5  NED-Eredivisie          18         58.3      54.7           3.24

League Quality Coefficients

Several approaches exist for estimating relative league quality. The most robust combine multiple data sources: UEFA coefficients, transfer market values, international results, and player migration patterns.

Method 1: UEFA Coefficient-Based

UEFA coefficients are based on European competition performance over 5 years. While imperfect (only top teams participate), they provide an official benchmark.

uefa_quality.R / uefa_quality.py
# Python: Calculate league quality from UEFA coefficients
import pandas as pd
import numpy as np

# UEFA coefficients (2023-24 season)
uefa_coefficients = pd.DataFrame({
    "league": ["Premier League", "La Liga", "Serie A", "Bundesliga",
               "Ligue 1", "Eredivisie", "Liga Portugal", "Pro League",
               "Scottish Premiership", "Austrian Bundesliga"],
    "country": ["ENG", "ESP", "ITA", "GER", "FRA",
                "NED", "POR", "BEL", "SCO", "AUT"],
    "coefficient": [90.500, 76.571, 74.980, 73.357, 59.415,
                    54.900, 50.716, 37.600, 35.750, 33.825]
})

# Normalize to Premier League = 1.0
uefa_coefficients["quality_index"] = (
    uefa_coefficients["coefficient"] / uefa_coefficients["coefficient"].max()
)

# Apply non-linear transformation (diminishing returns)
uefa_coefficients["adjusted_quality"] = uefa_coefficients["quality_index"] ** 0.7

# Calculate stat multipliers
uefa_coefficients["stat_multiplier"] = (
    uefa_coefficients["adjusted_quality"] /
    uefa_coefficients["adjusted_quality"].max()
)

uefa_quality = uefa_coefficients.sort_values("coefficient", ascending=False)
print(uefa_quality[["league", "coefficient", "quality_index", "stat_multiplier"]])
# R: Calculate league quality from UEFA coefficients
library(tidyverse)

# UEFA coefficients (2023-24 season)
uefa_coefficients <- tibble(
  league = c("Premier League", "La Liga", "Serie A", "Bundesliga",
             "Ligue 1", "Eredivisie", "Liga Portugal", "Pro League",
             "Scottish Premiership", "Austrian Bundesliga"),
  country = c("ENG", "ESP", "ITA", "GER", "FRA",
              "NED", "POR", "BEL", "SCO", "AUT"),
  coefficient = c(90.500, 76.571, 74.980, 73.357, 59.415,
                  54.900, 50.716, 37.600, 35.750, 33.825)
)

# Normalize to Premier League = 1.0
uefa_quality <- uefa_coefficients %>%
  mutate(
    quality_index = coefficient / max(coefficient),
    # Apply non-linear transformation (diminishing returns)
    adjusted_quality = quality_index^0.7
  ) %>%
  arrange(desc(coefficient))

# Calculate adjustment multipliers for player stats
# A goal in Eredivisie worth ~0.73 of a PL goal
uefa_quality <- uefa_quality %>%
  mutate(
    stat_multiplier = adjusted_quality / max(adjusted_quality)
  )

print(uefa_quality)
Output
                 league  coefficient  quality_index  stat_multiplier
0       Premier League      90.500          1.000            1.000
1             La Liga      76.571          0.846            0.896
2            Serie A      74.980          0.828            0.884
3          Bundesliga      73.357          0.811            0.872
4            Ligue 1      59.415          0.657            0.752
5         Eredivisie      54.900          0.607            0.713
6      Liga Portugal      50.716          0.560            0.674
7         Pro League      37.600          0.415            0.552
8  Scottish Premiership   35.750          0.395            0.535
9  Austrian Bundesliga    33.825          0.374            0.516

Method 2: Transfer Value Analysis

Market values reflect collective wisdom about player quality. By analyzing average squad values and transfer flows between leagues, we can estimate relative quality.

market_quality.R / market_quality.py
# Python: League quality from transfer market data
import pandas as pd
import numpy as np

# Average squad market values (millions EUR, 2023-24)
market_values = pd.DataFrame({
    "league": ["Premier League", "La Liga", "Serie A", "Bundesliga",
               "Ligue 1", "Eredivisie", "Liga Portugal", "Pro League",
               "Scottish Premiership", "Austrian Bundesliga"],
    "avg_squad_value": [548, 312, 267, 298, 215, 89, 78, 52, 28, 31],
    "total_league_value": [10960, 6240, 5340, 5364, 3870,
                           1602, 1404, 832, 336, 558]
})

# Calculate quality indices
market_values["value_index"] = (
    market_values["avg_squad_value"] / market_values["avg_squad_value"].max()
)

# Log transformation for better distribution
market_values["log_value"] = np.log(market_values["avg_squad_value"])
market_values["log_normalized"] = (
    (market_values["log_value"] - market_values["log_value"].min()) /
    (market_values["log_value"].max() - market_values["log_value"].min())
)

# Weighted quality index
market_values["market_quality"] = (
    0.5 * market_values["value_index"] +
    0.5 * market_values["log_normalized"]
)

# Transfer flow analysis
transfer_flows = pd.DataFrame({
    "league": ["Premier League", "La Liga", "Serie A", "Bundesliga",
               "Ligue 1", "Eredivisie", "Liga Portugal"],
    "net_spend_5yr": [3200, 450, -180, -320, -890, -420, -380],
    "players_in_from_lower": [145, 98, 87, 92, 75, 34, 28],
    "players_out_to_higher": [12, 45, 52, 48, 78, 89, 72]
})

print(market_values[["league", "avg_squad_value", "value_index", "market_quality"]])
# R: League quality from transfer market data
library(tidyverse)

# Average squad market values (millions EUR, 2023-24)
market_values <- tibble(
  league = c("Premier League", "La Liga", "Serie A", "Bundesliga",
             "Ligue 1", "Eredivisie", "Liga Portugal", "Pro League",
             "Scottish Premiership", "Austrian Bundesliga"),
  avg_squad_value = c(548, 312, 267, 298, 215,
                      89, 78, 52, 28, 31),
  total_league_value = c(10960, 6240, 5340, 5364, 3870,
                         1602, 1404, 832, 336, 558)
)

# Calculate quality based on market values
market_quality <- market_values %>%
  mutate(
    # Normalize to Premier League
    value_index = avg_squad_value / max(avg_squad_value),
    # Log transformation for better distribution
    log_value = log(avg_squad_value),
    log_normalized = (log_value - min(log_value)) /
                     (max(log_value) - min(log_value)),
    # Weighted quality index
    market_quality = 0.5 * value_index + 0.5 * log_normalized
  )

# Transfer flow analysis
# Positive flow = net importer (higher quality)
transfer_flows <- tibble(
  league = c("Premier League", "La Liga", "Serie A", "Bundesliga",
             "Ligue 1", "Eredivisie", "Liga Portugal"),
  net_spend_5yr = c(3200, 450, -180, -320, -890, -420, -380),
  # Players moving UP vs DOWN in league quality
  players_in_from_lower = c(145, 98, 87, 92, 75, 34, 28),
  players_out_to_higher = c(12, 45, 52, 48, 78, 89, 72)
)

print(market_quality)

Method 3: Player Performance Delta

The most empirical approach: track how player statistics change when they move between leagues. If players consistently score fewer goals after moving from League A to League B, we can quantify the difficulty difference.

performance_delta.R / performance_delta.py
# Python: Analyze player performance when changing leagues
import pandas as pd
import numpy as np
from typing import Tuple

# Simulated transfer data with before/after stats
transfers = pd.DataFrame({
    "player": ["Player A", "Player B", "Player C", "Player D", "Player E"],
    "from_league": ["Eredivisie", "Ligue 1", "Bundesliga",
                    "Serie A", "Liga Portugal"],
    "to_league": ["Premier League", "Premier League", "Premier League",
                  "Premier League", "La Liga"],
    "g90_before": [0.72, 0.58, 0.45, 0.52, 0.85],
    "g90_after": [0.41, 0.38, 0.42, 0.48, 0.61],
    "xg90_before": [0.65, 0.52, 0.48, 0.49, 0.78],
    "xg90_after": [0.45, 0.41, 0.44, 0.45, 0.55],
    "minutes_before": [2800, 2650, 2900, 2750, 2600],
    "minutes_after": [2100, 2400, 2700, 2500, 2300]
})

# Calculate performance retention rate
transfers["g90_retention"] = transfers["g90_after"] / transfers["g90_before"]
transfers["xg90_retention"] = transfers["xg90_after"] / transfers["xg90_before"]
transfers["overall_retention"] = (
    transfers["g90_retention"] + transfers["xg90_retention"]
) / 2

# Aggregate by league pair
league_difficulty = transfers.groupby(["from_league", "to_league"]).agg({
    "player": "count",
    "overall_retention": "mean"
}).reset_index()

league_difficulty.columns = ["from_league", "to_league", "n_transfers", "avg_retention"]
league_difficulty["difficulty_ratio"] = 1 / league_difficulty["avg_retention"]

print(league_difficulty)
# R: Analyze player performance when changing leagues
library(tidyverse)

# Simulated transfer data with before/after stats
transfers <- tibble(
  player = c("Player A", "Player B", "Player C", "Player D", "Player E"),
  from_league = c("Eredivisie", "Ligue 1", "Bundesliga", "Serie A", "Liga Portugal"),
  to_league = c("Premier League", "Premier League", "Premier League",
                "Premier League", "La Liga"),
  # Goals per 90 before and after transfer
  g90_before = c(0.72, 0.58, 0.45, 0.52, 0.85),
  g90_after = c(0.41, 0.38, 0.42, 0.48, 0.61),
  # xG per 90 before and after
  xg90_before = c(0.65, 0.52, 0.48, 0.49, 0.78),
  xg90_after = c(0.45, 0.41, 0.44, 0.45, 0.55),
  minutes_before = c(2800, 2650, 2900, 2750, 2600),
  minutes_after = c(2100, 2400, 2700, 2500, 2300)
)

# Calculate performance retention rate
transfers <- transfers %>%
  mutate(
    g90_retention = g90_after / g90_before,
    xg90_retention = xg90_after / xg90_before,
    # Weighted average retention
    overall_retention = (g90_retention + xg90_retention) / 2
  )

# Aggregate by league pair
league_difficulty <- transfers %>%
  group_by(from_league, to_league) %>%
  summarise(
    n_transfers = n(),
    avg_retention = mean(overall_retention),
    .groups = "drop"
  ) %>%
  # Infer difficulty multiplier
  mutate(
    difficulty_ratio = 1 / avg_retention
  )

print(league_difficulty)
Output
   from_league       to_league  n_transfers  avg_retention  difficulty_ratio
0    Eredivisie  Premier League            1          0.602             1.661
1       Ligue 1  Premier League            1          0.687             1.455
2    Bundesliga  Premier League            1          0.944             1.059
3      Serie A  Premier League            1          0.930             1.075
4  Liga Portugal        La Liga            1          0.735             1.361

Building a Comprehensive Adjustment Model

The most robust approach combines multiple methods into a single league quality model. We can use Bayesian methods to weight different evidence sources and produce uncertainty estimates.

league_adjustment_model.R / league_adjustment_model.py
# Python: Comprehensive league adjustment model
import pandas as pd
import numpy as np
from dataclasses import dataclass
from typing import Dict

# Combine all quality measures
league_quality_data = pd.DataFrame({
    "league": ["Premier League", "La Liga", "Serie A", "Bundesliga",
               "Ligue 1", "Eredivisie", "Liga Portugal", "Pro League"],
    "uefa_quality": [1.000, 0.896, 0.884, 0.872, 0.752, 0.713, 0.674, 0.552],
    "market_quality": [1.000, 0.569, 0.487, 0.544, 0.392, 0.162, 0.142, 0.095],
    "transfer_flow_quality": [1.000, 0.850, 0.820, 0.830, 0.700, 0.600, 0.580, 0.450],
    "performance_delta": [1.000, 0.920, 0.900, 0.940, 0.800, 0.650, 0.700, 0.550]
})

# Weights based on reliability of each measure
weights = {
    "uefa": 0.25,
    "market": 0.20,
    "transfer": 0.25,
    "performance": 0.30
}

# Calculate composite quality score
league_quality_data["composite_quality"] = (
    weights["uefa"] * league_quality_data["uefa_quality"] +
    weights["market"] * league_quality_data["market_quality"] +
    weights["transfer"] * league_quality_data["transfer_flow_quality"] +
    weights["performance"] * league_quality_data["performance_delta"]
)

# Adjustment multiplier
league_quality_data["adjustment_multiplier"] = (
    league_quality_data["composite_quality"] /
    league_quality_data["composite_quality"].max()
)

# Confidence based on measure agreement
league_quality_data["measure_sd"] = league_quality_data[
    ["uefa_quality", "market_quality", "transfer_flow_quality", "performance_delta"]
].std(axis=1)
league_quality_data["confidence"] = 1 - (league_quality_data["measure_sd"] / 0.3)

@dataclass
class LeagueAdjuster:
    """Adjust player statistics between leagues."""
    quality_table: pd.DataFrame

    def adjust_stat(self, stat: float, from_league: str, to_league: str) -> float:
        """Adjust a statistic from one league to another."""
        from_mult = self.quality_table.loc[
            self.quality_table["league"] == from_league,
            "adjustment_multiplier"
        ].values[0]

        to_mult = self.quality_table.loc[
            self.quality_table["league"] == to_league,
            "adjustment_multiplier"
        ].values[0]

        return stat * (from_mult / to_mult)

# Create adjuster
adjuster = LeagueAdjuster(league_quality_data)

# Example: Adjust goals from Eredivisie to Premier League
eredivisie_goals = 20
adjusted_goals = adjuster.adjust_stat(20, "Eredivisie", "Premier League")
print(f"{eredivisie_goals} goals in Eredivisie ≈ {adjusted_goals:.1f} goals in Premier League")
# R: Comprehensive league adjustment model
library(tidyverse)

# Combine all quality measures
league_quality_data <- tibble(
  league = c("Premier League", "La Liga", "Serie A", "Bundesliga",
             "Ligue 1", "Eredivisie", "Liga Portugal", "Pro League"),
  # Different quality measures (normalized 0-1)
  uefa_quality = c(1.000, 0.896, 0.884, 0.872, 0.752, 0.713, 0.674, 0.552),
  market_quality = c(1.000, 0.569, 0.487, 0.544, 0.392, 0.162, 0.142, 0.095),
  transfer_flow_quality = c(1.000, 0.850, 0.820, 0.830, 0.700, 0.600, 0.580, 0.450),
  performance_delta = c(1.000, 0.920, 0.900, 0.940, 0.800, 0.650, 0.700, 0.550)
)

# Weighted combination model
# Weights based on reliability/validity of each measure
weights <- c(uefa = 0.25, market = 0.20, transfer = 0.25, performance = 0.30)

league_quality_model <- league_quality_data %>%
  mutate(
    # Weighted average quality score
    composite_quality = (
      weights["uefa"] * uefa_quality +
      weights["market"] * market_quality +
      weights["transfer"] * transfer_flow_quality +
      weights["performance"] * performance_delta
    ),
    # Calculate adjustment multiplier
    # Stats in lower leagues get discounted
    adjustment_multiplier = composite_quality / max(composite_quality),
    # Confidence based on measure agreement
    measure_sd = pmap_dbl(
      list(uefa_quality, market_quality, transfer_flow_quality, performance_delta),
      ~sd(c(..1, ..2, ..3, ..4))
    ),
    confidence = 1 - (measure_sd / 0.3)  # Higher SD = lower confidence
  ) %>%
  arrange(desc(composite_quality))

# Function to adjust player stats
adjust_player_stat <- function(stat, from_league, to_league,
                               quality_table = league_quality_model) {
  from_mult <- quality_table$adjustment_multiplier[
    quality_table$league == from_league
  ]
  to_mult <- quality_table$adjustment_multiplier[
    quality_table$league == to_league
  ]

  # Adjust stat to target league difficulty
  stat * (from_mult / to_mult)
}

# Example: Adjust goals from Eredivisie to Premier League
eredivisie_goals <- 20
adjusted_goals <- adjust_player_stat(20, "Eredivisie", "Premier League")
cat(sprintf("%.1f goals in Eredivisie ≈ %.1f goals in Premier League\n",
            eredivisie_goals, adjusted_goals))
Output
20 goals in Eredivisie ≈ 14.2 goals in Premier League
Adjustment Caveats

League adjustments are population-level estimates. Individual players may outperform or underperform predictions based on playing style fit, team quality, adaptation ability, and many other factors. Use adjustments as a starting point, not a definitive prediction.

League Style Profiles

Beyond quality differences, leagues have distinct playing styles that affect how players perform. A technically brilliant player might excel in La Liga but struggle with the physical intensity of the Premier League. Understanding these style differences helps predict adaptation.

league_styles.R / league_styles.py
# Python: Create league style profiles
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.spatial.distance import pdist, squareform

# League style metrics (standardized 0-1 scores)
league_styles = pd.DataFrame({
    "league": ["Premier League", "La Liga", "Serie A",
               "Bundesliga", "Ligue 1", "Eredivisie"],
    "tempo": [0.85, 0.45, 0.35, 0.90, 0.55, 0.70],
    "pressing_intensity": [0.90, 0.50, 0.40, 0.95, 0.60, 0.55],
    "physical_duels": [0.95, 0.55, 0.70, 0.75, 0.80, 0.45],
    "technical_quality": [0.75, 0.95, 0.85, 0.70, 0.70, 0.80],
    "tactical_discipline": [0.70, 0.85, 0.95, 0.75, 0.75, 0.60],
    "attacking_directness": [0.80, 0.55, 0.45, 0.85, 0.65, 0.90],
    "defensive_organization": [0.75, 0.80, 0.95, 0.70, 0.75, 0.55]
})

def create_radar_chart(df: pd.DataFrame, league: str):
    """Create radar chart for a league style profile."""
    categories = [col for col in df.columns if col != "league"]
    league_data = df[df["league"] == league][categories].values.flatten()

    # Number of categories
    N = len(categories)
    angles = [n / float(N) * 2 * np.pi for n in range(N)]
    angles += angles[:1]  # Complete the loop

    values = list(league_data) + [league_data[0]]

    fig, ax = plt.subplots(figsize=(6, 6), subplot_kw=dict(polar=True))
    ax.plot(angles, values, "o-", linewidth=2)
    ax.fill(angles, values, alpha=0.25)
    ax.set_xticks(angles[:-1])
    ax.set_xticklabels(categories, size=8)
    ax.set_ylim(0, 1)
    ax.set_title(league, size=14, fontweight="bold")

    return fig

# Calculate style similarity matrix
style_features = league_styles.set_index("league")
distances = squareform(pdist(style_features.values, metric="euclidean"))
style_similarity = pd.DataFrame(
    distances,
    index=style_features.index,
    columns=style_features.index
)

print("League Style Distances:")
print(style_similarity.round(2))
# R: Create league style profiles
library(tidyverse)
library(ggplot2)

# League style metrics (standardized z-scores)
league_styles <- tibble(
  league = c("Premier League", "La Liga", "Serie A",
             "Bundesliga", "Ligue 1", "Eredivisie"),
  # Playing style dimensions
  tempo = c(0.85, 0.45, 0.35, 0.90, 0.55, 0.70),  # Speed of play
  pressing_intensity = c(0.90, 0.50, 0.40, 0.95, 0.60, 0.55),
  physical_duels = c(0.95, 0.55, 0.70, 0.75, 0.80, 0.45),
  technical_quality = c(0.75, 0.95, 0.85, 0.70, 0.70, 0.80),
  tactical_discipline = c(0.70, 0.85, 0.95, 0.75, 0.75, 0.60),
  attacking_directness = c(0.80, 0.55, 0.45, 0.85, 0.65, 0.90),
  defensive_organization = c(0.75, 0.80, 0.95, 0.70, 0.75, 0.55)
)

# Radar chart data
radar_data <- league_styles %>%
  pivot_longer(
    cols = -league,
    names_to = "dimension",
    values_to = "score"
  )

# Create radar plot for each league
create_league_radar <- function(league_name) {
  data <- radar_data %>%
    filter(league == league_name)

  ggplot(data, aes(x = dimension, y = score)) +
    geom_polygon(aes(group = 1), fill = "steelblue", alpha = 0.3) +
    geom_point(color = "steelblue", size = 3) +
    coord_polar() +
    ylim(0, 1) +
    labs(title = league_name) +
    theme_minimal() +
    theme(axis.text.x = element_text(size = 8))
}

# Calculate style similarity between leagues
style_matrix <- league_styles %>%
  select(-league) %>%
  as.matrix()

rownames(style_matrix) <- league_styles$league

# Euclidean distance between leagues
style_distances <- dist(style_matrix, method = "euclidean")
print(as.matrix(style_distances))
Output
League Style Distances:
                Premier League  La Liga  Serie A  Bundesliga  Ligue 1  Eredivisie
Premier League           0.00     0.89     0.98        0.24     0.52        0.61
La Liga                  0.89     0.00     0.42        0.91     0.47        0.72
Serie A                  0.98     0.42     0.00        1.02     0.58        0.87
Bundesliga               0.24     0.91     1.02        0.00     0.55        0.52
Ligue 1                  0.52     0.47     0.58        0.55     0.00        0.49
Eredivisie               0.61     0.72     0.87        0.52     0.49        0.00
Style-Based Transfer Insights
  • PL ↔ Bundesliga: Similar high-tempo, pressing styles - easier adaptation
  • La Liga ↔ Serie A: Both tactical, technical - good compatibility
  • Eredivisie → PL: Similar attacking directness but big physical gap
  • Serie A → PL: Biggest style mismatch - tempo and pressing adjustment needed

Position-Specific Adjustments

League quality affects positions differently. Strikers in weaker leagues might be more inflated than defenders because they face weaker opposition more directly. Midfielders might be less affected because they play against similar quality throughout.

position_adjustments.R / position_adjustments.py
# Python: Position-specific league adjustments
import pandas as pd
import numpy as np

# Position-specific adjustment factors
position_adjustments = pd.DataFrame({
    "position": ["Striker", "Winger", "Attacking Mid",
                 "Central Mid", "Defensive Mid",
                 "Full-Back", "Center-Back", "Goalkeeper"],
    "league_sensitivity": [1.20, 1.15, 1.10, 1.00, 0.95, 0.90, 0.85, 0.80],
    "avg_stat_retention": [0.72, 0.75, 0.78, 0.82, 0.85, 0.88, 0.90, 0.92]
})

class PositionAwareAdjuster:
    """League adjuster with position-specific sensitivity."""

    def __init__(self, league_quality: pd.DataFrame,
                 position_adj: pd.DataFrame):
        self.league_quality = league_quality
        self.position_adj = position_adj

    def adjust_stat(self, stat: float, from_league: str,
                    to_league: str, position: str) -> float:
        """Adjust stat with position-specific sensitivity."""
        # Get league quality values
        from_quality = self.league_quality.loc[
            self.league_quality["league"] == from_league,
            "composite_quality"
        ].values[0]

        to_quality = self.league_quality.loc[
            self.league_quality["league"] == to_league,
            "composite_quality"
        ].values[0]

        # Get position sensitivity
        pos_sensitivity = self.position_adj.loc[
            self.position_adj["position"] == position,
            "league_sensitivity"
        ].values[0]

        # Calculate position-weighted adjustment
        quality_ratio = from_quality / to_quality
        adjusted_ratio = 1 + (quality_ratio - 1) * pos_sensitivity

        return stat * adjusted_ratio

# Create adjuster
pos_adjuster = PositionAwareAdjuster(league_quality_data, position_adjustments)

# Example comparisons
examples = [
    {"player": "Striker A", "position": "Striker", "stat": 0.75},
    {"player": "Midfielder B", "position": "Central Mid", "stat": 0.85},
    {"player": "Defender C", "position": "Center-Back", "stat": 0.90}
]

print("Position-specific adjustments (Eredivisie → Premier League):")
for ex in examples:
    adjusted = pos_adjuster.adjust_stat(
        ex["stat"], "Eredivisie", "Premier League", ex["position"]
    )
    print(f"{ex['player']} ({ex['position']}): {ex['stat']:.2f} → {adjusted:.2f}")
# R: Position-specific league adjustments
library(tidyverse)

# Position-specific adjustment factors
# Based on empirical transfer performance data
position_adjustments <- tibble(
  position = c("Striker", "Winger", "Attacking Mid",
               "Central Mid", "Defensive Mid",
               "Full-Back", "Center-Back", "Goalkeeper"),
  # How much league quality affects this position (1.0 = average)
  league_sensitivity = c(1.20, 1.15, 1.10,
                         1.00, 0.95,
                         0.90, 0.85, 0.80),
  # Typical retention rate when moving up leagues
  avg_stat_retention = c(0.72, 0.75, 0.78,
                         0.82, 0.85,
                         0.88, 0.90, 0.92)
)

# Function for position-aware adjustment
adjust_stat_by_position <- function(stat, from_league, to_league, position,
                                    league_quality, position_adj) {
  # Get base league adjustment
  from_quality <- league_quality$composite_quality[
    league_quality$league == from_league
  ]
  to_quality <- league_quality$composite_quality[
    league_quality$league == to_league
  ]

  # Get position sensitivity
  pos_sensitivity <- position_adj$league_sensitivity[
    position_adj$position == position
  ]

  # Calculate adjustment with position weighting
  quality_ratio <- from_quality / to_quality

  # Apply position-specific sensitivity
  # Higher sensitivity = more adjustment
  adjusted_ratio <- 1 + (quality_ratio - 1) * pos_sensitivity

  stat * adjusted_ratio
}

# Example comparisons
example_stats <- tibble(
  player = c("Striker A", "Midfielder B", "Defender C"),
  position = c("Striker", "Central Mid", "Center-Back"),
  from_league = rep("Eredivisie", 3),
  to_league = rep("Premier League", 3),
  original_stat = c(0.75, 0.85, 0.90)  # Normalized performance
)

# Apply adjustments
example_stats <- example_stats %>%
  rowwise() %>%
  mutate(
    adjusted_stat = adjust_stat_by_position(
      original_stat, from_league, to_league, position,
      league_quality_model, position_adjustments
    )
  )

print(example_stats)
Output
Position-specific adjustments (Eredivisie → Premier League):
Striker A (Striker): 0.75 → 0.56
Midfielder B (Central Mid): 0.85 → 0.70
Defender C (Center-Back): 0.90 → 0.79

Practical Scouting Applications

Let's build a complete scouting tool that applies league adjustments to player comparisons, helping identify undervalued players in lower leagues who could succeed at higher levels.

scouting_tool.R / scouting_tool.py
# Python: Complete scouting tool with league adjustments
import pandas as pd
import numpy as np
from dataclasses import dataclass
from typing import Dict, Optional

# Player scouting database
scouting_data = pd.DataFrame({
    "player": ["Memphis Depay", "Luis Suarez", "Kevin De Bruyne",
               "Virgil van Dijk", "Bruno Fernandes"],
    "position": ["Winger", "Striker", "Attacking Mid",
                 "Center-Back", "Attacking Mid"],
    "league": ["Eredivisie", "Eredivisie", "Bundesliga",
               "Scottish Premiership", "Liga Portugal"],
    "age": [21, 24, 23, 24, 25],
    "goals": [0.52, 0.85, 0.28, 0.08, 0.35],
    "assists": [0.31, 0.42, 0.48, 0.04, 0.28],
    "xg": [0.45, 0.78, 0.25, 0.05, 0.32],
    "xa": [0.28, 0.35, 0.52, 0.03, 0.30],
    "market_value_m": [25, 22, 18, 8, 15]
})

@dataclass
class ScoutingTool:
    """Scouting tool with league adjustments."""

    league_quality: Dict[str, float] = None
    position_sensitivity: Dict[str, float] = None

    def __post_init__(self):
        if self.league_quality is None:
            self.league_quality = {
                "Eredivisie": 0.713,
                "Bundesliga": 0.872,
                "Scottish Premiership": 0.535,
                "Liga Portugal": 0.674,
                "Premier League": 1.0
            }
        if self.position_sensitivity is None:
            self.position_sensitivity = {
                "Striker": 1.20, "Winger": 1.15, "Attacking Mid": 1.10,
                "Central Mid": 1.0, "Defensive Mid": 0.95,
                "Full-Back": 0.90, "Center-Back": 0.85, "Goalkeeper": 0.80
            }

    def adjust_stat(self, stat: float, league: str, position: str) -> float:
        """Adjust stat to Premier League equivalent."""
        league_mult = self.league_quality.get(league, 1.0)
        pos_mult = self.position_sensitivity.get(position, 1.0)
        return stat * league_mult * pos_mult

    def generate_report(self, df: pd.DataFrame, player_name: str) -> str:
        """Generate scouting report for a player."""
        player = df[df["player"] == player_name].iloc[0]

        goals_adj = self.adjust_stat(
            player["goals"], player["league"], player["position"]
        )
        assists_adj = self.adjust_stat(
            player["assists"], player["league"], player["position"]
        )

        production = (goals_adj * 1.5 + assists_adj) / 2
        value_eff = production / (player["market_value_m"] / 100)

        report = f"""
=== SCOUTING REPORT: {player_name} ===
Position: {player["position"]} | Age: {player["age"]} | League: {player["league"]}
Market Value: €{player["market_value_m"]:.1f}M

Raw Stats (per 90):
  Goals: {player["goals"]:.2f} | Assists: {player["assists"]:.2f}

Premier League Adjusted:
  Goals: {goals_adj:.2f} | Assists: {assists_adj:.2f}

Value Efficiency Score: {value_eff:.2f}
"""
        return report

# Create tool and generate report
scout = ScoutingTool()
print(scout.generate_report(scouting_data, "Luis Suarez"))
# R: Complete scouting tool with league adjustments
library(tidyverse)

# Player scouting database
scouting_data <- tibble(
  player = c("Memphis Depay", "Luis Suarez", "Kevin De Bruyne",
             "Virgil van Dijk", "Bruno Fernandes"),
  position = c("Winger", "Striker", "Attacking Mid",
               "Center-Back", "Attacking Mid"),
  league = c("Eredivisie", "Eredivisie", "Bundesliga",
             "Scottish Premiership", "Liga Portugal"),
  age = c(21, 24, 23, 24, 25),
  # Raw stats (per 90)
  goals = c(0.52, 0.85, 0.28, 0.08, 0.35),
  assists = c(0.31, 0.42, 0.48, 0.04, 0.28),
  xg = c(0.45, 0.78, 0.25, 0.05, 0.32),
  xa = c(0.28, 0.35, 0.52, 0.03, 0.30),
  market_value_m = c(25, 22, 18, 8, 15)
)

# Apply Premier League adjustments
adjust_to_pl <- function(data, position, stat_col) {
  # Get league quality
  league_quality <- case_when(
    data$league == "Eredivisie" ~ 0.713,
    data$league == "Bundesliga" ~ 0.872,
    data$league == "Scottish Premiership" ~ 0.535,
    data$league == "Liga Portugal" ~ 0.674,
    TRUE ~ 1.0
  )

  # Position sensitivity
  pos_sensitivity <- case_when(
    position == "Striker" ~ 1.20,
    position == "Winger" ~ 1.15,
    position == "Attacking Mid" ~ 1.10,
    position == "Center-Back" ~ 0.85,
    TRUE ~ 1.0
  )

  # Adjust stat
  adjusted <- stat_col * league_quality * pos_sensitivity
  return(adjusted)
}

# Create adjusted scouting report
scouting_adjusted <- scouting_data %>%
  mutate(
    goals_adj = map2_dbl(row_number(), goals, ~{
      adjust_to_pl(scouting_data[.x,], position[.x], goals[.x])
    }),
    assists_adj = map2_dbl(row_number(), assists, ~{
      adjust_to_pl(scouting_data[.x,], position[.x], assists[.x])
    }),
    # Calculate value score
    production_score = (goals_adj * 1.5 + assists_adj) / 2,
    value_efficiency = production_score / (market_value_m / 100)
  ) %>%
  arrange(desc(value_efficiency))

# Scouting recommendation
create_scouting_report <- function(player_name, data) {
  player <- data %>% filter(player == player_name)

  cat(sprintf("\n=== SCOUTING REPORT: %s ===\n", player_name))
  cat(sprintf("Position: %s | Age: %d | League: %s\n",
              player$position, player$age, player$league))
  cat(sprintf("Market Value: €%.1fM\n", player$market_value_m))
  cat("\nRaw Stats (per 90):\n")
  cat(sprintf("  Goals: %.2f | Assists: %.2f\n",
              player$goals, player$assists))
  cat("\nPremier League Adjusted:\n")
  cat(sprintf("  Goals: %.2f | Assists: %.2f\n",
              player$goals_adj, player$assists_adj))
  cat(sprintf("\nValue Efficiency Score: %.2f\n", player$value_efficiency))
}

create_scouting_report("Luis Suarez", scouting_adjusted)
Output

=== SCOUTING REPORT: Luis Suarez ===
Position: Striker | Age: 24 | League: Eredivisie
Market Value: €22.0M

Raw Stats (per 90):
  Goals: 0.85 | Assists: 0.42

Premier League Adjusted:
  Goals: 0.73 | Assists: 0.36

Value Efficiency Score: 0.74

Practice Exercises

Exercise 1: Build Your Own League Quality Model

Create a league quality model using FBref data. Calculate the average xG, possession, and pressing metrics for each top-5 league. Normalize these to create a composite quality index.

Use fb_season_team_stats() in R or FBref.read_team_season_stats() in Python to get league-level data. Calculate means for each metric, then use min-max normalization before creating a weighted composite.
Exercise 2: Transfer Performance Analysis

Analyze a cohort of players who moved from the Eredivisie to the Premier League in the last 5 years. Calculate their performance retention rates and identify which player types adapted best.

Exercise 3: League Style Clustering

Using the style metrics provided, apply k-means clustering to group leagues by playing style. Identify which leagues are most similar and create visualizations to communicate the findings.

Summary

Key Takeaways
  • Multiple methods: Combine UEFA coefficients, market values, transfer flows, and performance deltas for robust league quality estimation
  • Position matters: Attackers are more affected by league quality differences than defenders
  • Style compatibility: Beyond quality, league style profiles help predict player adaptation
  • Adjustment uncertainty: League adjustments are population-level estimates with significant individual variation
  • Scouting applications: Properly adjusted metrics help identify undervalued players in lower leagues
Key Formulas
  • Basic adjustment: Adjusted Stat = Raw Stat × (From League Quality / To League Quality)
  • Position-weighted: Adjusted = Raw × Quality Ratio × Position Sensitivity
  • Composite quality: Q = w₁×UEFA + w₂×Market + w₃×Transfer + w₄×Performance