Capstone - Complete Analytics System
Learning Objectives
- Understand challenges in comparing metrics across different leagues
- Build league difficulty adjustment models
- Calculate league quality coefficients from multiple data sources
- Apply adjustments when scouting players from different competitions
- Analyze league style profiles and playing characteristics
Not all leagues are created equal. A player scoring 15 goals in the Eredivisie is not equivalent to 15 goals in the Premier League. Understanding how to adjust metrics across leagues is essential for recruitment, valuation, and performance analysis in a globalized football market.
The Challenge of Cross-League Comparison
Football's global transfer market requires analysts to evaluate players from vastly different competitive environments. A midfielder dominating in the Belgian Pro League might struggle in Serie A. A prolific scorer in the Scottish Premiership might find the Bundesliga much tougher. These differences arise from:
- Player quality - Average talent level
- Tactical sophistication - Coaching standards
- Physical intensity - Pace and pressing
- Competition depth - Gap between top and bottom
- Financial resources - Squad depth and quality
- Playing tempo - Speed of play
- Tactical trends - Formations, pressing
- Referee standards - Foul tolerance
- Fixture congestion - Games per season
- Climate/conditions - Weather, pitch quality
# Python: Load league data from multiple sources
import pandas as pd
import numpy as np
from typing import Dict, List
import soccerdata as sd
# Initialize data sources
fbref = sd.FBref(leagues=["ENG-Premier League", "ESP-La Liga",
"GER-Bundesliga", "ITA-Serie A",
"FRA-Ligue 1", "NED-Eredivisie"])
# Get team season statistics
def get_league_stats(league: str, season: str = "2023-2024") -> pd.DataFrame:
"""Get league-level statistics."""
try:
stats = fbref.read_team_season_stats(stat_type="standard")
return stats.loc[(league, season), :]
except Exception as e:
print(f"Error fetching {league}: {e}")
return pd.DataFrame()
# Collect all league data
leagues = ["ENG-Premier League", "ESP-La Liga", "GER-Bundesliga",
"ITA-Serie A", "FRA-Ligue 1", "NED-Eredivisie"]
league_data = {}
for league in leagues:
league_data[league] = get_league_stats(league)
# Calculate league averages
def calculate_league_averages(data: Dict[str, pd.DataFrame]) -> pd.DataFrame:
"""Calculate average metrics per league."""
averages = []
for league, df in data.items():
if df.empty:
continue
avg = {
"league": league,
"teams": len(df),
"avg_goals_for": df["Gls"].mean(),
"avg_goals_against": df["GA"].mean(),
"avg_xg": df["xG"].mean() if "xG" in df.columns else np.nan,
"avg_possession": df["Poss"].mean() if "Poss" in df.columns else np.nan,
"total_goals": df["Gls"].sum()
}
avg["goals_per_game"] = avg["total_goals"] / (avg["teams"] * (avg["teams"] - 1))
averages.append(avg)
return pd.DataFrame(averages)
league_averages = calculate_league_averages(league_data)
print(league_averages)# R: Load league data from multiple sources
library(tidyverse)
library(worldfootballR)
# Get league-level statistics from FBref
leagues <- c("ENG", "ESP", "GER", "ITA", "FRA",
"NED", "POR", "BEL", "SCO", "AUT")
# Function to get league statistics
get_league_stats <- function(country, season = 2023) {
tryCatch({
# Get team stats for the league
stats <- fb_season_team_stats(
country = country,
gender = "M",
season_end_year = season,
stat_type = "standard"
)
stats %>%
mutate(
country = country,
season = season
)
}, error = function(e) NULL)
}
# Collect data for all leagues
league_data <- map_dfr(leagues, get_league_stats)
# Calculate league averages
league_averages <- league_data %>%
group_by(country) %>%
summarise(
teams = n(),
avg_goals_for = mean(Gls, na.rm = TRUE),
avg_goals_against = mean(GA, na.rm = TRUE),
avg_xg = mean(xG, na.rm = TRUE),
avg_xga = mean(xGA, na.rm = TRUE),
avg_possession = mean(Poss, na.rm = TRUE),
total_goals = sum(Gls, na.rm = TRUE),
.groups = "drop"
) %>%
mutate(
goals_per_game = total_goals / (teams * (teams - 1))
)
print(league_averages) league teams avg_goals_for avg_xg goals_per_game
0 ENG-Premier League 20 52.4 51.8 2.76
1 ESP-La Liga 20 48.9 49.2 2.58
2 GER-Bundesliga 18 51.2 50.1 2.84
3 ITA-Serie A 20 49.1 48.5 2.59
4 FRA-Ligue 1 18 44.8 45.2 2.49
5 NED-Eredivisie 18 58.3 54.7 3.24League Quality Coefficients
Several approaches exist for estimating relative league quality. The most robust combine multiple data sources: UEFA coefficients, transfer market values, international results, and player migration patterns.
Method 1: UEFA Coefficient-Based
UEFA coefficients are based on European competition performance over 5 years. While imperfect (only top teams participate), they provide an official benchmark.
# Python: Calculate league quality from UEFA coefficients
import pandas as pd
import numpy as np
# UEFA coefficients (2023-24 season)
uefa_coefficients = pd.DataFrame({
"league": ["Premier League", "La Liga", "Serie A", "Bundesliga",
"Ligue 1", "Eredivisie", "Liga Portugal", "Pro League",
"Scottish Premiership", "Austrian Bundesliga"],
"country": ["ENG", "ESP", "ITA", "GER", "FRA",
"NED", "POR", "BEL", "SCO", "AUT"],
"coefficient": [90.500, 76.571, 74.980, 73.357, 59.415,
54.900, 50.716, 37.600, 35.750, 33.825]
})
# Normalize to Premier League = 1.0
uefa_coefficients["quality_index"] = (
uefa_coefficients["coefficient"] / uefa_coefficients["coefficient"].max()
)
# Apply non-linear transformation (diminishing returns)
uefa_coefficients["adjusted_quality"] = uefa_coefficients["quality_index"] ** 0.7
# Calculate stat multipliers
uefa_coefficients["stat_multiplier"] = (
uefa_coefficients["adjusted_quality"] /
uefa_coefficients["adjusted_quality"].max()
)
uefa_quality = uefa_coefficients.sort_values("coefficient", ascending=False)
print(uefa_quality[["league", "coefficient", "quality_index", "stat_multiplier"]])# R: Calculate league quality from UEFA coefficients
library(tidyverse)
# UEFA coefficients (2023-24 season)
uefa_coefficients <- tibble(
league = c("Premier League", "La Liga", "Serie A", "Bundesliga",
"Ligue 1", "Eredivisie", "Liga Portugal", "Pro League",
"Scottish Premiership", "Austrian Bundesliga"),
country = c("ENG", "ESP", "ITA", "GER", "FRA",
"NED", "POR", "BEL", "SCO", "AUT"),
coefficient = c(90.500, 76.571, 74.980, 73.357, 59.415,
54.900, 50.716, 37.600, 35.750, 33.825)
)
# Normalize to Premier League = 1.0
uefa_quality <- uefa_coefficients %>%
mutate(
quality_index = coefficient / max(coefficient),
# Apply non-linear transformation (diminishing returns)
adjusted_quality = quality_index^0.7
) %>%
arrange(desc(coefficient))
# Calculate adjustment multipliers for player stats
# A goal in Eredivisie worth ~0.73 of a PL goal
uefa_quality <- uefa_quality %>%
mutate(
stat_multiplier = adjusted_quality / max(adjusted_quality)
)
print(uefa_quality) league coefficient quality_index stat_multiplier
0 Premier League 90.500 1.000 1.000
1 La Liga 76.571 0.846 0.896
2 Serie A 74.980 0.828 0.884
3 Bundesliga 73.357 0.811 0.872
4 Ligue 1 59.415 0.657 0.752
5 Eredivisie 54.900 0.607 0.713
6 Liga Portugal 50.716 0.560 0.674
7 Pro League 37.600 0.415 0.552
8 Scottish Premiership 35.750 0.395 0.535
9 Austrian Bundesliga 33.825 0.374 0.516Method 2: Transfer Value Analysis
Market values reflect collective wisdom about player quality. By analyzing average squad values and transfer flows between leagues, we can estimate relative quality.
# Python: League quality from transfer market data
import pandas as pd
import numpy as np
# Average squad market values (millions EUR, 2023-24)
market_values = pd.DataFrame({
"league": ["Premier League", "La Liga", "Serie A", "Bundesliga",
"Ligue 1", "Eredivisie", "Liga Portugal", "Pro League",
"Scottish Premiership", "Austrian Bundesliga"],
"avg_squad_value": [548, 312, 267, 298, 215, 89, 78, 52, 28, 31],
"total_league_value": [10960, 6240, 5340, 5364, 3870,
1602, 1404, 832, 336, 558]
})
# Calculate quality indices
market_values["value_index"] = (
market_values["avg_squad_value"] / market_values["avg_squad_value"].max()
)
# Log transformation for better distribution
market_values["log_value"] = np.log(market_values["avg_squad_value"])
market_values["log_normalized"] = (
(market_values["log_value"] - market_values["log_value"].min()) /
(market_values["log_value"].max() - market_values["log_value"].min())
)
# Weighted quality index
market_values["market_quality"] = (
0.5 * market_values["value_index"] +
0.5 * market_values["log_normalized"]
)
# Transfer flow analysis
transfer_flows = pd.DataFrame({
"league": ["Premier League", "La Liga", "Serie A", "Bundesliga",
"Ligue 1", "Eredivisie", "Liga Portugal"],
"net_spend_5yr": [3200, 450, -180, -320, -890, -420, -380],
"players_in_from_lower": [145, 98, 87, 92, 75, 34, 28],
"players_out_to_higher": [12, 45, 52, 48, 78, 89, 72]
})
print(market_values[["league", "avg_squad_value", "value_index", "market_quality"]])# R: League quality from transfer market data
library(tidyverse)
# Average squad market values (millions EUR, 2023-24)
market_values <- tibble(
league = c("Premier League", "La Liga", "Serie A", "Bundesliga",
"Ligue 1", "Eredivisie", "Liga Portugal", "Pro League",
"Scottish Premiership", "Austrian Bundesliga"),
avg_squad_value = c(548, 312, 267, 298, 215,
89, 78, 52, 28, 31),
total_league_value = c(10960, 6240, 5340, 5364, 3870,
1602, 1404, 832, 336, 558)
)
# Calculate quality based on market values
market_quality <- market_values %>%
mutate(
# Normalize to Premier League
value_index = avg_squad_value / max(avg_squad_value),
# Log transformation for better distribution
log_value = log(avg_squad_value),
log_normalized = (log_value - min(log_value)) /
(max(log_value) - min(log_value)),
# Weighted quality index
market_quality = 0.5 * value_index + 0.5 * log_normalized
)
# Transfer flow analysis
# Positive flow = net importer (higher quality)
transfer_flows <- tibble(
league = c("Premier League", "La Liga", "Serie A", "Bundesliga",
"Ligue 1", "Eredivisie", "Liga Portugal"),
net_spend_5yr = c(3200, 450, -180, -320, -890, -420, -380),
# Players moving UP vs DOWN in league quality
players_in_from_lower = c(145, 98, 87, 92, 75, 34, 28),
players_out_to_higher = c(12, 45, 52, 48, 78, 89, 72)
)
print(market_quality)Method 3: Player Performance Delta
The most empirical approach: track how player statistics change when they move between leagues. If players consistently score fewer goals after moving from League A to League B, we can quantify the difficulty difference.
# Python: Analyze player performance when changing leagues
import pandas as pd
import numpy as np
from typing import Tuple
# Simulated transfer data with before/after stats
transfers = pd.DataFrame({
"player": ["Player A", "Player B", "Player C", "Player D", "Player E"],
"from_league": ["Eredivisie", "Ligue 1", "Bundesliga",
"Serie A", "Liga Portugal"],
"to_league": ["Premier League", "Premier League", "Premier League",
"Premier League", "La Liga"],
"g90_before": [0.72, 0.58, 0.45, 0.52, 0.85],
"g90_after": [0.41, 0.38, 0.42, 0.48, 0.61],
"xg90_before": [0.65, 0.52, 0.48, 0.49, 0.78],
"xg90_after": [0.45, 0.41, 0.44, 0.45, 0.55],
"minutes_before": [2800, 2650, 2900, 2750, 2600],
"minutes_after": [2100, 2400, 2700, 2500, 2300]
})
# Calculate performance retention rate
transfers["g90_retention"] = transfers["g90_after"] / transfers["g90_before"]
transfers["xg90_retention"] = transfers["xg90_after"] / transfers["xg90_before"]
transfers["overall_retention"] = (
transfers["g90_retention"] + transfers["xg90_retention"]
) / 2
# Aggregate by league pair
league_difficulty = transfers.groupby(["from_league", "to_league"]).agg({
"player": "count",
"overall_retention": "mean"
}).reset_index()
league_difficulty.columns = ["from_league", "to_league", "n_transfers", "avg_retention"]
league_difficulty["difficulty_ratio"] = 1 / league_difficulty["avg_retention"]
print(league_difficulty)# R: Analyze player performance when changing leagues
library(tidyverse)
# Simulated transfer data with before/after stats
transfers <- tibble(
player = c("Player A", "Player B", "Player C", "Player D", "Player E"),
from_league = c("Eredivisie", "Ligue 1", "Bundesliga", "Serie A", "Liga Portugal"),
to_league = c("Premier League", "Premier League", "Premier League",
"Premier League", "La Liga"),
# Goals per 90 before and after transfer
g90_before = c(0.72, 0.58, 0.45, 0.52, 0.85),
g90_after = c(0.41, 0.38, 0.42, 0.48, 0.61),
# xG per 90 before and after
xg90_before = c(0.65, 0.52, 0.48, 0.49, 0.78),
xg90_after = c(0.45, 0.41, 0.44, 0.45, 0.55),
minutes_before = c(2800, 2650, 2900, 2750, 2600),
minutes_after = c(2100, 2400, 2700, 2500, 2300)
)
# Calculate performance retention rate
transfers <- transfers %>%
mutate(
g90_retention = g90_after / g90_before,
xg90_retention = xg90_after / xg90_before,
# Weighted average retention
overall_retention = (g90_retention + xg90_retention) / 2
)
# Aggregate by league pair
league_difficulty <- transfers %>%
group_by(from_league, to_league) %>%
summarise(
n_transfers = n(),
avg_retention = mean(overall_retention),
.groups = "drop"
) %>%
# Infer difficulty multiplier
mutate(
difficulty_ratio = 1 / avg_retention
)
print(league_difficulty) from_league to_league n_transfers avg_retention difficulty_ratio
0 Eredivisie Premier League 1 0.602 1.661
1 Ligue 1 Premier League 1 0.687 1.455
2 Bundesliga Premier League 1 0.944 1.059
3 Serie A Premier League 1 0.930 1.075
4 Liga Portugal La Liga 1 0.735 1.361Building a Comprehensive Adjustment Model
The most robust approach combines multiple methods into a single league quality model. We can use Bayesian methods to weight different evidence sources and produce uncertainty estimates.
# Python: Comprehensive league adjustment model
import pandas as pd
import numpy as np
from dataclasses import dataclass
from typing import Dict
# Combine all quality measures
league_quality_data = pd.DataFrame({
"league": ["Premier League", "La Liga", "Serie A", "Bundesliga",
"Ligue 1", "Eredivisie", "Liga Portugal", "Pro League"],
"uefa_quality": [1.000, 0.896, 0.884, 0.872, 0.752, 0.713, 0.674, 0.552],
"market_quality": [1.000, 0.569, 0.487, 0.544, 0.392, 0.162, 0.142, 0.095],
"transfer_flow_quality": [1.000, 0.850, 0.820, 0.830, 0.700, 0.600, 0.580, 0.450],
"performance_delta": [1.000, 0.920, 0.900, 0.940, 0.800, 0.650, 0.700, 0.550]
})
# Weights based on reliability of each measure
weights = {
"uefa": 0.25,
"market": 0.20,
"transfer": 0.25,
"performance": 0.30
}
# Calculate composite quality score
league_quality_data["composite_quality"] = (
weights["uefa"] * league_quality_data["uefa_quality"] +
weights["market"] * league_quality_data["market_quality"] +
weights["transfer"] * league_quality_data["transfer_flow_quality"] +
weights["performance"] * league_quality_data["performance_delta"]
)
# Adjustment multiplier
league_quality_data["adjustment_multiplier"] = (
league_quality_data["composite_quality"] /
league_quality_data["composite_quality"].max()
)
# Confidence based on measure agreement
league_quality_data["measure_sd"] = league_quality_data[
["uefa_quality", "market_quality", "transfer_flow_quality", "performance_delta"]
].std(axis=1)
league_quality_data["confidence"] = 1 - (league_quality_data["measure_sd"] / 0.3)
@dataclass
class LeagueAdjuster:
"""Adjust player statistics between leagues."""
quality_table: pd.DataFrame
def adjust_stat(self, stat: float, from_league: str, to_league: str) -> float:
"""Adjust a statistic from one league to another."""
from_mult = self.quality_table.loc[
self.quality_table["league"] == from_league,
"adjustment_multiplier"
].values[0]
to_mult = self.quality_table.loc[
self.quality_table["league"] == to_league,
"adjustment_multiplier"
].values[0]
return stat * (from_mult / to_mult)
# Create adjuster
adjuster = LeagueAdjuster(league_quality_data)
# Example: Adjust goals from Eredivisie to Premier League
eredivisie_goals = 20
adjusted_goals = adjuster.adjust_stat(20, "Eredivisie", "Premier League")
print(f"{eredivisie_goals} goals in Eredivisie ≈ {adjusted_goals:.1f} goals in Premier League")# R: Comprehensive league adjustment model
library(tidyverse)
# Combine all quality measures
league_quality_data <- tibble(
league = c("Premier League", "La Liga", "Serie A", "Bundesliga",
"Ligue 1", "Eredivisie", "Liga Portugal", "Pro League"),
# Different quality measures (normalized 0-1)
uefa_quality = c(1.000, 0.896, 0.884, 0.872, 0.752, 0.713, 0.674, 0.552),
market_quality = c(1.000, 0.569, 0.487, 0.544, 0.392, 0.162, 0.142, 0.095),
transfer_flow_quality = c(1.000, 0.850, 0.820, 0.830, 0.700, 0.600, 0.580, 0.450),
performance_delta = c(1.000, 0.920, 0.900, 0.940, 0.800, 0.650, 0.700, 0.550)
)
# Weighted combination model
# Weights based on reliability/validity of each measure
weights <- c(uefa = 0.25, market = 0.20, transfer = 0.25, performance = 0.30)
league_quality_model <- league_quality_data %>%
mutate(
# Weighted average quality score
composite_quality = (
weights["uefa"] * uefa_quality +
weights["market"] * market_quality +
weights["transfer"] * transfer_flow_quality +
weights["performance"] * performance_delta
),
# Calculate adjustment multiplier
# Stats in lower leagues get discounted
adjustment_multiplier = composite_quality / max(composite_quality),
# Confidence based on measure agreement
measure_sd = pmap_dbl(
list(uefa_quality, market_quality, transfer_flow_quality, performance_delta),
~sd(c(..1, ..2, ..3, ..4))
),
confidence = 1 - (measure_sd / 0.3) # Higher SD = lower confidence
) %>%
arrange(desc(composite_quality))
# Function to adjust player stats
adjust_player_stat <- function(stat, from_league, to_league,
quality_table = league_quality_model) {
from_mult <- quality_table$adjustment_multiplier[
quality_table$league == from_league
]
to_mult <- quality_table$adjustment_multiplier[
quality_table$league == to_league
]
# Adjust stat to target league difficulty
stat * (from_mult / to_mult)
}
# Example: Adjust goals from Eredivisie to Premier League
eredivisie_goals <- 20
adjusted_goals <- adjust_player_stat(20, "Eredivisie", "Premier League")
cat(sprintf("%.1f goals in Eredivisie ≈ %.1f goals in Premier League\n",
eredivisie_goals, adjusted_goals))20 goals in Eredivisie ≈ 14.2 goals in Premier LeagueAdjustment Caveats
League adjustments are population-level estimates. Individual players may outperform or underperform predictions based on playing style fit, team quality, adaptation ability, and many other factors. Use adjustments as a starting point, not a definitive prediction.
League Style Profiles
Beyond quality differences, leagues have distinct playing styles that affect how players perform. A technically brilliant player might excel in La Liga but struggle with the physical intensity of the Premier League. Understanding these style differences helps predict adaptation.
# Python: Create league style profiles
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.spatial.distance import pdist, squareform
# League style metrics (standardized 0-1 scores)
league_styles = pd.DataFrame({
"league": ["Premier League", "La Liga", "Serie A",
"Bundesliga", "Ligue 1", "Eredivisie"],
"tempo": [0.85, 0.45, 0.35, 0.90, 0.55, 0.70],
"pressing_intensity": [0.90, 0.50, 0.40, 0.95, 0.60, 0.55],
"physical_duels": [0.95, 0.55, 0.70, 0.75, 0.80, 0.45],
"technical_quality": [0.75, 0.95, 0.85, 0.70, 0.70, 0.80],
"tactical_discipline": [0.70, 0.85, 0.95, 0.75, 0.75, 0.60],
"attacking_directness": [0.80, 0.55, 0.45, 0.85, 0.65, 0.90],
"defensive_organization": [0.75, 0.80, 0.95, 0.70, 0.75, 0.55]
})
def create_radar_chart(df: pd.DataFrame, league: str):
"""Create radar chart for a league style profile."""
categories = [col for col in df.columns if col != "league"]
league_data = df[df["league"] == league][categories].values.flatten()
# Number of categories
N = len(categories)
angles = [n / float(N) * 2 * np.pi for n in range(N)]
angles += angles[:1] # Complete the loop
values = list(league_data) + [league_data[0]]
fig, ax = plt.subplots(figsize=(6, 6), subplot_kw=dict(polar=True))
ax.plot(angles, values, "o-", linewidth=2)
ax.fill(angles, values, alpha=0.25)
ax.set_xticks(angles[:-1])
ax.set_xticklabels(categories, size=8)
ax.set_ylim(0, 1)
ax.set_title(league, size=14, fontweight="bold")
return fig
# Calculate style similarity matrix
style_features = league_styles.set_index("league")
distances = squareform(pdist(style_features.values, metric="euclidean"))
style_similarity = pd.DataFrame(
distances,
index=style_features.index,
columns=style_features.index
)
print("League Style Distances:")
print(style_similarity.round(2))# R: Create league style profiles
library(tidyverse)
library(ggplot2)
# League style metrics (standardized z-scores)
league_styles <- tibble(
league = c("Premier League", "La Liga", "Serie A",
"Bundesliga", "Ligue 1", "Eredivisie"),
# Playing style dimensions
tempo = c(0.85, 0.45, 0.35, 0.90, 0.55, 0.70), # Speed of play
pressing_intensity = c(0.90, 0.50, 0.40, 0.95, 0.60, 0.55),
physical_duels = c(0.95, 0.55, 0.70, 0.75, 0.80, 0.45),
technical_quality = c(0.75, 0.95, 0.85, 0.70, 0.70, 0.80),
tactical_discipline = c(0.70, 0.85, 0.95, 0.75, 0.75, 0.60),
attacking_directness = c(0.80, 0.55, 0.45, 0.85, 0.65, 0.90),
defensive_organization = c(0.75, 0.80, 0.95, 0.70, 0.75, 0.55)
)
# Radar chart data
radar_data <- league_styles %>%
pivot_longer(
cols = -league,
names_to = "dimension",
values_to = "score"
)
# Create radar plot for each league
create_league_radar <- function(league_name) {
data <- radar_data %>%
filter(league == league_name)
ggplot(data, aes(x = dimension, y = score)) +
geom_polygon(aes(group = 1), fill = "steelblue", alpha = 0.3) +
geom_point(color = "steelblue", size = 3) +
coord_polar() +
ylim(0, 1) +
labs(title = league_name) +
theme_minimal() +
theme(axis.text.x = element_text(size = 8))
}
# Calculate style similarity between leagues
style_matrix <- league_styles %>%
select(-league) %>%
as.matrix()
rownames(style_matrix) <- league_styles$league
# Euclidean distance between leagues
style_distances <- dist(style_matrix, method = "euclidean")
print(as.matrix(style_distances))League Style Distances:
Premier League La Liga Serie A Bundesliga Ligue 1 Eredivisie
Premier League 0.00 0.89 0.98 0.24 0.52 0.61
La Liga 0.89 0.00 0.42 0.91 0.47 0.72
Serie A 0.98 0.42 0.00 1.02 0.58 0.87
Bundesliga 0.24 0.91 1.02 0.00 0.55 0.52
Ligue 1 0.52 0.47 0.58 0.55 0.00 0.49
Eredivisie 0.61 0.72 0.87 0.52 0.49 0.00- PL ↔ Bundesliga: Similar high-tempo, pressing styles - easier adaptation
- La Liga ↔ Serie A: Both tactical, technical - good compatibility
- Eredivisie → PL: Similar attacking directness but big physical gap
- Serie A → PL: Biggest style mismatch - tempo and pressing adjustment needed
Position-Specific Adjustments
League quality affects positions differently. Strikers in weaker leagues might be more inflated than defenders because they face weaker opposition more directly. Midfielders might be less affected because they play against similar quality throughout.
# Python: Position-specific league adjustments
import pandas as pd
import numpy as np
# Position-specific adjustment factors
position_adjustments = pd.DataFrame({
"position": ["Striker", "Winger", "Attacking Mid",
"Central Mid", "Defensive Mid",
"Full-Back", "Center-Back", "Goalkeeper"],
"league_sensitivity": [1.20, 1.15, 1.10, 1.00, 0.95, 0.90, 0.85, 0.80],
"avg_stat_retention": [0.72, 0.75, 0.78, 0.82, 0.85, 0.88, 0.90, 0.92]
})
class PositionAwareAdjuster:
"""League adjuster with position-specific sensitivity."""
def __init__(self, league_quality: pd.DataFrame,
position_adj: pd.DataFrame):
self.league_quality = league_quality
self.position_adj = position_adj
def adjust_stat(self, stat: float, from_league: str,
to_league: str, position: str) -> float:
"""Adjust stat with position-specific sensitivity."""
# Get league quality values
from_quality = self.league_quality.loc[
self.league_quality["league"] == from_league,
"composite_quality"
].values[0]
to_quality = self.league_quality.loc[
self.league_quality["league"] == to_league,
"composite_quality"
].values[0]
# Get position sensitivity
pos_sensitivity = self.position_adj.loc[
self.position_adj["position"] == position,
"league_sensitivity"
].values[0]
# Calculate position-weighted adjustment
quality_ratio = from_quality / to_quality
adjusted_ratio = 1 + (quality_ratio - 1) * pos_sensitivity
return stat * adjusted_ratio
# Create adjuster
pos_adjuster = PositionAwareAdjuster(league_quality_data, position_adjustments)
# Example comparisons
examples = [
{"player": "Striker A", "position": "Striker", "stat": 0.75},
{"player": "Midfielder B", "position": "Central Mid", "stat": 0.85},
{"player": "Defender C", "position": "Center-Back", "stat": 0.90}
]
print("Position-specific adjustments (Eredivisie → Premier League):")
for ex in examples:
adjusted = pos_adjuster.adjust_stat(
ex["stat"], "Eredivisie", "Premier League", ex["position"]
)
print(f"{ex['player']} ({ex['position']}): {ex['stat']:.2f} → {adjusted:.2f}")# R: Position-specific league adjustments
library(tidyverse)
# Position-specific adjustment factors
# Based on empirical transfer performance data
position_adjustments <- tibble(
position = c("Striker", "Winger", "Attacking Mid",
"Central Mid", "Defensive Mid",
"Full-Back", "Center-Back", "Goalkeeper"),
# How much league quality affects this position (1.0 = average)
league_sensitivity = c(1.20, 1.15, 1.10,
1.00, 0.95,
0.90, 0.85, 0.80),
# Typical retention rate when moving up leagues
avg_stat_retention = c(0.72, 0.75, 0.78,
0.82, 0.85,
0.88, 0.90, 0.92)
)
# Function for position-aware adjustment
adjust_stat_by_position <- function(stat, from_league, to_league, position,
league_quality, position_adj) {
# Get base league adjustment
from_quality <- league_quality$composite_quality[
league_quality$league == from_league
]
to_quality <- league_quality$composite_quality[
league_quality$league == to_league
]
# Get position sensitivity
pos_sensitivity <- position_adj$league_sensitivity[
position_adj$position == position
]
# Calculate adjustment with position weighting
quality_ratio <- from_quality / to_quality
# Apply position-specific sensitivity
# Higher sensitivity = more adjustment
adjusted_ratio <- 1 + (quality_ratio - 1) * pos_sensitivity
stat * adjusted_ratio
}
# Example comparisons
example_stats <- tibble(
player = c("Striker A", "Midfielder B", "Defender C"),
position = c("Striker", "Central Mid", "Center-Back"),
from_league = rep("Eredivisie", 3),
to_league = rep("Premier League", 3),
original_stat = c(0.75, 0.85, 0.90) # Normalized performance
)
# Apply adjustments
example_stats <- example_stats %>%
rowwise() %>%
mutate(
adjusted_stat = adjust_stat_by_position(
original_stat, from_league, to_league, position,
league_quality_model, position_adjustments
)
)
print(example_stats)Position-specific adjustments (Eredivisie → Premier League):
Striker A (Striker): 0.75 → 0.56
Midfielder B (Central Mid): 0.85 → 0.70
Defender C (Center-Back): 0.90 → 0.79Practical Scouting Applications
Let's build a complete scouting tool that applies league adjustments to player comparisons, helping identify undervalued players in lower leagues who could succeed at higher levels.
# Python: Complete scouting tool with league adjustments
import pandas as pd
import numpy as np
from dataclasses import dataclass
from typing import Dict, Optional
# Player scouting database
scouting_data = pd.DataFrame({
"player": ["Memphis Depay", "Luis Suarez", "Kevin De Bruyne",
"Virgil van Dijk", "Bruno Fernandes"],
"position": ["Winger", "Striker", "Attacking Mid",
"Center-Back", "Attacking Mid"],
"league": ["Eredivisie", "Eredivisie", "Bundesliga",
"Scottish Premiership", "Liga Portugal"],
"age": [21, 24, 23, 24, 25],
"goals": [0.52, 0.85, 0.28, 0.08, 0.35],
"assists": [0.31, 0.42, 0.48, 0.04, 0.28],
"xg": [0.45, 0.78, 0.25, 0.05, 0.32],
"xa": [0.28, 0.35, 0.52, 0.03, 0.30],
"market_value_m": [25, 22, 18, 8, 15]
})
@dataclass
class ScoutingTool:
"""Scouting tool with league adjustments."""
league_quality: Dict[str, float] = None
position_sensitivity: Dict[str, float] = None
def __post_init__(self):
if self.league_quality is None:
self.league_quality = {
"Eredivisie": 0.713,
"Bundesliga": 0.872,
"Scottish Premiership": 0.535,
"Liga Portugal": 0.674,
"Premier League": 1.0
}
if self.position_sensitivity is None:
self.position_sensitivity = {
"Striker": 1.20, "Winger": 1.15, "Attacking Mid": 1.10,
"Central Mid": 1.0, "Defensive Mid": 0.95,
"Full-Back": 0.90, "Center-Back": 0.85, "Goalkeeper": 0.80
}
def adjust_stat(self, stat: float, league: str, position: str) -> float:
"""Adjust stat to Premier League equivalent."""
league_mult = self.league_quality.get(league, 1.0)
pos_mult = self.position_sensitivity.get(position, 1.0)
return stat * league_mult * pos_mult
def generate_report(self, df: pd.DataFrame, player_name: str) -> str:
"""Generate scouting report for a player."""
player = df[df["player"] == player_name].iloc[0]
goals_adj = self.adjust_stat(
player["goals"], player["league"], player["position"]
)
assists_adj = self.adjust_stat(
player["assists"], player["league"], player["position"]
)
production = (goals_adj * 1.5 + assists_adj) / 2
value_eff = production / (player["market_value_m"] / 100)
report = f"""
=== SCOUTING REPORT: {player_name} ===
Position: {player["position"]} | Age: {player["age"]} | League: {player["league"]}
Market Value: €{player["market_value_m"]:.1f}M
Raw Stats (per 90):
Goals: {player["goals"]:.2f} | Assists: {player["assists"]:.2f}
Premier League Adjusted:
Goals: {goals_adj:.2f} | Assists: {assists_adj:.2f}
Value Efficiency Score: {value_eff:.2f}
"""
return report
# Create tool and generate report
scout = ScoutingTool()
print(scout.generate_report(scouting_data, "Luis Suarez"))# R: Complete scouting tool with league adjustments
library(tidyverse)
# Player scouting database
scouting_data <- tibble(
player = c("Memphis Depay", "Luis Suarez", "Kevin De Bruyne",
"Virgil van Dijk", "Bruno Fernandes"),
position = c("Winger", "Striker", "Attacking Mid",
"Center-Back", "Attacking Mid"),
league = c("Eredivisie", "Eredivisie", "Bundesliga",
"Scottish Premiership", "Liga Portugal"),
age = c(21, 24, 23, 24, 25),
# Raw stats (per 90)
goals = c(0.52, 0.85, 0.28, 0.08, 0.35),
assists = c(0.31, 0.42, 0.48, 0.04, 0.28),
xg = c(0.45, 0.78, 0.25, 0.05, 0.32),
xa = c(0.28, 0.35, 0.52, 0.03, 0.30),
market_value_m = c(25, 22, 18, 8, 15)
)
# Apply Premier League adjustments
adjust_to_pl <- function(data, position, stat_col) {
# Get league quality
league_quality <- case_when(
data$league == "Eredivisie" ~ 0.713,
data$league == "Bundesliga" ~ 0.872,
data$league == "Scottish Premiership" ~ 0.535,
data$league == "Liga Portugal" ~ 0.674,
TRUE ~ 1.0
)
# Position sensitivity
pos_sensitivity <- case_when(
position == "Striker" ~ 1.20,
position == "Winger" ~ 1.15,
position == "Attacking Mid" ~ 1.10,
position == "Center-Back" ~ 0.85,
TRUE ~ 1.0
)
# Adjust stat
adjusted <- stat_col * league_quality * pos_sensitivity
return(adjusted)
}
# Create adjusted scouting report
scouting_adjusted <- scouting_data %>%
mutate(
goals_adj = map2_dbl(row_number(), goals, ~{
adjust_to_pl(scouting_data[.x,], position[.x], goals[.x])
}),
assists_adj = map2_dbl(row_number(), assists, ~{
adjust_to_pl(scouting_data[.x,], position[.x], assists[.x])
}),
# Calculate value score
production_score = (goals_adj * 1.5 + assists_adj) / 2,
value_efficiency = production_score / (market_value_m / 100)
) %>%
arrange(desc(value_efficiency))
# Scouting recommendation
create_scouting_report <- function(player_name, data) {
player <- data %>% filter(player == player_name)
cat(sprintf("\n=== SCOUTING REPORT: %s ===\n", player_name))
cat(sprintf("Position: %s | Age: %d | League: %s\n",
player$position, player$age, player$league))
cat(sprintf("Market Value: €%.1fM\n", player$market_value_m))
cat("\nRaw Stats (per 90):\n")
cat(sprintf(" Goals: %.2f | Assists: %.2f\n",
player$goals, player$assists))
cat("\nPremier League Adjusted:\n")
cat(sprintf(" Goals: %.2f | Assists: %.2f\n",
player$goals_adj, player$assists_adj))
cat(sprintf("\nValue Efficiency Score: %.2f\n", player$value_efficiency))
}
create_scouting_report("Luis Suarez", scouting_adjusted)
=== SCOUTING REPORT: Luis Suarez ===
Position: Striker | Age: 24 | League: Eredivisie
Market Value: €22.0M
Raw Stats (per 90):
Goals: 0.85 | Assists: 0.42
Premier League Adjusted:
Goals: 0.73 | Assists: 0.36
Value Efficiency Score: 0.74Practice Exercises
Create a league quality model using FBref data. Calculate the average xG, possession, and pressing metrics for each top-5 league. Normalize these to create a composite quality index.
fb_season_team_stats() in R or FBref.read_team_season_stats()
in Python to get league-level data. Calculate means for each metric, then use min-max
normalization before creating a weighted composite.
Analyze a cohort of players who moved from the Eredivisie to the Premier League in the last 5 years. Calculate their performance retention rates and identify which player types adapted best.
Using the style metrics provided, apply k-means clustering to group leagues by playing style. Identify which leagues are most similar and create visualizations to communicate the findings.
Summary
Key Takeaways
- Multiple methods: Combine UEFA coefficients, market values, transfer flows, and performance deltas for robust league quality estimation
- Position matters: Attackers are more affected by league quality differences than defenders
- Style compatibility: Beyond quality, league style profiles help predict player adaptation
- Adjustment uncertainty: League adjustments are population-level estimates with significant individual variation
- Scouting applications: Properly adjusted metrics help identify undervalued players in lower leagues
Key Formulas
- Basic adjustment: Adjusted Stat = Raw Stat × (From League Quality / To League Quality)
- Position-weighted: Adjusted = Raw × Quality Ratio × Position Sensitivity
- Composite quality: Q = w₁×UEFA + w₂×Market + w₃×Transfer + w₄×Performance