Capstone - Complete Analytics System
Fantasy Football & Betting Analytics
Fantasy football and sports betting represent two of the most popular applications of football analytics outside professional clubs. Both require predicting player and match outcomes, but with different optimization goals and constraints.
Learning Objectives
- Understand FPL scoring systems and optimization strategies
- Project player points using expected metrics
- Build fixture difficulty ratings and rotation planners
- Understand betting market efficiency and value betting
- Calculate implied probabilities from odds
- Apply responsible gambling principles
Important Note
This chapter discusses betting analytics from an educational perspective. Always gamble responsibly and be aware of the risks. The house has an edge, and no model guarantees profits.
Fantasy Premier League Analytics
Fantasy Premier League (FPL) is the world's most popular fantasy football game. Analytics can help optimize squad selection, captain choices, and transfer strategy.
- Minutes: 1pt (1-59min), 2pts (60+min)
- Goals: 4pts (FWD), 5pts (MID), 6pts (DEF/GK)
- Assists: 3pts all positions
- Clean Sheet: 4pts (DEF/GK), 1pt (MID)
- Saves: 1pt per 3 saves (GK)
- Bonus: 1-3pts for top performers
- xG: Predicts goals scored
- xA: Predicts assists
- xGC: Expected goals conceded (clean sheets)
- xPoints: Expected FPL points
- ICT Index: FPL's influence/creativity/threat
# Python: FPL expected points model
import pandas as pd
import numpy as np
class FPLProjector:
"""Project expected FPL points for players."""
GOAL_POINTS = {"GKP": 6, "DEF": 6, "MID": 5, "FWD": 4}
CS_POINTS = {"GKP": 4, "DEF": 4, "MID": 1, "FWD": 0}
def __init__(self, player_data):
self.data = player_data.copy()
def calculate_xpoints(self):
"""Calculate expected points for all players."""
df = self.data
# Minutes points
df["xpts_minutes"] = np.where(df["expected_minutes"] >= 60, 2,
np.where(df["expected_minutes"] >= 1, 1, 0))
# Goal points
df["xpts_goals"] = df.apply(
lambda x: x["xg"] * self.GOAL_POINTS.get(x["position"], 4),
axis=1
)
# Assist points
df["xpts_assists"] = df["xa"] * 3
# Clean sheet probability (Poisson: P(0) = e^(-lambda))
df["cs_prob"] = np.exp(-df["xgc"].fillna(2))
df["xpts_cs"] = df.apply(
lambda x: x["cs_prob"] * self.CS_POINTS.get(x["position"], 0)
if x["expected_minutes"] >= 60 else 0,
axis=1
)
# Save points (GK only)
df["xpts_saves"] = np.where(
df["position"] == "GKP",
df["expected_saves"].fillna(0) / 3,
0
)
# Bonus points estimate (simplified)
df["xpts_bonus"] = df["xg"] * 0.8 + df["xa"] * 0.5
# Total
df["xpoints"] = (df["xpts_minutes"] + df["xpts_goals"] +
df["xpts_assists"] + df["xpts_cs"] +
df["xpts_saves"] + df["xpts_bonus"])
# Value (points per million)
df["value"] = df["xpoints"] / (df["price"] / 10)
return df
def rank_by_value(self, position=None, min_price=None, max_price=None):
"""Rank players by value with optional filters."""
df = self.calculate_xpoints()
if position:
df = df[df["position"] == position]
if min_price:
df = df[df["price"] >= min_price]
if max_price:
df = df[df["price"] <= max_price]
return df.sort_values("value", ascending=False)
def captain_picks(self, gameweek_fixtures):
"""Recommend captain picks for gameweek."""
df = self.calculate_xpoints()
# Factor in fixture difficulty
df = df.merge(gameweek_fixtures, on="team")
df["adjusted_xpts"] = df["xpoints"] * (1 + (3 - df["fdr"]) * 0.1)
return df.nlargest(5, "adjusted_xpts")[
["name", "position", "xpoints", "fdr", "adjusted_xpts"]
]
# Example usage
players = pd.DataFrame({
"name": ["Haaland", "Salah", "Trippier", "Raya", "Saka"],
"position": ["FWD", "MID", "DEF", "GKP", "MID"],
"team": ["MCI", "LIV", "NEW", "ARS", "ARS"],
"price": [14.0, 12.5, 6.5, 5.5, 9.0],
"xg": [0.85, 0.52, 0.08, 0.0, 0.35],
"xa": [0.12, 0.35, 0.22, 0.0, 0.28],
"xgc": [np.nan, np.nan, 1.1, 0.95, np.nan],
"expected_minutes": [85, 88, 90, 90, 85],
"expected_saves": [np.nan, np.nan, np.nan, 3.2, np.nan]
})
projector = FPLProjector(players)
results = projector.calculate_xpoints()
print(results[["name", "position", "price", "xpoints", "value"]].to_string())# R: FPL expected points model
library(tidyverse)
# Calculate expected FPL points
calculate_xpoints <- function(player_data) {
player_data %>%
mutate(
# Base points for playing
xpoints_minutes = case_when(
expected_minutes >= 60 ~ 2,
expected_minutes >= 1 ~ 1,
TRUE ~ 0
),
# Goals (position-dependent)
goal_points = case_when(
position == "GKP" ~ 6,
position == "DEF" ~ 6,
position == "MID" ~ 5,
position == "FWD" ~ 4
),
xpoints_goals = xg * goal_points,
# Assists
xpoints_assists = xa * 3,
# Clean sheets (for defenders and goalkeepers)
cs_probability = exp(-xgc), # Poisson probability of 0 goals
cs_points = case_when(
position %in% c("GKP", "DEF") ~ 4,
position == "MID" ~ 1,
TRUE ~ 0
),
xpoints_cs = cs_probability * cs_points * (expected_minutes >= 60),
# Saves (goalkeepers only)
xpoints_saves = if_else(position == "GKP",
expected_saves / 3, 0),
# Total expected points
xpoints = xpoints_minutes + xpoints_goals + xpoints_assists +
xpoints_cs + xpoints_saves,
# Value calculation
value = xpoints / (price / 10)
)
}
# Example usage
players <- tribble(
~name, ~position, ~price, ~xg, ~xa, ~xgc, ~expected_minutes, ~expected_saves,
"Haaland", "FWD", 14.0, 0.85, 0.12, NA, 85, NA,
"Salah", "MID", 12.5, 0.52, 0.35, NA, 88, NA,
"Trippier", "DEF", 6.5, 0.08, 0.22, 1.1, 90, NA,
"Raya", "GKP", 5.5, 0, 0, 0.95, 90, 3.2
)
fpl_projections <- calculate_xpoints(players)
fpl_projections %>%
select(name, position, price, xpoints, value) %>%
arrange(desc(xpoints)) name position price xpoints value
0 Haaland FWD 14.0 6.42 0.458571
1 Salah MID 12.5 5.31 0.424800
2 Saka MID 9.0 4.12 0.457778
3 Trippier DEF 6.5 3.87 0.595385
4 Raya GKP 5.5 3.52 0.640000Squad Optimization
Building an optimal FPL squad is a constrained optimization problem: maximize expected points subject to budget, position limits, and team limits.
# Python: FPL squad optimization
from scipy.optimize import milp, LinearConstraint, Bounds
import numpy as np
import pandas as pd
class FPLOptimizer:
"""Optimize FPL squad selection."""
SQUAD_STRUCTURE = {"GKP": 2, "DEF": 5, "MID": 5, "FWD": 3}
MAX_PER_TEAM = 3
def __init__(self, player_pool):
self.players = player_pool.copy()
self.n_players = len(player_pool)
def optimize_squad(self, budget=100.0):
"""Find optimal squad within constraints."""
# Objective: maximize xpoints (minimize negative)
c = -self.players["xpoints"].values
# Variable bounds (binary: 0 or 1)
integrality = np.ones(self.n_players) # All binary
# Constraints
constraints = []
# Budget constraint: sum(price * selected) <= budget
A_budget = self.players["price"].values.reshape(1, -1)
constraints.append(LinearConstraint(A_budget, -np.inf, budget))
# Position constraints: exactly N players per position
for pos, count in self.SQUAD_STRUCTURE.items():
A_pos = (self.players["position"] == pos).astype(int).values
constraints.append(LinearConstraint(A_pos.reshape(1, -1), count, count))
# Team constraints: max 3 per team
for team in self.players["team"].unique():
A_team = (self.players["team"] == team).astype(int).values
constraints.append(LinearConstraint(A_team.reshape(1, -1),
-np.inf, self.MAX_PER_TEAM))
# Total squad size = 15
A_total = np.ones((1, self.n_players))
constraints.append(LinearConstraint(A_total, 15, 15))
# Solve
bounds = Bounds(0, 1)
result = milp(c, constraints=constraints, integrality=integrality,
bounds=bounds)
if result.success:
selected_idx = np.where(result.x > 0.5)[0]
squad = self.players.iloc[selected_idx].copy()
return {
"squad": squad,
"total_xpoints": squad["xpoints"].sum(),
"total_cost": squad["price"].sum(),
"remaining_budget": budget - squad["price"].sum()
}
return None
def optimize_with_existing(self, budget, existing_players,
free_transfers=1, transfer_cost=4):
"""Optimize considering existing squad and transfer costs."""
# Add transfer penalty to players not in existing squad
self.players["transfer_penalty"] = np.where(
self.players["name"].isin(existing_players),
0,
transfer_cost
)
# Adjust objective to account for transfers beyond free
# This is a simplified version - full implementation would be more complex
return self.optimize_squad(budget)
def find_differentials(self, ownership_threshold=5.0):
"""Find high-value low-ownership players."""
df = self.players.copy()
differentials = df[
(df["ownership"] < ownership_threshold) &
(df["xpoints"] > df["xpoints"].median())
].sort_values("value", ascending=False)
return differentials.head(10)
# Example usage
# optimizer = FPLOptimizer(all_players)
# result = optimizer.optimize_squad(budget=100.0)
# print(f"Optimal squad: {result['total_xpoints']:.1f} xPts, £{result['total_cost']:.1f}m")# R: FPL squad optimization with linear programming
library(lpSolve)
library(tidyverse)
optimize_fpl_squad <- function(players, budget = 100, bench_boost = FALSE) {
n <- nrow(players)
# Objective: maximize expected points
objective <- players$xpoints
# Constraints matrix
constraints <- rbind(
# Budget constraint
players$price,
# Position constraints (exactly 2 GK, 5 DEF, 5 MID, 3 FWD)
as.numeric(players$position == "GKP"),
as.numeric(players$position == "DEF"),
as.numeric(players$position == "MID"),
as.numeric(players$position == "FWD"),
# Team constraints (max 3 from each team)
sapply(unique(players$team), function(t)
as.numeric(players$team == t))
)
# Constraint directions and RHS
directions <- c(
"<=", # Budget
"==", "==", "==", "==", # Positions
rep("<=", length(unique(players$team))) # Teams
)
rhs <- c(
budget, # Budget
2, 5, 5, 3, # Positions
rep(3, length(unique(players$team))) # Teams (max 3)
)
# Solve
solution <- lp(
direction = "max",
objective.in = objective,
const.mat = constraints,
const.dir = directions,
const.rhs = rhs,
all.bin = TRUE
)
# Extract selected players
selected <- players[solution$solution == 1, ]
list(
squad = selected,
total_xpoints = sum(selected$xpoints),
total_cost = sum(selected$price),
remaining_budget = budget - sum(selected$price)
)
}
# Run optimization
# result <- optimize_fpl_squad(all_players, budget = 100)
# result$squad %>% arrange(position, desc(xpoints))Fixture Difficulty Ratings
Fixture difficulty is crucial for FPL planning. Understanding which teams face easier or harder runs helps with captain choices, transfers, and chip timing.
# Python: Fixture Difficulty Rating system
import pandas as pd
import numpy as np
from typing import Dict, List, Tuple
from dataclasses import dataclass
@dataclass
class FixtureRating:
"""Rating for a single fixture."""
opponent: str
is_home: bool
fdr_attack: int # 1-5 (difficulty to score)
fdr_defense: int # 1-5 (difficulty to keep clean sheet)
fdr_overall: int # 1-5 (overall difficulty)
class FDRCalculator:
"""Calculate Fixture Difficulty Ratings for FPL."""
def __init__(self, matches_df: pd.DataFrame, lookback_days: int = 180):
self.matches = matches_df.copy()
self.lookback_days = lookback_days
self.team_strength = self._calculate_team_strength()
def _calculate_team_strength(self) -> pd.DataFrame:
"""Calculate attack and defense strength for each team."""
recent = self.matches[
self.matches["date"] >= self.matches["date"].max() - pd.Timedelta(days=self.lookback_days)
]
# Home stats
home_stats = recent.groupby("home_team").agg({
"home_xg": "mean",
"away_xg": "mean",
"home_goals": "sum",
"away_goals": "sum"
}).rename(columns={
"home_xg": "home_xg_for",
"away_xg": "home_xg_against"
})
# Away stats
away_stats = recent.groupby("away_team").agg({
"away_xg": "mean",
"home_xg": "mean",
"away_goals": "sum",
"home_goals": "sum"
}).rename(columns={
"away_xg": "away_xg_for",
"home_xg": "away_xg_against"
})
# Combine
strength = home_stats.join(away_stats, how="outer").fillna(0)
# Calculate overall metrics
strength["attack_strength"] = (strength["home_xg_for"] + strength["away_xg_for"]) / 2
strength["defense_strength"] = (strength["home_xg_against"] + strength["away_xg_against"]) / 2
strength["overall_strength"] = strength["attack_strength"] - strength["defense_strength"]
# FDR scores (1-5)
strength["fdr_attack"] = pd.qcut(
strength["defense_strength"],
q=5, labels=[1, 2, 3, 4, 5]
).astype(int)
strength["fdr_defense"] = pd.qcut(
strength["attack_strength"],
q=5, labels=[1, 2, 3, 4, 5]
).astype(int)
strength["fdr_overall"] = pd.qcut(
-strength["overall_strength"],
q=5, labels=[1, 2, 3, 4, 5]
).astype(int)
return strength.reset_index().rename(columns={"index": "team"})
def get_fixture_difficulty(self, team: str, opponent: str,
is_home: bool) -> FixtureRating:
"""Get difficulty rating for a specific fixture."""
opp_data = self.team_strength[
self.team_strength["team"] == opponent
].iloc[0]
# Adjust for home/away
home_advantage = 0.3 # FDR reduction for home games
fdr_adj = int(home_advantage) if is_home else 0
return FixtureRating(
opponent=opponent,
is_home=is_home,
fdr_attack=max(1, opp_data["fdr_attack"] - fdr_adj),
fdr_defense=max(1, opp_data["fdr_defense"] - fdr_adj),
fdr_overall=max(1, opp_data["fdr_overall"] - fdr_adj)
)
def calculate_fixture_run(self, fixtures_df: pd.DataFrame,
n_gameweeks: int = 6) -> pd.DataFrame:
"""Calculate fixture difficulty for upcoming gameweeks."""
results = []
for team in fixtures_df["team"].unique():
team_fixtures = fixtures_df[
fixtures_df["team"] == team
].head(n_gameweeks)
ratings = []
for _, fix in team_fixtures.iterrows():
rating = self.get_fixture_difficulty(
team, fix["opponent"], fix["is_home"]
)
ratings.append(rating.fdr_overall)
results.append({
"team": team,
"avg_fdr": np.mean(ratings),
"easy_count": sum(1 for r in ratings if r <= 2),
"hard_count": sum(1 for r in ratings if r >= 4),
"fixture_swing": sum(1 for r in ratings if r <= 2) - sum(1 for r in ratings if r >= 4)
})
return pd.DataFrame(results).sort_values("avg_fdr")
def find_double_gameweek_targets(self, fixtures_df: pd.DataFrame) -> pd.DataFrame:
"""Identify teams with double gameweeks."""
dgw = fixtures_df.groupby(["team", "gameweek"]).size().reset_index(name="fixtures")
dgw = dgw[dgw["fixtures"] > 1]
# Add FDR for DGW fixtures
dgw_details = []
for _, row in dgw.iterrows():
team_gw = fixtures_df[
(fixtures_df["team"] == row["team"]) &
(fixtures_df["gameweek"] == row["gameweek"])
]
fdrs = [
self.get_fixture_difficulty(row["team"], f["opponent"], f["is_home"]).fdr_overall
for _, f in team_gw.iterrows()
]
dgw_details.append({
"team": row["team"],
"gameweek": row["gameweek"],
"fixtures": row["fixtures"],
"avg_fdr": np.mean(fdrs)
})
return pd.DataFrame(dgw_details).sort_values(["gameweek", "avg_fdr"])
class FixtureSwingAnalyzer:
"""Analyze fixture swings for transfer planning."""
def __init__(self, fdr_calc: FDRCalculator, fixtures: pd.DataFrame):
self.fdr = fdr_calc
self.fixtures = fixtures
def find_rotation_pairs(self, budget: float = 10.0) -> List[Tuple[str, str]]:
"""Find pairs of players/teams that rotate well."""
teams = self.fixtures["team"].unique()
pairs = []
for i, team1 in enumerate(teams):
for team2 in teams[i+1:]:
# Get next 10 fixtures for each
fix1 = self.fixtures[self.fixtures["team"] == team1].head(10)
fix2 = self.fixtures[self.fixtures["team"] == team2].head(10)
if len(fix1) == 10 and len(fix2) == 10:
# Check if they complement each other
rotation_score = self._calculate_rotation_score(fix1, fix2)
if rotation_score > 7: # Good rotation
pairs.append((team1, team2, rotation_score))
return sorted(pairs, key=lambda x: x[2], reverse=True)
def _calculate_rotation_score(self, fix1: pd.DataFrame,
fix2: pd.DataFrame) -> float:
"""Score how well two fixture lists rotate."""
score = 0
for i in range(min(len(fix1), len(fix2))):
fdr1 = self.fdr.get_fixture_difficulty(
fix1.iloc[i]["team"],
fix1.iloc[i]["opponent"],
fix1.iloc[i]["is_home"]
).fdr_overall
fdr2 = self.fdr.get_fixture_difficulty(
fix2.iloc[i]["team"],
fix2.iloc[i]["opponent"],
fix2.iloc[i]["is_home"]
).fdr_overall
# Best rotation: one easy, one hard
if (fdr1 <= 2 and fdr2 >= 4) or (fdr1 >= 4 and fdr2 <= 2):
score += 1
# Good: one easy, one medium
elif (fdr1 <= 2 and fdr2 == 3) or (fdr1 == 3 and fdr2 <= 2):
score += 0.5
return score
print("FDR Calculator initialized!")# R: Fixture Difficulty Rating system
library(tidyverse)
# Build comprehensive FDR model
calculate_fdr <- function(teams_data, matches_data, n_matches = 6) {
# Calculate home and away strength
team_strength <- matches_data %>%
filter(date >= max(date) - 180) %>% # Last 6 months
group_by(home_team) %>%
summarise(
home_xg_for = mean(home_xg),
home_xg_against = mean(away_xg),
home_points = sum(case_when(
home_goals > away_goals ~ 3,
home_goals == away_goals ~ 1,
TRUE ~ 0
)) / n(),
.groups = "drop"
) %>%
rename(team = home_team) %>%
left_join(
matches_data %>%
filter(date >= max(date) - 180) %>%
group_by(away_team) %>%
summarise(
away_xg_for = mean(away_xg),
away_xg_against = mean(home_xg),
away_points = sum(case_when(
away_goals > home_goals ~ 3,
away_goals == home_goals ~ 1,
TRUE ~ 0
)) / n(),
.groups = "drop"
) %>%
rename(team = away_team),
by = "team"
) %>%
mutate(
# Overall attacking/defensive strength
attack_strength = (home_xg_for + away_xg_for) / 2,
defense_strength = (home_xg_against + away_xg_against) / 2,
overall_strength = attack_strength - defense_strength,
# FDR score (1-5 scale, lower = easier)
fdr_attack = ntile(defense_strength, 5), # Easier to attack weak defenses
fdr_defense = ntile(attack_strength, 5), # Harder to keep CS vs strong attacks
fdr_overall = ntile(-overall_strength, 5)
)
team_strength
}
# Calculate fixture runs
calculate_fixture_runs <- function(fixtures, fdr_data, n_gameweeks = 6) {
fixtures %>%
filter(gameweek <= max(gameweek) + n_gameweeks) %>%
left_join(
fdr_data %>% select(team, fdr_overall),
by = c("opponent" = "team")
) %>%
group_by(team) %>%
summarise(
next_n_fdr = mean(fdr_overall),
easy_fixtures = sum(fdr_overall <= 2),
hard_fixtures = sum(fdr_overall >= 4),
fixture_swing = easy_fixtures - hard_fixtures,
.groups = "drop"
) %>%
arrange(next_n_fdr)
}
# Fixture ticker for dashboard
create_fixture_ticker <- function(fixtures, fdr_data, team) {
fixtures %>%
filter(team == !!team) %>%
head(10) %>%
left_join(fdr_data %>% select(team, fdr_overall),
by = c("opponent" = "team")) %>%
mutate(
fdr_color = case_when(
fdr_overall == 1 ~ "#00FF00", # Bright green
fdr_overall == 2 ~ "#90EE90", # Light green
fdr_overall == 3 ~ "#FFFF00", # Yellow
fdr_overall == 4 ~ "#FFA500", # Orange
fdr_overall == 5 ~ "#FF0000" # Red
),
display = paste0(
opponent, " (",
ifelse(is_home, "H", "A"), ")"
)
)
}
print("FDR calculation system ready!")FDR Strategy Tips
- Wildcards: Time wildcards to catch fixture swings - target teams with 4+ green fixtures
- Differentials: Look for low-ownership players from teams with easy runs
- Rotation: Pair 4.5m defenders who rotate well to maximize clean sheets
- Captaincy: Weight captain picks toward easier fixtures, especially for premiums
FPL Chip Strategy & Optimization
FPL chips (Wildcard, Bench Boost, Triple Captain, Free Hit) can swing hundreds of points when used optimally. Understanding when to deploy them is crucial for top finishes.
# Python: Chip optimization strategy
import pandas as pd
import numpy as np
from dataclasses import dataclass
from typing import List, Dict, Optional
from scipy.optimize import milp, LinearConstraint, Bounds
@dataclass
class ChipOpportunity:
"""Represents a chip deployment opportunity."""
gameweek: int
chip_type: str
score: float
reasoning: str
recommended_action: str
class ChipOptimizer:
"""Optimize FPL chip deployment."""
def __init__(self, fixtures: pd.DataFrame, player_pool: pd.DataFrame,
fdr_data: pd.DataFrame):
self.fixtures = fixtures
self.players = player_pool
self.fdr = fdr_data
def find_bench_boost_opportunities(self,
remaining_gws: List[int]) -> List[ChipOpportunity]:
"""Find optimal Bench Boost gameweeks."""
opportunities = []
for gw in remaining_gws:
gw_fixtures = self.fixtures[self.fixtures["gameweek"] == gw]
# Count doubles and blanks
team_counts = gw_fixtures.groupby("team").size()
doubles = (team_counts == 2).sum()
blanks = len(set(self.players["team"]) - set(gw_fixtures["team"]))
# Average FDR
avg_fdr = gw_fixtures.merge(self.fdr[["team", "fdr_overall"]],
left_on="opponent", right_on="team",
how="left")["fdr_overall"].mean()
# Score the opportunity
score = doubles * 3 + (5 - avg_fdr) * 2 - blanks * 2
if score > 5:
opportunities.append(ChipOpportunity(
gameweek=gw,
chip_type="Bench Boost",
score=score,
reasoning=f"{doubles} DGWs, avg FDR {avg_fdr:.1f}",
recommended_action=f"Consider BB in GW{gw}" if score > 8 else "Monitor"
))
return sorted(opportunities, key=lambda x: x.score, reverse=True)
def find_triple_captain_targets(self, gameweek: int,
squad: List[str]) -> pd.DataFrame:
"""Find best Triple Captain picks for a gameweek."""
# Get fixtures for the gameweek
gw_fixtures = self.fixtures[self.fixtures["gameweek"] == gameweek]
# Find players with doubles or great fixtures
squad_players = self.players[self.players["name"].isin(squad)]
results = []
for _, player in squad_players.iterrows():
team_fixtures = gw_fixtures[gw_fixtures["team"] == player["team"]]
if len(team_fixtures) == 0:
continue
# Calculate TC expected points
fixture_count = len(team_fixtures)
avg_fdr = team_fixtures.merge(
self.fdr[["team", "fdr_overall"]],
left_on="opponent", right_on="team"
)["fdr_overall"].mean()
# Adjust xpoints for fixture count and difficulty
base_xpts = player["xpoints"]
adj_xpts = base_xpts * fixture_count * (1 + (3 - avg_fdr) * 0.1)
tc_expected = adj_xpts * 3 # Triple points
results.append({
"name": player["name"],
"position": player["position"],
"team": player["team"],
"fixtures": fixture_count,
"avg_fdr": avg_fdr,
"base_xpts": base_xpts,
"tc_expected": tc_expected
})
return pd.DataFrame(results).sort_values("tc_expected", ascending=False)
def optimize_free_hit_squad(self, gameweek: int,
budget: float = 100.0) -> Dict:
"""Build optimal Free Hit squad for a specific gameweek."""
gw_fixtures = self.fixtures[self.fixtures["gameweek"] == gameweek]
playing_teams = set(gw_fixtures["team"])
# Filter to players with fixtures
available = self.players[self.players["team"].isin(playing_teams)].copy()
# Adjust for double gameweeks
team_counts = gw_fixtures.groupby("team").size().to_dict()
available["gw_fixtures"] = available["team"].map(team_counts)
# Add FDR adjustment
available = available.merge(
gw_fixtures.groupby("team")["opponent"].apply(list).reset_index(),
on="team", how="left"
)
# Calculate adjusted xpoints for this gameweek
def calc_gw_xpts(row):
base = row["xpoints"]
fixtures = row["gw_fixtures"]
return base * fixtures * 1.1 # Slight boost for DGW
available["gw_xpoints"] = available.apply(calc_gw_xpts, axis=1)
# Now optimize (using existing optimizer)
n = len(available)
# Objective: maximize gw_xpoints
c = -available["gw_xpoints"].values
constraints = []
# Budget
A_budget = available["price"].values.reshape(1, -1)
constraints.append(LinearConstraint(A_budget, -np.inf, budget))
# Position constraints
for pos, count in {"GKP": 2, "DEF": 5, "MID": 5, "FWD": 3}.items():
A_pos = (available["position"] == pos).astype(int).values
constraints.append(LinearConstraint(A_pos.reshape(1, -1), count, count))
# Team constraints
for team in available["team"].unique():
A_team = (available["team"] == team).astype(int).values
constraints.append(LinearConstraint(A_team.reshape(1, -1),
-np.inf, 3))
# Total = 15
constraints.append(LinearConstraint(np.ones((1, n)), 15, 15))
# Solve
integrality = np.ones(n)
result = milp(c, constraints=constraints, integrality=integrality,
bounds=Bounds(0, 1))
if result.success:
selected_idx = np.where(result.x > 0.5)[0]
squad = available.iloc[selected_idx]
return {
"squad": squad[["name", "position", "team", "price",
"gw_fixtures", "gw_xpoints"]],
"total_xpoints": squad["gw_xpoints"].sum(),
"total_cost": squad["price"].sum()
}
return None
def wildcard_timing_analysis(self, current_gw: int,
remaining_gws: List[int]) -> Dict:
"""Analyze optimal wildcard timing."""
swing_analysis = []
for gw in remaining_gws:
# Calculate fixture swing from current GW to target
upcoming = self.fixtures[
(self.fixtures["gameweek"] >= gw) &
(self.fixtures["gameweek"] < gw + 6)
]
team_fdrs = upcoming.merge(
self.fdr[["team", "fdr_overall"]],
left_on="opponent", right_on="team"
).groupby("team_x")["fdr_overall"].mean().reset_index()
# Find teams with improving fixtures
improving_teams = team_fdrs[team_fdrs["fdr_overall"] <= 2.5]
swing_analysis.append({
"wildcard_gw": gw,
"easy_fixture_teams": len(improving_teams),
"best_teams": improving_teams.nsmallest(5, "fdr_overall")["team_x"].tolist()
})
return pd.DataFrame(swing_analysis)
# Example usage
print("Chip Optimizer ready for deployment!")
# Usage pattern:
# optimizer = ChipOptimizer(fixtures_df, players_df, fdr_df)
# bb_opps = optimizer.find_bench_boost_opportunities([30, 31, 32, 33, 34])
# tc_picks = optimizer.find_triple_captain_targets(34, my_squad)
# fh_squad = optimizer.optimize_free_hit_squad(29, budget=100.0)# R: Chip optimization strategy
library(tidyverse)
# Chip timing optimizer
analyze_chip_opportunities <- function(fixtures, fdr_data, player_pool) {
# Find best Bench Boost gameweeks
find_bb_opportunities <- function(fixtures, n_gw = 10) {
fixtures %>%
group_by(gameweek) %>%
summarise(
double_gw_count = sum(is_double_gw),
avg_fdr = mean(fdr),
easy_fixture_count = sum(fdr <= 2),
.groups = "drop"
) %>%
mutate(
bb_score = double_gw_count * 3 + easy_fixture_count +
(5 - avg_fdr) * 2
) %>%
arrange(desc(bb_score)) %>%
head(n_gw)
}
# Find best Triple Captain targets
find_tc_opportunities <- function(player_pool, fixtures) {
player_pool %>%
filter(position %in% c("MID", "FWD")) %>%
left_join(fixtures, by = "team") %>%
filter(is_double_gw | fdr <= 2) %>%
mutate(
tc_score = xpoints * (1 + is_double_gw + (3 - fdr) * 0.2)
) %>%
arrange(desc(tc_score)) %>%
head(10)
}
# Free Hit gameweek analysis
find_fh_opportunities <- function(fixtures) {
fixtures %>%
group_by(gameweek) %>%
summarise(
blank_count = sum(is_blank),
double_count = sum(is_double_gw),
avg_fdr = mean(fdr, na.rm = TRUE),
.groups = "drop"
) %>%
mutate(
fh_score = blank_count * 5 + # High value if many blanks
double_count * 2 +
(5 - avg_fdr)
) %>%
filter(fh_score >= 5) %>%
arrange(desc(fh_score))
}
list(
bench_boost = find_bb_opportunities(fixtures),
triple_captain = find_tc_opportunities(player_pool, fixtures),
free_hit = find_fh_opportunities(fixtures)
)
}
# Wildcard planning
plan_wildcard <- function(current_squad, player_pool, target_gw,
budget = 100, upcoming_fixtures) {
# Calculate value scores for all players
player_pool <- player_pool %>%
left_join(
upcoming_fixtures %>%
group_by(team) %>%
summarise(avg_fdr = mean(fdr), .groups = "drop"),
by = "team"
) %>%
mutate(
# Adjust xpoints by fixture difficulty
adj_xpoints = xpoints * (1 + (3 - avg_fdr) * 0.15),
value = adj_xpoints / (price / 10)
)
# Find optimal new squad
list(
transfers_needed = sum(!current_squad$name %in% player_pool$name),
top_picks_by_position = player_pool %>%
group_by(position) %>%
slice_max(value, n = 5) %>%
select(name, team, price, xpoints, adj_xpoints, value)
)
}
print("Chip strategy analyzer ready!")| Chip | Optimal Timing | Key Factors | Common Mistakes |
|---|---|---|---|
| Bench Boost | Double Gameweek with 15 playing players | All bench players have fixtures, ideally doubles | Using without full playing squad |
| Triple Captain | Premium player with DGW or 2 easy fixtures | High xG player, good fixtures, form | Chasing last year's TC pick |
| Free Hit | Blank Gameweek with many teams missing | Squad normally has many blanks | Using for DGW instead of blank |
| Wildcard | Major fixture swing or injury crisis | 4+ transfers needed, fixture improvement | Panic wildcard after one bad week |
Betting Market Analysis
Understanding betting markets helps evaluate model performance and market efficiency. Odds contain valuable information about probability estimates.
# Python: Betting odds analysis
import pandas as pd
import numpy as np
class OddsAnalyzer:
"""Analyze betting odds and implied probabilities."""
def __init__(self):
pass
def convert_odds(self, decimal_odds):
"""Convert decimal odds to all formats."""
return {
"decimal": decimal_odds,
"fractional": f"{int(decimal_odds - 1)}/1" if decimal_odds >= 2 else f"1/{int(1/(decimal_odds-1))}",
"american": f"+{int((decimal_odds - 1) * 100)}" if decimal_odds >= 2
else f"-{int(100 / (decimal_odds - 1))}",
"implied_prob": 1 / decimal_odds
}
def calculate_implied_probabilities(self, home_odds, draw_odds, away_odds):
"""Calculate true probabilities from betting odds."""
# Raw implied probabilities
home_prob = 1 / home_odds
draw_prob = 1 / draw_odds
away_prob = 1 / away_odds
# Overround (bookmaker margin)
overround = home_prob + draw_prob + away_prob
# True probabilities (margin removed)
return pd.DataFrame({
"outcome": ["Home", "Draw", "Away"],
"odds": [home_odds, draw_odds, away_odds],
"raw_implied": [home_prob, draw_prob, away_prob],
"true_implied": [home_prob/overround, draw_prob/overround,
away_prob/overround],
"overround_pct": [(overround - 1) * 100] * 3
})
def calculate_expected_value(self, model_prob, odds):
"""Calculate expected value of a bet."""
# EV = (probability * profit) - (1 - probability) * stake
# For unit stake: EV = (prob * (odds - 1)) - ((1 - prob) * 1)
ev = (model_prob * (odds - 1)) - (1 - model_prob)
return ev
def find_value_bets(self, matches_df, model_probs_col, odds_col,
ev_threshold=0.05):
"""Identify bets where model shows positive expected value."""
df = matches_df.copy()
df["implied_prob"] = 1 / df[odds_col]
df["ev"] = df.apply(
lambda x: self.calculate_expected_value(x[model_probs_col], x[odds_col]),
axis=1
)
df["edge"] = df[model_probs_col] - df["implied_prob"]
value_bets = df[df["ev"] > ev_threshold].sort_values("ev", ascending=False)
return value_bets
def kelly_criterion(self, model_prob, odds, fraction=0.25):
"""Calculate Kelly stake as percentage of bankroll."""
# Full Kelly: f* = (bp - q) / b
# where b = odds - 1, p = prob of win, q = prob of loss
b = odds - 1
p = model_prob
q = 1 - p
kelly = (b * p - q) / b
# Apply fractional Kelly for safety
return max(0, kelly * fraction)
# Example usage
analyzer = OddsAnalyzer()
# Analyze match odds
probs = analyzer.calculate_implied_probabilities(1.65, 3.80, 5.50)
print("Implied Probabilities:")
print(probs.to_string(index=False))
print(f"\nBookmaker margin: {probs['overround_pct'].iloc[0]:.1f}%")
# Calculate EV for a bet
model_prob = 0.65 # Our model says 65% chance of home win
home_odds = 1.65
ev = analyzer.calculate_expected_value(model_prob, home_odds)
kelly = analyzer.kelly_criterion(model_prob, home_odds)
print(f"\nModel probability: {model_prob:.1%}")
print(f"Implied probability: {1/home_odds:.1%}")
print(f"Expected Value: {ev:.3f}")
print(f"Kelly stake: {kelly:.1%} of bankroll")# R: Betting odds analysis
library(tidyverse)
# Convert odds formats
convert_odds <- function(decimal_odds) {
list(
decimal = decimal_odds,
fractional = paste0(round(decimal_odds - 1), "/1"),
american = ifelse(decimal_odds >= 2,
paste0("+", round((decimal_odds - 1) * 100)),
paste0("-", round(100 / (decimal_odds - 1)))),
implied_prob = 1 / decimal_odds
)
}
# Calculate implied probabilities from odds
implied_probabilities <- function(home_odds, draw_odds, away_odds) {
# Raw implied probabilities
home_prob <- 1 / home_odds
draw_prob <- 1 / draw_odds
away_prob <- 1 / away_odds
# Calculate overround (bookmaker margin)
overround <- home_prob + draw_prob + away_prob
# True probabilities (removing margin)
tibble(
outcome = c("Home", "Draw", "Away"),
odds = c(home_odds, draw_odds, away_odds),
raw_implied = c(home_prob, draw_prob, away_prob),
true_implied = c(home_prob, draw_prob, away_prob) / overround,
overround_pct = (overround - 1) * 100
)
}
# Example
match_odds <- implied_probabilities(
home_odds = 1.65,
draw_odds = 3.80,
away_odds = 5.50
)
print(match_odds)
cat("\nBookmaker margin:", round(match_odds$overround_pct[1], 1), "%")Implied Probabilities:
outcome odds raw_implied true_implied overround_pct
Home 1.65 0.6061 0.5714 6.1
Draw 3.80 0.2632 0.2484 6.1
Away 5.50 0.1818 0.1716 6.1
Bookmaker margin: 6.1%
Model probability: 65.0%
Implied probability: 60.6%
Expected Value: 0.073
Kelly stake: 2.8% of bankrollEvaluating Prediction Models
Assessing model quality is crucial for both fantasy and betting applications. Key metrics include calibration, log loss, and Brier score.
# Python: Model evaluation metrics
import numpy as np
from sklearn.metrics import brier_score_loss, log_loss, roc_auc_score
from sklearn.calibration import calibration_curve
import matplotlib.pyplot as plt
class ModelEvaluator:
"""Evaluate prediction model performance."""
def __init__(self, predictions, actuals):
self.predictions = np.array(predictions)
self.actuals = np.array(actuals)
def brier_score(self):
"""Calculate Brier score (lower is better)."""
return brier_score_loss(self.actuals, self.predictions)
def calculate_log_loss(self):
"""Calculate log loss (lower is better)."""
return log_loss(self.actuals, self.predictions)
def calculate_auc(self):
"""Calculate ROC AUC."""
return roc_auc_score(self.actuals, self.predictions)
def calibration_analysis(self, n_bins=10):
"""Analyze model calibration."""
prob_true, prob_pred = calibration_curve(
self.actuals, self.predictions, n_bins=n_bins
)
return {
"predicted": prob_pred,
"actual": prob_true,
"calibration_error": np.mean(np.abs(prob_pred - prob_true))
}
def plot_calibration(self):
"""Plot calibration curve."""
cal = self.calibration_analysis()
fig, ax = plt.subplots(figsize=(8, 8))
# Perfect calibration line
ax.plot([0, 1], [0, 1], "k--", label="Perfectly Calibrated")
# Model calibration
ax.plot(cal["predicted"], cal["actual"], "s-",
label=f"Model (ECE: {cal['calibration_error']:.3f})")
ax.set_xlabel("Mean Predicted Probability")
ax.set_ylabel("Actual Proportion")
ax.set_title("Calibration Curve")
ax.legend()
ax.grid(True, alpha=0.3)
return fig
def full_report(self):
"""Generate comprehensive evaluation report."""
return {
"brier_score": self.brier_score(),
"log_loss": self.calculate_log_loss(),
"auc": self.calculate_auc(),
"calibration": self.calibration_analysis()
}
# Compare model to betting market
def evaluate_against_market(model_probs, market_probs, actuals):
"""Compare model performance to market."""
model_eval = ModelEvaluator(model_probs, actuals)
market_eval = ModelEvaluator(market_probs, actuals)
comparison = pd.DataFrame({
"Metric": ["Brier Score", "Log Loss", "AUC"],
"Model": [model_eval.brier_score(),
model_eval.calculate_log_loss(),
model_eval.calculate_auc()],
"Market": [market_eval.brier_score(),
market_eval.calculate_log_loss(),
market_eval.calculate_auc()]
})
comparison["Model Better"] = comparison.apply(
lambda x: x["Model"] < x["Market"] if x["Metric"] != "AUC"
else x["Model"] > x["Market"],
axis=1
)
return comparison
# Example evaluation
np.random.seed(42)
n = 500
predictions = np.random.beta(2, 3, n)
actuals = (np.random.random(n) < predictions).astype(int)
evaluator = ModelEvaluator(predictions, actuals)
report = evaluator.full_report()
print("Model Evaluation Report:")
print(f" Brier Score: {report['brier_score']:.4f}")
print(f" Log Loss: {report['log_loss']:.4f}")
print(f" AUC: {report['auc']:.4f}")
print(f" Calibration Error: {report['calibration']['calibration_error']:.4f}")# R: Model evaluation metrics
library(tidyverse)
# Brier score (lower is better)
brier_score <- function(predicted_prob, actual_outcome) {
mean((predicted_prob - actual_outcome)^2)
}
# Log loss (lower is better)
log_loss <- function(predicted_prob, actual_outcome) {
eps <- 1e-15
predicted_prob <- pmax(pmin(predicted_prob, 1 - eps), eps)
-mean(actual_outcome * log(predicted_prob) +
(1 - actual_outcome) * log(1 - predicted_prob))
}
# Calibration analysis
calibration_analysis <- function(predictions, outcomes, n_bins = 10) {
data <- tibble(pred = predictions, actual = outcomes)
data %>%
mutate(bin = cut(pred, breaks = seq(0, 1, length.out = n_bins + 1),
include.lowest = TRUE)) %>%
group_by(bin) %>%
summarise(
mean_predicted = mean(pred),
mean_actual = mean(actual),
count = n()
)
}
# ROC AUC calculation
calculate_auc <- function(predictions, outcomes) {
# Simple AUC calculation
pos <- predictions[outcomes == 1]
neg <- predictions[outcomes == 0]
mean(sapply(pos, function(p) mean(p > neg)))
}
# Comprehensive evaluation
evaluate_model <- function(predictions, outcomes) {
list(
brier = brier_score(predictions, outcomes),
log_loss = log_loss(predictions, outcomes),
auc = calculate_auc(predictions, outcomes),
calibration = calibration_analysis(predictions, outcomes)
)
}Model Evaluation Report:
Brier Score: 0.1923
Log Loss: 0.5847
AUC: 0.7234
Calibration Error: 0.0412Goal Scorer & Over/Under Markets
Player-level and match total betting markets can be analyzed using xG-based models. These markets often show different efficiency levels than match result markets.
# Python: Goal scorer market analysis
import pandas as pd
import numpy as np
from scipy.stats import poisson
from typing import Dict, List, Tuple
class GoalScorerAnalyzer:
"""Analyze anytime goalscorer markets."""
def __init__(self, player_stats: pd.DataFrame, odds_data: pd.DataFrame):
self.players = player_stats.merge(odds_data, on="player_name", how="left")
def calculate_ags_probability(self, xg_per_90: float,
expected_mins: float) -> float:
"""Calculate probability of scoring at least once."""
expected_goals = xg_per_90 * expected_mins / 90
return 1 - np.exp(-expected_goals)
def analyze_market(self) -> pd.DataFrame:
"""Analyze all AGS market prices."""
df = self.players.copy()
# Calculate model probability
df["expected_goals"] = df["xg_per_90"] * df["expected_mins"] / 90
df["model_prob"] = 1 - np.exp(-df["expected_goals"])
# Market implied probability
df["implied_prob"] = 1 / df["ags_odds"]
# Edge and EV
df["edge"] = df["model_prob"] - df["implied_prob"]
df["ev"] = (df["model_prob"] * (df["ags_odds"] - 1)) - (1 - df["model_prob"])
# Confidence score
df["confidence"] = np.minimum(df["matches_played"] / 10, 1)
return df.sort_values("ev", ascending=False)
def find_value_ags(self, min_ev: float = 0.05,
min_confidence: float = 0.5) -> pd.DataFrame:
"""Find value AGS bets."""
analysis = self.analyze_market()
return analysis[
(analysis["ev"] > min_ev) &
(analysis["confidence"] >= min_confidence)
][["player_name", "team", "ags_odds", "model_prob",
"implied_prob", "edge", "ev"]]
class OverUnderAnalyzer:
"""Analyze over/under and BTTS markets."""
def __init__(self, match_data: pd.DataFrame):
self.matches = match_data
def analyze_ou_market(self, line: float = 2.5) -> pd.DataFrame:
"""Analyze over/under market for given line."""
df = self.matches.copy()
# Total expected goals
df["total_xg"] = df["home_xg"] + df["away_xg"]
# Poisson probabilities
df["p_over"] = 1 - poisson.cdf(int(line), df["total_xg"])
df["p_under"] = poisson.cdf(int(line), df["total_xg"])
# Market comparison
df["implied_over"] = 1 / df["over_odds"]
df["implied_under"] = 1 / df["under_odds"]
# Edge calculation
df["over_edge"] = df["p_over"] - df["implied_over"]
df["under_edge"] = df["p_under"] - df["implied_under"]
# EV calculation
df["over_ev"] = (df["p_over"] * (df["over_odds"] - 1)) - (1 - df["p_over"])
df["under_ev"] = (df["p_under"] * (df["under_odds"] - 1)) - (1 - df["p_under"])
# Best bet
df["best_bet"] = np.where(df["over_ev"] > 0.05, "OVER",
np.where(df["under_ev"] > 0.05, "UNDER", "PASS"))
return df
def analyze_btts(self) -> pd.DataFrame:
"""Analyze Both Teams to Score market."""
df = self.matches.copy()
# Probability each team scores
df["p_home_scores"] = 1 - np.exp(-df["home_xg"])
df["p_away_scores"] = 1 - np.exp(-df["away_xg"])
# BTTS probability
df["model_btts"] = df["p_home_scores"] * df["p_away_scores"]
df["model_no_btts"] = 1 - df["model_btts"]
# Market comparison
df["implied_btts"] = 1 / df["btts_yes_odds"]
df["implied_no_btts"] = 1 / df["btts_no_odds"]
# Edge
df["btts_edge"] = df["model_btts"] - df["implied_btts"]
df["no_btts_edge"] = df["model_no_btts"] - df["implied_no_btts"]
return df
def correct_score_probabilities(self, home_xg: float, away_xg: float,
max_goals: int = 5) -> pd.DataFrame:
"""Calculate correct score probabilities using Poisson."""
results = []
for h in range(max_goals + 1):
for a in range(max_goals + 1):
prob = poisson.pmf(h, home_xg) * poisson.pmf(a, away_xg)
results.append({
"home_goals": h,
"away_goals": a,
"score": f"{h}-{a}",
"probability": prob,
"fair_odds": 1 / prob if prob > 0 else float("inf")
})
return pd.DataFrame(results).sort_values("probability", ascending=False)
def asian_handicap_probability(self, home_xg: float, away_xg: float,
line: float) -> Dict:
"""Calculate Asian Handicap probabilities."""
# Simulate many games using Poisson
n_sims = 100000
home_goals = np.random.poisson(home_xg, n_sims)
away_goals = np.random.poisson(away_xg, n_sims)
# Apply handicap
adjusted_margin = home_goals - away_goals + line
# Calculate outcomes
home_wins = np.sum(adjusted_margin > 0) / n_sims
away_wins = np.sum(adjusted_margin < 0) / n_sims
pushes = np.sum(adjusted_margin == 0) / n_sims
return {
"line": line,
"home_covers": home_wins,
"away_covers": away_wins,
"push": pushes,
"home_fair_odds": 1 / home_wins if home_wins > 0 else float("inf"),
"away_fair_odds": 1 / away_wins if away_wins > 0 else float("inf")
}
# Example usage
print("Goal scorer and O/U analyzer ready!")
# Correct score example
analyzer = OverUnderAnalyzer(pd.DataFrame())
cs = analyzer.correct_score_probabilities(1.8, 1.2)
print("\nMost likely scores (Man City 1.8 xG vs Arsenal 1.2 xG):")
print(cs.head(10).to_string(index=False))# R: Goal scorer market analysis
library(tidyverse)
# Analyze anytime goalscorer markets
analyze_ags_market <- function(player_data, market_odds) {
player_data %>%
left_join(market_odds, by = "player_name") %>%
mutate(
# Implied probability from odds
implied_prob = 1 / ags_odds,
# Model probability: P(at least 1 goal) = 1 - P(0 goals)
# Using Poisson: P(0) = e^(-xG * mins/90)
expected_goals = xg_per_90 * expected_mins / 90,
model_prob = 1 - exp(-expected_goals),
# Calculate edge
edge = model_prob - implied_prob,
# Expected value
ev = (model_prob * (ags_odds - 1)) - (1 - model_prob),
# Confidence based on sample size
confidence = pmin(matches_played / 10, 1)
) %>%
filter(!is.na(ags_odds)) %>%
arrange(desc(ev))
}
# Over/Under goals model
analyze_ou_market <- function(match_data, line = 2.5) {
match_data %>%
mutate(
# Total expected goals
total_xg = home_xg + away_xg,
# Probability of over using Poisson
# P(total > line) = 1 - P(total <= floor(line))
p_over = 1 - ppois(floor(line), lambda = total_xg),
p_under = ppois(floor(line), lambda = total_xg),
# Compare to market
implied_over = 1 / over_odds,
implied_under = 1 / under_odds,
# Edge
over_edge = p_over - implied_over,
under_edge = p_under - implied_under,
# Best bet direction
best_bet = case_when(
over_edge > 0.05 ~ "OVER",
under_edge > 0.05 ~ "UNDER",
TRUE ~ "PASS"
)
)
}
# BTTS (Both Teams to Score) market
analyze_btts <- function(match_data) {
match_data %>%
mutate(
# P(home scores at least 1)
p_home_scores = 1 - exp(-home_xg),
# P(away scores at least 1)
p_away_scores = 1 - exp(-away_xg),
# P(BTTS) = P(home scores) * P(away scores)
model_btts = p_home_scores * p_away_scores,
model_no_btts = 1 - model_btts,
# Compare to market
implied_btts = 1 / btts_yes_odds,
implied_no_btts = 1 / btts_no_odds,
btts_edge = model_btts - implied_btts,
no_btts_edge = model_no_btts - implied_no_btts
)
}
# Correct score probabilities (Poisson)
correct_score_probs <- function(home_xg, away_xg, max_goals = 5) {
scores <- expand_grid(
home_goals = 0:max_goals,
away_goals = 0:max_goals
) %>%
mutate(
# Independent Poisson probabilities
prob = dpois(home_goals, home_xg) * dpois(away_goals, away_xg),
score = paste0(home_goals, "-", away_goals)
) %>%
arrange(desc(prob))
scores
}
# Example
cs_probs <- correct_score_probs(home_xg = 1.8, away_xg = 1.2)
print(head(cs_probs, 10))Most likely scores (Man City 1.8 xG vs Arsenal 1.2 xG):
home_goals away_goals score probability fair_odds
1 1 1-1 0.1790 5.59
2 1 2-1 0.1611 6.21
1 0 1-0 0.1492 6.70
2 0 2-0 0.1343 7.45
0 1 0-1 0.0828 12.08
1 2 1-2 0.1074 9.31
0 0 0-0 0.0690 14.49
3 1 3-1 0.0967 10.34
2 2 2-2 0.0967 10.34
3 0 3-0 0.0805 12.42Bankroll Management
Proper bankroll management is more important than picking winners. Even the best models fail without disciplined stake sizing.
# Python: Bankroll management system
import numpy as np
import pandas as pd
from dataclasses import dataclass, field
from typing import List, Dict, Optional
from datetime import datetime
@dataclass
class Bet:
"""Represents a single bet."""
date: datetime
event: str
selection: str
odds: float
stake: float
stake_pct: float
model_prob: float
result: Optional[str] = None
profit: Optional[float] = None
class BankrollManager:
"""Comprehensive bankroll management system."""
def __init__(self, initial_bankroll: float, max_stake_pct: float = 0.05,
kelly_fraction: float = 0.25):
self.initial = initial_bankroll
self.current = initial_bankroll
self.max_stake = max_stake_pct
self.kelly_fraction = kelly_fraction
self.history: List[Bet] = []
def kelly_stake(self, model_prob: float, odds: float) -> float:
"""Calculate Kelly criterion stake."""
b = odds - 1
p = model_prob
q = 1 - p
# Full Kelly
full_kelly = (b * p - q) / b
if full_kelly <= 0:
return 0
# Apply fraction and cap
stake_pct = min(full_kelly * self.kelly_fraction, self.max_stake)
return max(0, stake_pct)
def flat_stake(self, units: float = 1.0, unit_size: float = 0.01) -> float:
"""Flat staking approach."""
return min(units * unit_size, self.max_stake)
def place_bet(self, event: str, selection: str, odds: float,
model_prob: float, stake_method: str = "kelly") -> Bet:
"""Place a bet and record it."""
if stake_method == "kelly":
stake_pct = self.kelly_stake(model_prob, odds)
else:
stake_pct = self.flat_stake()
stake = self.current * stake_pct
bet = Bet(
date=datetime.now(),
event=event,
selection=selection,
odds=odds,
stake=stake,
stake_pct=stake_pct,
model_prob=model_prob
)
self.history.append(bet)
return bet
def settle_bet(self, bet: Bet, won: bool) -> None:
"""Settle a bet and update bankroll."""
bet.result = "WON" if won else "LOST"
bet.profit = bet.stake * (bet.odds - 1) if won else -bet.stake
self.current += bet.profit
def get_stats(self) -> Dict:
"""Calculate comprehensive betting statistics."""
settled = [b for b in self.history if b.result is not None]
if not settled:
return {"message": "No settled bets"}
wins = [b for b in settled if b.result == "WON"]
total_staked = sum(b.stake for b in settled)
total_profit = sum(b.profit for b in settled)
return {
"total_bets": len(settled),
"wins": len(wins),
"losses": len(settled) - len(wins),
"win_rate": len(wins) / len(settled),
"total_staked": total_staked,
"total_profit": total_profit,
"roi": total_profit / total_staked if total_staked > 0 else 0,
"bankroll_growth": (self.current - self.initial) / self.initial,
"current_bankroll": self.current,
"avg_odds": np.mean([b.odds for b in settled]),
"avg_stake_pct": np.mean([b.stake_pct for b in settled])
}
def simulate_future(self, n_bets: int, avg_edge: float,
avg_odds: float, n_sims: int = 10000) -> Dict:
"""Monte Carlo simulation of future performance."""
win_rate = (1 / avg_odds) + avg_edge
results = []
for _ in range(n_sims):
bankroll = self.current
max_bankroll = bankroll
min_bankroll = bankroll
for _ in range(n_bets):
stake_pct = self.kelly_stake(win_rate, avg_odds)
stake = bankroll * stake_pct
if np.random.random() < win_rate:
bankroll += stake * (avg_odds - 1)
else:
bankroll -= stake
max_bankroll = max(max_bankroll, bankroll)
min_bankroll = min(min_bankroll, bankroll)
if bankroll <= 0:
break
results.append({
"final_bankroll": bankroll,
"max_bankroll": max_bankroll,
"min_bankroll": min_bankroll,
"ruined": bankroll <= 0
})
df = pd.DataFrame(results)
return {
"mean_final": df["final_bankroll"].mean(),
"median_final": df["final_bankroll"].median(),
"percentile_5": df["final_bankroll"].quantile(0.05),
"percentile_95": df["final_bankroll"].quantile(0.95),
"risk_of_ruin": df["ruined"].mean(),
"max_drawdown_mean": (df["max_bankroll"] - df["min_bankroll"]).mean() / df["max_bankroll"].mean()
}
class ClosingLineValue:
"""Track Closing Line Value (CLV) for bet quality assessment."""
def __init__(self):
self.bets = []
def add_bet(self, placed_odds: float, closing_odds: float,
stake: float, won: bool) -> None:
"""Add a bet with placed and closing odds."""
self.bets.append({
"placed_odds": placed_odds,
"closing_odds": closing_odds,
"stake": stake,
"won": won,
"clv": (1/placed_odds) - (1/closing_odds)
})
def analyze(self) -> Dict:
"""Analyze CLV performance."""
if not self.bets:
return {"message": "No bets recorded"}
df = pd.DataFrame(self.bets)
return {
"total_bets": len(df),
"positive_clv_rate": (df["clv"] > 0).mean(),
"mean_clv": df["clv"].mean(),
"mean_clv_pct": df["clv"].mean() * 100,
"win_rate": df["won"].mean(),
"total_stake": df["stake"].sum(),
"clv_by_outcome": {
"winners": df[df["won"]]["clv"].mean() if df["won"].any() else 0,
"losers": df[~df["won"]]["clv"].mean() if (~df["won"]).any() else 0
}
}
# Example usage
manager = BankrollManager(initial_bankroll=1000, kelly_fraction=0.25)
# Simulate some bets
print("Bankroll Management System")
print(f"Starting bankroll: £{manager.current:.2f}")
# Calculate recommended stake
edge_bet = manager.kelly_stake(model_prob=0.55, odds=2.10)
print(f"\nFor 55% model prob at 2.10 odds:")
print(f" Kelly recommends: {edge_bet:.1%} of bankroll")
print(f" Stake amount: £{manager.current * edge_bet:.2f}")# R: Bankroll management system
library(tidyverse)
# Kelly Criterion with fractional approach
kelly_stake <- function(model_prob, odds, fraction = 0.25, max_stake = 0.05) {
b <- odds - 1
p <- model_prob
q <- 1 - p
# Full Kelly
full_kelly <- (b * p - q) / b
# Fractional Kelly (safer)
fractional <- full_kelly * fraction
# Cap at maximum stake
stake <- max(0, min(fractional, max_stake))
list(
full_kelly = full_kelly,
fractional_kelly = fractional,
recommended_stake = stake
)
}
# Bankroll tracking system
create_bankroll_tracker <- function(initial_bankroll) {
tracker <- list(
initial = initial_bankroll,
current = initial_bankroll,
history = tibble(
date = as.Date(character()),
bet_id = integer(),
stake = numeric(),
odds = numeric(),
result = character(),
profit = numeric(),
bankroll = numeric()
),
stats = list()
)
# Add bet function
add_bet <- function(tracker, stake_pct, odds, won) {
stake <- tracker$current * stake_pct
profit <- if (won) stake * (odds - 1) else -stake
new_bankroll <- tracker$current + profit
new_row <- tibble(
date = Sys.Date(),
bet_id = nrow(tracker$history) + 1,
stake = stake,
odds = odds,
result = if (won) "WON" else "LOST",
profit = profit,
bankroll = new_bankroll
)
tracker$history <- bind_rows(tracker$history, new_row)
tracker$current <- new_bankroll
tracker
}
tracker
}
# Calculate risk of ruin
risk_of_ruin <- function(win_rate, avg_odds, stake_pct, n_simulations = 10000) {
# Simulate betting sequences
ruin_count <- 0
for (i in 1:n_simulations) {
bankroll <- 1.0
n_bets <- 1000
for (b in 1:n_bets) {
stake <- bankroll * stake_pct
won <- runif(1) < win_rate
if (won) {
bankroll <- bankroll + stake * (avg_odds - 1)
} else {
bankroll <- bankroll - stake
}
if (bankroll <= 0) {
ruin_count <- ruin_count + 1
break
}
}
}
ruin_count / n_simulations
}
# Expected growth rate
expected_growth <- function(win_rate, odds, stake_pct) {
# G = p * log(1 + b*f) + q * log(1 - f)
# where f = stake fraction, b = odds - 1
b <- odds - 1
p <- win_rate
q <- 1 - p
f <- stake_pct
p * log(1 + b * f) + q * log(1 - f)
}
print("Bankroll management system ready!")Bankroll Management System
Starting bankroll: £1000.00
For 55% model prob at 2.10 odds:
Kelly recommends: 2.0% of bankroll
Stake amount: £20.00Staking Guidelines
- Never bet more than 5% of bankroll on a single bet
- Use fractional Kelly (25-50%) not full Kelly
- Track Closing Line Value (CLV) as a skill indicator
- Set loss limits per day/week/month
- Don't chase losses with larger stakes
Responsible Gambling
No analytics chapter on betting is complete without addressing responsible gambling. This is non-negotiable content for ethical practice.
Critical Warnings
- The house always has an edge. Long-term, bookmakers profit and most bettors lose.
- Models don't guarantee profits. Even profitable edges can lead to significant losses.
- Never bet money you can't afford to lose. This is not investment advice.
- Gambling can be addictive. If you feel you're losing control, seek help immediately.
- Betting more than planned
- Chasing losses with bigger bets
- Borrowing money to bet
- Lying about betting activity
- Neglecting work, relationships, or health
- Feeling anxious when not betting
- Betting to escape problems
- Set strict budget limits before betting
- Never bet under emotional influence
- Keep detailed records of all bets
- Take regular breaks from betting
- Never borrow to fund betting
- Treat it as entertainment, not income
- Use deposit limits and self-exclusion tools
If you or someone you know has a gambling problem, help is available:
- UK: GambleAware - 0808 8020 133 - begambleaware.org
- UK: GamCare - 0808 8020 133 - gamcare.org.uk
- US: National Council on Problem Gambling - 1-800-522-4700
- International: Gamblers Anonymous - gamblersanonymous.org
# Python: Self-assessment and limit tracking
import pandas as pd
from datetime import datetime, timedelta
from dataclasses import dataclass
from typing import Dict, List, Optional
@dataclass
class GamblingLimits:
"""Track and enforce gambling limits."""
daily_loss: float = 50.0
weekly_loss: float = 200.0
monthly_loss: float = 500.0
session_time_mins: int = 60
max_bets_per_day: int = 10
class ResponsibleGamblingTracker:
"""Track gambling behavior for responsible practices."""
def __init__(self, limits: GamblingLimits = None):
self.limits = limits or GamblingLimits()
self.sessions = []
self.daily_results = {}
def start_session(self, mood: str = "neutral") -> Dict:
"""Start a new gambling session."""
session = {
"id": len(self.sessions) + 1,
"start_time": datetime.now(),
"end_time": None,
"bets": [],
"profit_loss": 0,
"mood_before": mood,
"mood_after": None,
"within_limits": True
}
self.sessions.append(session)
# Check remaining limits
status = self.check_limits()
return {
"session_id": session["id"],
"status": status,
"message": self._get_limit_message(status)
}
def check_limits(self) -> Dict:
"""Check current limit status."""
today = datetime.now().date()
week_ago = today - timedelta(days=7)
month_ago = today - timedelta(days=30)
# Calculate losses
daily_loss = sum(
s["profit_loss"] for s in self.sessions
if s["start_time"].date() == today and s["profit_loss"] < 0
)
weekly_loss = sum(
s["profit_loss"] for s in self.sessions
if s["start_time"].date() >= week_ago and s["profit_loss"] < 0
)
monthly_loss = sum(
s["profit_loss"] for s in self.sessions
if s["start_time"].date() >= month_ago and s["profit_loss"] < 0
)
daily_bets = sum(
len(s["bets"]) for s in self.sessions
if s["start_time"].date() == today
)
return {
"daily_loss": abs(daily_loss),
"daily_remaining": max(0, self.limits.daily_loss - abs(daily_loss)),
"weekly_loss": abs(weekly_loss),
"weekly_remaining": max(0, self.limits.weekly_loss - abs(weekly_loss)),
"bets_today": daily_bets,
"bets_remaining": max(0, self.limits.max_bets_per_day - daily_bets),
"should_stop": abs(daily_loss) >= self.limits.daily_loss
}
def _get_limit_message(self, status: Dict) -> str:
"""Generate appropriate warning message."""
if status["should_stop"]:
return "STOP: You have reached your daily loss limit. Please stop betting."
if status["daily_remaining"] < self.limits.daily_loss * 0.2:
return f"WARNING: Only £{status['daily_remaining']:.2f} remaining in daily limit."
if status["bets_remaining"] <= 2:
return f"NOTE: Only {status['bets_remaining']} bets remaining today."
return "Within all limits. Remember to bet responsibly."
def end_session(self, mood_after: str = "neutral",
notes: str = "") -> Dict:
"""End current session and record stats."""
if not self.sessions:
return {"error": "No active session"}
session = self.sessions[-1]
session["end_time"] = datetime.now()
session["mood_after"] = mood_after
session["notes"] = notes
duration = (session["end_time"] - session["start_time"]).seconds / 60
return {
"duration_mins": duration,
"profit_loss": session["profit_loss"],
"within_time_limit": duration <= self.limits.session_time_mins,
"within_loss_limit": session["profit_loss"] > -self.limits.daily_loss
}
def pgsi_screening(self, responses: List[int]) -> Dict:
"""
Problem Gambling Severity Index screening.
responses: List of 9 responses, each 0-3
0 = Never
1 = Sometimes
2 = Most of the time
3 = Almost always
"""
if len(responses) != 9:
return {"error": "PGSI requires exactly 9 responses"}
total = sum(responses)
if total == 0:
risk_level = "Non-problem gambling"
recommendation = "Your gambling appears to be recreational and controlled."
elif total <= 2:
risk_level = "Low risk gambling"
recommendation = "You show few signs of problem gambling, but stay aware."
elif total <= 7:
risk_level = "Moderate risk gambling"
recommendation = "Consider setting stricter limits. Monitor your behavior."
else:
risk_level = "Problem gambling"
recommendation = "Please seek professional support. Help is available."
return {
"score": total,
"max_score": 27,
"risk_level": risk_level,
"recommendation": recommendation,
"seek_help": total >= 8
}
# Example usage
tracker = ResponsibleGamblingTracker()
session = tracker.start_session(mood="excited")
print(f"Session started: {session['message']}")
# Check limits
status = tracker.check_limits()
print(f"\nCurrent Status:")
print(f" Daily remaining: £{status['daily_remaining']:.2f}")
print(f" Weekly remaining: £{status['weekly_remaining']:.2f}")
print(f" Bets remaining today: {status['bets_remaining']}")# R: Self-assessment and limit tracking
library(tidyverse)
# Gambling behavior tracker
create_behavior_tracker <- function() {
list(
# Set limits
limits = list(
daily_loss = 50,
weekly_loss = 200,
monthly_loss = 500,
session_time_mins = 60
),
# Track sessions
sessions = tibble(
date = as.Date(character()),
start_time = as.POSIXct(character()),
end_time = as.POSIXct(character()),
profit_loss = numeric(),
mood_before = character(),
mood_after = character(),
stuck_to_limits = logical()
),
# Check if approaching limits
check_limits = function(self) {
today <- Sys.Date()
daily_total <- self$sessions %>%
filter(date == today) %>%
summarise(total = sum(profit_loss)) %>%
pull(total)
weekly_total <- self$sessions %>%
filter(date >= today - 7) %>%
summarise(total = sum(profit_loss)) %>%
pull(total)
list(
daily_remaining = self$limits$daily_loss + daily_total,
weekly_remaining = self$limits$weekly_loss + weekly_total,
should_stop = daily_total <= -self$limits$daily_loss
)
}
)
}
# Problem gambling screening (based on PGSI)
pgsi_screening <- function(responses) {
# Responses should be 0-3 for each of 9 questions
# 0 = Never, 1 = Sometimes, 2 = Most of the time, 3 = Almost always
total_score <- sum(responses)
risk_level <- case_when(
total_score == 0 ~ "Non-problem gambling",
total_score <= 2 ~ "Low risk gambling",
total_score <= 7 ~ "Moderate risk gambling",
TRUE ~ "Problem gambling"
)
list(
score = total_score,
risk_level = risk_level,
recommendation = if (total_score >= 3)
"Consider speaking to a professional about your gambling habits"
else
"Continue to monitor and maintain healthy gambling limits"
)
}
print("Behavior tracking system ready")Practice Exercises
Hands-On Practice
Complete these exercises to apply fantasy and betting analytics:
Build an expected points model for FPL using public xG/xA data. Validate against historical FPL scores to assess accuracy.
Implement the linear programming squad optimizer. Find the optimal £100m squad for a specific gameweek using your xPoints projections.
Collect historical betting odds and match results. Calculate implied probabilities and compare market calibration against your own model.
Build a Fixture Difficulty Rating system using recent xG data. Create visualizations showing fixture swings for all Premier League teams over the next 10 gameweeks.
Analyze the remaining FPL gameweeks and recommend optimal chip timing. Consider Double Gameweeks, Blank Gameweeks, and fixture swings.
Build a Poisson-based over/under model for match totals. Backtest against historical odds to evaluate if your model can find value.
Create a Monte Carlo simulation to project bankroll growth under different staking strategies. Compare flat staking vs Kelly criterion with various edge assumptions.
Track your betting results including both placed and closing odds. Calculate your Closing Line Value (CLV) and analyze whether positive CLV correlates with long-term profitability.
Summary
Key Takeaways
- FPL optimization combines xG/xA projections with scoring system rules
- Squad selection is a constrained optimization problem solvable with linear programming
- Fixture Difficulty Ratings help plan transfers, captaincy, and chip deployment
- Chip timing can swing hundreds of points—plan around DGWs and fixture swings
- Betting odds contain implied probabilities with bookmaker margin (overround)
- Expected value determines whether a bet is theoretically profitable
- Model calibration is as important as accuracy for betting applications
- Kelly criterion helps determine optimal stake sizing—always use fractional Kelly
- Closing Line Value (CLV) is the best indicator of betting skill
- Responsible gambling practices are essential—set limits and stick to them
Common Pitfalls to Avoid
- Chasing FPL template players: High ownership reduces differential potential
- Overweighting recent form: Last 3 games aren't enough sample size
- Ignoring fixture difficulty: A 7-point player vs City isn't the same as vs Sheffield United
- Panic wildcards: One bad week doesn't justify burning a chip
- Ignoring the overround: Betting odds look attractive until you account for margin
- Overstating model confidence: A 5% edge doesn't mean guaranteed profit
- Chasing losses: The surest path to ruin is increasing stakes after losses
- Betting without edge: Entertainment betting is fine, but don't pretend it's investing
- Ignoring variance: Even positive EV bettors face long losing streaks
- Using full Kelly: Full Kelly maximizes growth but also maximizes volatility
Essential Tools and Libraries
| Category | R Libraries | Python Libraries | Purpose |
|---|---|---|---|
| Optimization | lpSolve, ROI | scipy.optimize, PuLP, cvxpy | Squad optimization, lineup selection |
| Data Analysis | tidyverse | pandas, numpy | Data manipulation and statistics |
| Statistical Modeling | stats, MASS | scipy.stats, statsmodels | Poisson models, probability calculations |
| Visualization | ggplot2 | matplotlib, plotly | Fixture tickers, performance charts |
| FPL API Access | fplr, httr2 | fpl, requests | Fetching FPL data |
| Web Scraping | rvest | beautifulsoup4, selenium | Odds data collection |
| Machine Learning | caret, mlr3 | scikit-learn | Prediction models, calibration |
| Simulation | base R | numpy (Monte Carlo) | Bankroll projections, risk of ruin |
FPL Data Sources
- Official FPL API: fantasy.premierleague.com/api/ - Player data, fixtures, gameweeks
- Understat: xG/xA data for top 5 leagues
- FBref: Comprehensive player statistics via StatsBomb
- Fantasy Football Scout: Historical FPL performance data
- FPL Review: Expected points projections
Betting Market Data Sources
- Odds Portal: Historical odds across multiple bookmakers
- Football-Data.co.uk: Free historical odds and results
- Betfair API: Exchange odds and market depth (requires account)
- Pinnacle: Sharpest odds, often used as benchmark
- The Odds API: Real-time odds aggregation
Key Metrics Reference
| Metric | Definition | Good Value |
|---|---|---|
| xPoints | Expected FPL points based on xG/xA | 6+ per gameweek for premiums |
| Value (pts/£) | xPoints divided by price in millions | 0.6+ for good value picks |
| FDR | Fixture Difficulty Rating (1-5) | 1-2 = Easy, 4-5 = Hard |
| Overround | Bookmaker margin on market | ~3-5% for efficient markets |
| Expected Value (EV) | (Prob × Profit) - (1-Prob × Loss) | Positive = theoretically profitable |
| CLV (Closing Line Value) | Difference between bet and closing odds | Positive CLV = beating the market |
| Kelly % | Optimal stake as % of bankroll | Use 25-50% of full Kelly |
| ROI | Profit / Total Staked | 3-5% long-term is excellent |
| Brier Score | Mean squared error of probabilities | Lower is better, ~0.20 is good |
Final Thoughts
Fantasy football and sports betting can be excellent laboratories for testing analytical skills—they provide immediate feedback on predictions. However, betting should always be approached with caution. The analytics may be fascinating, but the house always has an edge. Use these tools responsibly, set strict limits, and never risk more than you can afford to lose.
Fantasy and betting analytics provide excellent ways to test prediction skills while enjoying the game. In the next chapter, we'll explore the unique challenges and opportunities of women's football analytics.