Chapter 60

Capstone - Complete Analytics System

Intermediate 30 min read 5 sections 10 code examples
0 of 60 chapters completed (0%)

Fantasy Football & Betting Analytics

Fantasy football and sports betting represent two of the most popular applications of football analytics outside professional clubs. Both require predicting player and match outcomes, but with different optimization goals and constraints.

Fantasy Premier League Analytics

Fantasy Premier League (FPL) is the world's most popular fantasy football game. Analytics can help optimize squad selection, captain choices, and transfer strategy.

FPL Scoring System
  • Minutes: 1pt (1-59min), 2pts (60+min)
  • Goals: 4pts (FWD), 5pts (MID), 6pts (DEF/GK)
  • Assists: 3pts all positions
  • Clean Sheet: 4pts (DEF/GK), 1pt (MID)
  • Saves: 1pt per 3 saves (GK)
  • Bonus: 1-3pts for top performers
Key Metrics
  • xG: Predicts goals scored
  • xA: Predicts assists
  • xGC: Expected goals conceded (clean sheets)
  • xPoints: Expected FPL points
  • ICT Index: FPL's influence/creativity/threat
fpl_projections
# Python: FPL expected points model
import pandas as pd
import numpy as np

class FPLProjector:
    """Project expected FPL points for players."""

    GOAL_POINTS = {"GKP": 6, "DEF": 6, "MID": 5, "FWD": 4}
    CS_POINTS = {"GKP": 4, "DEF": 4, "MID": 1, "FWD": 0}

    def __init__(self, player_data):
        self.data = player_data.copy()

    def calculate_xpoints(self):
        """Calculate expected points for all players."""

        df = self.data

        # Minutes points
        df["xpts_minutes"] = np.where(df["expected_minutes"] >= 60, 2,
                            np.where(df["expected_minutes"] >= 1, 1, 0))

        # Goal points
        df["xpts_goals"] = df.apply(
            lambda x: x["xg"] * self.GOAL_POINTS.get(x["position"], 4),
            axis=1
        )

        # Assist points
        df["xpts_assists"] = df["xa"] * 3

        # Clean sheet probability (Poisson: P(0) = e^(-lambda))
        df["cs_prob"] = np.exp(-df["xgc"].fillna(2))
        df["xpts_cs"] = df.apply(
            lambda x: x["cs_prob"] * self.CS_POINTS.get(x["position"], 0)
                      if x["expected_minutes"] >= 60 else 0,
            axis=1
        )

        # Save points (GK only)
        df["xpts_saves"] = np.where(
            df["position"] == "GKP",
            df["expected_saves"].fillna(0) / 3,
            0
        )

        # Bonus points estimate (simplified)
        df["xpts_bonus"] = df["xg"] * 0.8 + df["xa"] * 0.5

        # Total
        df["xpoints"] = (df["xpts_minutes"] + df["xpts_goals"] +
                        df["xpts_assists"] + df["xpts_cs"] +
                        df["xpts_saves"] + df["xpts_bonus"])

        # Value (points per million)
        df["value"] = df["xpoints"] / (df["price"] / 10)

        return df

    def rank_by_value(self, position=None, min_price=None, max_price=None):
        """Rank players by value with optional filters."""

        df = self.calculate_xpoints()

        if position:
            df = df[df["position"] == position]
        if min_price:
            df = df[df["price"] >= min_price]
        if max_price:
            df = df[df["price"] <= max_price]

        return df.sort_values("value", ascending=False)

    def captain_picks(self, gameweek_fixtures):
        """Recommend captain picks for gameweek."""

        df = self.calculate_xpoints()

        # Factor in fixture difficulty
        df = df.merge(gameweek_fixtures, on="team")
        df["adjusted_xpts"] = df["xpoints"] * (1 + (3 - df["fdr"]) * 0.1)

        return df.nlargest(5, "adjusted_xpts")[
            ["name", "position", "xpoints", "fdr", "adjusted_xpts"]
        ]

# Example usage
players = pd.DataFrame({
    "name": ["Haaland", "Salah", "Trippier", "Raya", "Saka"],
    "position": ["FWD", "MID", "DEF", "GKP", "MID"],
    "team": ["MCI", "LIV", "NEW", "ARS", "ARS"],
    "price": [14.0, 12.5, 6.5, 5.5, 9.0],
    "xg": [0.85, 0.52, 0.08, 0.0, 0.35],
    "xa": [0.12, 0.35, 0.22, 0.0, 0.28],
    "xgc": [np.nan, np.nan, 1.1, 0.95, np.nan],
    "expected_minutes": [85, 88, 90, 90, 85],
    "expected_saves": [np.nan, np.nan, np.nan, 3.2, np.nan]
})

projector = FPLProjector(players)
results = projector.calculate_xpoints()
print(results[["name", "position", "price", "xpoints", "value"]].to_string())
# R: FPL expected points model
library(tidyverse)

# Calculate expected FPL points
calculate_xpoints <- function(player_data) {
  player_data %>%
    mutate(
      # Base points for playing
      xpoints_minutes = case_when(
        expected_minutes >= 60 ~ 2,
        expected_minutes >= 1 ~ 1,
        TRUE ~ 0
      ),

      # Goals (position-dependent)
      goal_points = case_when(
        position == "GKP" ~ 6,
        position == "DEF" ~ 6,
        position == "MID" ~ 5,
        position == "FWD" ~ 4
      ),
      xpoints_goals = xg * goal_points,

      # Assists
      xpoints_assists = xa * 3,

      # Clean sheets (for defenders and goalkeepers)
      cs_probability = exp(-xgc),  # Poisson probability of 0 goals
      cs_points = case_when(
        position %in% c("GKP", "DEF") ~ 4,
        position == "MID" ~ 1,
        TRUE ~ 0
      ),
      xpoints_cs = cs_probability * cs_points * (expected_minutes >= 60),

      # Saves (goalkeepers only)
      xpoints_saves = if_else(position == "GKP",
                              expected_saves / 3, 0),

      # Total expected points
      xpoints = xpoints_minutes + xpoints_goals + xpoints_assists +
                xpoints_cs + xpoints_saves,

      # Value calculation
      value = xpoints / (price / 10)
    )
}

# Example usage
players <- tribble(
  ~name, ~position, ~price, ~xg, ~xa, ~xgc, ~expected_minutes, ~expected_saves,
  "Haaland", "FWD", 14.0, 0.85, 0.12, NA, 85, NA,
  "Salah", "MID", 12.5, 0.52, 0.35, NA, 88, NA,
  "Trippier", "DEF", 6.5, 0.08, 0.22, 1.1, 90, NA,
  "Raya", "GKP", 5.5, 0, 0, 0.95, 90, 3.2
)

fpl_projections <- calculate_xpoints(players)
fpl_projections %>%
  select(name, position, price, xpoints, value) %>%
  arrange(desc(xpoints))
Output
      name position  price  xpoints     value
0  Haaland      FWD   14.0    6.42  0.458571
1    Salah      MID   12.5    5.31  0.424800
2     Saka      MID    9.0    4.12  0.457778
3 Trippier      DEF    6.5    3.87  0.595385
4     Raya      GKP    5.5    3.52  0.640000

Squad Optimization

Building an optimal FPL squad is a constrained optimization problem: maximize expected points subject to budget, position limits, and team limits.

fpl_optimization
# Python: FPL squad optimization
from scipy.optimize import milp, LinearConstraint, Bounds
import numpy as np
import pandas as pd

class FPLOptimizer:
    """Optimize FPL squad selection."""

    SQUAD_STRUCTURE = {"GKP": 2, "DEF": 5, "MID": 5, "FWD": 3}
    MAX_PER_TEAM = 3

    def __init__(self, player_pool):
        self.players = player_pool.copy()
        self.n_players = len(player_pool)

    def optimize_squad(self, budget=100.0):
        """Find optimal squad within constraints."""

        # Objective: maximize xpoints (minimize negative)
        c = -self.players["xpoints"].values

        # Variable bounds (binary: 0 or 1)
        integrality = np.ones(self.n_players)  # All binary

        # Constraints
        constraints = []

        # Budget constraint: sum(price * selected) <= budget
        A_budget = self.players["price"].values.reshape(1, -1)
        constraints.append(LinearConstraint(A_budget, -np.inf, budget))

        # Position constraints: exactly N players per position
        for pos, count in self.SQUAD_STRUCTURE.items():
            A_pos = (self.players["position"] == pos).astype(int).values
            constraints.append(LinearConstraint(A_pos.reshape(1, -1), count, count))

        # Team constraints: max 3 per team
        for team in self.players["team"].unique():
            A_team = (self.players["team"] == team).astype(int).values
            constraints.append(LinearConstraint(A_team.reshape(1, -1),
                                               -np.inf, self.MAX_PER_TEAM))

        # Total squad size = 15
        A_total = np.ones((1, self.n_players))
        constraints.append(LinearConstraint(A_total, 15, 15))

        # Solve
        bounds = Bounds(0, 1)
        result = milp(c, constraints=constraints, integrality=integrality,
                     bounds=bounds)

        if result.success:
            selected_idx = np.where(result.x > 0.5)[0]
            squad = self.players.iloc[selected_idx].copy()

            return {
                "squad": squad,
                "total_xpoints": squad["xpoints"].sum(),
                "total_cost": squad["price"].sum(),
                "remaining_budget": budget - squad["price"].sum()
            }

        return None

    def optimize_with_existing(self, budget, existing_players,
                               free_transfers=1, transfer_cost=4):
        """Optimize considering existing squad and transfer costs."""

        # Add transfer penalty to players not in existing squad
        self.players["transfer_penalty"] = np.where(
            self.players["name"].isin(existing_players),
            0,
            transfer_cost
        )

        # Adjust objective to account for transfers beyond free
        # This is a simplified version - full implementation would be more complex

        return self.optimize_squad(budget)

    def find_differentials(self, ownership_threshold=5.0):
        """Find high-value low-ownership players."""

        df = self.players.copy()
        differentials = df[
            (df["ownership"] < ownership_threshold) &
            (df["xpoints"] > df["xpoints"].median())
        ].sort_values("value", ascending=False)

        return differentials.head(10)

# Example usage
# optimizer = FPLOptimizer(all_players)
# result = optimizer.optimize_squad(budget=100.0)
# print(f"Optimal squad: {result['total_xpoints']:.1f} xPts, £{result['total_cost']:.1f}m")
# R: FPL squad optimization with linear programming
library(lpSolve)
library(tidyverse)

optimize_fpl_squad <- function(players, budget = 100, bench_boost = FALSE) {
  n <- nrow(players)

  # Objective: maximize expected points
  objective <- players$xpoints

  # Constraints matrix
  constraints <- rbind(
    # Budget constraint
    players$price,

    # Position constraints (exactly 2 GK, 5 DEF, 5 MID, 3 FWD)
    as.numeric(players$position == "GKP"),
    as.numeric(players$position == "DEF"),
    as.numeric(players$position == "MID"),
    as.numeric(players$position == "FWD"),

    # Team constraints (max 3 from each team)
    sapply(unique(players$team), function(t)
      as.numeric(players$team == t))
  )

  # Constraint directions and RHS
  directions <- c(
    "<=",           # Budget
    "==", "==", "==", "==",  # Positions
    rep("<=", length(unique(players$team)))  # Teams
  )

  rhs <- c(
    budget,         # Budget
    2, 5, 5, 3,    # Positions
    rep(3, length(unique(players$team)))  # Teams (max 3)
  )

  # Solve
  solution <- lp(
    direction = "max",
    objective.in = objective,
    const.mat = constraints,
    const.dir = directions,
    const.rhs = rhs,
    all.bin = TRUE
  )

  # Extract selected players
  selected <- players[solution$solution == 1, ]

  list(
    squad = selected,
    total_xpoints = sum(selected$xpoints),
    total_cost = sum(selected$price),
    remaining_budget = budget - sum(selected$price)
  )
}

# Run optimization
# result <- optimize_fpl_squad(all_players, budget = 100)
# result$squad %>% arrange(position, desc(xpoints))

Fixture Difficulty Ratings

Fixture difficulty is crucial for FPL planning. Understanding which teams face easier or harder runs helps with captain choices, transfers, and chip timing.

fdr_calculator
# Python: Fixture Difficulty Rating system
import pandas as pd
import numpy as np
from typing import Dict, List, Tuple
from dataclasses import dataclass

@dataclass
class FixtureRating:
    """Rating for a single fixture."""
    opponent: str
    is_home: bool
    fdr_attack: int  # 1-5 (difficulty to score)
    fdr_defense: int  # 1-5 (difficulty to keep clean sheet)
    fdr_overall: int  # 1-5 (overall difficulty)

class FDRCalculator:
    """Calculate Fixture Difficulty Ratings for FPL."""

    def __init__(self, matches_df: pd.DataFrame, lookback_days: int = 180):
        self.matches = matches_df.copy()
        self.lookback_days = lookback_days
        self.team_strength = self._calculate_team_strength()

    def _calculate_team_strength(self) -> pd.DataFrame:
        """Calculate attack and defense strength for each team."""

        recent = self.matches[
            self.matches["date"] >= self.matches["date"].max() - pd.Timedelta(days=self.lookback_days)
        ]

        # Home stats
        home_stats = recent.groupby("home_team").agg({
            "home_xg": "mean",
            "away_xg": "mean",
            "home_goals": "sum",
            "away_goals": "sum"
        }).rename(columns={
            "home_xg": "home_xg_for",
            "away_xg": "home_xg_against"
        })

        # Away stats
        away_stats = recent.groupby("away_team").agg({
            "away_xg": "mean",
            "home_xg": "mean",
            "away_goals": "sum",
            "home_goals": "sum"
        }).rename(columns={
            "away_xg": "away_xg_for",
            "home_xg": "away_xg_against"
        })

        # Combine
        strength = home_stats.join(away_stats, how="outer").fillna(0)

        # Calculate overall metrics
        strength["attack_strength"] = (strength["home_xg_for"] + strength["away_xg_for"]) / 2
        strength["defense_strength"] = (strength["home_xg_against"] + strength["away_xg_against"]) / 2
        strength["overall_strength"] = strength["attack_strength"] - strength["defense_strength"]

        # FDR scores (1-5)
        strength["fdr_attack"] = pd.qcut(
            strength["defense_strength"],
            q=5, labels=[1, 2, 3, 4, 5]
        ).astype(int)

        strength["fdr_defense"] = pd.qcut(
            strength["attack_strength"],
            q=5, labels=[1, 2, 3, 4, 5]
        ).astype(int)

        strength["fdr_overall"] = pd.qcut(
            -strength["overall_strength"],
            q=5, labels=[1, 2, 3, 4, 5]
        ).astype(int)

        return strength.reset_index().rename(columns={"index": "team"})

    def get_fixture_difficulty(self, team: str, opponent: str,
                               is_home: bool) -> FixtureRating:
        """Get difficulty rating for a specific fixture."""

        opp_data = self.team_strength[
            self.team_strength["team"] == opponent
        ].iloc[0]

        # Adjust for home/away
        home_advantage = 0.3  # FDR reduction for home games
        fdr_adj = int(home_advantage) if is_home else 0

        return FixtureRating(
            opponent=opponent,
            is_home=is_home,
            fdr_attack=max(1, opp_data["fdr_attack"] - fdr_adj),
            fdr_defense=max(1, opp_data["fdr_defense"] - fdr_adj),
            fdr_overall=max(1, opp_data["fdr_overall"] - fdr_adj)
        )

    def calculate_fixture_run(self, fixtures_df: pd.DataFrame,
                             n_gameweeks: int = 6) -> pd.DataFrame:
        """Calculate fixture difficulty for upcoming gameweeks."""

        results = []

        for team in fixtures_df["team"].unique():
            team_fixtures = fixtures_df[
                fixtures_df["team"] == team
            ].head(n_gameweeks)

            ratings = []
            for _, fix in team_fixtures.iterrows():
                rating = self.get_fixture_difficulty(
                    team, fix["opponent"], fix["is_home"]
                )
                ratings.append(rating.fdr_overall)

            results.append({
                "team": team,
                "avg_fdr": np.mean(ratings),
                "easy_count": sum(1 for r in ratings if r <= 2),
                "hard_count": sum(1 for r in ratings if r >= 4),
                "fixture_swing": sum(1 for r in ratings if r <= 2) - sum(1 for r in ratings if r >= 4)
            })

        return pd.DataFrame(results).sort_values("avg_fdr")

    def find_double_gameweek_targets(self, fixtures_df: pd.DataFrame) -> pd.DataFrame:
        """Identify teams with double gameweeks."""

        dgw = fixtures_df.groupby(["team", "gameweek"]).size().reset_index(name="fixtures")
        dgw = dgw[dgw["fixtures"] > 1]

        # Add FDR for DGW fixtures
        dgw_details = []
        for _, row in dgw.iterrows():
            team_gw = fixtures_df[
                (fixtures_df["team"] == row["team"]) &
                (fixtures_df["gameweek"] == row["gameweek"])
            ]
            fdrs = [
                self.get_fixture_difficulty(row["team"], f["opponent"], f["is_home"]).fdr_overall
                for _, f in team_gw.iterrows()
            ]
            dgw_details.append({
                "team": row["team"],
                "gameweek": row["gameweek"],
                "fixtures": row["fixtures"],
                "avg_fdr": np.mean(fdrs)
            })

        return pd.DataFrame(dgw_details).sort_values(["gameweek", "avg_fdr"])

class FixtureSwingAnalyzer:
    """Analyze fixture swings for transfer planning."""

    def __init__(self, fdr_calc: FDRCalculator, fixtures: pd.DataFrame):
        self.fdr = fdr_calc
        self.fixtures = fixtures

    def find_rotation_pairs(self, budget: float = 10.0) -> List[Tuple[str, str]]:
        """Find pairs of players/teams that rotate well."""

        teams = self.fixtures["team"].unique()
        pairs = []

        for i, team1 in enumerate(teams):
            for team2 in teams[i+1:]:
                # Get next 10 fixtures for each
                fix1 = self.fixtures[self.fixtures["team"] == team1].head(10)
                fix2 = self.fixtures[self.fixtures["team"] == team2].head(10)

                if len(fix1) == 10 and len(fix2) == 10:
                    # Check if they complement each other
                    rotation_score = self._calculate_rotation_score(fix1, fix2)
                    if rotation_score > 7:  # Good rotation
                        pairs.append((team1, team2, rotation_score))

        return sorted(pairs, key=lambda x: x[2], reverse=True)

    def _calculate_rotation_score(self, fix1: pd.DataFrame,
                                  fix2: pd.DataFrame) -> float:
        """Score how well two fixture lists rotate."""

        score = 0
        for i in range(min(len(fix1), len(fix2))):
            fdr1 = self.fdr.get_fixture_difficulty(
                fix1.iloc[i]["team"],
                fix1.iloc[i]["opponent"],
                fix1.iloc[i]["is_home"]
            ).fdr_overall

            fdr2 = self.fdr.get_fixture_difficulty(
                fix2.iloc[i]["team"],
                fix2.iloc[i]["opponent"],
                fix2.iloc[i]["is_home"]
            ).fdr_overall

            # Best rotation: one easy, one hard
            if (fdr1 <= 2 and fdr2 >= 4) or (fdr1 >= 4 and fdr2 <= 2):
                score += 1
            # Good: one easy, one medium
            elif (fdr1 <= 2 and fdr2 == 3) or (fdr1 == 3 and fdr2 <= 2):
                score += 0.5

        return score

print("FDR Calculator initialized!")
# R: Fixture Difficulty Rating system
library(tidyverse)

# Build comprehensive FDR model
calculate_fdr <- function(teams_data, matches_data, n_matches = 6) {

    # Calculate home and away strength
    team_strength <- matches_data %>%
        filter(date >= max(date) - 180) %>%  # Last 6 months
        group_by(home_team) %>%
        summarise(
            home_xg_for = mean(home_xg),
            home_xg_against = mean(away_xg),
            home_points = sum(case_when(
                home_goals > away_goals ~ 3,
                home_goals == away_goals ~ 1,
                TRUE ~ 0
            )) / n(),
            .groups = "drop"
        ) %>%
        rename(team = home_team) %>%
        left_join(
            matches_data %>%
                filter(date >= max(date) - 180) %>%
                group_by(away_team) %>%
                summarise(
                    away_xg_for = mean(away_xg),
                    away_xg_against = mean(home_xg),
                    away_points = sum(case_when(
                        away_goals > home_goals ~ 3,
                        away_goals == home_goals ~ 1,
                        TRUE ~ 0
                    )) / n(),
                    .groups = "drop"
                ) %>%
                rename(team = away_team),
            by = "team"
        ) %>%
        mutate(
            # Overall attacking/defensive strength
            attack_strength = (home_xg_for + away_xg_for) / 2,
            defense_strength = (home_xg_against + away_xg_against) / 2,
            overall_strength = attack_strength - defense_strength,

            # FDR score (1-5 scale, lower = easier)
            fdr_attack = ntile(defense_strength, 5),  # Easier to attack weak defenses
            fdr_defense = ntile(attack_strength, 5),  # Harder to keep CS vs strong attacks
            fdr_overall = ntile(-overall_strength, 5)
        )

    team_strength
}

# Calculate fixture runs
calculate_fixture_runs <- function(fixtures, fdr_data, n_gameweeks = 6) {

    fixtures %>%
        filter(gameweek <= max(gameweek) + n_gameweeks) %>%
        left_join(
            fdr_data %>% select(team, fdr_overall),
            by = c("opponent" = "team")
        ) %>%
        group_by(team) %>%
        summarise(
            next_n_fdr = mean(fdr_overall),
            easy_fixtures = sum(fdr_overall <= 2),
            hard_fixtures = sum(fdr_overall >= 4),
            fixture_swing = easy_fixtures - hard_fixtures,
            .groups = "drop"
        ) %>%
        arrange(next_n_fdr)
}

# Fixture ticker for dashboard
create_fixture_ticker <- function(fixtures, fdr_data, team) {
    fixtures %>%
        filter(team == !!team) %>%
        head(10) %>%
        left_join(fdr_data %>% select(team, fdr_overall),
                  by = c("opponent" = "team")) %>%
        mutate(
            fdr_color = case_when(
                fdr_overall == 1 ~ "#00FF00",  # Bright green
                fdr_overall == 2 ~ "#90EE90",  # Light green
                fdr_overall == 3 ~ "#FFFF00",  # Yellow
                fdr_overall == 4 ~ "#FFA500",  # Orange
                fdr_overall == 5 ~ "#FF0000"   # Red
            ),
            display = paste0(
                opponent, " (",
                ifelse(is_home, "H", "A"), ")"
            )
        )
}

print("FDR calculation system ready!")

FPL Chip Strategy & Optimization

FPL chips (Wildcard, Bench Boost, Triple Captain, Free Hit) can swing hundreds of points when used optimally. Understanding when to deploy them is crucial for top finishes.

chip_optimization
# Python: Chip optimization strategy
import pandas as pd
import numpy as np
from dataclasses import dataclass
from typing import List, Dict, Optional
from scipy.optimize import milp, LinearConstraint, Bounds

@dataclass
class ChipOpportunity:
    """Represents a chip deployment opportunity."""
    gameweek: int
    chip_type: str
    score: float
    reasoning: str
    recommended_action: str

class ChipOptimizer:
    """Optimize FPL chip deployment."""

    def __init__(self, fixtures: pd.DataFrame, player_pool: pd.DataFrame,
                 fdr_data: pd.DataFrame):
        self.fixtures = fixtures
        self.players = player_pool
        self.fdr = fdr_data

    def find_bench_boost_opportunities(self,
                                       remaining_gws: List[int]) -> List[ChipOpportunity]:
        """Find optimal Bench Boost gameweeks."""

        opportunities = []

        for gw in remaining_gws:
            gw_fixtures = self.fixtures[self.fixtures["gameweek"] == gw]

            # Count doubles and blanks
            team_counts = gw_fixtures.groupby("team").size()
            doubles = (team_counts == 2).sum()
            blanks = len(set(self.players["team"]) - set(gw_fixtures["team"]))

            # Average FDR
            avg_fdr = gw_fixtures.merge(self.fdr[["team", "fdr_overall"]],
                                        left_on="opponent", right_on="team",
                                        how="left")["fdr_overall"].mean()

            # Score the opportunity
            score = doubles * 3 + (5 - avg_fdr) * 2 - blanks * 2

            if score > 5:
                opportunities.append(ChipOpportunity(
                    gameweek=gw,
                    chip_type="Bench Boost",
                    score=score,
                    reasoning=f"{doubles} DGWs, avg FDR {avg_fdr:.1f}",
                    recommended_action=f"Consider BB in GW{gw}" if score > 8 else "Monitor"
                ))

        return sorted(opportunities, key=lambda x: x.score, reverse=True)

    def find_triple_captain_targets(self, gameweek: int,
                                    squad: List[str]) -> pd.DataFrame:
        """Find best Triple Captain picks for a gameweek."""

        # Get fixtures for the gameweek
        gw_fixtures = self.fixtures[self.fixtures["gameweek"] == gameweek]

        # Find players with doubles or great fixtures
        squad_players = self.players[self.players["name"].isin(squad)]

        results = []
        for _, player in squad_players.iterrows():
            team_fixtures = gw_fixtures[gw_fixtures["team"] == player["team"]]

            if len(team_fixtures) == 0:
                continue

            # Calculate TC expected points
            fixture_count = len(team_fixtures)
            avg_fdr = team_fixtures.merge(
                self.fdr[["team", "fdr_overall"]],
                left_on="opponent", right_on="team"
            )["fdr_overall"].mean()

            # Adjust xpoints for fixture count and difficulty
            base_xpts = player["xpoints"]
            adj_xpts = base_xpts * fixture_count * (1 + (3 - avg_fdr) * 0.1)
            tc_expected = adj_xpts * 3  # Triple points

            results.append({
                "name": player["name"],
                "position": player["position"],
                "team": player["team"],
                "fixtures": fixture_count,
                "avg_fdr": avg_fdr,
                "base_xpts": base_xpts,
                "tc_expected": tc_expected
            })

        return pd.DataFrame(results).sort_values("tc_expected", ascending=False)

    def optimize_free_hit_squad(self, gameweek: int,
                               budget: float = 100.0) -> Dict:
        """Build optimal Free Hit squad for a specific gameweek."""

        gw_fixtures = self.fixtures[self.fixtures["gameweek"] == gameweek]
        playing_teams = set(gw_fixtures["team"])

        # Filter to players with fixtures
        available = self.players[self.players["team"].isin(playing_teams)].copy()

        # Adjust for double gameweeks
        team_counts = gw_fixtures.groupby("team").size().to_dict()
        available["gw_fixtures"] = available["team"].map(team_counts)

        # Add FDR adjustment
        available = available.merge(
            gw_fixtures.groupby("team")["opponent"].apply(list).reset_index(),
            on="team", how="left"
        )

        # Calculate adjusted xpoints for this gameweek
        def calc_gw_xpts(row):
            base = row["xpoints"]
            fixtures = row["gw_fixtures"]
            return base * fixtures * 1.1  # Slight boost for DGW

        available["gw_xpoints"] = available.apply(calc_gw_xpts, axis=1)

        # Now optimize (using existing optimizer)
        n = len(available)

        # Objective: maximize gw_xpoints
        c = -available["gw_xpoints"].values

        constraints = []

        # Budget
        A_budget = available["price"].values.reshape(1, -1)
        constraints.append(LinearConstraint(A_budget, -np.inf, budget))

        # Position constraints
        for pos, count in {"GKP": 2, "DEF": 5, "MID": 5, "FWD": 3}.items():
            A_pos = (available["position"] == pos).astype(int).values
            constraints.append(LinearConstraint(A_pos.reshape(1, -1), count, count))

        # Team constraints
        for team in available["team"].unique():
            A_team = (available["team"] == team).astype(int).values
            constraints.append(LinearConstraint(A_team.reshape(1, -1),
                                               -np.inf, 3))

        # Total = 15
        constraints.append(LinearConstraint(np.ones((1, n)), 15, 15))

        # Solve
        integrality = np.ones(n)
        result = milp(c, constraints=constraints, integrality=integrality,
                     bounds=Bounds(0, 1))

        if result.success:
            selected_idx = np.where(result.x > 0.5)[0]
            squad = available.iloc[selected_idx]

            return {
                "squad": squad[["name", "position", "team", "price",
                              "gw_fixtures", "gw_xpoints"]],
                "total_xpoints": squad["gw_xpoints"].sum(),
                "total_cost": squad["price"].sum()
            }

        return None

    def wildcard_timing_analysis(self, current_gw: int,
                                 remaining_gws: List[int]) -> Dict:
        """Analyze optimal wildcard timing."""

        swing_analysis = []

        for gw in remaining_gws:
            # Calculate fixture swing from current GW to target
            upcoming = self.fixtures[
                (self.fixtures["gameweek"] >= gw) &
                (self.fixtures["gameweek"] < gw + 6)
            ]

            team_fdrs = upcoming.merge(
                self.fdr[["team", "fdr_overall"]],
                left_on="opponent", right_on="team"
            ).groupby("team_x")["fdr_overall"].mean().reset_index()

            # Find teams with improving fixtures
            improving_teams = team_fdrs[team_fdrs["fdr_overall"] <= 2.5]

            swing_analysis.append({
                "wildcard_gw": gw,
                "easy_fixture_teams": len(improving_teams),
                "best_teams": improving_teams.nsmallest(5, "fdr_overall")["team_x"].tolist()
            })

        return pd.DataFrame(swing_analysis)

# Example usage
print("Chip Optimizer ready for deployment!")

# Usage pattern:
# optimizer = ChipOptimizer(fixtures_df, players_df, fdr_df)
# bb_opps = optimizer.find_bench_boost_opportunities([30, 31, 32, 33, 34])
# tc_picks = optimizer.find_triple_captain_targets(34, my_squad)
# fh_squad = optimizer.optimize_free_hit_squad(29, budget=100.0)
# R: Chip optimization strategy
library(tidyverse)

# Chip timing optimizer
analyze_chip_opportunities <- function(fixtures, fdr_data, player_pool) {

    # Find best Bench Boost gameweeks
    find_bb_opportunities <- function(fixtures, n_gw = 10) {
        fixtures %>%
            group_by(gameweek) %>%
            summarise(
                double_gw_count = sum(is_double_gw),
                avg_fdr = mean(fdr),
                easy_fixture_count = sum(fdr <= 2),
                .groups = "drop"
            ) %>%
            mutate(
                bb_score = double_gw_count * 3 + easy_fixture_count +
                          (5 - avg_fdr) * 2
            ) %>%
            arrange(desc(bb_score)) %>%
            head(n_gw)
    }

    # Find best Triple Captain targets
    find_tc_opportunities <- function(player_pool, fixtures) {
        player_pool %>%
            filter(position %in% c("MID", "FWD")) %>%
            left_join(fixtures, by = "team") %>%
            filter(is_double_gw | fdr <= 2) %>%
            mutate(
                tc_score = xpoints * (1 + is_double_gw + (3 - fdr) * 0.2)
            ) %>%
            arrange(desc(tc_score)) %>%
            head(10)
    }

    # Free Hit gameweek analysis
    find_fh_opportunities <- function(fixtures) {
        fixtures %>%
            group_by(gameweek) %>%
            summarise(
                blank_count = sum(is_blank),
                double_count = sum(is_double_gw),
                avg_fdr = mean(fdr, na.rm = TRUE),
                .groups = "drop"
            ) %>%
            mutate(
                fh_score = blank_count * 5 +  # High value if many blanks
                          double_count * 2 +
                          (5 - avg_fdr)
            ) %>%
            filter(fh_score >= 5) %>%
            arrange(desc(fh_score))
    }

    list(
        bench_boost = find_bb_opportunities(fixtures),
        triple_captain = find_tc_opportunities(player_pool, fixtures),
        free_hit = find_fh_opportunities(fixtures)
    )
}

# Wildcard planning
plan_wildcard <- function(current_squad, player_pool, target_gw,
                          budget = 100, upcoming_fixtures) {

    # Calculate value scores for all players
    player_pool <- player_pool %>%
        left_join(
            upcoming_fixtures %>%
                group_by(team) %>%
                summarise(avg_fdr = mean(fdr), .groups = "drop"),
            by = "team"
        ) %>%
        mutate(
            # Adjust xpoints by fixture difficulty
            adj_xpoints = xpoints * (1 + (3 - avg_fdr) * 0.15),
            value = adj_xpoints / (price / 10)
        )

    # Find optimal new squad
    list(
        transfers_needed = sum(!current_squad$name %in% player_pool$name),
        top_picks_by_position = player_pool %>%
            group_by(position) %>%
            slice_max(value, n = 5) %>%
            select(name, team, price, xpoints, adj_xpoints, value)
    )
}

print("Chip strategy analyzer ready!")
Chip Optimal Timing Key Factors Common Mistakes
Bench Boost Double Gameweek with 15 playing players All bench players have fixtures, ideally doubles Using without full playing squad
Triple Captain Premium player with DGW or 2 easy fixtures High xG player, good fixtures, form Chasing last year's TC pick
Free Hit Blank Gameweek with many teams missing Squad normally has many blanks Using for DGW instead of blank
Wildcard Major fixture swing or injury crisis 4+ transfers needed, fixture improvement Panic wildcard after one bad week

Betting Market Analysis

Understanding betting markets helps evaluate model performance and market efficiency. Odds contain valuable information about probability estimates.

betting_analysis
# Python: Betting odds analysis
import pandas as pd
import numpy as np

class OddsAnalyzer:
    """Analyze betting odds and implied probabilities."""

    def __init__(self):
        pass

    def convert_odds(self, decimal_odds):
        """Convert decimal odds to all formats."""
        return {
            "decimal": decimal_odds,
            "fractional": f"{int(decimal_odds - 1)}/1" if decimal_odds >= 2 else f"1/{int(1/(decimal_odds-1))}",
            "american": f"+{int((decimal_odds - 1) * 100)}" if decimal_odds >= 2
                        else f"-{int(100 / (decimal_odds - 1))}",
            "implied_prob": 1 / decimal_odds
        }

    def calculate_implied_probabilities(self, home_odds, draw_odds, away_odds):
        """Calculate true probabilities from betting odds."""

        # Raw implied probabilities
        home_prob = 1 / home_odds
        draw_prob = 1 / draw_odds
        away_prob = 1 / away_odds

        # Overround (bookmaker margin)
        overround = home_prob + draw_prob + away_prob

        # True probabilities (margin removed)
        return pd.DataFrame({
            "outcome": ["Home", "Draw", "Away"],
            "odds": [home_odds, draw_odds, away_odds],
            "raw_implied": [home_prob, draw_prob, away_prob],
            "true_implied": [home_prob/overround, draw_prob/overround,
                           away_prob/overround],
            "overround_pct": [(overround - 1) * 100] * 3
        })

    def calculate_expected_value(self, model_prob, odds):
        """Calculate expected value of a bet."""
        # EV = (probability * profit) - (1 - probability) * stake
        # For unit stake: EV = (prob * (odds - 1)) - ((1 - prob) * 1)
        ev = (model_prob * (odds - 1)) - (1 - model_prob)
        return ev

    def find_value_bets(self, matches_df, model_probs_col, odds_col,
                       ev_threshold=0.05):
        """Identify bets where model shows positive expected value."""

        df = matches_df.copy()

        df["implied_prob"] = 1 / df[odds_col]
        df["ev"] = df.apply(
            lambda x: self.calculate_expected_value(x[model_probs_col], x[odds_col]),
            axis=1
        )
        df["edge"] = df[model_probs_col] - df["implied_prob"]

        value_bets = df[df["ev"] > ev_threshold].sort_values("ev", ascending=False)
        return value_bets

    def kelly_criterion(self, model_prob, odds, fraction=0.25):
        """Calculate Kelly stake as percentage of bankroll."""
        # Full Kelly: f* = (bp - q) / b
        # where b = odds - 1, p = prob of win, q = prob of loss

        b = odds - 1
        p = model_prob
        q = 1 - p

        kelly = (b * p - q) / b

        # Apply fractional Kelly for safety
        return max(0, kelly * fraction)

# Example usage
analyzer = OddsAnalyzer()

# Analyze match odds
probs = analyzer.calculate_implied_probabilities(1.65, 3.80, 5.50)
print("Implied Probabilities:")
print(probs.to_string(index=False))
print(f"\nBookmaker margin: {probs['overround_pct'].iloc[0]:.1f}%")

# Calculate EV for a bet
model_prob = 0.65  # Our model says 65% chance of home win
home_odds = 1.65

ev = analyzer.calculate_expected_value(model_prob, home_odds)
kelly = analyzer.kelly_criterion(model_prob, home_odds)

print(f"\nModel probability: {model_prob:.1%}")
print(f"Implied probability: {1/home_odds:.1%}")
print(f"Expected Value: {ev:.3f}")
print(f"Kelly stake: {kelly:.1%} of bankroll")
# R: Betting odds analysis
library(tidyverse)

# Convert odds formats
convert_odds <- function(decimal_odds) {
  list(
    decimal = decimal_odds,
    fractional = paste0(round(decimal_odds - 1), "/1"),
    american = ifelse(decimal_odds >= 2,
                      paste0("+", round((decimal_odds - 1) * 100)),
                      paste0("-", round(100 / (decimal_odds - 1)))),
    implied_prob = 1 / decimal_odds
  )
}

# Calculate implied probabilities from odds
implied_probabilities <- function(home_odds, draw_odds, away_odds) {
  # Raw implied probabilities
  home_prob <- 1 / home_odds
  draw_prob <- 1 / draw_odds
  away_prob <- 1 / away_odds

  # Calculate overround (bookmaker margin)
  overround <- home_prob + draw_prob + away_prob

  # True probabilities (removing margin)
  tibble(
    outcome = c("Home", "Draw", "Away"),
    odds = c(home_odds, draw_odds, away_odds),
    raw_implied = c(home_prob, draw_prob, away_prob),
    true_implied = c(home_prob, draw_prob, away_prob) / overround,
    overround_pct = (overround - 1) * 100
  )
}

# Example
match_odds <- implied_probabilities(
  home_odds = 1.65,
  draw_odds = 3.80,
  away_odds = 5.50
)

print(match_odds)
cat("\nBookmaker margin:", round(match_odds$overround_pct[1], 1), "%")
Output
Implied Probabilities:
outcome  odds  raw_implied  true_implied  overround_pct
   Home  1.65       0.6061        0.5714           6.1
   Draw  3.80       0.2632        0.2484           6.1
   Away  5.50       0.1818        0.1716           6.1

Bookmaker margin: 6.1%

Model probability: 65.0%
Implied probability: 60.6%
Expected Value: 0.073
Kelly stake: 2.8% of bankroll

Evaluating Prediction Models

Assessing model quality is crucial for both fantasy and betting applications. Key metrics include calibration, log loss, and Brier score.

model_evaluation
# Python: Model evaluation metrics
import numpy as np
from sklearn.metrics import brier_score_loss, log_loss, roc_auc_score
from sklearn.calibration import calibration_curve
import matplotlib.pyplot as plt

class ModelEvaluator:
    """Evaluate prediction model performance."""

    def __init__(self, predictions, actuals):
        self.predictions = np.array(predictions)
        self.actuals = np.array(actuals)

    def brier_score(self):
        """Calculate Brier score (lower is better)."""
        return brier_score_loss(self.actuals, self.predictions)

    def calculate_log_loss(self):
        """Calculate log loss (lower is better)."""
        return log_loss(self.actuals, self.predictions)

    def calculate_auc(self):
        """Calculate ROC AUC."""
        return roc_auc_score(self.actuals, self.predictions)

    def calibration_analysis(self, n_bins=10):
        """Analyze model calibration."""
        prob_true, prob_pred = calibration_curve(
            self.actuals, self.predictions, n_bins=n_bins
        )

        return {
            "predicted": prob_pred,
            "actual": prob_true,
            "calibration_error": np.mean(np.abs(prob_pred - prob_true))
        }

    def plot_calibration(self):
        """Plot calibration curve."""
        cal = self.calibration_analysis()

        fig, ax = plt.subplots(figsize=(8, 8))

        # Perfect calibration line
        ax.plot([0, 1], [0, 1], "k--", label="Perfectly Calibrated")

        # Model calibration
        ax.plot(cal["predicted"], cal["actual"], "s-",
               label=f"Model (ECE: {cal['calibration_error']:.3f})")

        ax.set_xlabel("Mean Predicted Probability")
        ax.set_ylabel("Actual Proportion")
        ax.set_title("Calibration Curve")
        ax.legend()
        ax.grid(True, alpha=0.3)

        return fig

    def full_report(self):
        """Generate comprehensive evaluation report."""
        return {
            "brier_score": self.brier_score(),
            "log_loss": self.calculate_log_loss(),
            "auc": self.calculate_auc(),
            "calibration": self.calibration_analysis()
        }

# Compare model to betting market
def evaluate_against_market(model_probs, market_probs, actuals):
    """Compare model performance to market."""

    model_eval = ModelEvaluator(model_probs, actuals)
    market_eval = ModelEvaluator(market_probs, actuals)

    comparison = pd.DataFrame({
        "Metric": ["Brier Score", "Log Loss", "AUC"],
        "Model": [model_eval.brier_score(),
                  model_eval.calculate_log_loss(),
                  model_eval.calculate_auc()],
        "Market": [market_eval.brier_score(),
                   market_eval.calculate_log_loss(),
                   market_eval.calculate_auc()]
    })

    comparison["Model Better"] = comparison.apply(
        lambda x: x["Model"] < x["Market"] if x["Metric"] != "AUC"
                  else x["Model"] > x["Market"],
        axis=1
    )

    return comparison

# Example evaluation
np.random.seed(42)
n = 500
predictions = np.random.beta(2, 3, n)
actuals = (np.random.random(n) < predictions).astype(int)

evaluator = ModelEvaluator(predictions, actuals)
report = evaluator.full_report()

print("Model Evaluation Report:")
print(f"  Brier Score: {report['brier_score']:.4f}")
print(f"  Log Loss: {report['log_loss']:.4f}")
print(f"  AUC: {report['auc']:.4f}")
print(f"  Calibration Error: {report['calibration']['calibration_error']:.4f}")
# R: Model evaluation metrics
library(tidyverse)

# Brier score (lower is better)
brier_score <- function(predicted_prob, actual_outcome) {
  mean((predicted_prob - actual_outcome)^2)
}

# Log loss (lower is better)
log_loss <- function(predicted_prob, actual_outcome) {
  eps <- 1e-15
  predicted_prob <- pmax(pmin(predicted_prob, 1 - eps), eps)
  -mean(actual_outcome * log(predicted_prob) +
        (1 - actual_outcome) * log(1 - predicted_prob))
}

# Calibration analysis
calibration_analysis <- function(predictions, outcomes, n_bins = 10) {
  data <- tibble(pred = predictions, actual = outcomes)

  data %>%
    mutate(bin = cut(pred, breaks = seq(0, 1, length.out = n_bins + 1),
                     include.lowest = TRUE)) %>%
    group_by(bin) %>%
    summarise(
      mean_predicted = mean(pred),
      mean_actual = mean(actual),
      count = n()
    )
}

# ROC AUC calculation
calculate_auc <- function(predictions, outcomes) {
  # Simple AUC calculation
  pos <- predictions[outcomes == 1]
  neg <- predictions[outcomes == 0]

  mean(sapply(pos, function(p) mean(p > neg)))
}

# Comprehensive evaluation
evaluate_model <- function(predictions, outcomes) {
  list(
    brier = brier_score(predictions, outcomes),
    log_loss = log_loss(predictions, outcomes),
    auc = calculate_auc(predictions, outcomes),
    calibration = calibration_analysis(predictions, outcomes)
  )
}
Output
Model Evaluation Report:
  Brier Score: 0.1923
  Log Loss: 0.5847
  AUC: 0.7234
  Calibration Error: 0.0412

Goal Scorer & Over/Under Markets

Player-level and match total betting markets can be analyzed using xG-based models. These markets often show different efficiency levels than match result markets.

goal_scorer_markets
# Python: Goal scorer market analysis
import pandas as pd
import numpy as np
from scipy.stats import poisson
from typing import Dict, List, Tuple

class GoalScorerAnalyzer:
    """Analyze anytime goalscorer markets."""

    def __init__(self, player_stats: pd.DataFrame, odds_data: pd.DataFrame):
        self.players = player_stats.merge(odds_data, on="player_name", how="left")

    def calculate_ags_probability(self, xg_per_90: float,
                                  expected_mins: float) -> float:
        """Calculate probability of scoring at least once."""
        expected_goals = xg_per_90 * expected_mins / 90
        return 1 - np.exp(-expected_goals)

    def analyze_market(self) -> pd.DataFrame:
        """Analyze all AGS market prices."""

        df = self.players.copy()

        # Calculate model probability
        df["expected_goals"] = df["xg_per_90"] * df["expected_mins"] / 90
        df["model_prob"] = 1 - np.exp(-df["expected_goals"])

        # Market implied probability
        df["implied_prob"] = 1 / df["ags_odds"]

        # Edge and EV
        df["edge"] = df["model_prob"] - df["implied_prob"]
        df["ev"] = (df["model_prob"] * (df["ags_odds"] - 1)) - (1 - df["model_prob"])

        # Confidence score
        df["confidence"] = np.minimum(df["matches_played"] / 10, 1)

        return df.sort_values("ev", ascending=False)

    def find_value_ags(self, min_ev: float = 0.05,
                       min_confidence: float = 0.5) -> pd.DataFrame:
        """Find value AGS bets."""

        analysis = self.analyze_market()

        return analysis[
            (analysis["ev"] > min_ev) &
            (analysis["confidence"] >= min_confidence)
        ][["player_name", "team", "ags_odds", "model_prob",
           "implied_prob", "edge", "ev"]]


class OverUnderAnalyzer:
    """Analyze over/under and BTTS markets."""

    def __init__(self, match_data: pd.DataFrame):
        self.matches = match_data

    def analyze_ou_market(self, line: float = 2.5) -> pd.DataFrame:
        """Analyze over/under market for given line."""

        df = self.matches.copy()

        # Total expected goals
        df["total_xg"] = df["home_xg"] + df["away_xg"]

        # Poisson probabilities
        df["p_over"] = 1 - poisson.cdf(int(line), df["total_xg"])
        df["p_under"] = poisson.cdf(int(line), df["total_xg"])

        # Market comparison
        df["implied_over"] = 1 / df["over_odds"]
        df["implied_under"] = 1 / df["under_odds"]

        # Edge calculation
        df["over_edge"] = df["p_over"] - df["implied_over"]
        df["under_edge"] = df["p_under"] - df["implied_under"]

        # EV calculation
        df["over_ev"] = (df["p_over"] * (df["over_odds"] - 1)) - (1 - df["p_over"])
        df["under_ev"] = (df["p_under"] * (df["under_odds"] - 1)) - (1 - df["p_under"])

        # Best bet
        df["best_bet"] = np.where(df["over_ev"] > 0.05, "OVER",
                         np.where(df["under_ev"] > 0.05, "UNDER", "PASS"))

        return df

    def analyze_btts(self) -> pd.DataFrame:
        """Analyze Both Teams to Score market."""

        df = self.matches.copy()

        # Probability each team scores
        df["p_home_scores"] = 1 - np.exp(-df["home_xg"])
        df["p_away_scores"] = 1 - np.exp(-df["away_xg"])

        # BTTS probability
        df["model_btts"] = df["p_home_scores"] * df["p_away_scores"]
        df["model_no_btts"] = 1 - df["model_btts"]

        # Market comparison
        df["implied_btts"] = 1 / df["btts_yes_odds"]
        df["implied_no_btts"] = 1 / df["btts_no_odds"]

        # Edge
        df["btts_edge"] = df["model_btts"] - df["implied_btts"]
        df["no_btts_edge"] = df["model_no_btts"] - df["implied_no_btts"]

        return df

    def correct_score_probabilities(self, home_xg: float, away_xg: float,
                                   max_goals: int = 5) -> pd.DataFrame:
        """Calculate correct score probabilities using Poisson."""

        results = []

        for h in range(max_goals + 1):
            for a in range(max_goals + 1):
                prob = poisson.pmf(h, home_xg) * poisson.pmf(a, away_xg)
                results.append({
                    "home_goals": h,
                    "away_goals": a,
                    "score": f"{h}-{a}",
                    "probability": prob,
                    "fair_odds": 1 / prob if prob > 0 else float("inf")
                })

        return pd.DataFrame(results).sort_values("probability", ascending=False)

    def asian_handicap_probability(self, home_xg: float, away_xg: float,
                                  line: float) -> Dict:
        """Calculate Asian Handicap probabilities."""

        # Simulate many games using Poisson
        n_sims = 100000
        home_goals = np.random.poisson(home_xg, n_sims)
        away_goals = np.random.poisson(away_xg, n_sims)

        # Apply handicap
        adjusted_margin = home_goals - away_goals + line

        # Calculate outcomes
        home_wins = np.sum(adjusted_margin > 0) / n_sims
        away_wins = np.sum(adjusted_margin < 0) / n_sims
        pushes = np.sum(adjusted_margin == 0) / n_sims

        return {
            "line": line,
            "home_covers": home_wins,
            "away_covers": away_wins,
            "push": pushes,
            "home_fair_odds": 1 / home_wins if home_wins > 0 else float("inf"),
            "away_fair_odds": 1 / away_wins if away_wins > 0 else float("inf")
        }

# Example usage
print("Goal scorer and O/U analyzer ready!")

# Correct score example
analyzer = OverUnderAnalyzer(pd.DataFrame())
cs = analyzer.correct_score_probabilities(1.8, 1.2)
print("\nMost likely scores (Man City 1.8 xG vs Arsenal 1.2 xG):")
print(cs.head(10).to_string(index=False))
# R: Goal scorer market analysis
library(tidyverse)

# Analyze anytime goalscorer markets
analyze_ags_market <- function(player_data, market_odds) {

    player_data %>%
        left_join(market_odds, by = "player_name") %>%
        mutate(
            # Implied probability from odds
            implied_prob = 1 / ags_odds,

            # Model probability: P(at least 1 goal) = 1 - P(0 goals)
            # Using Poisson: P(0) = e^(-xG * mins/90)
            expected_goals = xg_per_90 * expected_mins / 90,
            model_prob = 1 - exp(-expected_goals),

            # Calculate edge
            edge = model_prob - implied_prob,

            # Expected value
            ev = (model_prob * (ags_odds - 1)) - (1 - model_prob),

            # Confidence based on sample size
            confidence = pmin(matches_played / 10, 1)
        ) %>%
        filter(!is.na(ags_odds)) %>%
        arrange(desc(ev))
}

# Over/Under goals model
analyze_ou_market <- function(match_data, line = 2.5) {

    match_data %>%
        mutate(
            # Total expected goals
            total_xg = home_xg + away_xg,

            # Probability of over using Poisson
            # P(total > line) = 1 - P(total <= floor(line))
            p_over = 1 - ppois(floor(line), lambda = total_xg),
            p_under = ppois(floor(line), lambda = total_xg),

            # Compare to market
            implied_over = 1 / over_odds,
            implied_under = 1 / under_odds,

            # Edge
            over_edge = p_over - implied_over,
            under_edge = p_under - implied_under,

            # Best bet direction
            best_bet = case_when(
                over_edge > 0.05 ~ "OVER",
                under_edge > 0.05 ~ "UNDER",
                TRUE ~ "PASS"
            )
        )
}

# BTTS (Both Teams to Score) market
analyze_btts <- function(match_data) {

    match_data %>%
        mutate(
            # P(home scores at least 1)
            p_home_scores = 1 - exp(-home_xg),
            # P(away scores at least 1)
            p_away_scores = 1 - exp(-away_xg),

            # P(BTTS) = P(home scores) * P(away scores)
            model_btts = p_home_scores * p_away_scores,
            model_no_btts = 1 - model_btts,

            # Compare to market
            implied_btts = 1 / btts_yes_odds,
            implied_no_btts = 1 / btts_no_odds,

            btts_edge = model_btts - implied_btts,
            no_btts_edge = model_no_btts - implied_no_btts
        )
}

# Correct score probabilities (Poisson)
correct_score_probs <- function(home_xg, away_xg, max_goals = 5) {

    scores <- expand_grid(
        home_goals = 0:max_goals,
        away_goals = 0:max_goals
    ) %>%
        mutate(
            # Independent Poisson probabilities
            prob = dpois(home_goals, home_xg) * dpois(away_goals, away_xg),
            score = paste0(home_goals, "-", away_goals)
        ) %>%
        arrange(desc(prob))

    scores
}

# Example
cs_probs <- correct_score_probs(home_xg = 1.8, away_xg = 1.2)
print(head(cs_probs, 10))
Output
Most likely scores (Man City 1.8 xG vs Arsenal 1.2 xG):
 home_goals  away_goals score  probability  fair_odds
          1           1   1-1       0.1790       5.59
          2           1   2-1       0.1611       6.21
          1           0   1-0       0.1492       6.70
          2           0   2-0       0.1343       7.45
          0           1   0-1       0.0828      12.08
          1           2   1-2       0.1074       9.31
          0           0   0-0       0.0690      14.49
          3           1   3-1       0.0967      10.34
          2           2   2-2       0.0967      10.34
          3           0   3-0       0.0805      12.42

Bankroll Management

Proper bankroll management is more important than picking winners. Even the best models fail without disciplined stake sizing.

bankroll_management
# Python: Bankroll management system
import numpy as np
import pandas as pd
from dataclasses import dataclass, field
from typing import List, Dict, Optional
from datetime import datetime

@dataclass
class Bet:
    """Represents a single bet."""
    date: datetime
    event: str
    selection: str
    odds: float
    stake: float
    stake_pct: float
    model_prob: float
    result: Optional[str] = None
    profit: Optional[float] = None

class BankrollManager:
    """Comprehensive bankroll management system."""

    def __init__(self, initial_bankroll: float, max_stake_pct: float = 0.05,
                 kelly_fraction: float = 0.25):
        self.initial = initial_bankroll
        self.current = initial_bankroll
        self.max_stake = max_stake_pct
        self.kelly_fraction = kelly_fraction
        self.history: List[Bet] = []

    def kelly_stake(self, model_prob: float, odds: float) -> float:
        """Calculate Kelly criterion stake."""
        b = odds - 1
        p = model_prob
        q = 1 - p

        # Full Kelly
        full_kelly = (b * p - q) / b

        if full_kelly <= 0:
            return 0

        # Apply fraction and cap
        stake_pct = min(full_kelly * self.kelly_fraction, self.max_stake)
        return max(0, stake_pct)

    def flat_stake(self, units: float = 1.0, unit_size: float = 0.01) -> float:
        """Flat staking approach."""
        return min(units * unit_size, self.max_stake)

    def place_bet(self, event: str, selection: str, odds: float,
                 model_prob: float, stake_method: str = "kelly") -> Bet:
        """Place a bet and record it."""

        if stake_method == "kelly":
            stake_pct = self.kelly_stake(model_prob, odds)
        else:
            stake_pct = self.flat_stake()

        stake = self.current * stake_pct

        bet = Bet(
            date=datetime.now(),
            event=event,
            selection=selection,
            odds=odds,
            stake=stake,
            stake_pct=stake_pct,
            model_prob=model_prob
        )

        self.history.append(bet)
        return bet

    def settle_bet(self, bet: Bet, won: bool) -> None:
        """Settle a bet and update bankroll."""
        bet.result = "WON" if won else "LOST"
        bet.profit = bet.stake * (bet.odds - 1) if won else -bet.stake
        self.current += bet.profit

    def get_stats(self) -> Dict:
        """Calculate comprehensive betting statistics."""

        settled = [b for b in self.history if b.result is not None]

        if not settled:
            return {"message": "No settled bets"}

        wins = [b for b in settled if b.result == "WON"]
        total_staked = sum(b.stake for b in settled)
        total_profit = sum(b.profit for b in settled)

        return {
            "total_bets": len(settled),
            "wins": len(wins),
            "losses": len(settled) - len(wins),
            "win_rate": len(wins) / len(settled),
            "total_staked": total_staked,
            "total_profit": total_profit,
            "roi": total_profit / total_staked if total_staked > 0 else 0,
            "bankroll_growth": (self.current - self.initial) / self.initial,
            "current_bankroll": self.current,
            "avg_odds": np.mean([b.odds for b in settled]),
            "avg_stake_pct": np.mean([b.stake_pct for b in settled])
        }

    def simulate_future(self, n_bets: int, avg_edge: float,
                       avg_odds: float, n_sims: int = 10000) -> Dict:
        """Monte Carlo simulation of future performance."""

        win_rate = (1 / avg_odds) + avg_edge
        results = []

        for _ in range(n_sims):
            bankroll = self.current
            max_bankroll = bankroll
            min_bankroll = bankroll

            for _ in range(n_bets):
                stake_pct = self.kelly_stake(win_rate, avg_odds)
                stake = bankroll * stake_pct

                if np.random.random() < win_rate:
                    bankroll += stake * (avg_odds - 1)
                else:
                    bankroll -= stake

                max_bankroll = max(max_bankroll, bankroll)
                min_bankroll = min(min_bankroll, bankroll)

                if bankroll <= 0:
                    break

            results.append({
                "final_bankroll": bankroll,
                "max_bankroll": max_bankroll,
                "min_bankroll": min_bankroll,
                "ruined": bankroll <= 0
            })

        df = pd.DataFrame(results)

        return {
            "mean_final": df["final_bankroll"].mean(),
            "median_final": df["final_bankroll"].median(),
            "percentile_5": df["final_bankroll"].quantile(0.05),
            "percentile_95": df["final_bankroll"].quantile(0.95),
            "risk_of_ruin": df["ruined"].mean(),
            "max_drawdown_mean": (df["max_bankroll"] - df["min_bankroll"]).mean() / df["max_bankroll"].mean()
        }

class ClosingLineValue:
    """Track Closing Line Value (CLV) for bet quality assessment."""

    def __init__(self):
        self.bets = []

    def add_bet(self, placed_odds: float, closing_odds: float,
                stake: float, won: bool) -> None:
        """Add a bet with placed and closing odds."""
        self.bets.append({
            "placed_odds": placed_odds,
            "closing_odds": closing_odds,
            "stake": stake,
            "won": won,
            "clv": (1/placed_odds) - (1/closing_odds)
        })

    def analyze(self) -> Dict:
        """Analyze CLV performance."""

        if not self.bets:
            return {"message": "No bets recorded"}

        df = pd.DataFrame(self.bets)

        return {
            "total_bets": len(df),
            "positive_clv_rate": (df["clv"] > 0).mean(),
            "mean_clv": df["clv"].mean(),
            "mean_clv_pct": df["clv"].mean() * 100,
            "win_rate": df["won"].mean(),
            "total_stake": df["stake"].sum(),
            "clv_by_outcome": {
                "winners": df[df["won"]]["clv"].mean() if df["won"].any() else 0,
                "losers": df[~df["won"]]["clv"].mean() if (~df["won"]).any() else 0
            }
        }

# Example usage
manager = BankrollManager(initial_bankroll=1000, kelly_fraction=0.25)

# Simulate some bets
print("Bankroll Management System")
print(f"Starting bankroll: £{manager.current:.2f}")

# Calculate recommended stake
edge_bet = manager.kelly_stake(model_prob=0.55, odds=2.10)
print(f"\nFor 55% model prob at 2.10 odds:")
print(f"  Kelly recommends: {edge_bet:.1%} of bankroll")
print(f"  Stake amount: £{manager.current * edge_bet:.2f}")
# R: Bankroll management system
library(tidyverse)

# Kelly Criterion with fractional approach
kelly_stake <- function(model_prob, odds, fraction = 0.25, max_stake = 0.05) {
    b <- odds - 1
    p <- model_prob
    q <- 1 - p

    # Full Kelly
    full_kelly <- (b * p - q) / b

    # Fractional Kelly (safer)
    fractional <- full_kelly * fraction

    # Cap at maximum stake
    stake <- max(0, min(fractional, max_stake))

    list(
        full_kelly = full_kelly,
        fractional_kelly = fractional,
        recommended_stake = stake
    )
}

# Bankroll tracking system
create_bankroll_tracker <- function(initial_bankroll) {

    tracker <- list(
        initial = initial_bankroll,
        current = initial_bankroll,
        history = tibble(
            date = as.Date(character()),
            bet_id = integer(),
            stake = numeric(),
            odds = numeric(),
            result = character(),
            profit = numeric(),
            bankroll = numeric()
        ),
        stats = list()
    )

    # Add bet function
    add_bet <- function(tracker, stake_pct, odds, won) {
        stake <- tracker$current * stake_pct
        profit <- if (won) stake * (odds - 1) else -stake
        new_bankroll <- tracker$current + profit

        new_row <- tibble(
            date = Sys.Date(),
            bet_id = nrow(tracker$history) + 1,
            stake = stake,
            odds = odds,
            result = if (won) "WON" else "LOST",
            profit = profit,
            bankroll = new_bankroll
        )

        tracker$history <- bind_rows(tracker$history, new_row)
        tracker$current <- new_bankroll
        tracker
    }

    tracker
}

# Calculate risk of ruin
risk_of_ruin <- function(win_rate, avg_odds, stake_pct, n_simulations = 10000) {

    # Simulate betting sequences
    ruin_count <- 0

    for (i in 1:n_simulations) {
        bankroll <- 1.0
        n_bets <- 1000

        for (b in 1:n_bets) {
            stake <- bankroll * stake_pct
            won <- runif(1) < win_rate

            if (won) {
                bankroll <- bankroll + stake * (avg_odds - 1)
            } else {
                bankroll <- bankroll - stake
            }

            if (bankroll <= 0) {
                ruin_count <- ruin_count + 1
                break
            }
        }
    }

    ruin_count / n_simulations
}

# Expected growth rate
expected_growth <- function(win_rate, odds, stake_pct) {
    # G = p * log(1 + b*f) + q * log(1 - f)
    # where f = stake fraction, b = odds - 1

    b <- odds - 1
    p <- win_rate
    q <- 1 - p
    f <- stake_pct

    p * log(1 + b * f) + q * log(1 - f)
}

print("Bankroll management system ready!")
Output
Bankroll Management System
Starting bankroll: £1000.00

For 55% model prob at 2.10 odds:
  Kelly recommends: 2.0% of bankroll
  Stake amount: £20.00

Responsible Gambling

No analytics chapter on betting is complete without addressing responsible gambling. This is non-negotiable content for ethical practice.

Warning Signs
  • Betting more than planned
  • Chasing losses with bigger bets
  • Borrowing money to bet
  • Lying about betting activity
  • Neglecting work, relationships, or health
  • Feeling anxious when not betting
  • Betting to escape problems
Healthy Practices
  • Set strict budget limits before betting
  • Never bet under emotional influence
  • Keep detailed records of all bets
  • Take regular breaks from betting
  • Never borrow to fund betting
  • Treat it as entertainment, not income
  • Use deposit limits and self-exclusion tools
Support Resources

If you or someone you know has a gambling problem, help is available:

responsible_gambling
# Python: Self-assessment and limit tracking
import pandas as pd
from datetime import datetime, timedelta
from dataclasses import dataclass
from typing import Dict, List, Optional

@dataclass
class GamblingLimits:
    """Track and enforce gambling limits."""
    daily_loss: float = 50.0
    weekly_loss: float = 200.0
    monthly_loss: float = 500.0
    session_time_mins: int = 60
    max_bets_per_day: int = 10

class ResponsibleGamblingTracker:
    """Track gambling behavior for responsible practices."""

    def __init__(self, limits: GamblingLimits = None):
        self.limits = limits or GamblingLimits()
        self.sessions = []
        self.daily_results = {}

    def start_session(self, mood: str = "neutral") -> Dict:
        """Start a new gambling session."""

        session = {
            "id": len(self.sessions) + 1,
            "start_time": datetime.now(),
            "end_time": None,
            "bets": [],
            "profit_loss": 0,
            "mood_before": mood,
            "mood_after": None,
            "within_limits": True
        }

        self.sessions.append(session)

        # Check remaining limits
        status = self.check_limits()

        return {
            "session_id": session["id"],
            "status": status,
            "message": self._get_limit_message(status)
        }

    def check_limits(self) -> Dict:
        """Check current limit status."""

        today = datetime.now().date()
        week_ago = today - timedelta(days=7)
        month_ago = today - timedelta(days=30)

        # Calculate losses
        daily_loss = sum(
            s["profit_loss"] for s in self.sessions
            if s["start_time"].date() == today and s["profit_loss"] < 0
        )

        weekly_loss = sum(
            s["profit_loss"] for s in self.sessions
            if s["start_time"].date() >= week_ago and s["profit_loss"] < 0
        )

        monthly_loss = sum(
            s["profit_loss"] for s in self.sessions
            if s["start_time"].date() >= month_ago and s["profit_loss"] < 0
        )

        daily_bets = sum(
            len(s["bets"]) for s in self.sessions
            if s["start_time"].date() == today
        )

        return {
            "daily_loss": abs(daily_loss),
            "daily_remaining": max(0, self.limits.daily_loss - abs(daily_loss)),
            "weekly_loss": abs(weekly_loss),
            "weekly_remaining": max(0, self.limits.weekly_loss - abs(weekly_loss)),
            "bets_today": daily_bets,
            "bets_remaining": max(0, self.limits.max_bets_per_day - daily_bets),
            "should_stop": abs(daily_loss) >= self.limits.daily_loss
        }

    def _get_limit_message(self, status: Dict) -> str:
        """Generate appropriate warning message."""

        if status["should_stop"]:
            return "STOP: You have reached your daily loss limit. Please stop betting."

        if status["daily_remaining"] < self.limits.daily_loss * 0.2:
            return f"WARNING: Only £{status['daily_remaining']:.2f} remaining in daily limit."

        if status["bets_remaining"] <= 2:
            return f"NOTE: Only {status['bets_remaining']} bets remaining today."

        return "Within all limits. Remember to bet responsibly."

    def end_session(self, mood_after: str = "neutral",
                   notes: str = "") -> Dict:
        """End current session and record stats."""

        if not self.sessions:
            return {"error": "No active session"}

        session = self.sessions[-1]
        session["end_time"] = datetime.now()
        session["mood_after"] = mood_after
        session["notes"] = notes

        duration = (session["end_time"] - session["start_time"]).seconds / 60

        return {
            "duration_mins": duration,
            "profit_loss": session["profit_loss"],
            "within_time_limit": duration <= self.limits.session_time_mins,
            "within_loss_limit": session["profit_loss"] > -self.limits.daily_loss
        }

    def pgsi_screening(self, responses: List[int]) -> Dict:
        """
        Problem Gambling Severity Index screening.

        responses: List of 9 responses, each 0-3
            0 = Never
            1 = Sometimes
            2 = Most of the time
            3 = Almost always
        """

        if len(responses) != 9:
            return {"error": "PGSI requires exactly 9 responses"}

        total = sum(responses)

        if total == 0:
            risk_level = "Non-problem gambling"
            recommendation = "Your gambling appears to be recreational and controlled."
        elif total <= 2:
            risk_level = "Low risk gambling"
            recommendation = "You show few signs of problem gambling, but stay aware."
        elif total <= 7:
            risk_level = "Moderate risk gambling"
            recommendation = "Consider setting stricter limits. Monitor your behavior."
        else:
            risk_level = "Problem gambling"
            recommendation = "Please seek professional support. Help is available."

        return {
            "score": total,
            "max_score": 27,
            "risk_level": risk_level,
            "recommendation": recommendation,
            "seek_help": total >= 8
        }

# Example usage
tracker = ResponsibleGamblingTracker()
session = tracker.start_session(mood="excited")
print(f"Session started: {session['message']}")

# Check limits
status = tracker.check_limits()
print(f"\nCurrent Status:")
print(f"  Daily remaining: £{status['daily_remaining']:.2f}")
print(f"  Weekly remaining: £{status['weekly_remaining']:.2f}")
print(f"  Bets remaining today: {status['bets_remaining']}")
# R: Self-assessment and limit tracking
library(tidyverse)

# Gambling behavior tracker
create_behavior_tracker <- function() {

    list(
        # Set limits
        limits = list(
            daily_loss = 50,
            weekly_loss = 200,
            monthly_loss = 500,
            session_time_mins = 60
        ),

        # Track sessions
        sessions = tibble(
            date = as.Date(character()),
            start_time = as.POSIXct(character()),
            end_time = as.POSIXct(character()),
            profit_loss = numeric(),
            mood_before = character(),
            mood_after = character(),
            stuck_to_limits = logical()
        ),

        # Check if approaching limits
        check_limits = function(self) {
            today <- Sys.Date()

            daily_total <- self$sessions %>%
                filter(date == today) %>%
                summarise(total = sum(profit_loss)) %>%
                pull(total)

            weekly_total <- self$sessions %>%
                filter(date >= today - 7) %>%
                summarise(total = sum(profit_loss)) %>%
                pull(total)

            list(
                daily_remaining = self$limits$daily_loss + daily_total,
                weekly_remaining = self$limits$weekly_loss + weekly_total,
                should_stop = daily_total <= -self$limits$daily_loss
            )
        }
    )
}

# Problem gambling screening (based on PGSI)
pgsi_screening <- function(responses) {
    # Responses should be 0-3 for each of 9 questions
    # 0 = Never, 1 = Sometimes, 2 = Most of the time, 3 = Almost always

    total_score <- sum(responses)

    risk_level <- case_when(
        total_score == 0 ~ "Non-problem gambling",
        total_score <= 2 ~ "Low risk gambling",
        total_score <= 7 ~ "Moderate risk gambling",
        TRUE ~ "Problem gambling"
    )

    list(
        score = total_score,
        risk_level = risk_level,
        recommendation = if (total_score >= 3)
            "Consider speaking to a professional about your gambling habits"
        else
            "Continue to monitor and maintain healthy gambling limits"
    )
}

print("Behavior tracking system ready")

Practice Exercises

Exercise 43.1: FPL xPoints Model

Build an expected points model for FPL using public xG/xA data. Validate against historical FPL scores to assess accuracy.

Exercise 43.2: Squad Optimizer

Implement the linear programming squad optimizer. Find the optimal £100m squad for a specific gameweek using your xPoints projections.

Exercise 43.3: Market Analysis

Collect historical betting odds and match results. Calculate implied probabilities and compare market calibration against your own model.

Exercise 43.4: FDR Calculator

Build a Fixture Difficulty Rating system using recent xG data. Create visualizations showing fixture swings for all Premier League teams over the next 10 gameweeks.

Exercise 43.5: Chip Strategy Planner

Analyze the remaining FPL gameweeks and recommend optimal chip timing. Consider Double Gameweeks, Blank Gameweeks, and fixture swings.

Exercise 43.6: Over/Under Model

Build a Poisson-based over/under model for match totals. Backtest against historical odds to evaluate if your model can find value.

Exercise 43.7: Bankroll Simulator

Create a Monte Carlo simulation to project bankroll growth under different staking strategies. Compare flat staking vs Kelly criterion with various edge assumptions.

Exercise 43.8: CLV Tracker

Track your betting results including both placed and closing odds. Calculate your Closing Line Value (CLV) and analyze whether positive CLV correlates with long-term profitability.

Summary

Essential Tools and Libraries

Category R Libraries Python Libraries Purpose
Optimization lpSolve, ROI scipy.optimize, PuLP, cvxpy Squad optimization, lineup selection
Data Analysis tidyverse pandas, numpy Data manipulation and statistics
Statistical Modeling stats, MASS scipy.stats, statsmodels Poisson models, probability calculations
Visualization ggplot2 matplotlib, plotly Fixture tickers, performance charts
FPL API Access fplr, httr2 fpl, requests Fetching FPL data
Web Scraping rvest beautifulsoup4, selenium Odds data collection
Machine Learning caret, mlr3 scikit-learn Prediction models, calibration
Simulation base R numpy (Monte Carlo) Bankroll projections, risk of ruin

FPL Data Sources

  • Official FPL API: fantasy.premierleague.com/api/ - Player data, fixtures, gameweeks
  • Understat: xG/xA data for top 5 leagues
  • FBref: Comprehensive player statistics via StatsBomb
  • Fantasy Football Scout: Historical FPL performance data
  • FPL Review: Expected points projections

Betting Market Data Sources

  • Odds Portal: Historical odds across multiple bookmakers
  • Football-Data.co.uk: Free historical odds and results
  • Betfair API: Exchange odds and market depth (requires account)
  • Pinnacle: Sharpest odds, often used as benchmark
  • The Odds API: Real-time odds aggregation

Key Metrics Reference

Metric Definition Good Value
xPoints Expected FPL points based on xG/xA 6+ per gameweek for premiums
Value (pts/£) xPoints divided by price in millions 0.6+ for good value picks
FDR Fixture Difficulty Rating (1-5) 1-2 = Easy, 4-5 = Hard
Overround Bookmaker margin on market ~3-5% for efficient markets
Expected Value (EV) (Prob × Profit) - (1-Prob × Loss) Positive = theoretically profitable
CLV (Closing Line Value) Difference between bet and closing odds Positive CLV = beating the market
Kelly % Optimal stake as % of bankroll Use 25-50% of full Kelly
ROI Profit / Total Staked 3-5% long-term is excellent
Brier Score Mean squared error of probabilities Lower is better, ~0.20 is good

Fantasy and betting analytics provide excellent ways to test prediction skills while enjoying the game. In the next chapter, we'll explore the unique challenges and opportunities of women's football analytics.