Chapter 60

Capstone - Complete Analytics System

Intermediate 30 min read 5 sections 10 code examples
0 of 60 chapters completed (0%)

Injury prevention is where sports science meets data analytics. By monitoring training load, recovery status, and injury risk factors, clubs can reduce injury rates and keep key players available for crucial matches.

Training Load Metrics

Training load quantifies the physiological stress placed on players. Understanding and managing this load is crucial for performance optimization and injury prevention.

External Load
  • Total Distance - kilometers covered
  • High-Speed Running - distance >19.8 km/h
  • Sprint Distance - distance >25 km/h
  • Accelerations - high-intensity acceleration count
  • Decelerations - high-intensity deceleration count
  • Player Load - accelerometer-derived metric
Internal Load
  • Heart Rate - average and max HR
  • TRIMP - Training Impulse
  • sRPE - Session Rating of Perceived Exertion
  • HRV - Heart Rate Variability
  • Time in HR Zones - distribution across zones
  • Recovery HR - post-exercise recovery rate
load_metrics.py
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime, timedelta

# Load GPS training data
gps_data = pd.read_csv("training_gps.csv")
gps_data['date'] = pd.to_datetime(gps_data['session_date']).dt.date

# Calculate daily load metrics per player
daily_load = (
    gps_data
    .groupby(['player_id', 'date'])
    .agg({
        'total_distance': 'sum',
        'high_speed_running_distance': 'sum',
        'sprint_distance': 'sum',
        'high_intensity_accelerations': 'sum',
        'high_intensity_decelerations': 'sum',
        'player_load_au': 'sum',
        'duration_minutes': 'sum',
        'average_heart_rate': 'mean',
        'max_heart_rate': 'max'
    })
    .reset_index()
)

# Convert to standard units
daily_load['total_distance_km'] = daily_load['total_distance'] / 1000

# Calculate sRPE load
srpe_data = training_wellness.copy()
srpe_data['srpe_load'] = (
    srpe_data['session_rpe'] * srpe_data['session_duration_min']
)

# Combine data
combined_load = daily_load.merge(
    srpe_data[['player_id', 'date', 'srpe_load', 'resting_hr']],
    on=['player_id', 'date'],
    how='left'
)

def calculate_trimp(row, gender='M'):
    """
    Calculate TRIMP using Banister method.
    """
    avg_hr = row['average_heart_rate']
    max_hr = row['max_heart_rate']
    resting_hr = row.get('resting_hr', 60)
    duration = row['duration_minutes']

    if pd.isna(avg_hr) or pd.isna(max_hr):
        return np.nan

    hr_reserve_ratio = (avg_hr - resting_hr) / (max_hr - resting_hr)
    hr_reserve_ratio = np.clip(hr_reserve_ratio, 0, 1)

    y = 1.92 if gender == 'M' else 1.67

    trimp = duration * hr_reserve_ratio * 0.64 * np.exp(y * hr_reserve_ratio)
    return trimp

combined_load['trimp'] = combined_load.apply(calculate_trimp, axis=1)

# Visualize load distribution
fig, ax = plt.subplots(figsize=(12, 6))

for player_id in combined_load['player_id'].unique()[:10]:
    player_data = combined_load[combined_load['player_id'] == player_id]
    ax.plot(player_data['date'], player_data['total_distance_km'],
            alpha=0.3, color='gray')

# Team average
team_avg = combined_load.groupby('date')['total_distance_km'].mean()
ax.plot(team_avg.index, team_avg.values, color='red', linewidth=2,
        label='Team Average')

ax.set_xlabel('Date')
ax.set_ylabel('Total Distance (km)')
ax.set_title('Team Training Load Over Season')
ax.legend()
plt.tight_layout()
plt.show()
library(tidyverse)
library(lubridate)
library(slider)
library(zoo)  # For EMA function in EWMA calculations

# Load GPS training data
gps_data <- read_csv("training_gps.csv") %>%
  mutate(date = as_date(session_date))

# Calculate daily load metrics per player
daily_load <- gps_data %>%
  group_by(player_id, date) %>%
  summarise(
    total_distance_km = sum(total_distance) / 1000,
    hsr_distance_m = sum(high_speed_running_distance),
    sprint_distance_m = sum(sprint_distance),
    accelerations = sum(high_intensity_accelerations),
    decelerations = sum(high_intensity_decelerations),
    player_load = sum(player_load_au),
    session_duration_min = sum(duration_minutes),
    avg_hr = mean(average_heart_rate),
    max_hr = max(max_heart_rate),
    .groups = "drop"
  )

# Calculate sRPE load (RPE x duration)
srpe_data <- training_wellness %>%
  mutate(
    srpe_load = session_rpe * session_duration_min
  )

# Combine GPS and sRPE data
combined_load <- daily_load %>%
  left_join(srpe_data, by = c("player_id", "date"))

# Calculate TRIMP (simplified Banister method)
calculate_trimp <- function(avg_hr, max_hr, resting_hr, duration_min, gender = "M") {
  # Heart rate reserve ratio
  hr_reserve_ratio <- (avg_hr - resting_hr) / (max_hr - resting_hr)

  # Gender-specific coefficients
  if (gender == "M") {
    y <- 1.92
  } else {
    y <- 1.67
  }

  trimp <- duration_min * hr_reserve_ratio * 0.64 * exp(y * hr_reserve_ratio)
  return(trimp)
}

# Apply TRIMP calculation
combined_load <- combined_load %>%
  mutate(
    trimp = calculate_trimp(avg_hr, max_hr, resting_hr, session_duration_min)
  )

# Visualize load distribution
ggplot(combined_load, aes(x = date, y = total_distance_km)) +
  geom_line(aes(group = player_id), alpha = 0.2) +
  stat_summary(fun = mean, geom = "line", color = "red", size = 1.2) +
  labs(
    title = "Team Training Load Over Season",
    subtitle = "Individual players (gray) vs team average (red)",
    x = "Date",
    y = "Total Distance (km)"
  ) +
  theme_minimal()

Acute:Chronic Workload Ratio (ACWR)

The ACWR compares recent training load (acute, typically 7 days) to longer-term training load (chronic, typically 28 days). Research suggests injury risk increases when ACWR is outside the "sweet spot" of 0.8-1.3.

ACWR Zones
  • < 0.8: Undertraining - fitness loss, possible increased injury risk
  • 0.8 - 1.3: Sweet spot - optimal training adaptation
  • 1.3 - 1.5: Danger zone - elevated injury risk
  • > 1.5: High risk - significantly elevated injury probability
acwr_calculation.py
import pandas as pd
import numpy as np

def calculate_acwr(data, load_col, acute_days=7, chronic_days=28):
    """
    Calculate Acute:Chronic Workload Ratio using rolling averages.
    """
    data = data.sort_values(['player_id', 'date']).copy()

    # Group by player and calculate rolling averages
    def player_acwr(group):
        group = group.sort_values('date')

        # Acute load (7-day rolling mean)
        group['acute_load'] = (
            group[load_col]
            .rolling(window=acute_days, min_periods=1)
            .mean()
        )

        # Chronic load (28-day rolling mean)
        group['chronic_load'] = (
            group[load_col]
            .rolling(window=chronic_days, min_periods=1)
            .mean()
        )

        # ACWR
        group['acwr'] = group['acute_load'] / group['chronic_load']

        # Risk zone classification
        group['acwr_zone'] = pd.cut(
            group['acwr'],
            bins=[0, 0.8, 1.3, 1.5, float('inf')],
            labels=['Undertraining', 'Optimal', 'Danger', 'High Risk']
        )

        return group

    return data.groupby('player_id').apply(player_acwr).reset_index(drop=True)

# Calculate ACWR for total distance
player_acwr = calculate_acwr(combined_load, 'total_distance_km')

def calculate_ewma_acwr(data, load_col, acute_days=7, chronic_days=28):
    """
    Calculate ACWR using Exponentially Weighted Moving Average.
    More sensitive to recent load changes.
    """
    data = data.sort_values(['player_id', 'date']).copy()

    def player_ewma(group):
        group = group.sort_values('date')

        # EWMA parameters
        span_acute = acute_days
        span_chronic = chronic_days

        group['ewma_acute'] = (
            group[load_col]
            .ewm(span=span_acute, adjust=False)
            .mean()
        )
        group['ewma_chronic'] = (
            group[load_col]
            .ewm(span=span_chronic, adjust=False)
            .mean()
        )
        group['ewma_acwr'] = group['ewma_acute'] / group['ewma_chronic']

        return group

    return data.groupby('player_id').apply(player_ewma).reset_index(drop=True)

# Calculate EWMA ACWR
player_ewma = calculate_ewma_acwr(combined_load, 'total_distance_km')

def plot_player_acwr(data, player_id):
    """
    Plot ACWR timeline for a specific player.
    """
    player_data = data[data['player_id'] == player_id].copy()

    fig, ax = plt.subplots(figsize=(12, 6))

    # ACWR line
    ax.plot(player_data['date'], player_data['acwr'],
            color='steelblue', linewidth=2)

    # Risk zone lines
    ax.axhline(y=0.8, linestyle='--', color='orange', label='Lower threshold')
    ax.axhline(y=1.3, linestyle='--', color='orange', label='Upper threshold')
    ax.axhline(y=1.5, linestyle='--', color='red', label='High risk')

    # Shaded optimal zone
    ax.axhspan(0.8, 1.3, alpha=0.2, color='green', label='Optimal zone')

    ax.set_xlabel('Date')
    ax.set_ylabel('ACWR (Total Distance)')
    ax.set_title(f'ACWR Timeline - Player {player_id}')
    ax.set_ylim(0, 2)
    ax.legend(loc='upper right')

    plt.tight_layout()
    plt.show()

# Team ACWR summary
def team_acwr_summary(data):
    """
    Summarize team ACWR status.
    """
    latest = data.groupby('player_id').last().reset_index()

    summary = latest['acwr_zone'].value_counts()
    print("Current Team ACWR Status:")
    print(summary)

    # Players in danger zone
    danger = latest[latest['acwr_zone'].isin(['Danger', 'High Risk'])]
    if len(danger) > 0:
        print(f"\nPlayers requiring attention: {len(danger)}")
        print(danger[['player_id', 'acwr', 'acwr_zone']])

    return latest

team_status = team_acwr_summary(player_acwr)
library(slider)
library(zoo)

# Calculate rolling averages for ACWR
calculate_acwr <- function(data, load_col, acute_days = 7, chronic_days = 28) {
  data %>%
    arrange(player_id, date) %>%
    group_by(player_id) %>%
    mutate(
      # Acute load (rolling mean, last 7 days)
      acute_load = slide_dbl(
        .data[[load_col]],
        mean, na.rm = TRUE,
        .before = acute_days - 1,
        .complete = FALSE
      ),

      # Chronic load (rolling mean, last 28 days, excluding acute window)
      chronic_load = slide_dbl(
        .data[[load_col]],
        mean, na.rm = TRUE,
        .before = chronic_days - 1,
        .complete = FALSE
      ),

      # ACWR
      acwr = acute_load / chronic_load,

      # Risk zone classification
      acwr_zone = case_when(
        acwr < 0.8 ~ "Undertraining",
        acwr >= 0.8 & acwr <= 1.3 ~ "Optimal",
        acwr > 1.3 & acwr <= 1.5 ~ "Danger",
        acwr > 1.5 ~ "High Risk",
        TRUE ~ NA_character_
      )
    ) %>%
    ungroup()
}

# Calculate ACWR for different load metrics
player_acwr <- combined_load %>%
  calculate_acwr("total_distance_km") %>%
  rename(acwr_distance = acwr, zone_distance = acwr_zone) %>%
  calculate_acwr("hsr_distance_m") %>%
  rename(acwr_hsr = acwr, zone_hsr = acwr_zone) %>%
  calculate_acwr("srpe_load") %>%
  rename(acwr_srpe = acwr, zone_srpe = acwr_zone)

# Exponentially Weighted Moving Average (EWMA) ACWR
# More sensitive to recent load changes
calculate_ewma_acwr <- function(data, load_col, acute_days = 7, chronic_days = 28) {
  # Calculate decay factors
  lambda_acute <- 2 / (acute_days + 1)
  lambda_chronic <- 2 / (chronic_days + 1)

  data %>%
    arrange(player_id, date) %>%
    group_by(player_id) %>%
    mutate(
      ewma_acute = EMA(.data[[load_col]], n = acute_days),
      ewma_chronic = EMA(.data[[load_col]], n = chronic_days),
      ewma_acwr = ewma_acute / ewma_chronic
    ) %>%
    ungroup()
}

# Plot ACWR over time for a player
plot_player_acwr <- function(data, player_id) {
  player_data <- data %>% filter(player_id == !!player_id)

  ggplot(player_data, aes(x = date)) +
    # ACWR line
    geom_line(aes(y = acwr_distance), color = "steelblue", linewidth = 1) +
    # Risk zones
    geom_hline(yintercept = 0.8, linetype = "dashed", color = "orange") +
    geom_hline(yintercept = 1.3, linetype = "dashed", color = "orange") +
    geom_hline(yintercept = 1.5, linetype = "dashed", color = "red") +
    # Shaded optimal zone
    annotate("rect", xmin = min(player_data$date), xmax = max(player_data$date),
             ymin = 0.8, ymax = 1.3, fill = "green", alpha = 0.1) +
    labs(
      title = paste("ACWR Timeline - Player", player_id),
      x = "Date",
      y = "ACWR (Total Distance)"
    ) +
    scale_y_continuous(limits = c(0, 2)) +
    theme_minimal()
}

Injury Risk Prediction

Predictive models can estimate injury probability based on load history, player characteristics, and injury history.

injury_prediction.py
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import roc_auc_score, classification_report
from imblearn.over_sampling import SMOTE
import shap

# Prepare injury prediction dataset
def prepare_injury_data(player_data, injuries):
    """
    Create dataset for injury prediction.
    """
    # Merge with injury dates
    injuries['injury_window_start'] = (
        pd.to_datetime(injuries['injury_date']) - pd.Timedelta(days=7)
    )

    injury_data = player_data.copy()

    # Label days before injury
    def label_injury_risk(row):
        player_injuries = injuries[injuries['player_id'] == row['player_id']]
        for _, inj in player_injuries.iterrows():
            if inj['injury_window_start'] <= row['date'] <= inj['injury_date']:
                return 1
        return 0

    injury_data['injury_next_7d'] = injury_data.apply(label_injury_risk, axis=1)

    return injury_data

injury_data = prepare_injury_data(player_acwr, injuries)

# Feature engineering
def engineer_injury_features(data):
    """
    Create features for injury prediction.
    """
    data = data.sort_values(['player_id', 'date']).copy()

    def player_features(group):
        group = group.sort_values('date')

        # Load variability (coefficient of variation)
        group['load_cv_7d'] = (
            group['total_distance_km'].rolling(7).std() /
            group['total_distance_km'].rolling(7).mean()
        )

        # Week-to-week change
        group['load_change'] = (
            (group['acute_load'] - group['acute_load'].shift(7)) /
            group['acute_load'].shift(7)
        )

        # Cumulative load
        group['cumulative_load_28d'] = (
            group['total_distance_km'].rolling(28).sum()
        )

        # Rolling averages for wellness
        if 'sleep_quality' in group.columns:
            group['avg_sleep_7d'] = group['sleep_quality'].rolling(7).mean()
            group['avg_fatigue_7d'] = group['fatigue_score'].rolling(7).mean()

        return group

    return data.groupby('player_id').apply(player_features).reset_index(drop=True)

injury_features = engineer_injury_features(injury_data)

# Prepare features and target
model_features = [
    'acwr', 'chronic_load', 'load_cv_7d', 'load_change',
    'cumulative_load_28d', 'age'
]

# Clean data
clean_data = injury_features.dropna(subset=model_features + ['injury_next_7d'])

X = clean_data[model_features]
y = clean_data['injury_next_7d']

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

# Handle class imbalance with SMOTE
smote = SMOTE(random_state=42)
X_train_balanced, y_train_balanced = smote.fit_resample(X_train, y_train)

# Train model
injury_model = GradientBoostingClassifier(
    n_estimators=200,
    max_depth=4,
    learning_rate=0.1,
    random_state=42
)
injury_model.fit(X_train_balanced, y_train_balanced)

# Evaluate
y_pred_proba = injury_model.predict_proba(X_test)[:, 1]
auc = roc_auc_score(y_test, y_pred_proba)
print(f"ROC-AUC: {auc:.3f}")

# Feature importance with SHAP
explainer = shap.TreeExplainer(injury_model)
shap_values = explainer.shap_values(X_test)
shap.summary_plot(shap_values, X_test)

# Create risk score for current squad
def calculate_injury_risk_score(current_data, model, features):
    """
    Calculate current injury risk for all players.
    """
    risk_scores = model.predict_proba(current_data[features])[:, 1]

    risk_df = current_data[['player_id', 'player_name']].copy()
    risk_df['injury_risk'] = risk_scores
    risk_df['risk_level'] = pd.cut(
        risk_scores,
        bins=[0, 0.2, 0.4, 0.6, 1.0],
        labels=['Low', 'Moderate', 'High', 'Very High']
    )

    return risk_df.sort_values('injury_risk', ascending=False)

squad_risk = calculate_injury_risk_score(latest_data, injury_model, model_features)
print(squad_risk.head(10))
library(caret)
library(randomForest)
library(pROC)

# Prepare injury prediction dataset
injury_data <- player_acwr %>%
  # Create injury outcome (injury within next 7 days)
  left_join(
    injuries %>%
      select(player_id, injury_date) %>%
      mutate(
        injury_start = injury_date - 7,
        injury_end = injury_date
      ),
    by = "player_id"
  ) %>%
  mutate(
    injury_next_7d = as.factor(ifelse(
      date >= injury_start & date <= injury_end, 1, 0
    ))
  ) %>%
  select(-injury_start, -injury_end, -injury_date) %>%
  distinct()

# Feature engineering
injury_features <- injury_data %>%
  group_by(player_id) %>%
  mutate(
    # Load features
    acwr_7d = acwr_distance,
    acwr_hsr = acwr_hsr,
    chronic_load_28d = chronic_load,

    # Load variability
    load_cv_7d = slide_dbl(total_distance_km, sd, .before = 6) /
                 slide_dbl(total_distance_km, mean, .before = 6),

    # Week-to-week change
    load_change = (acute_load - lag(acute_load, 7)) / lag(acute_load, 7),

    # Cumulative load
    cumulative_load_28d = slide_dbl(total_distance_km, sum, .before = 27),

    # Previous injury history
    days_since_last_injury = as.numeric(date - lag(injury_date)),

    # Sleep and wellness
    avg_sleep_quality_7d = slide_dbl(sleep_quality, mean, .before = 6),
    avg_fatigue_7d = slide_dbl(fatigue_score, mean, .before = 6)
  ) %>%
  ungroup() %>%
  filter(!is.na(acwr_7d))

# Train/test split
set.seed(42)
train_idx <- createDataPartition(injury_features$injury_next_7d, p = 0.7, list = FALSE)
train_data <- injury_features[train_idx, ]
test_data <- injury_features[-train_idx, ]

# Handle class imbalance with SMOTE
library(DMwR)
train_balanced <- SMOTE(injury_next_7d ~ ., data = train_data[, model_features])

# Train Random Forest model
model_features <- c(
  "acwr_7d", "acwr_hsr", "chronic_load_28d", "load_cv_7d",
  "load_change", "cumulative_load_28d", "days_since_last_injury",
  "avg_sleep_quality_7d", "avg_fatigue_7d", "age"
)

injury_model <- randomForest(
  injury_next_7d ~ .,
  data = train_balanced[, c(model_features, "injury_next_7d")],
  ntree = 500,
  mtry = 3
)

# Evaluate model
predictions <- predict(injury_model, test_data, type = "prob")[, "1"]
roc_result <- roc(test_data$injury_next_7d, predictions)
print(paste("AUC:", round(auc(roc_result), 3)))

# Variable importance
importance(injury_model) %>%
  as.data.frame() %>%
  rownames_to_column("feature") %>%
  arrange(desc(MeanDecreaseGini)) %>%
  head(10) %>%
  ggplot(aes(x = reorder(feature, MeanDecreaseGini), y = MeanDecreaseGini)) +
  geom_bar(stat = "identity", fill = "coral") +
  coord_flip() +
  labs(title = "Injury Risk Factor Importance", x = "", y = "Importance") +
  theme_minimal()

Load Management Dashboard

A real-time dashboard helps coaches and sports scientists monitor player load and make informed decisions about training intensity and player availability.

load_dashboard.py
import dash
from dash import dcc, html
from dash.dependencies import Input, Output
import plotly.express as px
import plotly.graph_objects as go

# Initialize Dash app
app = dash.Dash(__name__)

# Layout
app.layout = html.Div([
    html.H1("Load Management Dashboard"),

    html.Div([
        html.Label("Select Player:"),
        dcc.Dropdown(
            id='player-dropdown',
            options=[{'label': name, 'value': name}
                     for name in player_acwr['player_name'].unique()],
            value=player_acwr['player_name'].iloc[0]
        ),
        dcc.DatePickerRange(
            id='date-range',
            start_date=datetime.now() - timedelta(days=30),
            end_date=datetime.now()
        )
    ], style={'width': '30%', 'padding': '20px'}),

    # KPI Cards
    html.Div([
        html.Div([
            html.H3(id='acwr-value'),
            html.P("Current ACWR")
        ], className='kpi-card'),
        html.Div([
            html.H3(id='chronic-load-value'),
            html.P("Chronic Load")
        ], className='kpi-card'),
        html.Div([
            html.H3(id='risk-value'),
            html.P("Injury Risk")
        ], className='kpi-card')
    ], style={'display': 'flex', 'justify-content': 'space-around'}),

    # Charts
    html.Div([
        dcc.Graph(id='acwr-timeline'),
        dcc.Graph(id='load-distribution')
    ]),

    html.Div([
        dcc.Graph(id='weekly-comparison'),
        dcc.Graph(id='wellness-trends')
    ])
])

@app.callback(
    [Output('acwr-value', 'children'),
     Output('chronic-load-value', 'children'),
     Output('risk-value', 'children'),
     Output('acwr-timeline', 'figure'),
     Output('load-distribution', 'figure')],
    [Input('player-dropdown', 'value'),
     Input('date-range', 'start_date'),
     Input('date-range', 'end_date')]
)
def update_dashboard(player, start_date, end_date):
    # Filter data
    mask = (
        (player_acwr['player_name'] == player) &
        (player_acwr['date'] >= start_date) &
        (player_acwr['date'] <= end_date)
    )
    player_data = player_acwr[mask]

    if len(player_data) == 0:
        return "N/A", "N/A", "N/A", {}, {}

    # Current values
    current_acwr = player_data['acwr'].iloc[-1]
    chronic_load = player_data['chronic_load'].iloc[-1]
    injury_risk = player_data.get('injury_risk_score', pd.Series([0.1])).iloc[-1] * 100

    # ACWR Timeline
    acwr_fig = go.Figure()
    acwr_fig.add_trace(go.Scatter(
        x=player_data['date'],
        y=player_data['acwr'],
        mode='lines',
        name='ACWR',
        line=dict(color='steelblue', width=2)
    ))
    # Add risk zones
    acwr_fig.add_hline(y=0.8, line_dash="dash", line_color="orange")
    acwr_fig.add_hline(y=1.3, line_dash="dash", line_color="orange")
    acwr_fig.add_hline(y=1.5, line_dash="dash", line_color="red")
    acwr_fig.add_hrect(y0=0.8, y1=1.3, fillcolor="green", opacity=0.1)
    acwr_fig.update_layout(title="ACWR Timeline", yaxis_range=[0, 2])

    # Load distribution
    load_fig = px.histogram(
        player_data, x='total_distance_km',
        title="Load Distribution"
    )

    return (
        f"{current_acwr:.2f}",
        f"{chronic_load:.0f}",
        f"{injury_risk:.0f}%",
        acwr_fig,
        load_fig
    )

if __name__ == '__main__':
    app.run_server(debug=True)
library(shiny)
library(shinydashboard)
library(plotly)

# Dashboard UI
ui <- dashboardPage(
  dashboardHeader(title = "Load Management"),
  dashboardSidebar(
    selectInput("player", "Player", choices = unique(player_acwr$player_name)),
    dateRangeInput("dates", "Date Range",
                   start = Sys.Date() - 30, end = Sys.Date())
  ),
  dashboardBody(
    fluidRow(
      # KPI boxes
      valueBoxOutput("acwr_box"),
      valueBoxOutput("chronic_load_box"),
      valueBoxOutput("injury_risk_box")
    ),
    fluidRow(
      box(title = "ACWR Timeline", plotlyOutput("acwr_plot"), width = 8),
      box(title = "Load Distribution", plotlyOutput("load_dist"), width = 4)
    ),
    fluidRow(
      box(title = "Weekly Load Comparison", plotlyOutput("weekly_comparison"), width = 6),
      box(title = "Wellness Trends", plotlyOutput("wellness_plot"), width = 6)
    )
  )
)

# Dashboard Server
server <- function(input, output) {

  player_data <- reactive({
    player_acwr %>%
      filter(
        player_name == input$player,
        date >= input$dates[1],
        date <= input$dates[2]
      )
  })

  output$acwr_box <- renderValueBox({
    current_acwr <- tail(player_data()$acwr_distance, 1)
    color <- case_when(
      current_acwr < 0.8 ~ "yellow",
      current_acwr <= 1.3 ~ "green",
      current_acwr <= 1.5 ~ "orange",
      TRUE ~ "red"
    )
    valueBox(
      round(current_acwr, 2), "Current ACWR",
      icon = icon("chart-line"),
      color = color
    )
  })

  output$chronic_load_box <- renderValueBox({
    valueBox(
      round(tail(player_data()$chronic_load, 1), 0),
      "Chronic Load (28d avg)",
      icon = icon("dumbbell"),
      color = "blue"
    )
  })

  output$injury_risk_box <- renderValueBox({
    risk <- tail(player_data()$injury_risk_score, 1) * 100
    color <- ifelse(risk < 20, "green", ifelse(risk < 50, "yellow", "red"))
    valueBox(
      paste0(round(risk, 0), "%"),
      "Injury Risk",
      icon = icon("shield-alt"),
      color = color
    )
  })

  output$acwr_plot <- renderPlotly({
    p <- ggplot(player_data(), aes(x = date, y = acwr_distance)) +
      geom_line(color = "steelblue", linewidth = 1) +
      geom_hline(yintercept = c(0.8, 1.3), linetype = "dashed", color = "orange") +
      geom_ribbon(aes(ymin = 0.8, ymax = 1.3), fill = "green", alpha = 0.1) +
      labs(x = "", y = "ACWR") +
      theme_minimal()

    ggplotly(p)
  })

  output$load_dist <- renderPlotly({
    plot_ly(player_data(), x = ~total_distance_km, type = "histogram",
            marker = list(color = "steelblue"))
  })
}

shinyApp(ui, server)

Return to Play Analytics

After injury, analytics can help determine when a player is ready to return to full training and match play by tracking fitness markers and comparing to pre-injury baselines.

return_to_play.py
def calculate_rtp_readiness(player_id, injury_date, current_date):
    """
    Calculate return-to-play readiness based on pre-injury baseline.
    """
    # Get pre-injury baseline (30 days before injury)
    baseline_mask = (
        (player_training['player_id'] == player_id) &
        (player_training['date'] >= injury_date - timedelta(days=30)) &
        (player_training['date'] < injury_date)
    )
    baseline_data = player_training[baseline_mask]

    baseline = {
        'distance': baseline_data['total_distance_km'].mean(),
        'hsr': baseline_data['hsr_distance_m'].mean(),
        'sprint': baseline_data['sprint_distance_m'].mean(),
        'accelerations': baseline_data['accelerations'].mean(),
        'top_speed': baseline_data['top_speed_kmh'].max()
    }

    # Get current metrics (last 7 days)
    current_mask = (
        (player_training['player_id'] == player_id) &
        (player_training['date'] >= current_date - timedelta(days=7)) &
        (player_training['date'] <= current_date)
    )
    current_data = player_training[current_mask]

    current = {
        'distance': current_data['total_distance_km'].mean(),
        'hsr': current_data['hsr_distance_m'].mean(),
        'sprint': current_data['sprint_distance_m'].mean(),
        'accelerations': current_data['accelerations'].mean(),
        'top_speed': current_data['top_speed_kmh'].max()
    }

    # Calculate readiness percentages
    readiness = pd.DataFrame({
        'metric': ['Total Distance', 'High-Speed Running', 'Sprint Distance',
                   'Accelerations', 'Top Speed'],
        'baseline': list(baseline.values()),
        'current': list(current.values())
    })
    readiness['pct_of_baseline'] = (
        readiness['current'] / readiness['baseline'] * 100
    )

    # Overall readiness
    overall_readiness = readiness['pct_of_baseline'].mean()

    # RTP criteria thresholds
    criteria = {
        'distance_ready': readiness.iloc[0]['pct_of_baseline'] >= 85,
        'hsr_ready': readiness.iloc[1]['pct_of_baseline'] >= 80,
        'sprint_ready': readiness.iloc[2]['pct_of_baseline'] >= 75,
        'speed_ready': readiness.iloc[4]['pct_of_baseline'] >= 90,
        'overall_ready': overall_readiness >= 85
    }

    recommendation = (
        "CLEARED for full training" if all(criteria.values())
        else "NOT READY - continue modified training"
    )

    return {
        'readiness': readiness,
        'overall_score': overall_readiness,
        'criteria': criteria,
        'recommendation': recommendation
    }

def visualize_rtp_progress(rtp_data):
    """
    Visualize return-to-play progress.
    """
    readiness = rtp_data['readiness']

    fig, ax = plt.subplots(figsize=(10, 6))

    colors = ['forestgreen' if pct >= 85 else 'orange'
              for pct in readiness['pct_of_baseline']]

    bars = ax.barh(readiness['metric'], readiness['pct_of_baseline'],
                   color=colors)

    # Reference lines
    ax.axvline(x=85, linestyle='--', color='red', label='85% threshold')
    ax.axvline(x=100, linestyle='-', color='green', label='Baseline')

    ax.set_xlabel('% of Pre-Injury Baseline')
    ax.set_title(f"Return to Play Readiness\n"
                 f"Overall: {rtp_data['overall_score']:.0f}%")
    ax.legend()

    # Add value labels
    for bar, pct in zip(bars, readiness['pct_of_baseline']):
        ax.text(bar.get_width() + 2, bar.get_y() + bar.get_height()/2,
                f'{pct:.0f}%', va='center')

    plt.tight_layout()
    plt.show()

# Example usage
rtp_result = calculate_rtp_readiness(
    player_id='P001',
    injury_date=datetime(2024, 1, 15),
    current_date=datetime(2024, 2, 20)
)
print(rtp_result['recommendation'])
visualize_rtp_progress(rtp_result)
# Return to play monitoring
calculate_rtp_readiness <- function(player_id, injury_date, current_date) {

  # Get pre-injury baseline (last 30 days before injury)
  baseline <- player_training %>%
    filter(
      player_id == !!player_id,
      date >= injury_date - 30,
      date < injury_date
    ) %>%
    summarise(
      baseline_distance = mean(total_distance_km),
      baseline_hsr = mean(hsr_distance_m),
      baseline_sprint = mean(sprint_distance_m),
      baseline_accel = mean(accelerations),
      baseline_speed = max(top_speed_kmh)
    )

  # Get current metrics (last 7 days)
  current <- player_training %>%
    filter(
      player_id == !!player_id,
      date >= current_date - 7,
      date <= current_date
    ) %>%
    summarise(
      current_distance = mean(total_distance_km),
      current_hsr = mean(hsr_distance_m),
      current_sprint = mean(sprint_distance_m),
      current_accel = mean(accelerations),
      current_speed = max(top_speed_kmh)
    )

  # Calculate readiness percentages
  readiness <- tibble(
    metric = c("Total Distance", "High-Speed Running", "Sprint Distance",
               "Accelerations", "Top Speed"),
    baseline = c(baseline$baseline_distance, baseline$baseline_hsr,
                 baseline$baseline_sprint, baseline$baseline_accel,
                 baseline$baseline_speed),
    current = c(current$current_distance, current$current_hsr,
                current$current_sprint, current$current_accel,
                current$current_speed),
    pct_of_baseline = current / baseline * 100
  )

  # Overall readiness score
  overall_readiness <- mean(readiness$pct_of_baseline)

  # RTP criteria check
  rtp_criteria <- list(
    distance_ready = readiness$pct_of_baseline[1] >= 85,
    hsr_ready = readiness$pct_of_baseline[2] >= 80,
    sprint_ready = readiness$pct_of_baseline[3] >= 75,
    speed_ready = readiness$pct_of_baseline[5] >= 90,
    overall_ready = overall_readiness >= 85
  )

  return(list(
    readiness = readiness,
    overall_score = overall_readiness,
    criteria = rtp_criteria,
    recommendation = ifelse(
      all(unlist(rtp_criteria)),
      "CLEARED for full training",
      "NOT READY - continue modified training"
    )
  ))
}

# Visualize RTP progress
visualize_rtp_progress <- function(rtp_data) {
  ggplot(rtp_data$readiness, aes(x = metric, y = pct_of_baseline)) +
    geom_bar(stat = "identity", aes(fill = pct_of_baseline >= 85)) +
    geom_hline(yintercept = 85, linetype = "dashed", color = "red") +
    geom_hline(yintercept = 100, linetype = "solid", color = "green") +
    scale_fill_manual(values = c("FALSE" = "orange", "TRUE" = "forestgreen")) +
    labs(
      title = "Return to Play Readiness",
      subtitle = paste("Overall:", round(rtp_data$overall_score, 0), "%"),
      x = "", y = "% of Pre-Injury Baseline"
    ) +
    theme_minimal() +
    theme(legend.position = "none") +
    coord_flip()
}

Practice Exercises

Exercise 22.1: Complete ACWR Monitoring System

Task: Build a comprehensive ACWR monitoring system that calculates both rolling average and EWMA-based ACWR for multiple load metrics, identifies players in danger zones, and generates daily team status reports.

Requirements:

  • Calculate ACWR using both methods (rolling average and EWMA)
  • Monitor total distance, HSR, sprint distance, and sRPE
  • Classify players into risk zones (under-training, optimal, danger, high-risk)
  • Generate team summary with flagged players
  • Create visualization comparing methods

acwr_monitoring_system.py
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import matplotlib.pyplot as plt

# Simulate GPS training data
np.random.seed(42)
n_players = 25
n_days = 60

# Player info
players = [f'P{i:02d}' for i in range(1, n_players + 1)]
positions = np.random.choice(['GK', 'DEF', 'MID', 'FWD'], n_players, p=[0.1, 0.35, 0.35, 0.2])
player_info = pd.DataFrame({'player_id': players, 'position': positions})

# Generate dates
dates = pd.date_range(end=datetime.now(), periods=n_days, freq='D')

# Create training data
training_data = []
for player, position in zip(players, positions):
    for date in dates:
        dow = date.dayofweek
        is_match = dow == 5  # Saturday
        is_recovery = dow == 6  # Sunday

        base_dist = {'GK': 5.5, 'DEF': 9.5, 'MID': 11, 'FWD': 10}[position]

        if is_match:
            total_dist = base_dist * 1.2 + np.random.normal(0, 0.8)
            hsr = 1200 + np.random.normal(0, 200)
            sprint = 350 + np.random.normal(0, 80)
            srpe = 90 * 7.5 + np.random.normal(0, 50)
        elif is_recovery:
            total_dist = base_dist * 0.4 + np.random.normal(0, 0.3)
            hsr = 150 + np.random.normal(0, 50)
            sprint = 30 + np.random.normal(0, 15)
            srpe = 30 * 3 + np.random.normal(0, 20)
        else:
            total_dist = base_dist + np.random.normal(0, 1.2)
            hsr = 600 + np.random.normal(0, 150)
            sprint = 180 + np.random.normal(0, 50)
            srpe = 75 * 5 + np.random.normal(0, 40)

        training_data.append({
            'player_id': player,
            'position': position,
            'date': date,
            'total_distance': max(0, total_dist),
            'hsr_distance': max(0, hsr),
            'sprint_distance': max(0, sprint),
            'srpe_load': max(0, srpe)
        })

df = pd.DataFrame(training_data)

def calculate_rolling_acwr(data, load_col, acute_days=7, chronic_days=28):
    """Calculate ACWR using simple rolling averages."""
    data = data.sort_values(['player_id', 'date']).copy()

    result = []
    for player_id in data['player_id'].unique():
        player_data = data[data['player_id'] == player_id].copy()

        player_data['acute_load'] = player_data[load_col].rolling(
            window=acute_days, min_periods=1
        ).mean()
        player_data['chronic_load'] = player_data[load_col].rolling(
            window=chronic_days, min_periods=1
        ).mean()
        player_data['acwr_rolling'] = player_data['acute_load'] / player_data['chronic_load']

        result.append(player_data)

    return pd.concat(result)

def calculate_ewma_acwr(data, load_col, acute_days=7, chronic_days=28):
    """Calculate ACWR using exponentially weighted moving averages."""
    data = data.sort_values(['player_id', 'date']).copy()

    result = []
    for player_id in data['player_id'].unique():
        player_data = data[data['player_id'] == player_id].copy()

        player_data['ewma_acute'] = player_data[load_col].ewm(
            span=acute_days, adjust=False
        ).mean()
        player_data['ewma_chronic'] = player_data[load_col].ewm(
            span=chronic_days, adjust=False
        ).mean()
        player_data['acwr_ewma'] = player_data['ewma_acute'] / player_data['ewma_chronic']

        result.append(player_data)

    return pd.concat(result)

# Calculate ACWR for all metrics
metrics = ['total_distance', 'hsr_distance', 'sprint_distance', 'srpe_load']

acwr_df = df.copy()
for metric in metrics:
    # Rolling ACWR
    rolling_result = calculate_rolling_acwr(df, metric)
    acwr_df[f'{metric}_acute'] = rolling_result['acute_load']
    acwr_df[f'{metric}_chronic'] = rolling_result['chronic_load']
    acwr_df[f'{metric}_acwr_rolling'] = rolling_result['acwr_rolling']

    # EWMA ACWR
    ewma_result = calculate_ewma_acwr(df, metric)
    acwr_df[f'{metric}_ewma_acute'] = ewma_result['ewma_acute']
    acwr_df[f'{metric}_ewma_chronic'] = ewma_result['ewma_chronic']
    acwr_df[f'{metric}_acwr_ewma'] = ewma_result['acwr_ewma']

def classify_risk_zone(acwr):
    """Classify ACWR into risk zones."""
    if pd.isna(acwr) or np.isinf(acwr):
        return 'Insufficient Data'
    elif acwr < 0.8:
        return 'Under-training'
    elif acwr <= 1.3:
        return 'Optimal'
    elif acwr <= 1.5:
        return 'Danger'
    else:
        return 'High Risk'

# Add risk classifications
for metric in ['total_distance', 'hsr_distance', 'srpe_load']:
    acwr_df[f'{metric}_zone_rolling'] = acwr_df[f'{metric}_acwr_rolling'].apply(classify_risk_zone)
    acwr_df[f'{metric}_zone_ewma'] = acwr_df[f'{metric}_acwr_ewma'].apply(classify_risk_zone)

def generate_team_status(data, report_date=None):
    """Generate daily team status report."""
    if report_date is None:
        report_date = data['date'].max()

    daily_data = data[data['date'] == report_date]

    print("=" * 65)
    print(f"DAILY LOAD MONITORING REPORT - {report_date.strftime('%Y-%m-%d')}")
    print("=" * 65)

    # Team summary
    n_players = len(daily_data)
    n_optimal = (daily_data['total_distance_zone_rolling'] == 'Optimal').sum()
    n_danger = daily_data['total_distance_zone_rolling'].isin(['Danger', 'High Risk']).sum()
    n_undertrain = (daily_data['total_distance_zone_rolling'] == 'Under-training').sum()
    avg_acwr = daily_data['total_distance_acwr_rolling'].mean()

    print("\nTEAM SUMMARY:")
    print(f"  Players Monitored: {n_players}")
    print(f"  In Optimal Zone: {n_optimal}")
    print(f"  In Danger/High Risk: {n_danger}")
    print(f"  Under-training: {n_undertrain}")
    print(f"  Team Avg ACWR: {avg_acwr:.2f}")

    # Players requiring attention
    attention_mask = (
        daily_data['total_distance_zone_rolling'].isin(['Danger', 'High Risk']) |
        daily_data['hsr_distance_zone_rolling'].isin(['Danger', 'High Risk']) |
        daily_data['srpe_load_zone_rolling'].isin(['Danger', 'High Risk'])
    )
    attention_players = daily_data[attention_mask][[
        'player_id', 'position',
        'total_distance_acwr_rolling', 'total_distance_zone_rolling',
        'hsr_distance_acwr_rolling', 'hsr_distance_zone_rolling'
    ]]

    if len(attention_players) > 0:
        print("\nPLAYERS REQUIRING ATTENTION:")
        print(attention_players.to_string(index=False))
    else:
        print("\nNo players in danger zones.")

    return {'n_players': n_players, 'attention': attention_players}

# Generate report
report = generate_team_status(acwr_df)

# Visualization
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Plot 1: Rolling vs EWMA comparison
latest = acwr_df[acwr_df['date'] == acwr_df['date'].max()]

colors = {'Under-training': 'blue', 'Optimal': 'green', 'Danger': 'orange',
          'High Risk': 'red', 'Insufficient Data': 'gray'}
scatter_colors = [colors.get(z, 'gray') for z in latest['total_distance_zone_rolling']]

axes[0].scatter(latest['total_distance_acwr_rolling'], latest['total_distance_acwr_ewma'],
                c=scatter_colors, s=80)
axes[0].plot([0.5, 2], [0.5, 2], 'k--', alpha=0.5)
axes[0].axhline(0.8, ls=':', color='orange')
axes[0].axhline(1.3, ls=':', color='orange')
axes[0].axvline(0.8, ls=':', color='orange')
axes[0].axvline(1.3, ls=':', color='orange')
axes[0].set_xlabel('Rolling Average ACWR')
axes[0].set_ylabel('EWMA ACWR')
axes[0].set_title('ACWR Comparison: Rolling vs EWMA')

# Add legend
for zone, color in colors.items():
    axes[0].scatter([], [], c=color, label=zone)
axes[0].legend(title='Risk Zone', loc='upper left')

# Plot 2: Player timeline example
player_data = acwr_df[(acwr_df['player_id'] == 'P01') &
                       (acwr_df['date'] >= acwr_df['date'].max() - timedelta(days=28))]

axes[1].plot(player_data['date'], player_data['total_distance_acwr_rolling'],
             label='Rolling', linewidth=2)
axes[1].plot(player_data['date'], player_data['total_distance_acwr_ewma'],
             label='EWMA', linewidth=2, linestyle='--')
axes[1].axhline(0.8, ls='--', color='orange', alpha=0.7)
axes[1].axhline(1.3, ls='--', color='orange', alpha=0.7)
axes[1].axhline(1.5, ls=':', color='red', alpha=0.7)
axes[1].axhspan(0.8, 1.3, alpha=0.1, color='green')
axes[1].set_xlabel('Date')
axes[1].set_ylabel('ACWR')
axes[1].set_title('ACWR Timeline - Player P01')
axes[1].legend()
axes[1].tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()
library(tidyverse)
library(slider)
library(lubridate)

# Simulate GPS training data for a squad over 60 days
set.seed(42)
n_players <- 25
n_days <- 60

# Generate player data
player_info <- tibble(
  player_id = paste0("P", sprintf("%02d", 1:n_players)),
  position = sample(c("GK", "DEF", "MID", "FWD"), n_players, replace = TRUE,
                    prob = c(0.1, 0.35, 0.35, 0.2))
)

# Generate daily training load data
training_data <- expand_grid(
  player_id = player_info$player_id,
  date = seq.Date(Sys.Date() - n_days + 1, Sys.Date(), by = "day")
) %>%
  left_join(player_info, by = "player_id") %>%
  mutate(
    # Simulate training day (lower load on match days and recovery days)
    day_of_week = wday(date),
    is_match_day = day_of_week == 7,  # Saturday matches
    is_recovery = day_of_week == 1,   # Sunday recovery

    # Base loads vary by position
    base_distance = case_when(
      position == "GK" ~ 5.5,
      position == "DEF" ~ 9.5,
      position == "MID" ~ 11,
      position == "FWD" ~ 10
    ),

    # Total distance (km)
    total_distance = case_when(
      is_match_day ~ base_distance * 1.2 + rnorm(n(), 0, 0.8),
      is_recovery ~ base_distance * 0.4 + rnorm(n(), 0, 0.3),
      TRUE ~ base_distance + rnorm(n(), 0, 1.2)
    ) %>% pmax(0),

    # High-speed running (m)
    hsr_distance = case_when(
      is_match_day ~ 1200 + rnorm(n(), 0, 200),
      is_recovery ~ 150 + rnorm(n(), 0, 50),
      TRUE ~ 600 + rnorm(n(), 0, 150)
    ) %>% pmax(0),

    # Sprint distance (m)
    sprint_distance = case_when(
      is_match_day ~ 350 + rnorm(n(), 0, 80),
      is_recovery ~ 30 + rnorm(n(), 0, 15),
      TRUE ~ 180 + rnorm(n(), 0, 50)
    ) %>% pmax(0),

    # sRPE (session RPE * duration)
    srpe_load = case_when(
      is_match_day ~ 90 * 7.5 + rnorm(n(), 0, 50),  # 90 min, RPE ~7.5
      is_recovery ~ 30 * 3 + rnorm(n(), 0, 20),     # 30 min, RPE ~3
      TRUE ~ 75 * 5 + rnorm(n(), 0, 40)             # 75 min, RPE ~5
    ) %>% pmax(0)
  )

# Calculate rolling average ACWR
calculate_rolling_acwr <- function(data, load_col, acute_days = 7, chronic_days = 28) {
  data %>%
    arrange(player_id, date) %>%
    group_by(player_id) %>%
    mutate(
      acute_load = slide_dbl(.data[[load_col]], mean, .before = acute_days - 1, .complete = FALSE),
      chronic_load = slide_dbl(.data[[load_col]], mean, .before = chronic_days - 1, .complete = FALSE),
      acwr_rolling = acute_load / chronic_load
    ) %>%
    ungroup() %>%
    select(player_id, date, acute_load, chronic_load, acwr_rolling)
}

# Calculate EWMA-based ACWR
calculate_ewma_acwr <- function(data, load_col, acute_days = 7, chronic_days = 28) {
  data %>%
    arrange(player_id, date) %>%
    group_by(player_id) %>%
    mutate(
      # EWMA with decay factor
      ewma_acute = pracma::movavg(.data[[load_col]], acute_days, type = "e"),
      ewma_chronic = pracma::movavg(.data[[load_col]], chronic_days, type = "e"),
      acwr_ewma = ewma_acute / ewma_chronic
    ) %>%
    ungroup() %>%
    select(player_id, date, ewma_acute, ewma_chronic, acwr_ewma)
}

# Manual EWMA implementation (if pracma not available)
calculate_ewma_manual <- function(data, load_col, acute_days = 7, chronic_days = 28) {
  lambda_a <- 2 / (acute_days + 1)
  lambda_c <- 2 / (chronic_days + 1)

  data %>%
    arrange(player_id, date) %>%
    group_by(player_id) %>%
    mutate(
      ewma_acute = accumulate(.data[[load_col]], ~ lambda_a * .y + (1 - lambda_a) * .x, .init = first(.data[[load_col]]))[-1],
      ewma_chronic = accumulate(.data[[load_col]], ~ lambda_c * .y + (1 - lambda_c) * .x, .init = first(.data[[load_col]]))[-1],
      acwr_ewma = ewma_acute / ewma_chronic
    ) %>%
    ungroup()
}

# Calculate ACWR for all metrics
metrics <- c("total_distance", "hsr_distance", "sprint_distance", "srpe_load")

acwr_results <- training_data

for (metric in metrics) {
  rolling <- calculate_rolling_acwr(training_data, metric)
  ewma <- calculate_ewma_manual(training_data, metric)

  acwr_results <- acwr_results %>%
    left_join(
      rolling %>%
        rename_with(~paste0(metric, "_", .), c("acute_load", "chronic_load", "acwr_rolling")),
      by = c("player_id", "date")
    ) %>%
    left_join(
      ewma %>%
        select(player_id, date, ewma_acute, ewma_chronic, acwr_ewma) %>%
        rename_with(~paste0(metric, "_", .), c("ewma_acute", "ewma_chronic", "acwr_ewma")),
      by = c("player_id", "date")
    )
}

# Classify risk zones
classify_risk_zone <- function(acwr) {
  case_when(
    is.na(acwr) | is.infinite(acwr) ~ "Insufficient Data",
    acwr < 0.8 ~ "Under-training",
    acwr <= 1.3 ~ "Optimal",
    acwr <= 1.5 ~ "Danger",
    TRUE ~ "High Risk"
  )
}

# Add risk classifications
acwr_results <- acwr_results %>%
  mutate(
    zone_distance_rolling = classify_risk_zone(total_distance_acwr_rolling),
    zone_distance_ewma = classify_risk_zone(total_distance_acwr_ewma),
    zone_hsr_rolling = classify_risk_zone(hsr_distance_acwr_rolling),
    zone_srpe_rolling = classify_risk_zone(srpe_load_acwr_rolling)
  )

# Generate daily team status report
generate_team_status <- function(data, report_date = Sys.Date()) {
  daily_data <- data %>%
    filter(date == report_date)

  cat("=" , rep("=", 60), "\n", sep = "")
  cat("DAILY LOAD MONITORING REPORT - ", format(report_date, "%Y-%m-%d"), "\n", sep = "")
  cat("=" , rep("=", 60), "\n\n", sep = "")

  # Team summary
  team_summary <- daily_data %>%
    summarise(
      n_players = n(),
      n_optimal = sum(zone_distance_rolling == "Optimal"),
      n_danger = sum(zone_distance_rolling %in% c("Danger", "High Risk")),
      n_undertrain = sum(zone_distance_rolling == "Under-training"),
      avg_acwr = mean(total_distance_acwr_rolling, na.rm = TRUE)
    )

  cat("TEAM SUMMARY:\n")
  cat("  Players Monitored:", team_summary$n_players, "\n")
  cat("  In Optimal Zone:", team_summary$n_optimal, "\n")
  cat("  In Danger/High Risk:", team_summary$n_danger, "\n")
  cat("  Under-training:", team_summary$n_undertrain, "\n")
  cat("  Team Avg ACWR:", round(team_summary$avg_acwr, 2), "\n\n")

  # Players requiring attention
  attention_players <- daily_data %>%
    filter(zone_distance_rolling %in% c("Danger", "High Risk") |
           zone_hsr_rolling %in% c("Danger", "High Risk") |
           zone_srpe_rolling %in% c("Danger", "High Risk")) %>%
    select(player_id, position,
           total_distance_acwr_rolling, zone_distance_rolling,
           hsr_distance_acwr_rolling, zone_hsr_rolling,
           srpe_load_acwr_rolling, zone_srpe_rolling)

  if (nrow(attention_players) > 0) {
    cat("PLAYERS REQUIRING ATTENTION:\n")
    print(attention_players)
  } else {
    cat("No players in danger zones.\n")
  }

  return(invisible(list(summary = team_summary, attention = attention_players)))
}

# Generate report for today
report <- generate_team_status(acwr_results)

# Visualize ACWR comparison (Rolling vs EWMA)
latest_data <- acwr_results %>%
  filter(date == max(date))

p1 <- ggplot(latest_data, aes(x = total_distance_acwr_rolling, y = total_distance_acwr_ewma)) +
  geom_point(aes(color = zone_distance_rolling), size = 3) +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
  geom_hline(yintercept = c(0.8, 1.3), linetype = "dotted", color = "orange") +
  geom_vline(xintercept = c(0.8, 1.3), linetype = "dotted", color = "orange") +
  scale_color_manual(values = c(
    "Under-training" = "blue", "Optimal" = "green",
    "Danger" = "orange", "High Risk" = "red", "Insufficient Data" = "gray"
  )) +
  labs(
    title = "ACWR Comparison: Rolling Average vs EWMA",
    x = "Rolling Average ACWR", y = "EWMA ACWR",
    color = "Risk Zone"
  ) +
  theme_minimal()

# Individual player timeline
plot_player_acwr <- function(data, pid) {
  player_data <- data %>%
    filter(player_id == pid, date >= max(date) - 28)

  ggplot(player_data, aes(x = date)) +
    geom_line(aes(y = total_distance_acwr_rolling, color = "Rolling"), linewidth = 1) +
    geom_line(aes(y = total_distance_acwr_ewma, color = "EWMA"), linewidth = 1, linetype = "dashed") +
    geom_hline(yintercept = c(0.8, 1.3, 1.5), linetype = c("dashed", "dashed", "dotted"),
               color = c("orange", "orange", "red")) +
    annotate("rect", xmin = min(player_data$date), xmax = max(player_data$date),
             ymin = 0.8, ymax = 1.3, fill = "green", alpha = 0.1) +
    scale_color_manual(values = c("Rolling" = "steelblue", "EWMA" = "darkred")) +
    labs(
      title = paste("ACWR Timeline - Player", pid),
      x = "", y = "ACWR", color = "Method"
    ) +
    theme_minimal()
}

print(p1)
print(plot_player_acwr(acwr_results, "P01"))
Exercise 22.2: Comprehensive Injury Risk Prediction Model

Task: Build a machine learning model to predict soft tissue injuries (within next 7 days) using training load history, ACWR, wellness data, and player characteristics. Handle class imbalance and evaluate model performance.

Requirements:

  • Engineer features from load data (ACWR, load spikes, cumulative load)
  • Include wellness metrics (sleep, fatigue, muscle soreness)
  • Handle class imbalance using SMOTE or class weights
  • Evaluate with ROC-AUC, precision-recall curves
  • Create feature importance analysis
  • Generate daily injury risk scores for squad

injury_risk_prediction.py
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import roc_auc_score, classification_report, roc_curve
from imblearn.over_sampling import SMOTE
import matplotlib.pyplot as plt

# Simulate comprehensive injury prediction dataset
np.random.seed(42)
n_records = 5000

injury_data = pd.DataFrame({
    'record_id': range(n_records),
    'player_id': np.random.choice([f'P{i:02d}' for i in range(1, 31)], n_records),
    'age': np.round(np.random.uniform(18, 35, n_records), 1),
    'position': np.random.choice(['GK', 'DEF', 'MID', 'FWD'], n_records, p=[0.1, 0.35, 0.35, 0.2]),

    # Training load metrics
    'acwr_distance': np.random.uniform(0.5, 2.0, n_records),
    'acwr_hsr': np.random.uniform(0.4, 2.2, n_records),
    'acwr_sprint': np.random.uniform(0.3, 2.5, n_records),
    'chronic_load_28d': np.random.uniform(200, 500, n_records),
    'acute_load_7d': np.random.uniform(50, 150, n_records),

    # Load variability
    'load_cv_7d': np.random.uniform(0.1, 0.6, n_records),
    'load_spike_3d': np.random.uniform(-30, 50, n_records),
    'cumulative_load_14d': np.random.uniform(80, 250, n_records),

    # Wellness metrics (1-10 scale)
    'sleep_quality_7d_avg': np.random.uniform(4, 9, n_records),
    'fatigue_score_7d_avg': np.random.uniform(3, 8, n_records),
    'muscle_soreness_7d_avg': np.random.uniform(3, 9, n_records),
    'mood_score_7d_avg': np.random.uniform(4, 9, n_records),
    'hrv_rmssd': np.random.uniform(30, 90, n_records),

    # Injury history
    'days_since_last_injury': np.concatenate([
        np.random.uniform(14, 365, int(n_records * 0.3)),
        np.full(int(n_records * 0.7), 365)
    ]),
    'injuries_last_12_months': np.random.choice([0, 1, 2, 3, 4], n_records, p=[0.5, 0.3, 0.15, 0.04, 0.01]),

    # Match load
    'matches_last_14d': np.random.choice([0, 1, 2, 3, 4, 5], n_records, p=[0.1, 0.25, 0.35, 0.2, 0.08, 0.02]),
    'consecutive_starts': np.random.choice(range(11), n_records)
})

# Shuffle days_since_last_injury
injury_data['days_since_last_injury'] = np.random.permutation(injury_data['days_since_last_injury'])

# Create injury outcome
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# Risk factors
injury_data['acwr_risk'] = np.maximum(0, (np.maximum.reduce([
    injury_data['acwr_distance'],
    injury_data['acwr_hsr'],
    injury_data['acwr_sprint']
]) - 1.3) * 2)

injury_data['wellness_risk'] = (10 - (
    injury_data['sleep_quality_7d_avg'] +
    injury_data['fatigue_score_7d_avg'] +
    injury_data['muscle_soreness_7d_avg']
) / 3) / 5

injury_data['history_risk'] = (
    np.where(injury_data['days_since_last_injury'] < 60, 0.3, 0) +
    injury_data['injuries_last_12_months'] * 0.1
)

injury_data['spike_risk'] = np.maximum(0, injury_data['load_spike_3d'] / 50)
injury_data['age_risk'] = np.maximum(0, (injury_data['age'] - 28) / 20)

# Combined injury probability
injury_data['injury_prob'] = sigmoid(
    -3 +
    1.5 * injury_data['acwr_risk'] +
    1.2 * injury_data['wellness_risk'] +
    1.0 * injury_data['history_risk'] +
    0.8 * injury_data['spike_risk'] +
    0.5 * injury_data['age_risk'] +
    np.random.normal(0, 0.3, n_records)
)

injury_data['injury_next_7d'] = np.random.binomial(1, injury_data['injury_prob'])

print(f"Injury rate: {injury_data['injury_next_7d'].mean() * 100:.1f}%")

# Feature engineering
injury_data['any_acwr_danger'] = (
    (injury_data['acwr_distance'] > 1.3) |
    (injury_data['acwr_hsr'] > 1.3) |
    (injury_data['acwr_sprint'] > 1.3)
).astype(int)

injury_data['max_acwr'] = np.maximum.reduce([
    injury_data['acwr_distance'],
    injury_data['acwr_hsr'],
    injury_data['acwr_sprint']
])

injury_data['wellness_composite'] = (
    injury_data['sleep_quality_7d_avg'] +
    injury_data['fatigue_score_7d_avg'] +
    injury_data['muscle_soreness_7d_avg'] +
    injury_data['mood_score_7d_avg']
) / 4

injury_data['recent_injury'] = (injury_data['days_since_last_injury'] < 90).astype(int)

# Features for model
features = ['acwr_distance', 'acwr_hsr', 'acwr_sprint', 'chronic_load_28d',
            'load_cv_7d', 'load_spike_3d', 'cumulative_load_14d',
            'sleep_quality_7d_avg', 'fatigue_score_7d_avg', 'muscle_soreness_7d_avg',
            'hrv_rmssd', 'days_since_last_injury', 'injuries_last_12_months',
            'matches_last_14d', 'consecutive_starts', 'age',
            'any_acwr_danger', 'max_acwr', 'wellness_composite', 'recent_injury']

X = injury_data[features]
y = injury_data['injury_next_7d']

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
                                                     random_state=42, stratify=y)

# Handle class imbalance with SMOTE
smote = SMOTE(random_state=42)
X_train_balanced, y_train_balanced = smote.fit_resample(X_train, y_train)

print(f"Training set balanced: {pd.Series(y_train_balanced).value_counts().to_dict()}")

# Train Gradient Boosting model
model = GradientBoostingClassifier(
    n_estimators=300,
    max_depth=5,
    learning_rate=0.1,
    random_state=42
)
model.fit(X_train_balanced, y_train_balanced)

# Predictions
y_proba = model.predict_proba(X_test)[:, 1]
y_pred = model.predict(X_test)

# Evaluation
auc_score = roc_auc_score(y_test, y_proba)
print(f"\nModel Performance:")
print(f"ROC-AUC: {auc_score:.3f}")

# Cross-validation
cv_scores = cross_val_score(model, X, y, cv=5, scoring='roc_auc')
print(f"CV ROC-AUC: {cv_scores.mean():.3f} (+/- {cv_scores.std():.3f})")

print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=['No Injury', 'Injury']))

# Feature importance
importance_df = pd.DataFrame({
    'feature': features,
    'importance': model.feature_importances_
}).sort_values('importance', ascending=False).head(15)

print("\nTop 15 Risk Factors:")
print(importance_df.to_string(index=False))

# Visualization
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Feature importance
axes[0].barh(importance_df['feature'], importance_df['importance'], color='coral')
axes[0].set_xlabel('Importance')
axes[0].set_title('Injury Prediction - Feature Importance')
axes[0].invert_yaxis()

# ROC Curve
fpr, tpr, _ = roc_curve(y_test, y_proba)
axes[1].plot(fpr, tpr, linewidth=2, label=f'AUC = {auc_score:.3f}')
axes[1].plot([0, 1], [0, 1], 'k--')
axes[1].set_xlabel('False Positive Rate')
axes[1].set_ylabel('True Positive Rate')
axes[1].set_title('Injury Prediction ROC Curve')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Squad risk report function
def generate_squad_risk_report(data, model, features):
    """Generate daily injury risk report for squad."""
    data = data.copy()
    data['injury_risk'] = model.predict_proba(data[features])[:, 1] * 100

    data['risk_level'] = pd.cut(
        data['injury_risk'],
        bins=[0, 20, 40, 100],
        labels=['LOW', 'MODERATE', 'HIGH']
    )

    print("\n=== SQUAD INJURY RISK REPORT ===")
    print(f"Date: {pd.Timestamp.now().strftime('%Y-%m-%d')}\n")

    for level in ['HIGH', 'MODERATE']:
        print(f"{level} RISK PLAYERS:")
        level_data = data[data['risk_level'] == level][[
            'player_id', 'position', 'age', 'injury_risk', 'max_acwr',
            'wellness_composite', 'days_since_last_injury'
        ]].sort_values('injury_risk', ascending=False)

        if len(level_data) > 0:
            print(level_data.round(2).to_string(index=False))
        else:
            print("None")
        print()

    print("Summary:")
    print(f"  High Risk: {(data['risk_level'] == 'HIGH').sum()} players")
    print(f"  Moderate Risk: {(data['risk_level'] == 'MODERATE').sum()} players")
    print(f"  Low Risk: {(data['risk_level'] == 'LOW').sum()} players")

    return data

# Simulate current squad
current_squad = injury_data.groupby('player_id').last().reset_index().head(25)
squad_report = generate_squad_risk_report(current_squad, model, features)
library(tidyverse)
library(caret)
library(randomForest)
library(pROC)
library(ROSE)

# Simulate comprehensive injury prediction dataset
set.seed(42)
n_records <- 5000

injury_data <- tibble(
  record_id = 1:n_records,
  player_id = sample(paste0("P", sprintf("%02d", 1:30)), n_records, replace = TRUE),
  age = round(runif(n_records, 18, 35), 1),
  position = sample(c("GK", "DEF", "MID", "FWD"), n_records, replace = TRUE, prob = c(0.1, 0.35, 0.35, 0.2)),

  # Training load metrics
  acwr_distance = runif(n_records, 0.5, 2.0),
  acwr_hsr = runif(n_records, 0.4, 2.2),
  acwr_sprint = runif(n_records, 0.3, 2.5),
  chronic_load_28d = runif(n_records, 200, 500),
  acute_load_7d = runif(n_records, 50, 150),

  # Load variability
  load_cv_7d = runif(n_records, 0.1, 0.6),
  load_spike_3d = runif(n_records, -30, 50),  # % change from previous week
  cumulative_load_14d = runif(n_records, 80, 250),

  # Wellness metrics (1-10 scale, higher = better)
  sleep_quality_7d_avg = runif(n_records, 4, 9),
  fatigue_score_7d_avg = runif(n_records, 3, 8),  # Higher = less fatigued
  muscle_soreness_7d_avg = runif(n_records, 3, 9),  # Higher = less sore
  mood_score_7d_avg = runif(n_records, 4, 9),
  hrv_rmssd = runif(n_records, 30, 90),  # Heart rate variability

  # Injury history
  days_since_last_injury = sample(c(runif(n_records * 0.3, 14, 365), rep(365, n_records * 0.7))),
  injuries_last_12_months = sample(0:4, n_records, replace = TRUE, prob = c(0.5, 0.3, 0.15, 0.04, 0.01)),

  # Match load
  matches_last_14d = sample(0:5, n_records, replace = TRUE, prob = c(0.1, 0.25, 0.35, 0.2, 0.08, 0.02)),
  consecutive_starts = sample(0:10, n_records, replace = TRUE)
)

# Create injury outcome (7-day window)
# Higher risk with: high ACWR, poor wellness, recent injury history, high load spike
injury_data <- injury_data %>%
  mutate(
    # Risk factors
    acwr_risk = pmax(0, (pmax(acwr_distance, acwr_hsr, acwr_sprint) - 1.3) * 2),
    wellness_risk = (10 - (sleep_quality_7d_avg + fatigue_score_7d_avg + muscle_soreness_7d_avg) / 3) / 5,
    history_risk = ifelse(days_since_last_injury < 60, 0.3, 0) + injuries_last_12_months * 0.1,
    spike_risk = pmax(0, load_spike_3d / 50),
    age_risk = pmax(0, (age - 28) / 20),

    # Combined injury probability
    injury_prob = plogis(-3 +
      1.5 * acwr_risk +
      1.2 * wellness_risk +
      1.0 * history_risk +
      0.8 * spike_risk +
      0.5 * age_risk +
      rnorm(n_records, 0, 0.3)
    ),

    # Binary outcome (approximately 8% injury rate)
    injury_next_7d = rbinom(n_records, 1, injury_prob)
  ) %>%
  mutate(injury_next_7d = as.factor(injury_next_7d))

cat("Injury rate:", mean(injury_data$injury_next_7d == "1") * 100, "%\n")

# Feature engineering
model_features <- injury_data %>%
  mutate(
    # Combined ACWR risk flag
    any_acwr_danger = as.numeric(acwr_distance > 1.3 | acwr_hsr > 1.3 | acwr_sprint > 1.3),
    max_acwr = pmax(acwr_distance, acwr_hsr, acwr_sprint),

    # Wellness composite
    wellness_composite = (sleep_quality_7d_avg + fatigue_score_7d_avg + muscle_soreness_7d_avg + mood_score_7d_avg) / 4,

    # Load trend
    load_trend = acute_load_7d / chronic_load_28d,

    # Recent injury flag
    recent_injury = as.numeric(days_since_last_injury < 90)
  )

# Select features for model
features <- c("acwr_distance", "acwr_hsr", "acwr_sprint", "chronic_load_28d",
              "load_cv_7d", "load_spike_3d", "cumulative_load_14d",
              "sleep_quality_7d_avg", "fatigue_score_7d_avg", "muscle_soreness_7d_avg",
              "hrv_rmssd", "days_since_last_injury", "injuries_last_12_months",
              "matches_last_14d", "consecutive_starts", "age",
              "any_acwr_danger", "max_acwr", "wellness_composite", "recent_injury")

# Split data
set.seed(42)
train_idx <- createDataPartition(model_features$injury_next_7d, p = 0.7, list = FALSE)
train_data <- model_features[train_idx, ]
test_data <- model_features[-train_idx, ]

# Handle class imbalance with ROSE (Random Over-Sampling Examples)
train_balanced <- ROSE(injury_next_7d ~ .,
                       data = train_data[, c(features, "injury_next_7d")],
                       seed = 42)$data

cat("Training set balanced:", table(train_balanced$injury_next_7d), "\n")

# Train Random Forest
rf_model <- randomForest(
  injury_next_7d ~ .,
  data = train_balanced,
  ntree = 500,
  mtry = 5,
  importance = TRUE
)

# Predictions
test_probs <- predict(rf_model, test_data, type = "prob")[, "1"]
test_preds <- predict(rf_model, test_data)

# Evaluation
roc_result <- roc(test_data$injury_next_7d, test_probs)
cat("\nModel Performance:\n")
cat("ROC-AUC:", round(auc(roc_result), 3), "\n")

# Confusion matrix at optimal threshold
optimal_threshold <- coords(roc_result, "best", ret = "threshold")
test_preds_optimal <- factor(ifelse(test_probs >= optimal_threshold, "1", "0"), levels = c("0", "1"))
cm <- confusionMatrix(test_preds_optimal, test_data$injury_next_7d, positive = "1")
print(cm)

# Feature importance
importance_df <- importance(rf_model) %>%
  as.data.frame() %>%
  rownames_to_column("feature") %>%
  arrange(desc(MeanDecreaseGini)) %>%
  head(15)

cat("\nTop 15 Risk Factors:\n")
print(importance_df)

# Plot feature importance
ggplot(importance_df, aes(x = reorder(feature, MeanDecreaseGini), y = MeanDecreaseGini)) +
  geom_bar(stat = "identity", fill = "coral") +
  coord_flip() +
  labs(title = "Injury Prediction - Feature Importance",
       x = "", y = "Importance (Mean Decrease Gini)") +
  theme_minimal()

# Generate daily risk scores for squad
generate_squad_risk_report <- function(current_data, model) {
  current_data$injury_risk <- predict(model, current_data, type = "prob")[, "1"] * 100

  risk_report <- current_data %>%
    select(player_id, position, age, injury_risk, max_acwr, wellness_composite,
           days_since_last_injury) %>%
    mutate(
      risk_level = case_when(
        injury_risk >= 40 ~ "HIGH",
        injury_risk >= 20 ~ "MODERATE",
        TRUE ~ "LOW"
      )
    ) %>%
    arrange(desc(injury_risk))

  cat("\n=== SQUAD INJURY RISK REPORT ===\n")
  cat("Date:", format(Sys.Date(), "%Y-%m-%d"), "\n\n")

  cat("HIGH RISK PLAYERS:\n")
  high_risk <- risk_report %>% filter(risk_level == "HIGH")
  if (nrow(high_risk) > 0) {
    print(high_risk)
  } else {
    cat("None\n")
  }

  cat("\nMODERATE RISK PLAYERS:\n")
  mod_risk <- risk_report %>% filter(risk_level == "MODERATE")
  if (nrow(mod_risk) > 0) {
    print(mod_risk)
  } else {
    cat("None\n")
  }

  cat("\nSummary:\n")
  cat("  High Risk:", nrow(high_risk), "players\n")
  cat("  Moderate Risk:", nrow(mod_risk), "players\n")
  cat("  Low Risk:", nrow(risk_report) - nrow(high_risk) - nrow(mod_risk), "players\n")

  return(risk_report)
}

# Simulate current squad data
current_squad <- model_features %>%
  filter(player_id %in% sample(unique(player_id), 25)) %>%
  group_by(player_id) %>%
  slice_tail(n = 1) %>%
  ungroup()

squad_report <- generate_squad_risk_report(current_squad, rf_model)

# ROC curve plot
plot(roc_result, main = "Injury Prediction ROC Curve",
     print.auc = TRUE, auc.polygon = TRUE, grid = TRUE)
Exercise 22.3: Match-Day Availability & Readiness Dashboard

Task: Create a comprehensive match-day readiness report that integrates load status, injury risk, recovery metrics, and return-to-play progress to give coaches a clear picture of player availability and fitness status.

Requirements:

  • Combine ACWR status, injury risk score, and wellness metrics
  • Create "Match Readiness Score" (0-100)
  • Flag players recovering from injury with RTP progress
  • Generate squad depth chart with availability status
  • Recommend load modifications for at-risk players

matchday_readiness.py
import pandas as pd
import numpy as np
from datetime import datetime
import matplotlib.pyplot as plt

# Simulate squad data for match-day readiness
np.random.seed(42)
n_players = 25

positions = ['GK', 'GK'] + ['DEF']*5 + ['MID']*6 + ['FWD']*4 + ['DEF']*3 + ['MID']*3 + ['FWD']

squad_data = pd.DataFrame({
    'player_id': [f'P{i:02d}' for i in range(1, n_players + 1)],
    'player_name': [f'Player_{i}' for i in range(1, n_players + 1)],
    'position': positions,
    'age': np.random.randint(19, 35, n_players),

    # Current load status
    'acwr_distance': np.random.uniform(0.6, 1.6, n_players),
    'acwr_hsr': np.random.uniform(0.5, 1.7, n_players),
    'chronic_load_fitness': np.random.uniform(60, 95, n_players),

    # Wellness metrics
    'sleep_quality': np.random.randint(4, 10, n_players),
    'fatigue': np.random.randint(3, 10, n_players),
    'muscle_soreness': np.random.randint(3, 10, n_players),
    'mood': np.random.randint(5, 10, n_players),

    # Injury risk
    'injury_risk_score': np.random.uniform(5, 45, n_players),

    # Injury status
    'is_injured': np.random.choice([True, False], n_players, p=[0.15, 0.85]),

    # Recent matches
    'minutes_last_7d': np.random.randint(0, 271, n_players),
    'matches_last_14d': np.random.randint(0, 5, n_players)
})

# Add injury details for injured players
injury_types = ['Hamstring', 'Ankle', 'Groin', 'Calf', 'Knee']
squad_data['injury_type'] = np.where(
    squad_data['is_injured'],
    np.random.choice(injury_types, n_players),
    None
)
squad_data['days_injured'] = np.where(
    squad_data['is_injured'],
    np.random.randint(7, 46, n_players),
    0
)

# RTP metrics
squad_data['rtp_distance_pct'] = np.where(
    squad_data['is_injured'],
    np.random.uniform(50, 100, n_players),
    100
)
squad_data['rtp_hsr_pct'] = np.where(
    squad_data['is_injured'],
    np.random.uniform(40, 100, n_players),
    100
)
squad_data['rtp_sprint_pct'] = np.where(
    squad_data['is_injured'],
    np.random.uniform(30, 100, n_players),
    100
)
squad_data['rtp_cleared'] = np.where(
    squad_data['is_injured'],
    np.random.random(n_players) > 0.7,
    True
)

def calculate_readiness_score(data):
    """Calculate comprehensive match readiness score."""
    data = data.copy()

    # ACWR component
    data['acwr_score'] = 100 - np.abs(data['acwr_distance'] - 1.05) * 50 - np.abs(data['acwr_hsr'] - 1.05) * 30
    data['acwr_score'] = np.clip(data['acwr_score'], 0, 100)

    # Wellness component
    data['wellness_score'] = ((data['sleep_quality'] + data['fatigue'] +
                               data['muscle_soreness'] + data['mood']) / 4 - 1) / 9 * 100

    # Injury risk component
    data['injury_component'] = np.clip(100 - data['injury_risk_score'] * 1.5, 0, 100)

    # RTP score
    data['rtp_score'] = (data['rtp_distance_pct'] * 0.4 +
                         data['rtp_hsr_pct'] * 0.35 +
                         data['rtp_sprint_pct'] * 0.25)

    # Combined readiness score
    conditions = [
        data['is_injured'] & ~data['rtp_cleared'],
        data['is_injured'] & data['rtp_cleared']
    ]
    choices = [
        np.minimum(50, data['rtp_score'] * 0.5),
        0.3 * data['acwr_score'] + 0.25 * data['wellness_score'] +
        0.25 * data['injury_component'] + 0.2 * data['rtp_score']
    ]
    default = (0.3 * data['acwr_score'] + 0.3 * data['wellness_score'] +
               0.25 * data['injury_component'] + 0.15 * data['chronic_load_fitness'])

    data['match_readiness'] = np.select(conditions, choices, default)
    data['match_readiness'] = np.clip(data['match_readiness'], 0, 100).round(0)

    # Availability status
    conditions = [
        data['is_injured'] & ~data['rtp_cleared'],
        data['match_readiness'] >= 80,
        data['match_readiness'] >= 60,
        data['match_readiness'] >= 40
    ]
    choices = ['UNAVAILABLE', 'FIT', 'AVAILABLE (Monitor)', 'DOUBTFUL']
    data['availability'] = np.select(conditions, choices, default='UNAVAILABLE')

    # Load recommendation
    conditions = [
        data['is_injured'] & ~data['rtp_cleared'],
        (data['acwr_distance'] > 1.5) | (data['acwr_hsr'] > 1.5),
        data['acwr_distance'] < 0.8,
        data['injury_risk_score'] > 35,
        data['wellness_score'] < 50
    ]
    choices = [
        'Continue RTP protocol',
        'REDUCE LOAD - High ACWR',
        'INCREASE LOAD - Undertrained',
        'Modified training - High injury risk',
        'Light session - Poor wellness'
    ]
    data['load_recommendation'] = np.select(conditions, choices, default='Normal training')

    return data

squad_readiness = calculate_readiness_score(squad_data)

def generate_matchday_report(data, match_info="Next Match"):
    """Generate comprehensive match-day readiness report."""
    print("=" * 72)
    print("MATCH-DAY READINESS REPORT")
    print(f"Match: {match_info}")
    print(f"Generated: {datetime.now().strftime('%Y-%m-%d %H:%M')}")
    print("=" * 72)

    # Summary
    print("\nSQUAD SUMMARY:")
    print(f"  Total Squad: {len(data)}")
    print(f"  Fit: {(data['availability'] == 'FIT').sum()}")
    print(f"  Available (Monitor): {(data['availability'] == 'AVAILABLE (Monitor)').sum()}")
    print(f"  Doubtful: {(data['availability'] == 'DOUBTFUL').sum()}")
    print(f"  Unavailable: {(data['availability'] == 'UNAVAILABLE').sum()}")
    print(f"  Average Readiness: {data['match_readiness'].mean():.1f}%")
    print(f"  High Injury Risk: {(data['injury_risk_score'] > 30).sum()} players")

    # Position breakdown
    print("\nAVAILABILITY BY POSITION:")
    position_summary = data.groupby('position').agg({
        'player_id': 'count',
        'availability': lambda x: (x.isin(['FIT', 'AVAILABLE (Monitor)'])).sum()
    }).reset_index()
    position_summary.columns = ['Position', 'Total', 'Available']
    position_summary['Depth'] = position_summary.apply(
        lambda row: f"{row['Available']}/{row['Total']}", axis=1
    )
    print(position_summary.to_string(index=False))

    # Unavailable players
    unavailable = data[data['availability'] == 'UNAVAILABLE'][
        ['player_name', 'position', 'injury_type', 'days_injured', 'rtp_score']
    ]
    if len(unavailable) > 0:
        print("\nUNAVAILABLE PLAYERS:")
        print(unavailable.to_string(index=False))

    # Players requiring attention
    attention = data[
        (data['availability'].isin(['DOUBTFUL', 'AVAILABLE (Monitor)'])) &
        (~data['is_injured'])
    ][['player_name', 'position', 'match_readiness', 'injury_risk_score',
       'acwr_distance', 'wellness_score', 'load_recommendation']]

    if len(attention) > 0:
        print("\nPLAYERS REQUIRING ATTENTION:")
        print(attention.sort_values('injury_risk_score', ascending=False).round(1).to_string(index=False))

    # RTP progress
    rtp_players = data[data['is_injured']][
        ['player_name', 'position', 'injury_type', 'days_injured',
         'rtp_distance_pct', 'rtp_hsr_pct', 'rtp_sprint_pct', 'rtp_cleared']
    ]
    if len(rtp_players) > 0:
        print("\nRETURN-TO-PLAY PROGRESS:")
        print(rtp_players.round(1).to_string(index=False))

    return data

# Generate report
generate_matchday_report(squad_readiness, "Premier League - Matchday 15")

# Full squad readiness
print("\n\nFULL SQUAD READINESS:")
display_cols = ['player_name', 'position', 'match_readiness', 'availability',
                'acwr_distance', 'injury_risk_score', 'load_recommendation']
print(squad_readiness.sort_values(['position', 'match_readiness'],
      ascending=[True, False])[display_cols].round(2).to_string(index=False))

# Visualization
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Box plot by position
colors = {'FIT': 'green', 'AVAILABLE (Monitor)': 'yellow',
          'DOUBTFUL': 'orange', 'UNAVAILABLE': 'red'}

positions_order = ['GK', 'DEF', 'MID', 'FWD']
for i, pos in enumerate(positions_order):
    pos_data = squad_readiness[squad_readiness['position'] == pos]
    jitter_x = i + np.random.uniform(-0.15, 0.15, len(pos_data))
    scatter_colors = [colors.get(a, 'gray') for a in pos_data['availability']]
    axes[0].scatter(jitter_x, pos_data['match_readiness'], c=scatter_colors, s=100, alpha=0.7)

axes[0].boxplot([squad_readiness[squad_readiness['position'] == p]['match_readiness']
                 for p in positions_order], positions=range(4), widths=0.5)
axes[0].set_xticks(range(4))
axes[0].set_xticklabels(positions_order)
axes[0].axhline(80, ls='--', color='green', alpha=0.5, label='FIT threshold')
axes[0].axhline(60, ls='--', color='orange', alpha=0.5, label='DOUBTFUL threshold')
axes[0].set_ylabel('Match Readiness Score (%)')
axes[0].set_title('Squad Match Readiness by Position')
axes[0].legend()

# Availability pie chart
availability_counts = squad_readiness['availability'].value_counts()
pie_colors = [colors.get(a, 'gray') for a in availability_counts.index]
axes[1].pie(availability_counts, labels=availability_counts.index, autopct='%1.0f%%',
            colors=pie_colors, startangle=90)
axes[1].set_title('Squad Availability Distribution')

plt.tight_layout()
plt.show()
library(tidyverse)

# Simulate squad data for match-day readiness
set.seed(42)
n_players <- 25

squad_data <- tibble(
  player_id = paste0("P", sprintf("%02d", 1:n_players)),
  player_name = paste0("Player_", 1:n_players),
  position = c("GK", "GK", rep("DEF", 5), rep("MID", 6), rep("FWD", 4),
               rep("DEF", 3), rep("MID", 3), "FWD"),
  age = round(runif(n_players, 19, 34), 0),

  # Current load status
  acwr_distance = runif(n_players, 0.6, 1.6),
  acwr_hsr = runif(n_players, 0.5, 1.7),
  chronic_load_fitness = runif(n_players, 60, 95),  # Fitness level

  # Wellness metrics (today, 1-10 scale)
  sleep_quality = sample(4:9, n_players, replace = TRUE),
  fatigue = sample(3:9, n_players, replace = TRUE),
  muscle_soreness = sample(3:9, n_players, replace = TRUE),
  mood = sample(5:9, n_players, replace = TRUE),

  # Injury risk (from prediction model, %)
  injury_risk_score = runif(n_players, 5, 45),

  # Injury status
  is_injured = sample(c(TRUE, FALSE), n_players, replace = TRUE, prob = c(0.15, 0.85)),
  injury_type = ifelse(is_injured, sample(c("Hamstring", "Ankle", "Groin", "Calf", "Knee"),
                                          n_players, replace = TRUE), NA),
  days_injured = ifelse(is_injured, sample(7:45, n_players, replace = TRUE), 0),

  # RTP metrics (for injured players)
  rtp_distance_pct = ifelse(is_injured, runif(n_players, 50, 100), 100),
  rtp_hsr_pct = ifelse(is_injured, runif(n_players, 40, 100), 100),
  rtp_sprint_pct = ifelse(is_injured, runif(n_players, 30, 100), 100),
  rtp_cleared = ifelse(is_injured, runif(n_players, 0, 1) > 0.7, TRUE),

  # Recent matches
  minutes_last_7d = round(runif(n_players, 0, 270), 0),
  matches_last_14d = sample(0:4, n_players, replace = TRUE)
)

# Calculate Match Readiness Score (0-100)
calculate_readiness_score <- function(data) {
  data %>%
    mutate(
      # ACWR component (optimal = 100, deviations reduce score)
      acwr_score = 100 - abs(acwr_distance - 1.05) * 50 - abs(acwr_hsr - 1.05) * 30,
      acwr_score = pmax(0, pmin(100, acwr_score)),

      # Wellness component (average of wellness metrics, scaled to 100)
      wellness_score = ((sleep_quality + fatigue + muscle_soreness + mood) / 4 - 1) / 9 * 100,

      # Injury risk component (inverted - lower risk = higher score)
      injury_component = 100 - injury_risk_score * 1.5,
      injury_component = pmax(0, pmin(100, injury_component)),

      # RTP component (for injured players)
      rtp_score = (rtp_distance_pct * 0.4 + rtp_hsr_pct * 0.35 + rtp_sprint_pct * 0.25),

      # Fitness level component
      fitness_score = chronic_load_fitness,

      # Combined readiness score
      match_readiness = case_when(
        is_injured & !rtp_cleared ~ pmin(50, rtp_score * 0.5),  # Injured, not cleared
        is_injured & rtp_cleared ~ 0.3 * acwr_score + 0.25 * wellness_score +
                                    0.25 * injury_component + 0.2 * rtp_score,  # Cleared but recovering
        TRUE ~ 0.3 * acwr_score + 0.3 * wellness_score +
               0.25 * injury_component + 0.15 * fitness_score
      ),
      match_readiness = round(pmax(0, pmin(100, match_readiness)), 0),

      # Availability status
      availability = case_when(
        is_injured & !rtp_cleared ~ "UNAVAILABLE",
        match_readiness >= 80 ~ "FIT",
        match_readiness >= 60 ~ "AVAILABLE (Monitor)",
        match_readiness >= 40 ~ "DOUBTFUL",
        TRUE ~ "UNAVAILABLE"
      ),

      # Load recommendation
      load_recommendation = case_when(
        is_injured & !rtp_cleared ~ "Continue RTP protocol",
        acwr_distance > 1.5 | acwr_hsr > 1.5 ~ "REDUCE LOAD - High ACWR",
        acwr_distance < 0.8 ~ "INCREASE LOAD - Undertrained",
        injury_risk_score > 35 ~ "Modified training - High injury risk",
        wellness_score < 50 ~ "Light session - Poor wellness",
        TRUE ~ "Normal training"
      )
    )
}

squad_readiness <- calculate_readiness_score(squad_data)

# Generate Match-Day Report
generate_matchday_report <- function(data, match_info = "Next Match") {
  cat("=" , rep("=", 70), "\n", sep = "")
  cat("MATCH-DAY READINESS REPORT\n")
  cat("Match:", match_info, "\n")
  cat("Generated:", format(Sys.time(), "%Y-%m-%d %H:%M"), "\n")
  cat("=" , rep("=", 70), "\n\n", sep = "")

  # Summary
  summary_stats <- data %>%
    summarise(
      total_squad = n(),
      fit = sum(availability == "FIT"),
      available_monitor = sum(availability == "AVAILABLE (Monitor)"),
      doubtful = sum(availability == "DOUBTFUL"),
      unavailable = sum(availability == "UNAVAILABLE"),
      avg_readiness = mean(match_readiness),
      high_risk_count = sum(injury_risk_score > 30)
    )

  cat("SQUAD SUMMARY:\n")
  cat("  Total Squad:", summary_stats$total_squad, "\n")
  cat("  Fit:", summary_stats$fit, "\n")
  cat("  Available (Monitor):", summary_stats$available_monitor, "\n")
  cat("  Doubtful:", summary_stats$doubtful, "\n")
  cat("  Unavailable:", summary_stats$unavailable, "\n")
  cat("  Average Readiness:", round(summary_stats$avg_readiness, 1), "%\n")
  cat("  High Injury Risk:", summary_stats$high_risk_count, "players\n\n")

  # Position breakdown
  cat("AVAILABILITY BY POSITION:\n")
  position_summary <- data %>%
    group_by(position) %>%
    summarise(
      total = n(),
      fit = sum(availability %in% c("FIT", "AVAILABLE (Monitor)")),
      unavailable = sum(availability == "UNAVAILABLE"),
      .groups = "drop"
    ) %>%
    mutate(depth = paste0(fit, "/", total))

  print(position_summary)

  # Unavailable players
  unavailable <- data %>%
    filter(availability == "UNAVAILABLE") %>%
    select(player_name, position, injury_type, days_injured, rtp_score)

  if (nrow(unavailable) > 0) {
    cat("\nUNAVAILABLE PLAYERS:\n")
    print(unavailable)
  }

  # Players requiring attention
  attention <- data %>%
    filter(availability %in% c("DOUBTFUL", "AVAILABLE (Monitor)"),
           !is_injured) %>%
    select(player_name, position, match_readiness, injury_risk_score,
           acwr_distance, wellness_score, load_recommendation) %>%
    arrange(desc(injury_risk_score))

  if (nrow(attention) > 0) {
    cat("\nPLAYERS REQUIRING ATTENTION:\n")
    print(attention)
  }

  # RTP Progress (injured players)
  rtp_players <- data %>%
    filter(is_injured) %>%
    select(player_name, position, injury_type, days_injured,
           rtp_distance_pct, rtp_hsr_pct, rtp_sprint_pct, rtp_cleared)

  if (nrow(rtp_players) > 0) {
    cat("\nRETURN-TO-PLAY PROGRESS:\n")
    print(rtp_players)
  }

  return(invisible(data))
}

# Generate report
generate_matchday_report(squad_readiness, "Premier League - Matchday 15")

# Full squad readiness table
cat("\n\nFULL SQUAD READINESS:\n")
squad_readiness %>%
  select(player_name, position, match_readiness, availability,
         acwr_distance, injury_risk_score, load_recommendation) %>%
  arrange(position, desc(match_readiness)) %>%
  print(n = n_players)

# Visualization
library(ggplot2)

# Readiness by position
ggplot(squad_readiness, aes(x = position, y = match_readiness, fill = availability)) +
  geom_boxplot(alpha = 0.7) +
  geom_jitter(width = 0.2, size = 2, alpha = 0.7) +
  scale_fill_manual(values = c(
    "FIT" = "green", "AVAILABLE (Monitor)" = "yellow",
    "DOUBTFUL" = "orange", "UNAVAILABLE" = "red"
  )) +
  labs(
    title = "Squad Match Readiness by Position",
    x = "Position", y = "Match Readiness Score (%)",
    fill = "Availability"
  ) +
  theme_minimal() +
  geom_hline(yintercept = c(60, 80), linetype = "dashed", alpha = 0.5)

Summary

Key Takeaways
  • Training load combines external (distance, speed) and internal (HR, RPE) metrics to quantify physiological stress
  • ACWR sweet spot of 0.8-1.3 minimizes injury risk; values above 1.5 significantly increase injury probability
  • EWMA-based ACWR is more sensitive to recent load changes than simple rolling averages
  • Injury prediction requires load history, player characteristics, and wellness data; class imbalance is a key challenge
  • Return to play decisions should be data-driven, comparing current performance to pre-injury baselines across multiple metrics