Chapter 60

Capstone - Complete Analytics System

Intermediate 30 min read 5 sections 10 code examples
0 of 60 chapters completed (0%)
Learning Objectives
  • Understand different stakeholder perspectives and communication needs
  • Master data storytelling techniques for football analytics
  • Create effective visualizations for technical and non-technical audiences
  • Build compelling presentations for coaches, executives, and scouts
  • Write clear, actionable analytics reports
  • Design intuitive dashboards for real-time decision support
  • Handle objections and build trust with analytics skeptics
  • Develop a communication strategy for your analytics department

The Communication Challenge

The most sophisticated analysis is worthless if it doesn't influence decisions. Analytics communication is about translating complex statistical insights into actionable information that coaches, scouts, executives, and players can understand and trust. This is often the biggest barrier to analytics adoption in football.

The Analytics Gap

Many clubs have invested heavily in data infrastructure and analysts but struggle to see returns because insights don't reach decision-makers in usable form. The best analysts are not just technically skilled—they're effective communicators who understand their audience.

Coaches
Want tactical insights, quickly
Executives
Need business impact, ROI
Scouts
Seek player comparisons, context
Players
Personal, actionable feedback
communication_framework.py
# Python: Communication Style Framework
import pandas as pd

# Define audience profiles and their preferences
audience_profiles = pd.DataFrame({
    "audience": ["Head Coach", "DOF", "Scout", "Owner", "Analyst"],
    "time_available": ["2 minutes", "10 minutes", "15 minutes", "5 minutes", "30+ minutes"],
    "technical_level": ["Low", "Medium", "Medium-High", "Low", "High"],
    "primary_concern": ["Tactical edge", "Value & risk", "Player fit", "ROI", "Methodology"],
    "preferred_format": ["Video + key stat", "Dashboard summary", "Detailed report",
                        "Executive summary", "Full technical doc"]
})

def format_insight(insight, audience):
    """Format analytical insight for specific audience."""
    profile = audience_profiles[audience_profiles["audience"] == audience].iloc[0]

    formatted = {
        "headline": create_headline(insight, profile["primary_concern"]),
        "visualization": select_viz_type(insight, profile["technical_level"]),
        "detail_level": determine_detail(profile["time_available"]),
        "call_to_action": create_cta(insight, profile["audience"])
    }

    return formatted

print(audience_profiles)
# R: Communication Style Framework
library(tidyverse)

# Define audience profiles and their preferences
audience_profiles <- tribble(
    ~audience,    ~time_available, ~technical_level, ~primary_concern, ~preferred_format,
    "Head Coach", "2 minutes",     "Low",            "Tactical edge",  "Video + key stat",
    "DOF",        "10 minutes",    "Medium",         "Value & risk",   "Dashboard summary",
    "Scout",      "15 minutes",    "Medium-High",    "Player fit",     "Detailed report",
    "Owner",      "5 minutes",     "Low",            "ROI",            "Executive summary",
    "Analyst",    "30+ minutes",   "High",           "Methodology",    "Full technical doc"
)

# Map insight to appropriate format
format_insight <- function(insight, audience) {
    profile <- audience_profiles %>% filter(audience == !!audience)

    formatted <- list(
        headline = create_headline(insight, profile$primary_concern),
        visualization = select_viz_type(insight, profile$technical_level),
        detail_level = determine_detail(profile$time_available),
        call_to_action = create_cta(insight, profile$audience)
    )

    return(formatted)
}

print(audience_profiles)
Output
     audience time_available technical_level  primary_concern  preferred_format
0  Head Coach      2 minutes             Low    Tactical edge  Video + key stat
1         DOF     10 minutes          Medium    Value & risk  Dashboard summary
2       Scout     15 minutes     Medium-High      Player fit    Detailed report
3       Owner      5 minutes             Low             ROI  Executive summary
4     Analyst    30+ minutes            High     Methodology  Full technical doc

Data Storytelling

Effective analytics communication follows narrative structures. Rather than presenting data dumps, tell a story with a clear beginning (context), middle (analysis), and end (recommendation).

The STAR Framework for Analytics Stories
Situation

Set the context: What's the problem or opportunity? Why does it matter now?

Task

Define the question: What did we analyze? What was our objective?

Analysis

Present findings: What did the data reveal? Key insights only.

Recommendation

Action items: What should we do? Be specific and actionable.

data_storytelling.py
# Python: Structure Analytics Story
import pandas as pd

def create_analytics_story(analysis_result, context):
    """Create structured analytics story using STAR framework."""
    story = {}

    # SITUATION: Set context
    story["situation"] = f"""Our {context["metric_name"]} has dropped by {abs(context["change_pct"])}%
    over the last {context["period"]} matches.
    This puts us {ordinal(context["league_rank"])} in the league for this metric."""

    # TASK: Define the question
    story["task"] = f"""We analyzed {len(analysis_result["data"])} matches to identify
    the root cause of this decline and potential solutions."""

    # ANALYSIS: Key findings (limit to 3)
    top_insights = analysis_result["insights"].nlargest(3, "impact_score")
    story["findings"] = [
        f"{i+1}. {row['category']}: {row['description']} (Impact: {row['impact_level']})"
        for i, (_, row) in enumerate(top_insights.iterrows())
    ]

    # RECOMMENDATION: Specific actions
    story["recommendation"] = [
        f"- {row['action']} ({row['expected_outcome']})"
        for _, row in analysis_result["recommendations"].iterrows()
    ]

    return story

# Example output
example_story = {
    "situation": "Our pressing success rate has dropped by 12% over the last 8 matches.",
    "task": "We analyzed pressing patterns across 24 matches.",
    "findings": [
        "1. Trigger timing: Pressing 0.5s too late on average",
        "2. Compactness: Team shape stretched 8m wider",
        "3. Recovery runs: 15% fewer recovery sprints"
    ],
    "recommendation": [
        "- Earlier trigger on opposition first touch",
        "- Narrower defensive width in middle third"
    ]
}

print("=== ANALYTICS STORY EXAMPLE ===\n")
print(f"SITUATION:\n{example_story['situation']}\n")
print(f"TASK:\n{example_story['task']}\n")
print("KEY FINDINGS:")
print("\n".join(example_story["findings"]))
print("\nRECOMMENDATIONS:")
print("\n".join(example_story["recommendation"]))
# R: Structure Analytics Story
library(tidyverse)

create_analytics_story <- function(analysis_result, context) {
    story <- list()

    # SITUATION: Set context
    story$situation <- sprintf(
        "Our %s has dropped by %d%% over the last %d matches.
        This puts us %s in the league for this metric.",
        context$metric_name,
        abs(context$change_pct),
        context$period,
        ordinal(context$league_rank)
    )

    # TASK: Define the question
    story$task <- sprintf(
        "We analyzed %d matches to identify the root cause of
        this decline and potential solutions.",
        nrow(analysis_result$data)
    )

    # ANALYSIS: Key findings (limit to 3)
    story$findings <- analysis_result$insights %>%
        arrange(desc(impact_score)) %>%
        head(3) %>%
        mutate(
            finding = sprintf(
                "%d. %s: %s (Impact: %s)",
                row_number(),
                category,
                description,
                impact_level
            )
        ) %>%
        pull(finding)

    # RECOMMENDATION: Specific actions
    story$recommendation <- analysis_result$recommendations %>%
        mutate(
            action = sprintf("- %s (%s)", action, expected_outcome)
        ) %>%
        pull(action)

    return(story)
}

# Example output structure
example_story <- list(
    situation = "Our pressing success rate has dropped by 12% over the last 8 matches.",
    task = "We analyzed pressing patterns across 24 matches.",
    findings = c(
        "1. Trigger timing: Pressing 0.5s too late on average",
        "2. Compactness: Team shape stretched 8m wider",
        "3. Recovery runs: 15% fewer recovery sprints"
    ),
    recommendation = c(
        "- Earlier trigger on opposition first touch",
        "- Narrower defensive width in middle third"
    )
)

cat("=== ANALYTICS STORY EXAMPLE ===\n\n")
cat("SITUATION:\n", example_story$situation, "\n\n")
cat("TASK:\n", example_story$task, "\n\n")
cat("KEY FINDINGS:\n")
cat(paste(example_story$findings, collapse = "\n"), "\n\n")
cat("RECOMMENDATIONS:\n")
cat(paste(example_story$recommendation, collapse = "\n"))
Output
=== ANALYTICS STORY EXAMPLE ===

SITUATION:
Our pressing success rate has dropped by 12% over the last 8 matches.

TASK:
We analyzed pressing patterns across 24 matches.

KEY FINDINGS:
1. Trigger timing: Pressing 0.5s too late on average
2. Compactness: Team shape stretched 8m wider
3. Recovery runs: 15% fewer recovery sprints

RECOMMENDATIONS:
- Earlier trigger on opposition first touch
- Narrower defensive width in middle third

The Pyramid Principle

Start with the conclusion, then provide supporting evidence. Busy stakeholders may only read the first line—make it count.

Bad: Bottom-Up

"We analyzed 2,847 shots from this season. Using logistic regression, we modeled shot quality based on 15 variables including distance, angle, defender proximity... After controlling for various factors... The model suggests... Therefore, we recommend signing Player X."

The reader must wade through methodology to find the point.
Good: Top-Down (Pyramid)

"We should sign Player X—he would add 8-12 goals above our current options at the same cost. Here's why: [3 supporting points]. Detailed methodology available on request."

The key message is immediate. Details follow for those who want them.

Visualization for Different Audiences

The right visualization depends on your audience. Technical analysts may appreciate complex plots; coaches need simple, immediate clarity.

Audience Recommended Visualizations Avoid
Coaches Pitch maps, shot maps, video clips, simple bar charts, player comparison tables Dense scatter plots, statistical distributions, complex multi-panel figures
Executives Trend lines, KPI dashboards, financial projections, league rankings Raw data tables, overly technical metrics without context
Scouts Radar charts, percentile rankings, heat maps, similar player comparisons Aggregate stats without per-90 normalization
Players Personal highlight clips with stats overlay, progress charts, specific zone analysis League-wide comparisons that might demotivate
audience_visualizations.py
# Python: Audience-Appropriate Visualizations
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import numpy as np

def create_coach_shot_map(shots_data, player_name):
    """Create simple, clear shot map for coaches."""
    fig, ax = plt.subplots(figsize=(12, 8))

    # Pitch background
    ax.set_facecolor("#2E7D32")
    ax.add_patch(patches.Rectangle((0, 0), 120, 80, fill=False,
                                    edgecolor="white", linewidth=2))

    # Goal
    ax.add_patch(patches.Rectangle((114, 30), 6, 20, fill=False,
                                    edgecolor="white", linewidth=3))

    # Plot shots with colors by result
    colors = {"Goal": "#FFD700", "Saved": "white",
              "Blocked": "gray", "Off Target": "red"}

    for _, shot in shots_data.iterrows():
        color = colors.get(shot["result"], "white")
        size = shot["xg"] * 300 + 50
        ax.scatter(shot["x"], shot["y"], c=color, s=size,
                   alpha=0.8, edgecolors="black", linewidths=0.5)

    # Title and stats
    goals = (shots_data["result"] == "Goal").sum()
    total_xg = shots_data["xg"].sum()
    n_shots = len(shots_data)

    ax.text(60, 75, f"{player_name} - Shot Map",
            ha="center", fontsize=16, fontweight="bold", color="white")
    ax.text(60, 5, f"Goals: {goals} | xG: {total_xg:.1f} | Shots: {n_shots}",
            ha="center", fontsize=12, color="white")

    ax.set_xlim(0, 120)
    ax.set_ylim(0, 80)
    ax.set_aspect("equal")
    ax.axis("off")

    # Legend
    for result, color in colors.items():
        ax.scatter([], [], c=color, s=100, label=result, edgecolors="black")
    ax.legend(loc="lower center", ncol=4, frameon=False)

    plt.tight_layout()
    return fig

def create_executive_trend(performance_data, metric, target):
    """Create trend chart with target for executives."""
    fig, ax = plt.subplots(figsize=(10, 6))

    x = performance_data["matchweek"]
    y = performance_data[metric]

    # Target line
    ax.axhline(y=target, linestyle="--", color="#1B5E20",
               linewidth=2, label=f"Target: {target}")

    # Trend line
    ax.plot(x, y, color="#2E7D32", linewidth=2)

    # Points colored by above/below target
    above = y >= target
    ax.scatter(x[above], y[above], color="#1B5E20", s=80, zorder=5)
    ax.scatter(x[~above], y[~above], color="#D32F2F", s=80, zorder=5)

    # Current status
    current = y.iloc[-1]
    gap = current - target

    ax.set_title(f"Performance: {metric}", fontsize=14, fontweight="bold")
    ax.set_xlabel("Matchweek")
    ax.set_ylabel(metric)

    # Add summary annotation
    ax.text(0.02, 0.98,
            f"Current: {current:.1f} | Target: {target} | Gap: {gap:+.1f}",
            transform=ax.transAxes, fontsize=10, verticalalignment="top",
            bbox=dict(boxstyle="round", facecolor="wheat", alpha=0.5))

    ax.legend(loc="upper right")
    ax.grid(True, alpha=0.3)

    plt.tight_layout()
    return fig
# R: Audience-Appropriate Visualizations
library(tidyverse)
library(ggplot2)
library(patchwork)

# COACH VERSION: Simple, clear shot map
create_coach_shot_map <- function(shots_data, player_name) {
    ggplot(shots_data, aes(x = x, y = y)) +
        # Pitch background
        annotate("rect", xmin = 0, xmax = 120, ymin = 0, ymax = 80,
                 fill = "#2E7D32", color = "white") +
        # Goal
        annotate("rect", xmin = 114, xmax = 120, ymin = 30, ymax = 50,
                 fill = NA, color = "white", linewidth = 2) +
        # Shots
        geom_point(aes(size = xg, color = result),
                   alpha = 0.8) +
        scale_color_manual(values = c("Goal" = "#FFD700", "Saved" = "white",
                                       "Blocked" = "gray", "Off Target" = "red")) +
        scale_size_continuous(range = c(2, 8), guide = "none") +
        # Simple annotations
        annotate("text", x = 60, y = 75,
                 label = sprintf("%s - Shot Map", player_name),
                 size = 6, fontface = "bold", color = "white") +
        annotate("text", x = 60, y = 5,
                 label = sprintf("Goals: %d | xG: %.1f | Shots: %d",
                                sum(shots_data$result == "Goal"),
                                sum(shots_data$xg),
                                nrow(shots_data)),
                 size = 4, color = "white") +
        coord_fixed() +
        theme_void() +
        theme(legend.position = "bottom",
              legend.text = element_text(color = "black"),
              plot.background = element_rect(fill = "white"))
}

# EXECUTIVE VERSION: Trend with context
create_executive_trend <- function(performance_data, metric, target) {
    ggplot(performance_data, aes(x = matchweek, y = !!sym(metric))) +
        geom_hline(yintercept = target, linetype = "dashed",
                   color = "#1B5E20", linewidth = 1) +
        geom_line(color = "#2E7D32", linewidth = 1.5) +
        geom_point(aes(color = !!sym(metric) >= target), size = 3) +
        scale_color_manual(values = c("TRUE" = "#1B5E20", "FALSE" = "#D32F2F"),
                          guide = "none") +
        annotate("text", x = max(performance_data$matchweek), y = target,
                 label = "Target", hjust = -0.1, color = "#1B5E20") +
        labs(title = sprintf("Performance: %s", metric),
             subtitle = sprintf("Current: %.1f | Target: %.1f | Gap: %+.1f",
                               tail(performance_data[[metric]], 1),
                               target,
                               tail(performance_data[[metric]], 1) - target),
             x = "Matchweek", y = metric) +
        theme_minimal() +
        theme(plot.title = element_text(face = "bold"))
}

The "So What?" Test

Every visualization should pass the "So What?" test. After viewing, the audience should immediately understand what it means for them.

so_what_test.py
# Python: Add "So What" Context to Visualizations
def add_so_what_context(fig, ax, insight, action):
    """Add interpretive context to make visualization actionable."""
    # Add insight as subtitle
    ax.text(0.5, 1.02, insight,
            transform=ax.transAxes, ha="center", fontsize=10,
            style="italic", color="#1B5E20")

    # Add action as footer
    fig.text(0.1, 0.02, f"Recommendation: {action}",
             fontsize=9, color="#D32F2F", fontweight="bold")

    plt.subplots_adjust(bottom=0.12, top=0.88)
    return fig

# Example: xG difference chart with context
fig, ax = plt.subplots(figsize=(10, 6))

colors = ["#1B5E20" if x > 0 else "#D32F2F" for x in match_data["xg_diff"]]
ax.bar(match_data["matchweek"], match_data["xg_diff"], color=colors)
ax.axhline(y=0, color="black", linewidth=0.5)
ax.set_xlabel("Matchweek")
ax.set_ylabel("xG Difference")
ax.set_title("xG Difference by Match", fontweight="bold")

# Add the "So What?"
fig = add_so_what_context(
    fig, ax,
    insight="Creating more chances than conceding - underlying performance is strong",
    action="Maintain current tactical approach; results will likely improve"
)

plt.show()
# R: Add "So What" Context to Visualizations
add_so_what_context <- function(plot, insight, action) {
    # Add interpretive subtitle and actionable caption
    plot +
        labs(
            subtitle = insight,  # What does this mean?
            caption = paste("Recommendation:", action)  # What should we do?
        ) +
        theme(
            plot.subtitle = element_text(color = "#1B5E20", face = "italic"),
            plot.caption = element_text(color = "#D32F2F", face = "bold",
                                        hjust = 0, size = 10)
        )
}

# Example: Adding context to xG chart
xg_plot <- ggplot(match_data, aes(x = matchweek, y = xg_diff)) +
    geom_col(aes(fill = xg_diff > 0)) +
    scale_fill_manual(values = c("TRUE" = "#1B5E20", "FALSE" = "#D32F2F"),
                      guide = "none") +
    labs(title = "xG Difference by Match", x = "Matchweek", y = "xG Difference")

# Add the "So What?"
xg_plot_with_context <- add_so_what_context(
    xg_plot,
    insight = "We're creating more chances than conceding - underlying performance is strong",
    action = "Maintain current tactical approach; results will likely improve"
)

print(xg_plot_with_context)

Presenting to Coaches

Coaches are time-poor and action-oriented. They need insights that directly inform training sessions, team selection, and match tactics. Build trust by speaking their language and respecting their expertise.

Do
  • Lead with video clips supported by data
  • Use football terminology, not statistical jargon
  • Provide specific, actionable recommendations
  • Acknowledge what you don't know
  • Respect their observational expertise
  • Keep it brief—2 minutes max for key points
  • Offer deeper dives on request
Don't
  • Lecture on statistical methodology
  • Present data that contradicts their observation without humility
  • Overwhelm with too many metrics
  • Use percentiles without context
  • Claim certainty where uncertainty exists
  • Ignore their questions or pushback
  • Present without understanding the tactical context
coach_presentation.py
# Python: Create Coach-Ready Presentation Packet
from dataclasses import dataclass
from typing import List

@dataclass
class CoachPacket:
    headline: str
    key_stat: str
    tactical_suggestion: str
    video_clips: List[str]
    key_numbers: List[str]

def create_coach_packet(match_analysis, opposition_name):
    """Create presentation packet optimized for coaching staff."""

    # 1. One-page summary
    headline = f"{opposition_name} tend to leave space " \
               f"{match_analysis['vulnerability_zone']} when pressing high"

    key_stat = f"They concede {match_analysis['xg_conceded_zone']:.1f} xG/90 " \
               f"from {match_analysis['vulnerability_zone']} - " \
               f"{match_analysis['above_avg_pct']:.0f}% above league average"

    tactical_suggestion = f"Consider {match_analysis['suggested_tactic']} " \
                          f"to exploit this. Video examples attached."

    # 2. Key numbers (max 5)
    key_numbers = [
        f"{row['metric_name']}: {row['value_formatted']}"
        for _, row in match_analysis["top_insights"].head(5).iterrows()
    ]

    packet = CoachPacket(
        headline=headline,
        key_stat=key_stat,
        tactical_suggestion=tactical_suggestion,
        video_clips=match_analysis["example_clips"][:3],
        key_numbers=key_numbers
    )

    return packet

# Example output
example_packet = CoachPacket(
    headline="Liverpool leave space behind fullbacks when pressing high",
    key_stat="They concede 0.42 xG/90 from wide areas - 35% above league average",
    tactical_suggestion="Quick switches of play to exploit advancing fullbacks",
    video_clips=["clip_001.mp4", "clip_002.mp4", "clip_003.mp4"],
    key_numbers=[
        "High press trigger: 65% of opposition goal kicks",
        "Recovery time: 4.2 seconds average",
        "xG from counters: 0.31/90 (league avg: 0.22)"
    ]
)

print("=== COACH PRESENTATION PACKET ===\n")
print(f"HEADLINE:\n{example_packet.headline}\n")
print(f"KEY STAT:\n{example_packet.key_stat}\n")
print(f"SUGGESTION:\n{example_packet.tactical_suggestion}\n")
print("KEY NUMBERS:")
for num in example_packet.key_numbers:
    print(f"- {num}")
# R: Create Coach-Ready Presentation Packet
library(tidyverse)
library(officer)
library(rvg)

create_coach_packet <- function(match_analysis, opposition_name) {
    packet <- list()

    # 1. One-page summary
    packet$summary <- list(
        headline = sprintf(
            "%s tend to leave space %s when pressing high",
            opposition_name,
            match_analysis$vulnerability_zone
        ),
        key_stat = sprintf(
            "They concede %.1f xG/90 from %s - %.0f%% above league average",
            match_analysis$xg_conceded_zone,
            match_analysis$vulnerability_zone,
            match_analysis$above_avg_pct
        ),
        tactical_suggestion = sprintf(
            "Consider %s to exploit this. Video examples attached.",
            match_analysis$suggested_tactic
        ),
        video_clips = match_analysis$example_clips[1:3]  # Top 3 clips
    )

    # 2. Simple pitch graphic
    packet$pitch_graphic <- create_zone_heatmap(
        zones = match_analysis$dangerous_zones,
        title = sprintf("Where %s are vulnerable", opposition_name),
        subtitle = "Red = higher xG conceded"
    )

    # 3. Key numbers (max 5)
    packet$key_numbers <- match_analysis$top_insights %>%
        head(5) %>%
        mutate(
            formatted = sprintf("%s: %s", metric_name, value_formatted)
        )

    return(packet)
}

# Example packet content
example_packet <- list(
    headline = "Liverpool leave space behind fullbacks when pressing high",
    key_stat = "They concede 0.42 xG/90 from wide areas - 35% above league average",
    tactical_suggestion = "Quick switches of play to exploit advancing fullbacks",
    key_numbers = c(
        "High press trigger: 65% of opposition goal kicks",
        "Recovery time: 4.2 seconds average",
        "xG from counters: 0.31/90 (league avg: 0.22)"
    )
)

cat("=== COACH PRESENTATION PACKET ===\n\n")
cat("HEADLINE:\n", example_packet$headline, "\n\n")
cat("KEY STAT:\n", example_packet$key_stat, "\n\n")
cat("SUGGESTION:\n", example_packet$tactical_suggestion, "\n\n")
cat("KEY NUMBERS:\n")
cat(paste("-", example_packet$key_numbers, collapse = "\n"))
Output
=== COACH PRESENTATION PACKET ===

HEADLINE:
Liverpool leave space behind fullbacks when pressing high

KEY STAT:
They concede 0.42 xG/90 from wide areas - 35% above league average

SUGGESTION:
Quick switches of play to exploit advancing fullbacks

KEY NUMBERS:
- High press trigger: 65% of opposition goal kicks
- Recovery time: 4.2 seconds average
- xG from counters: 0.31/90 (league avg: 0.22)

Writing Effective Reports

Written reports need clear structure, progressive disclosure of detail, and actionable conclusions. Different report types serve different purposes.

Purpose: Quick overview for busy decision-makers

Structure:

  1. Bottom line: One sentence recommendation
  2. Key findings: 3 bullet points max
  3. Risk/confidence: What could go wrong
  4. Next steps: Specific actions needed

Purpose: Detailed player assessment for recruitment

Structure:

  1. Player overview: Bio, current situation, market value
  2. Statistical profile: Key metrics with percentile ranks
  3. Strengths: 3-4 areas with data + video evidence
  4. Weaknesses: 2-3 areas with honest assessment
  5. Fit assessment: How they'd fit our system
  6. Comparable players: Similar profiles (successful & unsuccessful)
  7. Recommendation: Sign/don't sign with confidence level

Purpose: Full methodology for analysts/archival

Structure:

  1. Executive summary: 1 page overview
  2. Methodology: Data sources, models, assumptions
  3. Analysis: Full results with all visualizations
  4. Validation: Model accuracy, backtesting
  5. Limitations: What this analysis can't tell us
  6. Appendices: Code, raw data references
report_generation.py
# Python: Generate Structured Report
from jinja2 import Template

def generate_scouting_report(player_analysis):
    """Generate structured scouting report."""
    report = {}

    # Section 1: Overview
    report["overview"] = f"""## Player Overview

**Name:** {player_analysis["name"]}
**Age:** {player_analysis["age"]} | **Position:** {player_analysis["position"]} | **Club:** {player_analysis["current_club"]}
**Contract Expires:** {player_analysis["contract_end"]} | **Est. Value:** {player_analysis["market_value"]}

{player_analysis["summary_paragraph"]}"""

    # Section 2: Statistical Profile
    metrics_display = []
    for _, row in player_analysis["key_metrics"].iterrows():
        metrics_display.append(
            f"- {row['metric_name']}: **{row['value']:.2f}** ({row['percentile']}th percentile)"
        )

    report["stats"] = f"""## Statistical Profile (vs {player_analysis["comparison_group"]})

{chr(10).join(metrics_display)}"""

    # Section 3: Strengths
    strengths_sections = []
    for _, strength in player_analysis["strengths"].iterrows():
        strengths_sections.append(f"""### {strength["title"]}
{strength["description"]}
*Evidence: {strength["video_link"]}*""")

    report["strengths"] = f"""## Strengths

{chr(10).join(strengths_sections)}"""

    # Section 4: Recommendation
    report["recommendation"] = f"""## Recommendation

**Verdict:** {player_analysis["verdict"]}
**Confidence:** {player_analysis["confidence"]}
**Risk Level:** {player_analysis["risk"]}

{player_analysis["final_comments"]}"""

    return report

def export_report_markdown(report, filename):
    """Export report to markdown file."""
    full_report = "\n\n".join([
        report["overview"],
        report["stats"],
        report["strengths"],
        report["recommendation"]
    ])

    with open(filename, "w") as f:
        f.write(full_report)

    print(f"Report exported to {filename}")
# R: Generate Structured Report
library(tidyverse)
library(rmarkdown)

generate_scouting_report <- function(player_analysis) {
    report <- list()

    # Section 1: Overview
    report$overview <- sprintf(
        "## Player Overview\n
        **Name:** %s
        **Age:** %d | **Position:** %s | **Club:** %s
        **Contract Expires:** %s | **Est. Value:** %s

        %s",
        player_analysis$name,
        player_analysis$age,
        player_analysis$position,
        player_analysis$current_club,
        player_analysis$contract_end,
        player_analysis$market_value,
        player_analysis$summary_paragraph
    )

    # Section 2: Statistical Profile
    metrics_table <- player_analysis$key_metrics %>%
        mutate(
            display = sprintf("%s: **%.2f** (%sth percentile)",
                             metric_name, value, percentile)
        )

    report$stats <- sprintf(
        "## Statistical Profile (vs %s)\n\n%s",
        player_analysis$comparison_group,
        paste(metrics_table$display, collapse = "\n")
    )

    # Section 3: Strengths
    strengths <- player_analysis$strengths %>%
        mutate(
            section = sprintf(
                "### %s\n%s\n*Evidence: %s*\n",
                title, description, video_link
            )
        )

    report$strengths <- sprintf(
        "## Strengths\n%s",
        paste(strengths$section, collapse = "\n")
    )

    # Section 4: Recommendation
    report$recommendation <- sprintf(
        "## Recommendation\n
        **Verdict:** %s
        **Confidence:** %s
        **Risk Level:** %s

        %s",
        player_analysis$verdict,
        player_analysis$confidence,
        player_analysis$risk,
        player_analysis$final_comments
    )

    return(report)
}

Dashboard Design

Dashboards provide real-time decision support. Good dashboard design follows principles of visual hierarchy, progressive disclosure, and actionability.

Layout Principles
  • Most important metrics top-left
  • Group related information
  • Consistent spacing and alignment
  • Clear visual hierarchy
  • Mobile-responsive design
Visual Design
  • Limited color palette (3-5 colors)
  • Semantic colors (red=bad, green=good)
  • Minimize chart junk
  • Clear labels and titles
  • Consistent styling across charts
Interactivity
  • Filters for different views
  • Drill-down capability
  • Tooltips for details
  • Export functionality
  • Real-time updates where needed
dashboard_design.py
# Python: Create Interactive Dashboard with Streamlit
import streamlit as st
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd

def create_match_dashboard():
    """Create interactive match analytics dashboard."""
    st.set_page_config(page_title="Match Analytics", layout="wide")

    # Sidebar controls
    st.sidebar.title("Match Analytics")
    match = st.sidebar.selectbox("Select Match", ["Match 1", "Match 2", "Match 3"])
    metric = st.sidebar.selectbox("Primary Metric", ["xG", "Possession", "PPDA"])
    show_benchmark = st.sidebar.checkbox("Show League Average", True)

    # Load data based on selection
    match_data = load_match_data(match)

    # Row 1: Key metrics
    col1, col2, col3, col4 = st.columns(4)

    with col1:
        st.metric("Expected Goals",
                  f"{match_data['xg']:.2f}",
                  delta=f"{match_data['xg'] - 1.5:.2f} vs avg")

    with col2:
        st.metric("Possession",
                  f"{match_data['possession']:.0f}%")

    with col3:
        st.metric("Shots",
                  match_data["shots"],
                  delta=match_data["shots_on_target"])

    with col4:
        st.metric("Pass Completion",
                  f"{match_data['pass_pct']:.1f}%")

    # Row 2: Main visualizations
    col_left, col_right = st.columns([2, 1])

    with col_left:
        st.subheader("xG Flow")
        fig = go.Figure()
        fig.add_trace(go.Scatter(
            x=match_data["xg_timeline"]["minute"],
            y=match_data["xg_timeline"]["cumulative_xg"],
            mode="lines",
            name="Our xG",
            line=dict(color="#1B5E20", width=2)
        ))
        if show_benchmark:
            fig.add_hline(y=1.5, line_dash="dash",
                         annotation_text="League Avg")
        st.plotly_chart(fig, use_container_width=True)

    with col_right:
        st.subheader("Shot Map")
        shot_fig = create_shot_map(match_data["shots_data"])
        st.pyplot(shot_fig)

    # Row 3: Player table
    st.subheader("Player Performance")
    st.dataframe(match_data["player_stats"],
                 use_container_width=True,
                 hide_index=True)

# Run: streamlit run dashboard.py
if __name__ == "__main__":
    create_match_dashboard()
# R: Create Interactive Dashboard with Shiny
library(shiny)
library(shinydashboard)
library(plotly)

# Dashboard UI
ui <- dashboardPage(
    dashboardHeader(title = "Match Analytics"),

    dashboardSidebar(
        selectInput("match", "Select Match:",
                    choices = c("Match 1", "Match 2", "Match 3")),
        selectInput("metric", "Primary Metric:",
                    choices = c("xG", "Possession", "PPDA")),
        checkboxInput("show_benchmark", "Show League Average", TRUE)
    ),

    dashboardBody(
        # Row 1: Key metrics
        fluidRow(
            valueBoxOutput("xg_box", width = 3),
            valueBoxOutput("possession_box", width = 3),
            valueBoxOutput("shots_box", width = 3),
            valueBoxOutput("passes_box", width = 3)
        ),

        # Row 2: Main visualization
        fluidRow(
            box(title = "xG Flow", status = "primary", solidHeader = TRUE,
                width = 8, plotlyOutput("xg_flow_plot")),
            box(title = "Shot Map", status = "success", solidHeader = TRUE,
                width = 4, plotOutput("shot_map"))
        ),

        # Row 3: Detailed tables
        fluidRow(
            box(title = "Player Performance", width = 12,
                DT::dataTableOutput("player_table"))
        )
    )
)

# Dashboard Server
server <- function(input, output) {
    output$xg_box <- renderValueBox({
        valueBox(
            sprintf("%.2f", match_data()$xg),
            "Expected Goals",
            icon = icon("futbol"),
            color = if (match_data()$xg > 1.5) "green" else "yellow"
        )
    })

    output$xg_flow_plot <- renderPlotly({
        plot_ly(xg_data(), x = ~minute, y = ~cumulative_xg,
                type = "scatter", mode = "lines",
                line = list(color = "#1B5E20", width = 2)) %>%
            layout(title = "Cumulative xG",
                   xaxis = list(title = "Minute"),
                   yaxis = list(title = "xG"))
    })
}

shinyApp(ui, server)

Handling Objections and Building Trust

Analytics skepticism is common in football. Building trust requires patience, humility, and consistent delivery of actionable insights.

Common Objection Response Strategy
"I've watched football for 30 years—I don't need numbers to tell me who's good." Acknowledge their expertise. Position analytics as a complement: "Your eye test caught things the data confirms. But data can also flag things happening in parts of the pitch you weren't watching."
"Stats don't capture the intangibles—leadership, mentality." Agree that some things are hard to quantify. "You're right—we focus on the measurable to free you up to evaluate those intangibles. Together it's a fuller picture."
"Your model said X would happen and it didn't." Explain probabilities vs certainties. "A 70% chance means it won't happen 30% of the time. Over many decisions, following 70% probabilities wins more than gut feel."
"Football is too complex to reduce to numbers." "You're right that models simplify. But every decision simplifies—the question is whether informed simplification beats uninformed intuition."
"We tried analytics before and it didn't work." Ask what went wrong. Often it's communication, not the analytics. "Let's focus on the specific decisions you need support on and build from there."
credibility_tracking.py
# Python: Track Analytics Impact for Credibility Building
import pandas as pd
import uuid
from datetime import date

class RecommendationTracker:
    """Track analytics recommendations and outcomes for credibility."""

    def __init__(self):
        self.recommendations = pd.DataFrame(columns=[
            "id", "date", "category", "description", "confidence",
            "decision_maker", "was_followed", "outcome", "outcome_date"
        ])

    def log_recommendation(self, recommendation):
        """Log a new analytics recommendation."""
        new_rec = {
            "id": str(uuid.uuid4()),
            "date": date.today(),
            "category": recommendation["category"],
            "description": recommendation["description"],
            "confidence": recommendation["confidence"],
            "decision_maker": recommendation["presented_to"],
            "was_followed": None,
            "outcome": None,
            "outcome_date": None
        }
        self.recommendations = pd.concat([
            self.recommendations,
            pd.DataFrame([new_rec])
        ], ignore_index=True)
        return new_rec["id"]

    def update_outcome(self, rec_id, was_followed, outcome):
        """Update recommendation with outcome."""
        mask = self.recommendations["id"] == rec_id
        self.recommendations.loc[mask, "was_followed"] = was_followed
        self.recommendations.loc[mask, "outcome"] = outcome
        self.recommendations.loc[mask, "outcome_date"] = date.today()

    def generate_credibility_report(self):
        """Generate report showing analytics track record."""
        completed = self.recommendations.dropna(subset=["outcome"])

        # Key metric: outcomes when followed vs not
        followed_vs_not = completed.groupby("was_followed").agg(
            n=("id", "count"),
            positive_rate=("outcome", lambda x: (x == "positive").mean())
        ).reset_index()

        # By category
        by_category = completed.groupby(["category", "was_followed"]).agg(
            n=("id", "count"),
            positive_rate=("outcome", lambda x: (x == "positive").mean()),
            avg_confidence=("confidence", "mean")
        ).reset_index()

        print("=== ANALYTICS CREDIBILITY REPORT ===\n")
        print("Recommendations followed vs not followed:")
        print(followed_vs_not)
        print("\nBy category:")
        print(by_category)

        return {"followed_vs_not": followed_vs_not, "by_category": by_category}

# Usage
tracker = RecommendationTracker()
rec_id = tracker.log_recommendation({
    "category": "signing",
    "description": "Sign Player X - projects to add 0.15 xG/90",
    "confidence": 4,
    "presented_to": "Director of Football"
})
# Later...
tracker.update_outcome(rec_id, was_followed=True, outcome="positive")
# R: Track Analytics Impact for Credibility Building
library(tidyverse)

# Log analytics recommendations and outcomes
track_recommendation <- function(rec_db, recommendation) {
    new_rec <- data.frame(
        id = uuid::UUIDgenerate(),
        date = Sys.Date(),
        category = recommendation$category,  # signing, tactical, lineup
        description = recommendation$description,
        confidence = recommendation$confidence,  # 1-5
        decision_maker = recommendation$presented_to,
        was_followed = NA,  # To be filled later
        outcome = NA,       # To be filled later
        outcome_date = NA
    )

    rec_db <- rbind(rec_db, new_rec)
    return(rec_db)
}

# Update with outcome
update_outcome <- function(rec_db, rec_id, was_followed, outcome) {
    rec_db <- rec_db %>%
        mutate(
            was_followed = if_else(id == rec_id, was_followed, was_followed),
            outcome = if_else(id == rec_id, outcome, outcome),
            outcome_date = if_else(id == rec_id, as.character(Sys.Date()), outcome_date)
        )
    return(rec_db)
}

# Generate credibility report
generate_credibility_report <- function(rec_db) {
    completed <- rec_db %>%
        filter(!is.na(outcome))

    summary <- completed %>%
        group_by(category, was_followed) %>%
        summarize(
            n = n(),
            pct_positive = mean(outcome == "positive"),
            avg_confidence = mean(confidence),
            .groups = "drop"
        )

    # Key metric: Do recommendations followed have better outcomes?
    followed_vs_not <- completed %>%
        group_by(was_followed) %>%
        summarize(
            n = n(),
            positive_rate = mean(outcome == "positive"),
            .groups = "drop"
        )

    cat("=== ANALYTICS CREDIBILITY REPORT ===\n\n")
    cat("Recommendations followed vs not followed:\n")
    print(followed_vs_not)

    cat("\nBy category:\n")
    print(summary)

    return(list(summary = summary, followed_vs_not = followed_vs_not))
}

Building Trust Over Time

The Trust-Building Playbook
  1. Start small: Begin with low-stakes insights that prove your value without threatening anyone's role
  2. Be right about something: Identify a prediction you're confident in and track it visibly
  3. Admit when wrong: Acknowledge failures openly—it builds credibility more than claiming perfection
  4. Speak their language: Learn the terminology coaches and scouts use; translate your findings
  5. Make them look good: Position insights as supporting their decisions, not replacing them
  6. Be present: Attend training, watch matches live, show you understand the game beyond spreadsheets
  7. Track your record: Keep evidence of recommendations and outcomes to demonstrate value over time

Developing a Communication Strategy

A systematic approach to analytics communication ensures consistent quality and builds organizational capability over time.

communication_strategy.py
# Python: Analytics Communication Strategy Framework
import pandas as pd

# Define communication channels and cadence
communication_framework = pd.DataFrame({
    "deliverable": ["Match Report", "Weekly Summary", "Transfer Target Report",
                    "Executive Dashboard", "Player Feedback", "Season Review"],
    "audience": ["Coaches", "Technical Director", "DOF, Scouts",
                 "CEO, Board", "Individual Players", "All Staff"],
    "frequency": ["Post-match", "Weekly", "On request",
                  "Monthly", "Monthly", "End of season"],
    "owner": ["Performance Analyst", "Head of Analytics", "Recruitment Analyst",
              "Head of Analytics", "Performance Analyst", "Analytics Team"],
    "format": ["1-page + video", "Dashboard", "5-page PDF",
               "Interactive", "1-on-1 meeting", "Presentation"]
})

# Quality checklist for deliverables
quality_checklist = {
    "before_release": [
        "Data accuracy verified by second analyst",
        "Key insight highlighted in first 30 seconds",
        "Visualizations pass the 5-second test",
        "Recommendation is specific and actionable",
        "Uncertainty/caveats clearly stated",
        "Proofread for typos and errors"
    ],
    "after_release": [
        "Collect feedback from recipients",
        "Track whether recommendation was followed",
        "Log outcome when available",
        "Schedule follow-up if needed"
    ]
}

print("Communication Framework:")
print(communication_framework)

print("\n=== QUALITY CHECKLIST ===")
print("\nBefore Release:")
for item in quality_checklist["before_release"]:
    print(f"[ ] {item}")

print("\nAfter Release:")
for item in quality_checklist["after_release"]:
    print(f"[ ] {item}")
# R: Analytics Communication Strategy Framework
library(tidyverse)

# Define communication channels and cadence
communication_framework <- tribble(
    ~deliverable, ~audience, ~frequency, ~owner, ~format,
    "Match Report", "Coaches", "Post-match", "Performance Analyst", "1-page + video",
    "Weekly Summary", "Technical Director", "Weekly", "Head of Analytics", "Dashboard",
    "Transfer Target Report", "DOF, Scouts", "On request", "Recruitment Analyst", "5-page PDF",
    "Executive Dashboard", "CEO, Board", "Monthly", "Head of Analytics", "Interactive",
    "Player Feedback", "Individual Players", "Monthly", "Performance Analyst", "1-on-1 meeting",
    "Season Review", "All Staff", "End of season", "Analytics Team", "Presentation"
)

# Quality checklist for deliverables
quality_checklist <- list(
    before_release = c(
        "Data accuracy verified by second analyst",
        "Key insight highlighted in first 30 seconds",
        "Visualizations pass the 5-second test",
        "Recommendation is specific and actionable",
        "Uncertainty/caveats clearly stated",
        "Proofread for typos and errors"
    ),

    after_release = c(
        "Collect feedback from recipients",
        "Track whether recommendation was followed",
        "Log outcome when available",
        "Schedule follow-up if needed"
    )
)

print("Communication Framework:")
print(communication_framework)

cat("\n=== QUALITY CHECKLIST ===\n")
cat("\nBefore Release:\n")
cat(paste("[ ]", quality_checklist$before_release, collapse = "\n"))
cat("\n\nAfter Release:\n")
cat(paste("[ ]", quality_checklist$after_release, collapse = "\n"))
Output
Communication Framework:
           deliverable            audience   frequency              owner           format
0         Match Report            Coaches  Post-match  Performance Analyst     1-page + video
1       Weekly Summary  Technical Director      Weekly    Head of Analytics        Dashboard
2  Transfer Target Report     DOF, Scouts  On request  Recruitment Analyst      5-page PDF
3   Executive Dashboard        CEO, Board     Monthly    Head of Analytics      Interactive
4      Player Feedback  Individual Players     Monthly  Performance Analyst  1-on-1 meeting
5        Season Review          All Staff  End of season     Analytics Team    Presentation

=== QUALITY CHECKLIST ===

Before Release:
[ ] Data accuracy verified by second analyst
[ ] Key insight highlighted in first 30 seconds
[ ] Visualizations pass the 5-second test
[ ] Recommendation is specific and actionable
[ ] Uncertainty/caveats clearly stated
[ ] Proofread for typos and errors

After Release:
[ ] Collect feedback from recipients
[ ] Track whether recommendation was followed
[ ] Log outcome when available
[ ] Schedule follow-up if needed

Practice Exercises

Exercise 30.1: Automated Match Report Generator

Task: Build an automated system that generates comprehensive post-match reports with key metrics, visualizations, and narrative summaries tailored for different audiences.

Requirements:

  • Generate executive summary with key findings in plain language
  • Create visualizations for shots, passing networks, and territory control
  • Produce audience-specific versions (coach, analyst, media)
  • Include comparison against benchmarks and season averages
  • Export to HTML/PDF format with consistent branding

match_report_generator.R
import pandas as pd
import numpy as np
from statsbombpy import sb
import matplotlib.pyplot as plt
from matplotlib.patches import Arc, Rectangle, Circle
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

class MatchReportGenerator:
    """Automated match report generator with multiple output formats."""

    def __init__(self, match_id):
        self.match_id = match_id
        self.events = sb.events(match_id=match_id)
        self.lineups = sb.lineups(match_id=match_id)
        self.match_info = self._get_match_info()
        self.team_stats = self._calculate_team_stats()

    def _get_match_info(self):
        """Extract basic match information."""
        teams = self.events['team'].unique()
        return {
            'home_team': teams[0] if len(teams) > 0 else 'Team A',
            'away_team': teams[1] if len(teams) > 1 else 'Team B',
            'date': datetime.now().strftime('%Y-%m-%d')
        }

    def _calculate_team_stats(self):
        """Calculate comprehensive team statistics."""
        stats = {}

        for team in [self.match_info['home_team'], self.match_info['away_team']]:
            team_events = self.events[self.events['team'] == team]
            shots = team_events[team_events['type'] == 'Shot']

            stats[team] = {
                'shots': len(shots),
                'shots_on_target': len(shots[shots['shot_outcome'].isin(['Goal', 'Saved'])]),
                'goals': len(shots[shots['shot_outcome'] == 'Goal']),
                'xg': shots['shot_statsbomb_xg'].sum() if 'shot_statsbomb_xg' in shots.columns else 0,
                'passes': len(team_events[team_events['type'] == 'Pass']),
                'pass_completion': self._calc_pass_completion(team_events),
                'possession_pct': len(team_events) / len(self.events) * 100,
                'pressures': len(team_events[team_events['type'] == 'Pressure']),
                'tackles': len(team_events[(team_events['type'] == 'Duel') &
                                           (team_events['duel_type'].fillna('') == 'Tackle') if 'duel_type' in team_events.columns else False]),
                'interceptions': len(team_events[team_events['type'] == 'Interception'])
            }

        return stats

    def _calc_pass_completion(self, team_events):
        """Calculate pass completion percentage."""
        passes = team_events[team_events['type'] == 'Pass']
        if len(passes) == 0:
            return 0
        completed = passes['pass_outcome'].isna().sum()
        return (completed / len(passes)) * 100

    def generate_executive_summary(self):
        """Generate plain-language executive summary."""
        home = self.match_info['home_team']
        away = self.match_info['away_team']
        h_stats = self.team_stats[home]
        a_stats = self.team_stats[away]

        # Determine result
        if h_stats['goals'] > a_stats['goals']:
            result = f"{home} won"
        elif a_stats['goals'] > h_stats['goals']:
            result = f"{away} won"
        else:
            result = "The match ended in a draw"

        # xG winner
        xg_winner = home if h_stats['xg'] > a_stats['xg'] else away

        # Possession dominance
        if h_stats['possession_pct'] > 55:
            poss_leader = home
        elif a_stats['possession_pct'] > 55:
            poss_leader = away
        else:
            poss_leader = "Neither team"

        summary = f"""
MATCH OVERVIEW
==============

{result} {h_stats['goals']}-{a_stats['goals']}.

KEY TAKEAWAYS:
- {xg_winner} created the better chances (xG: {h_stats['xg']:.2f} vs {a_stats['xg']:.2f})
- {poss_leader} dominated possession ({h_stats['possession_pct']:.1f}% vs {a_stats['possession_pct']:.1f}%)
- Shot efficiency: {home} {h_stats['shots_on_target']}/{h_stats['shots']} on target vs {away} {a_stats['shots_on_target']}/{a_stats['shots']}

PERFORMANCE RATING:
{home}: {self._rate_performance(h_stats)}
{away}: {self._rate_performance(a_stats)}
        """
        return summary.strip()

    def _rate_performance(self, stats):
        """Rate team performance based on key metrics."""
        score = 0

        # xG efficiency
        if stats['goals'] >= stats['xg']:
            score += 2
        elif stats['goals'] >= stats['xg'] * 0.8:
            score += 1

        # Pass completion
        if stats['pass_completion'] > 85:
            score += 2
        elif stats['pass_completion'] > 75:
            score += 1

        # Shots on target ratio
        if stats['shots'] > 0 and stats['shots_on_target'] / stats['shots'] > 0.4:
            score += 1

        if score >= 5:
            return "Excellent"
        elif score >= 3:
            return "Good"
        elif score >= 2:
            return "Average"
        else:
            return "Below Par"

    def generate_coach_summary(self):
        """Generate tactical analysis for coaching staff."""
        home = self.match_info['home_team']
        away = self.match_info['away_team']

        # Pressing analysis by zone
        press_by_zone = self._analyze_pressing()

        # Progressive passes
        prog_passes = self._analyze_progressive_passes()

        summary = f"""
TACTICAL ANALYSIS
=================

PRESSING:
- {home}: {press_by_zone.get(home, {}).get('High', 0)} high presses
- {away}: {press_by_zone.get(away, {}).get('High', 0)} high presses

PROGRESSIVE PLAY:
- {home}: {prog_passes.get(home, 0)} progressive passes
- {away}: {prog_passes.get(away, 0)} progressive passes

RECOMMENDATIONS:
{self._generate_recommendations()}
        """
        return summary.strip()

    def _analyze_pressing(self):
        """Analyze pressing by zone."""
        presses = self.events[self.events['type'] == 'Pressure'].copy()
        if len(presses) == 0:
            return {}

        presses['zone'] = presses['location'].apply(
            lambda loc: 'High' if isinstance(loc, list) and loc[0] > 80
                        else ('Mid' if isinstance(loc, list) and loc[0] > 40 else 'Low')
        )

        result = {}
        for team in presses['team'].unique():
            team_presses = presses[presses['team'] == team]
            result[team] = team_presses.groupby('zone').size().to_dict()

        return result

    def _analyze_progressive_passes(self):
        """Count progressive passes (>10m forward)."""
        passes = self.events[self.events['type'] == 'Pass'].copy()

        def calc_progress(row):
            if not isinstance(row['location'], list) or not isinstance(row.get('pass_end_location'), list):
                return 0
            return row['pass_end_location'][0] - row['location'][0]

        passes['progress'] = passes.apply(calc_progress, axis=1)

        result = {}
        for team in passes['team'].unique():
            team_passes = passes[passes['team'] == team]
            result[team] = (team_passes['progress'] > 10).sum()

        return result

    def _generate_recommendations(self):
        """Generate data-driven recommendations."""
        recs = []
        home = self.match_info['home_team']
        h_stats = self.team_stats[home]

        if h_stats['goals'] < h_stats['xg'] * 0.8:
            recs.append(f"- Review finishing; created {h_stats['xg']:.2f} xG but only scored {h_stats['goals']}")

        if h_stats['pass_completion'] < 75:
            recs.append(f"- Passing under pressure needs work ({h_stats['pass_completion']:.1f}% completion)")

        if not recs:
            recs.append("- Strong all-around performance, maintain current approach")

        return '\n'.join(recs)

    def create_shot_map(self, team_name, ax=None):
        """Create shot map visualization."""
        if ax is None:
            fig, ax = plt.subplots(figsize=(10, 7))

        # Draw pitch (simplified)
        ax.set_xlim(60, 120)
        ax.set_ylim(0, 80)
        ax.set_facecolor('darkgreen')
        ax.add_patch(Rectangle((102, 18), 18, 44, fill=False, color='white', lw=2))  # Box
        ax.axvline(x=120, color='white', lw=2)  # Goal line

        # Plot shots
        shots = self.events[(self.events['type'] == 'Shot') &
                           (self.events['team'] == team_name)]

        for _, shot in shots.iterrows():
            if isinstance(shot['location'], list):
                x, y = shot['location']
                xg = shot.get('shot_statsbomb_xg', 0.1)
                is_goal = shot.get('shot_outcome') == 'Goal'

                ax.scatter(x, y, s=xg*500,
                          c='red' if is_goal else 'steelblue',
                          alpha=0.7, edgecolors='white', linewidths=1)

        total_xg = shots['shot_statsbomb_xg'].sum() if 'shot_statsbomb_xg' in shots.columns else 0
        ax.set_title(f"{team_name} - Shot Map\nTotal xG: {total_xg:.2f}", fontsize=12, fontweight='bold')
        ax.set_xticks([])
        ax.set_yticks([])

        return ax

    def create_full_report(self, output_path='match_report.html'):
        """Generate complete HTML report."""
        # Create visualizations
        fig, axes = plt.subplots(1, 2, figsize=(14, 6))
        self.create_shot_map(self.match_info['home_team'], axes[0])
        self.create_shot_map(self.match_info['away_team'], axes[1])
        plt.tight_layout()
        plt.savefig('shots.png', dpi=150, bbox_inches='tight', facecolor='white')
        plt.close()

        # Build HTML
        html = f"""
<!DOCTYPE html>
<html>
<head>
    <title>Match Report</title>
    <style>
        body {{ font-family: Arial, sans-serif; max-width: 900px; margin: 0 auto; padding: 20px; }}
        .header {{ background: #1B5E20; color: white; padding: 20px; text-align: center; }}
        .section {{ margin: 20px 0; padding: 15px; background: #f5f5f5; border-radius: 5px; }}
        pre {{ white-space: pre-wrap; font-family: inherit; }}
        img {{ max-width: 100%; }}
    </style>
</head>
<body>
    <div class="header">
        <h1>Match Report</h1>
        <h2>{self.match_info['home_team']} vs {self.match_info['away_team']}</h2>
    </div>

    <div class="section">
        <h3>Executive Summary</h3>
        <pre>{self.generate_executive_summary()}</pre>
    </div>

    <div class="section">
        <h3>Shot Maps</h3>
        <img src="shots.png" alt="Shot Maps">
    </div>

    <div class="section">
        <h3>Tactical Analysis</h3>
        <pre>{self.generate_coach_summary()}</pre>
    </div>

    <footer>
        <p>Generated: {datetime.now().strftime('%Y-%m-%d %H:%M')}</p>
    </footer>
</body>
</html>
        """

        with open(output_path, 'w') as f:
            f.write(html)

        print(f"Report saved to {output_path}")
        return output_path


# Example usage
print("=== MATCH REPORT GENERATOR ===\n")

# Demo with a sample match
try:
    # Get a sample match
    competitions = sb.competitions()
    la_liga = competitions[competitions['competition_name'] == 'La Liga'].iloc[0]
    matches = sb.matches(competition_id=la_liga['competition_id'],
                         season_id=la_liga['season_id'])
    sample_match = matches.iloc[0]['match_id']

    # Generate report
    reporter = MatchReportGenerator(match_id=sample_match)

    print("EXECUTIVE SUMMARY:")
    print(reporter.generate_executive_summary())

    print("\n" + "="*50 + "\n")

    print("COACH SUMMARY:")
    print(reporter.generate_coach_summary())

    # Create visualizations
    fig, ax = plt.subplots(figsize=(10, 7))
    reporter.create_shot_map(reporter.match_info['home_team'], ax)
    plt.savefig('sample_shot_map.png', dpi=150, bbox_inches='tight')
    plt.show()

except Exception as e:
    print(f"Demo output (sample data not available): {e}")
    print("\nTo use: reporter = MatchReportGenerator(match_id=12345)")
    print("reporter.generate_executive_summary()")
library(StatsBombR)
library(tidyverse)
library(ggplot2)
library(gridExtra)
library(knitr)

# Match Report Generator Class/Functions
generate_match_report <- function(match_id, competition_name = "La Liga") {

  # Load match data
  competitions <- FreeCompetitions() %>%
    filter(competition_name == competition_name)
  matches <- FreeMatches(competitions)
  events <- StatsBombFreeEvents(MatchesDF = matches %>% filter(match_id == !!match_id))

  # Get match info
  match_info <- matches %>% filter(match_id == !!match_id)

  report <- list()
  report$match_info <- match_info

  # Calculate team statistics
  report$team_stats <- events %>%
    group_by(team.name) %>%
    summarise(
      shots = sum(type.name == "Shot"),
      shots_on_target = sum(type.name == "Shot" &
                            shot.outcome.name %in% c("Goal", "Saved")),
      goals = sum(type.name == "Shot" & shot.outcome.name == "Goal"),
      xg = sum(ifelse(type.name == "Shot", shot.statsbomb_xg, 0), na.rm = TRUE),
      passes = sum(type.name == "Pass"),
      pass_completion = mean(ifelse(type.name == "Pass", is.na(pass.outcome.name), NA), na.rm = TRUE) * 100,
      possession_pct = n() / nrow(events) * 100,
      pressures = sum(type.name == "Pressure"),
      tackles = sum(type.name == "Duel" & duel.type.name == "Tackle"),
      interceptions = sum(type.name == "Interception"),
      .groups = "drop"
    )

  # Key events
  report$key_events <- events %>%
    filter(type.name %in% c("Shot", "Goal", "Substitution")) %>%
    arrange(minute) %>%
    select(minute, team.name, player.name, type.name, shot.outcome.name)

  # Generate narrative summary
  home_team <- match_info$home_team.home_team_name
  away_team <- match_info$away_team.away_team_name
  home_stats <- report$team_stats %>% filter(team.name == home_team)
  away_stats <- report$team_stats %>% filter(team.name == away_team)

  report$executive_summary <- generate_executive_summary(
    home_team, away_team, home_stats, away_stats
  )

  report$coach_summary <- generate_coach_summary(
    events, home_team, away_team, home_stats, away_stats
  )

  report
}

# Executive Summary Generator (plain language)
generate_executive_summary <- function(home_team, away_team, home_stats, away_stats) {

  # Determine winner by xG
  xg_winner <- ifelse(home_stats$xg > away_stats$xg, home_team, away_team)
  xg_margin <- abs(home_stats$xg - away_stats$xg)

  # Determine actual winner
  if (home_stats$goals > away_stats$goals) {
    result <- paste(home_team, "won")
    goal_diff <- home_stats$goals - away_stats$goals
  } else if (away_stats$goals > home_stats$goals) {
    result <- paste(away_team, "won")
    goal_diff <- away_stats$goals - home_stats$goals
  } else {
    result <- "The match ended in a draw"
    goal_diff <- 0
  }

  # Build narrative
  summary <- paste0(
    "MATCH OVERVIEW\n",
    "==============\n\n",
    result, " ", home_stats$goals, "-", away_stats$goals, ".\n\n",

    "KEY TAKEAWAYS:\n",
    "- ", xg_winner, " created the better chances (xG: ",
    round(home_stats$xg, 2), " vs ", round(away_stats$xg, 2), ")\n",

    "- ", ifelse(home_stats$possession_pct > 55, home_team,
                 ifelse(away_stats$possession_pct > 55, away_team, "Neither team")),
    " dominated possession (",
    round(home_stats$possession_pct, 1), "% vs ",
    round(away_stats$possession_pct, 1), "%)\n",

    "- Shot efficiency: ", home_team, " ", home_stats$shots_on_target, "/",
    home_stats$shots, " on target vs ", away_team, " ",
    away_stats$shots_on_target, "/", away_stats$shots, "\n\n",

    "PERFORMANCE RATING:\n",
    home_team, ": ", rate_performance(home_stats), "\n",
    away_team, ": ", rate_performance(away_stats)
  )

  summary
}

# Performance rating helper
rate_performance <- function(stats) {
  score <- 0

  # xG efficiency
  if (stats$goals >= stats$xg) score <- score + 2
  else if (stats$goals >= stats$xg * 0.8) score <- score + 1

  # Pass completion
  if (stats$pass_completion > 85) score <- score + 2
  else if (stats$pass_completion > 75) score <- score + 1

  # Shots on target ratio
  if (stats$shots_on_target / stats$shots > 0.4) score <- score + 1

  case_when(
    score >= 5 ~ "Excellent",
    score >= 3 ~ "Good",
    score >= 2 ~ "Average",
    TRUE ~ "Below Par"
  )
}

# Coach Summary (tactical focus)
generate_coach_summary <- function(events, home_team, away_team, home_stats, away_stats) {

  # Pressing analysis
  press_by_zone <- events %>%
    filter(type.name == "Pressure") %>%
    mutate(
      zone = case_when(
        location.x > 80 ~ "High",
        location.x > 40 ~ "Mid",
        TRUE ~ "Low"
      )
    ) %>%
    group_by(team.name, zone) %>%
    summarise(presses = n(), .groups = "drop") %>%
    pivot_wider(names_from = zone, values_from = presses, values_fill = 0)

  # Progressive passes
  prog_passes <- events %>%
    filter(type.name == "Pass") %>%
    mutate(
      progress = pass.end_location.x - location.x
    ) %>%
    group_by(team.name) %>%
    summarise(
      progressive_passes = sum(progress > 10, na.rm = TRUE),
      avg_progress = mean(progress, na.rm = TRUE),
      .groups = "drop"
    )

  # Build coach brief
  summary <- paste0(
    "TACTICAL ANALYSIS\n",
    "=================\n\n",

    "PRESSING:\n",
    "- ", home_team, ": ", press_by_zone$High[press_by_zone$team.name == home_team],
    " high presses\n",
    "- ", away_team, ": ", press_by_zone$High[press_by_zone$team.name == away_team],
    " high presses\n\n",

    "PROGRESSIVE PLAY:\n",
    "- ", home_team, ": ", prog_passes$progressive_passes[prog_passes$team.name == home_team],
    " progressive passes\n",
    "- ", away_team, ": ", prog_passes$progressive_passes[prog_passes$team.name == away_team],
    " progressive passes\n\n",

    "RECOMMENDATIONS:\n",
    generate_recommendations(home_stats, away_stats, home_team, away_team)
  )

  summary
}

# Generate recommendations based on data
generate_recommendations <- function(home_stats, away_stats, home_team, away_team) {

  recs <- c()

  # If underperformed xG
  if (home_stats$goals < home_stats$xg * 0.8) {
    recs <- c(recs, paste("- Review ", home_team, "'s finishing; created ",
                          round(home_stats$xg, 2), " xG but only scored ",
                          home_stats$goals))
  }

  # If low pass completion
  if (home_stats$pass_completion < 75) {
    recs <- c(recs, paste("- ", home_team, " passing under pressure needs work (",
                          round(home_stats$pass_completion, 1), "% completion)"))
  }

  # If allowing too many shots
  if (away_stats$shots > 15) {
    recs <- c(recs, paste("- Defensive shape allowed ", away_stats$shots,
                          " opposition shots"))
  }

  if (length(recs) == 0) {
    recs <- "- Strong all-around performance, maintain current approach"
  }

  paste(recs, collapse = "\n")
}

# Create visualization panel
create_report_visuals <- function(events, team_name) {

  # Shot map
  shots <- events %>%
    filter(type.name == "Shot", team.name == team_name)

  p1 <- ggplot(shots, aes(x = location.x, y = location.y)) +
    annotate("rect", xmin = 0, xmax = 120, ymin = 0, ymax = 80,
             fill = "darkgreen", alpha = 0.3) +
    geom_point(aes(size = shot.statsbomb_xg,
                   color = shot.outcome.name == "Goal"),
               alpha = 0.7) +
    scale_color_manual(values = c("FALSE" = "steelblue", "TRUE" = "red"),
                       labels = c("No Goal", "Goal"), name = "Outcome") +
    scale_size_continuous(range = c(2, 8), name = "xG") +
    coord_fixed(ratio = 80/120) +
    labs(title = paste(team_name, "- Shot Map"),
         subtitle = paste("Total xG:", round(sum(shots$shot.statsbomb_xg, na.rm = TRUE), 2))) +
    theme_minimal() +
    theme(legend.position = "bottom")

  # Pass network simplified
  passes <- events %>%
    filter(type.name == "Pass", team.name == team_name, is.na(pass.outcome.name))

  player_positions <- passes %>%
    group_by(player.name) %>%
    summarise(
      x = mean(location.x),
      y = mean(location.y),
      passes = n(),
      .groups = "drop"
    ) %>%
    filter(passes >= 5)

  p2 <- ggplot(player_positions, aes(x = x, y = y)) +
    annotate("rect", xmin = 0, xmax = 120, ymin = 0, ymax = 80,
             fill = "darkgreen", alpha = 0.3) +
    geom_point(aes(size = passes), color = "steelblue", alpha = 0.7) +
    geom_text(aes(label = substr(player.name, 1, 10)), size = 2, vjust = -1) +
    scale_size_continuous(range = c(3, 10), name = "Passes") +
    coord_fixed(ratio = 80/120) +
    labs(title = paste(team_name, "- Average Positions"),
         subtitle = "Size = pass volume") +
    theme_minimal()

  # Combine
  grid.arrange(p1, p2, ncol = 2)
}

# Example usage
cat("=== MATCH REPORT GENERATOR ===\n\n")
cat("To generate a report, run:\n")
cat("report <- generate_match_report(match_id = 3773585)\n")
cat("cat(report$executive_summary)\n")
cat("cat(report$coach_summary)\n")
cat("create_report_visuals(events, 'Barcelona')\n")

# Demo with sample output
demo_output <- "
MATCH OVERVIEW
==============

Barcelona won 3-1.

KEY TAKEAWAYS:
- Barcelona created the better chances (xG: 2.34 vs 0.87)
- Barcelona dominated possession (67.2% vs 32.8%)
- Shot efficiency: Barcelona 5/12 on target vs Real Madrid 2/8

PERFORMANCE RATING:
Barcelona: Excellent
Real Madrid: Below Par
"
cat(demo_output)
Exercise 30.2: Interactive Player Comparison Dashboard

Task: Build an interactive dashboard that compares player performance across multiple metrics with percentile rankings, radar charts, and trend analysis.

Requirements:

  • Calculate percentile rankings against position-specific benchmarks
  • Create radar charts for visual comparison
  • Show performance trends over recent matches
  • Include context with league averages and top performer benchmarks
  • Generate exportable comparison reports

player_comparison_dashboard.R
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Patch
from scipy import stats
import warnings
warnings.filterwarnings('ignore')

class PlayerComparisonDashboard:
    """Interactive dashboard for comparing player performance."""

    def __init__(self, player_data):
        """
        Initialize with player data DataFrame.

        Expected columns: player_id, player_name, position, minutes,
                         goals_p90, assists_p90, xg_p90, xa_p90, etc.
        """
        self.player_data = player_data
        self.metrics = ['goals_p90', 'assists_p90', 'xg_p90', 'xa_p90',
                       'passes_p90', 'key_passes_p90', 'dribbles_p90']

    def calculate_percentiles(self, player_ids, position='Forward', min_minutes=900):
        """Calculate percentile rankings against position peers."""

        # Filter benchmark population
        benchmark = self.player_data[
            (self.player_data['position'] == position) &
            (self.player_data['minutes'] >= min_minutes)
        ]

        # Get selected players
        selected = self.player_data[
            self.player_data['player_id'].isin(player_ids)
        ].copy()

        # Calculate percentiles
        for metric in self.metrics:
            if metric in benchmark.columns:
                selected[f'{metric}_pct'] = selected[metric].apply(
                    lambda x: stats.percentileofscore(benchmark[metric].dropna(), x)
                )

        return selected

    def create_radar_chart(self, percentile_data, ax=None):
        """Create radar chart comparing players."""

        if ax is None:
            fig, ax = plt.subplots(figsize=(8, 8), subplot_kw=dict(projection='polar'))

        # Prepare data
        pct_cols = [f'{m}_pct' for m in self.metrics if f'{m}_pct' in percentile_data.columns]
        labels = [m.replace('_p90', '').replace('_', ' ').title() for m in self.metrics]

        # Number of variables
        num_vars = len(pct_cols)
        angles = np.linspace(0, 2 * np.pi, num_vars, endpoint=False).tolist()
        angles += angles[:1]  # Complete the loop

        # Colors for each player
        colors = ['#1B5E20', '#C62828', '#1565C0', '#FF8F00']

        for idx, (_, player) in enumerate(percentile_data.iterrows()):
            values = player[pct_cols].values.tolist()
            values += values[:1]  # Complete the loop

            ax.plot(angles, values, 'o-', linewidth=2,
                   label=player['player_name'], color=colors[idx % len(colors)])
            ax.fill(angles, values, alpha=0.15, color=colors[idx % len(colors)])

        # Configure chart
        ax.set_xticks(angles[:-1])
        ax.set_xticklabels(labels, size=10)
        ax.set_ylim(0, 100)
        ax.set_yticks([25, 50, 75, 90])
        ax.set_yticklabels(['25%', '50%', '75%', '90%'], size=8)
        ax.legend(loc='upper right', bbox_to_anchor=(1.3, 1.0))
        ax.set_title('Player Comparison - Percentile Rankings', size=14, fontweight='bold', y=1.08)

        return ax

    def create_percentile_bars(self, percentile_data, ax=None):
        """Create horizontal bar chart of percentile rankings."""

        if ax is None:
            fig, ax = plt.subplots(figsize=(10, 8))

        pct_cols = [f'{m}_pct' for m in self.metrics if f'{m}_pct' in percentile_data.columns]
        labels = [m.replace('_p90_pct', '').replace('_', ' ').title() for m in pct_cols]

        x = np.arange(len(labels))
        width = 0.25
        colors = ['#1B5E20', '#C62828', '#1565C0']

        for idx, (_, player) in enumerate(percentile_data.iterrows()):
            values = player[pct_cols].values
            offset = (idx - len(percentile_data) / 2 + 0.5) * width
            ax.barh(x + offset, values, width, label=player['player_name'],
                   color=colors[idx % len(colors)], alpha=0.8)

        # Add reference lines
        for pct in [25, 50, 75, 90]:
            ax.axvline(pct, color='gray', linestyle='--', alpha=0.5, linewidth=1)

        ax.set_yticks(x)
        ax.set_yticklabels(labels)
        ax.set_xlabel('Percentile')
        ax.set_xlim(0, 100)
        ax.legend(loc='lower right')
        ax.set_title('Percentile Rankings by Metric', fontweight='bold')

        # Add tier labels
        ax.text(12.5, len(labels), 'Poor', ha='center', va='bottom', fontsize=8, color='gray')
        ax.text(37.5, len(labels), 'Below\nAvg', ha='center', va='bottom', fontsize=8, color='gray')
        ax.text(62.5, len(labels), 'Avg', ha='center', va='bottom', fontsize=8, color='gray')
        ax.text(82.5, len(labels), 'Above\nAvg', ha='center', va='bottom', fontsize=8, color='gray')
        ax.text(95, len(labels), 'Elite', ha='center', va='bottom', fontsize=8, color='gray')

        return ax

    def generate_summary(self, percentile_data):
        """Generate text summary of comparison."""

        pct_cols = [f'{m}_pct' for m in self.metrics if f'{m}_pct' in percentile_data.columns]

        summary = "PLAYER COMPARISON SUMMARY\n"
        summary += "=" * 25 + "\n\n"

        for _, player in percentile_data.iterrows():
            values = {col.replace('_p90_pct', ''): player[col]
                     for col in pct_cols if col in player.index}

            # Sort to find strengths and weaknesses
            sorted_metrics = sorted(values.items(), key=lambda x: x[1], reverse=True)

            strengths = sorted_metrics[:2]
            weaknesses = sorted_metrics[-2:]

            summary += f"{player['player_name']}\n"
            summary += f"Strengths: {strengths[0][0]} ({strengths[0][1]:.0f}%), "
            summary += f"{strengths[1][0]} ({strengths[1][1]:.0f}%)\n"
            summary += f"Weaknesses: {weaknesses[0][0]} ({weaknesses[0][1]:.0f}%), "
            summary += f"{weaknesses[1][0]} ({weaknesses[1][1]:.0f}%)\n\n"

        return summary

    def create_full_dashboard(self, player_ids, position='Forward'):
        """Create complete comparison dashboard."""

        # Calculate percentiles
        percentile_data = self.calculate_percentiles(player_ids, position)

        # Create figure with subplots
        fig = plt.figure(figsize=(16, 10))

        # Radar chart
        ax1 = fig.add_subplot(121, projection='polar')
        self.create_radar_chart(percentile_data, ax1)

        # Percentile bars
        ax2 = fig.add_subplot(122)
        self.create_percentile_bars(percentile_data, ax2)

        plt.suptitle(f'Player Comparison Dashboard - {position}s', fontsize=16, fontweight='bold', y=1.02)
        plt.tight_layout()

        # Print summary
        print(self.generate_summary(percentile_data))

        return fig, percentile_data


# Demo with simulated data
np.random.seed(42)

# Create sample player data
n_players = 100
player_data = pd.DataFrame({
    'player_id': range(1, n_players + 1),
    'player_name': [f'Player_{i}' for i in range(1, n_players + 1)],
    'position': np.random.choice(['Forward', 'Midfielder', 'Defender'], n_players,
                                  p=[0.3, 0.4, 0.3]),
    'minutes': np.random.randint(500, 3000, n_players),
    'goals_p90': np.abs(np.random.normal(0.3, 0.15, n_players)),
    'assists_p90': np.abs(np.random.normal(0.2, 0.1, n_players)),
    'xg_p90': np.abs(np.random.normal(0.35, 0.12, n_players)),
    'xa_p90': np.abs(np.random.normal(0.18, 0.08, n_players)),
    'passes_p90': np.abs(np.random.normal(40, 10, n_players)),
    'key_passes_p90': np.abs(np.random.normal(1.5, 0.5, n_players)),
    'dribbles_p90': np.abs(np.random.normal(2, 0.8, n_players))
})

# Name specific players
player_data.loc[0, 'player_name'] = 'Messi Jr.'
player_data.loc[14, 'player_name'] = 'Ronaldo Jr.'
player_data.loc[41, 'player_name'] = 'Mbappe Jr.'

# Create dashboard
print("=== PLAYER COMPARISON DASHBOARD ===\n")

dashboard = PlayerComparisonDashboard(player_data)
fig, percentile_data = dashboard.create_full_dashboard(
    player_ids=[1, 15, 42],
    position='Forward'
)

plt.savefig('player_comparison_dashboard.png', dpi=150, bbox_inches='tight')
plt.show()

# Show the comparison data
print("\nPercentile Rankings:")
pct_cols = [col for col in percentile_data.columns if '_pct' in col]
print(percentile_data[['player_name'] + pct_cols].round(1).to_string(index=False))
library(tidyverse)
library(fmsb)
library(ggplot2)
library(gridExtra)

# Player Comparison Dashboard
create_player_comparison_dashboard <- function(player_data, player_ids, position = "Forward") {

  # Filter to selected players
  selected <- player_data %>%
    filter(player_id %in% player_ids)

  # Get position benchmarks
  benchmarks <- player_data %>%
    filter(position == !!position, minutes >= 900) %>%
    summarise(across(where(is.numeric), list(
      mean = ~mean(., na.rm = TRUE),
      p90 = ~quantile(., 0.9, na.rm = TRUE),
      p10 = ~quantile(., 0.1, na.rm = TRUE)
    )))

  # Calculate percentiles for each player
  metrics <- c("goals_p90", "assists_p90", "xg_p90", "xa_p90",
               "passes_p90", "key_passes_p90", "dribbles_p90")

  percentile_data <- selected %>%
    rowwise() %>%
    mutate(across(all_of(metrics), ~{
      all_values <- player_data[[cur_column()]][player_data$position == position &
                                                  player_data$minutes >= 900]
      ecdf(all_values)(.) * 100
    }, .names = "{.col}_pct")) %>%
    ungroup()

  list(
    players = selected,
    percentiles = percentile_data,
    benchmarks = benchmarks,
    radar_data = prepare_radar_data(percentile_data, metrics)
  )
}

# Prepare radar chart data
prepare_radar_data <- function(percentile_data, metrics) {

  metric_pcts <- paste0(metrics, "_pct")

  radar_df <- percentile_data %>%
    select(player_name, all_of(metric_pcts)) %>%
    column_to_rownames("player_name")

  # Radar chart needs max and min rows
  radar_df <- rbind(
    rep(100, ncol(radar_df)),  # Max
    rep(0, ncol(radar_df)),    # Min
    radar_df
  )

  # Clean column names for display
  colnames(radar_df) <- gsub("_p90_pct", "", colnames(radar_df)) %>%
    gsub("_", " ", .) %>%
    str_to_title()

  radar_df
}

# Create radar chart comparison
plot_radar_comparison <- function(radar_data, colors = c("#1B5E20", "#C62828", "#1565C0")) {

  n_players <- nrow(radar_data) - 2

  # Set up plot parameters
  par(mfrow = c(1, 1), mar = c(2, 2, 2, 2))

  radarchart(radar_data,
             axistype = 1,
             pcol = colors[1:n_players],
             pfcol = adjustcolor(colors[1:n_players], alpha = 0.2),
             plwd = 2,
             plty = 1,
             cglcol = "grey",
             cglty = 1,
             axislabcol = "grey",
             caxislabels = c("0%", "25%", "50%", "75%", "100%"),
             vlcex = 0.8,
             title = "Player Comparison - Percentile Ranks")

  legend("topright",
         legend = rownames(radar_data)[3:nrow(radar_data)],
         col = colors[1:n_players],
         lty = 1, lwd = 2,
         cex = 0.8)
}

# Create trend analysis
plot_performance_trends <- function(match_data, player_ids, metric = "xg") {

  trend_data <- match_data %>%
    filter(player_id %in% player_ids) %>%
    arrange(player_name, match_date) %>%
    group_by(player_name) %>%
    mutate(
      rolling_avg = zoo::rollmean(get(metric), k = 5, fill = NA, align = "right"),
      match_num = row_number()
    ) %>%
    ungroup()

  ggplot(trend_data, aes(x = match_num, y = rolling_avg, color = player_name)) +
    geom_line(linewidth = 1.2) +
    geom_point(aes(y = get(metric)), alpha = 0.4, size = 2) +
    labs(
      title = paste("Performance Trend -", str_to_title(metric)),
      subtitle = "5-match rolling average",
      x = "Match Number",
      y = metric,
      color = "Player"
    ) +
    theme_minimal() +
    theme(legend.position = "bottom") +
    scale_color_manual(values = c("#1B5E20", "#C62828", "#1565C0"))
}

# Create percentile bar chart
plot_percentile_bars <- function(percentile_data, metrics) {

  metric_pcts <- paste0(metrics, "_pct")

  long_data <- percentile_data %>%
    select(player_name, all_of(metric_pcts)) %>%
    pivot_longer(-player_name, names_to = "metric", values_to = "percentile") %>%
    mutate(
      metric = gsub("_p90_pct", "", metric) %>% gsub("_", " ", .) %>% str_to_title(),
      tier = case_when(
        percentile >= 90 ~ "Elite (90%+)",
        percentile >= 75 ~ "Above Avg (75-90%)",
        percentile >= 50 ~ "Average (50-75%)",
        percentile >= 25 ~ "Below Avg (25-50%)",
        TRUE ~ "Poor (<25%)"
      )
    )

  ggplot(long_data, aes(x = reorder(metric, percentile), y = percentile, fill = player_name)) +
    geom_col(position = "dodge", alpha = 0.8) +
    geom_hline(yintercept = c(25, 50, 75, 90), linetype = "dashed", alpha = 0.5) +
    coord_flip() +
    labs(
      title = "Percentile Rankings by Metric",
      x = NULL, y = "Percentile",
      fill = "Player"
    ) +
    scale_fill_manual(values = c("#1B5E20", "#C62828", "#1565C0")) +
    scale_y_continuous(breaks = c(0, 25, 50, 75, 90, 100)) +
    theme_minimal() +
    theme(legend.position = "bottom")
}

# Generate comparison summary
generate_comparison_summary <- function(dashboard_data) {

  players <- dashboard_data$percentiles

  summary <- "PLAYER COMPARISON SUMMARY\n"
  summary <- paste0(summary, "=========================\n\n")

  for (i in 1:nrow(players)) {
    p <- players[i, ]

    # Find strengths and weaknesses
    pct_cols <- grep("_pct$", names(p), value = TRUE)
    pct_values <- as.numeric(p[, pct_cols])
    names(pct_values) <- gsub("_p90_pct", "", pct_cols)

    top_2 <- sort(pct_values, decreasing = TRUE)[1:2]
    bottom_2 <- sort(pct_values)[1:2]

    summary <- paste0(summary, p$player_name, "\n")
    summary <- paste0(summary, "Strengths: ", paste(names(top_2), collapse = ", "),
                      " (", paste(round(top_2), collapse = "%, "), "%)\n")
    summary <- paste0(summary, "Weaknesses: ", paste(names(bottom_2), collapse = ", "),
                      " (", paste(round(bottom_2), collapse = "%, "), "%)\n\n")
  }

  cat(summary)
  invisible(summary)
}

# Demo with simulated data
set.seed(42)

# Create sample player data
n_players <- 100
player_data <- tibble(
  player_id = 1:n_players,
  player_name = paste0("Player_", 1:n_players),
  position = sample(c("Forward", "Midfielder", "Defender"), n_players, replace = TRUE,
                    prob = c(0.3, 0.4, 0.3)),
  minutes = sample(500:3000, n_players),
  goals_p90 = abs(rnorm(n_players, 0.3, 0.15)),
  assists_p90 = abs(rnorm(n_players, 0.2, 0.1)),
  xg_p90 = abs(rnorm(n_players, 0.35, 0.12)),
  xa_p90 = abs(rnorm(n_players, 0.18, 0.08)),
  passes_p90 = abs(rnorm(n_players, 40, 10)),
  key_passes_p90 = abs(rnorm(n_players, 1.5, 0.5)),
  dribbles_p90 = abs(rnorm(n_players, 2, 0.8))
)

# Create dashboard for 3 players
cat("=== PLAYER COMPARISON DASHBOARD ===\n\n")

# Select players to compare
compare_ids <- c(1, 15, 42)
player_data$player_name[compare_ids] <- c("Messi Jr.", "Ronaldo Jr.", "Mbappe Jr.")

dashboard <- create_player_comparison_dashboard(
  player_data, compare_ids, position = "Forward"
)

# Generate summary
generate_comparison_summary(dashboard)

# Create visualizations
par(mfrow = c(1, 1))
plot_radar_comparison(dashboard$radar_data)

# Percentile bars
metrics <- c("goals_p90", "assists_p90", "xg_p90", "xa_p90",
             "passes_p90", "key_passes_p90", "dribbles_p90")
p <- plot_percentile_bars(dashboard$percentiles, metrics)
print(p)
Exercise 30.3: Pre-Match Opposition Scout Report Generator

Task: Build an automated system that generates comprehensive opposition scout reports with tactical analysis, key players to watch, and data-driven recommendations.

Requirements:

  • Analyze opposition's recent form and key metrics
  • Identify tactical patterns (formation, pressing style, build-up)
  • Highlight top performers and their tendencies
  • Generate specific tactical recommendations with supporting data
  • Create visual pitch diagrams showing key areas

scout_report_generator.R
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

class ScoutReportGenerator:
    """Automated opposition scout report generator."""

    def __init__(self, match_data, player_data):
        self.match_data = match_data
        self.player_data = player_data

    def generate_report(self, team_name):
        """Generate comprehensive scout report for opposition team."""

        report = {
            'team': team_name,
            'generated': datetime.now()
        }

        # Filter to team's recent matches
        team_matches = self.match_data[
            self.match_data['team'] == team_name
        ].sort_values('date', ascending=False).head(5)

        # Calculate team profile
        report['profile'] = self._calculate_team_profile(team_matches)

        # Key players
        report['key_players'] = self._identify_key_players(team_name)

        # Tactical patterns
        report['tactics'] = self._analyze_tactical_patterns(team_matches)

        # Recommendations
        report['recommendations'] = self._generate_recommendations(report)

        return report

    def _calculate_team_profile(self, matches):
        """Calculate team's recent profile."""

        recent_form = {
            'wins': (matches['result'] == 'W').sum(),
            'draws': (matches['result'] == 'D').sum(),
            'losses': (matches['result'] == 'L').sum(),
            'goals_for': matches['goals_for'].sum(),
            'goals_against': matches['goals_against'].sum(),
            'xg_for': matches['xg_for'].sum(),
            'xg_against': matches['xg_against'].sum(),
            'avg_possession': matches['possession'].mean()
        }

        style = {
            'avg_ppda': matches['ppda'].mean(),
            'avg_passes': matches['passes'].mean(),
            'avg_shots': matches['shots'].mean(),
            'avg_crosses': matches['crosses'].mean(),
            'pass_completion': matches['pass_completion'].mean(),
            'directness': (matches['direct_attacks'] / matches['passes'] * 100).mean()
        }

        return {'recent_form': recent_form, 'style': style}

    def _identify_key_players(self, team_name, top_n=5):
        """Identify most dangerous players."""

        team_players = self.player_data[
            (self.player_data['team'] == team_name) &
            (self.player_data['minutes'] >= 450)
        ].copy()

        # Calculate contribution score
        team_players['contribution_score'] = (
            team_players['goals'] * 3 +
            team_players['assists'] * 2 +
            team_players['key_passes'] * 0.5 +
            team_players['successful_dribbles'] * 0.3 +
            team_players['tackles_won'] * 0.2
        )

        # Assign threat level
        team_players = team_players.sort_values('contribution_score', ascending=False).head(top_n)

        threshold_high = team_players['contribution_score'].quantile(0.8)
        threshold_med = team_players['contribution_score'].quantile(0.5)

        team_players['threat_level'] = team_players['contribution_score'].apply(
            lambda x: 'High' if x > threshold_high else ('Medium' if x > threshold_med else 'Monitor')
        )

        return team_players[['player_name', 'position', 'minutes', 'goals', 'assists',
                            'xg', 'xa', 'key_passes', 'contribution_score', 'threat_level']]

    def _analyze_tactical_patterns(self, matches):
        """Analyze team's tactical patterns."""

        # Primary formation
        formation = matches['formation'].mode().iloc[0] if len(matches) > 0 else '4-3-3'

        # Build-up style
        if matches['passes'].mean() > 500 and matches['possession'].mean() > 55:
            build_up = 'Possession-based'
        elif matches['direct_attacks'].mean() > 20:
            build_up = 'Direct/Counter'
        else:
            build_up = 'Balanced'

        # Pressing style
        avg_ppda = matches['ppda'].mean()
        if avg_ppda < 10:
            pressing = 'High Press'
        elif avg_ppda > 15:
            pressing = 'Low Block'
        else:
            pressing = 'Mid Block'

        # Attack zones
        attack_zones = {
            'left_wing': matches['attacks_left'].mean(),
            'center': matches['attacks_center'].mean(),
            'right_wing': matches['attacks_right'].mean()
        }
        primary_attack = max(attack_zones, key=attack_zones.get)

        return {
            'formation': formation,
            'build_up_style': build_up,
            'pressing_style': pressing,
            'primary_attack_zone': primary_attack,
            'attack_zones': attack_zones,
            'set_piece_threat': matches['set_piece_goals'].mean() > 0.5
        }

    def _generate_recommendations(self, report):
        """Generate tactical recommendations based on analysis."""

        recs = {}

        # Based on pressing style
        if report['tactics']['pressing_style'] == 'High Press':
            recs['against_press'] = {
                'recommendation': 'Play through or over the press',
                'detail': 'They press high - use quick passing combinations or long balls behind',
                'supporting_data': f"PPDA: {report['profile']['style']['avg_ppda']:.1f}"
            }
        elif report['tactics']['pressing_style'] == 'Low Block':
            recs['against_block'] = {
                'recommendation': 'Patient build-up, exploit wide areas',
                'detail': 'They sit deep - circulate ball and stretch them horizontally',
                'supporting_data': f"PPDA: {report['profile']['style']['avg_ppda']:.1f}"
            }

        # Based on build-up style
        if report['tactics']['build_up_style'] == 'Possession-based':
            recs['disrupt_possession'] = {
                'recommendation': 'Press their build-up triggers',
                'detail': 'Target their center-backs and defensive midfielder when receiving',
                'supporting_data': f"Avg passes: {report['profile']['style']['avg_passes']:.0f}"
            }

        # Based on attack zones
        zones = report['tactics']['attack_zones']
        if report['tactics']['primary_attack_zone'] == 'left_wing':
            recs['defend_flank'] = {
                'recommendation': 'Reinforce right defensive side',
                'detail': 'Their primary threat comes down the left - double up if needed',
                'supporting_data': f"{zones['left_wing']:.1f}% attacks from left"
            }
        elif report['tactics']['primary_attack_zone'] == 'right_wing':
            recs['defend_flank'] = {
                'recommendation': 'Reinforce left defensive side',
                'detail': 'Their primary threat comes down the right - double up if needed',
                'supporting_data': f"{zones['right_wing']:.1f}% attacks from right"
            }

        # Key player focus
        key_player = report['key_players'].iloc[0]
        recs['key_man'] = {
            'recommendation': f"Man-mark or limit service to {key_player['player_name']}",
            'detail': f"Their most dangerous player with {int(key_player['goals'])}G and {int(key_player['assists'])}A",
            'supporting_data': f"Contribution score: {key_player['contribution_score']:.1f}"
        }

        return recs

    def format_report(self, report):
        """Format report for display."""

        output = f"""
═══════════════════════════════════════════════
  OPPOSITION SCOUT REPORT: {report['team'].upper()}
═══════════════════════════════════════════════

RECENT FORM (Last 5 Matches)
──────────────────────────────
Record: {report['profile']['recent_form']['wins']}W-{report['profile']['recent_form']['draws']}D-{report['profile']['recent_form']['losses']}L
Goals: {report['profile']['recent_form']['goals_for']} scored, {report['profile']['recent_form']['goals_against']} conceded
xG: {report['profile']['recent_form']['xg_for']:.2f} for, {report['profile']['recent_form']['xg_against']:.2f} against
Avg Possession: {report['profile']['recent_form']['avg_possession']:.1f}%

TACTICAL PROFILE
──────────────────────────────
Formation: {report['tactics']['formation']}
Build-up: {report['tactics']['build_up_style']}
Defensive: {report['tactics']['pressing_style']}
Primary Attack: {report['tactics']['primary_attack_zone']}

KEY PLAYERS TO WATCH
──────────────────────────────"""

        for idx, (_, player) in enumerate(report['key_players'].head(3).iterrows(), 1):
            output += f"""
{idx}. {player['player_name']} ({player['position']})
   {int(player['goals'])}G, {int(player['assists'])}A | Threat: {player['threat_level']}"""

        output += """

TACTICAL RECOMMENDATIONS
──────────────────────────────"""

        for rec_name, rec in report['recommendations'].items():
            output += f"""
▸ {rec['recommendation']}
  {rec['detail']}
  [{rec['supporting_data']}]
"""

        output += f"""
═══════════════════════════════════════════════
Generated: {report['generated'].strftime('%Y-%m-%d %H:%M')}
"""

        return output


# Demo with simulated data
np.random.seed(42)

# Create simulated match data
dates = [datetime.now() - timedelta(days=x*4) for x in range(20)]

match_data = pd.DataFrame({
    'match_id': range(1, 21),
    'date': dates,
    'team': ['Barcelona'] * 10 + ['Real Madrid'] * 10,
    'opponent': ['Real Madrid'] * 10 + ['Barcelona'] * 10,
    'result': np.random.choice(['W', 'D', 'L'], 20, p=[0.5, 0.25, 0.25]),
    'goals_for': np.random.randint(0, 5, 20),
    'goals_against': np.random.randint(0, 4, 20),
    'xg_for': np.random.uniform(0.8, 2.5, 20),
    'xg_against': np.random.uniform(0.5, 2.0, 20),
    'possession': np.random.uniform(45, 70, 20),
    'passes': np.random.randint(400, 600, 20),
    'pass_completion': np.random.uniform(75, 90, 20),
    'shots': np.random.randint(8, 20, 20),
    'crosses': np.random.randint(10, 30, 20),
    'ppda': np.random.uniform(8, 18, 20),
    'direct_attacks': np.random.randint(10, 25, 20),
    'attacks_left': np.random.uniform(25, 40, 20),
    'attacks_center': np.random.uniform(20, 35, 20),
    'attacks_right': np.random.uniform(25, 40, 20),
    'set_piece_goals': np.random.randint(0, 3, 20),
    'formation': np.random.choice(['4-3-3', '4-4-2', '3-5-2'], 20, p=[0.6, 0.3, 0.1])
})

# Create simulated player data
player_data = pd.DataFrame({
    'player_name': ['Lewandowski', 'Pedri', 'Gavi', 'Araujo', 'Raphinha',
                   'Vinicius Jr.', 'Bellingham', 'Rodrygo', 'Valverde', 'Modric'],
    'team': ['Barcelona'] * 5 + ['Real Madrid'] * 5,
    'position': ['Forward', 'Midfielder', 'Midfielder', 'Defender', 'Forward',
                'Forward', 'Midfielder', 'Forward', 'Midfielder', 'Midfielder'],
    'minutes': np.random.randint(800, 2500, 10),
    'goals': [15, 5, 3, 2, 8, 12, 10, 7, 4, 3],
    'assists': [4, 8, 6, 1, 5, 7, 6, 5, 8, 9],
    'xg': [12.5, 4.2, 2.8, 1.5, 7.2, 10.8, 8.5, 6.2, 3.5, 2.8],
    'xa': [3.2, 7.5, 5.8, 0.8, 4.5, 6.2, 5.5, 4.8, 7.2, 8.5],
    'key_passes': [25, 65, 48, 12, 42, 55, 52, 38, 58, 62],
    'successful_dribbles': [18, 35, 42, 8, 52, 85, 45, 55, 28, 22],
    'tackles_won': [12, 45, 52, 85, 18, 15, 42, 20, 55, 38]
})

# Generate report
print("=== OPPOSITION SCOUT REPORT GENERATOR ===\n")

generator = ScoutReportGenerator(match_data, player_data)
report = generator.generate_report('Real Madrid')
formatted_report = generator.format_report(report)
print(formatted_report)
library(tidyverse)
library(ggplot2)
library(gridExtra)

# Opposition Scout Report Generator
generate_scout_report <- function(team_name, match_data, player_data) {

  report <- list()
  report$team <- team_name
  report$generated <- Sys.time()

  # Filter to team's recent matches
  team_matches <- match_data %>%
    filter(team == team_name) %>%
    arrange(desc(date)) %>%
    head(5)

  # Calculate team profile
  report$profile <- calculate_team_profile(team_matches)

  # Key players
  report$key_players <- identify_key_players(player_data, team_name)

  # Tactical patterns
  report$tactics <- analyze_tactical_patterns(team_matches)

  # Recommendations
  report$recommendations <- generate_tactical_recommendations(report)

  report
}

# Team Profile Calculator
calculate_team_profile <- function(matches) {

  list(
    # Recent form
    recent_form = matches %>%
      summarise(
        wins = sum(result == "W"),
        draws = sum(result == "D"),
        losses = sum(result == "L"),
        goals_for = sum(goals_for),
        goals_against = sum(goals_against),
        xg_for = sum(xg_for),
        xg_against = sum(xg_against),
        avg_possession = mean(possession)
      ),

    # Style metrics
    style = matches %>%
      summarise(
        avg_ppda = mean(ppda, na.rm = TRUE),  # Pressing intensity
        avg_passes = mean(passes),
        avg_shots = mean(shots),
        avg_crosses = mean(crosses),
        pass_completion = mean(pass_completion),
        directness = mean(direct_attacks / passes * 100)  # % direct play
      )
  )
}

# Key Players Identifier
identify_key_players <- function(player_data, team_name, top_n = 5) {

  team_players <- player_data %>%
    filter(team == team_name, minutes >= 450) %>%
    mutate(
      # Calculate contribution score
      contribution_score = goals * 3 + assists * 2 + key_passes * 0.5 +
                          successful_dribbles * 0.3 + tackles_won * 0.2
    ) %>%
    arrange(desc(contribution_score)) %>%
    head(top_n)

  team_players %>%
    select(player_name, position, minutes, goals, assists, xg, xa,
           key_passes, contribution_score) %>%
    mutate(
      threat_level = case_when(
        contribution_score > quantile(contribution_score, 0.8) ~ "High",
        contribution_score > quantile(contribution_score, 0.5) ~ "Medium",
        TRUE ~ "Monitor"
      )
    )
}

# Tactical Pattern Analyzer
analyze_tactical_patterns <- function(matches) {

  # Determine primary formation
  formation <- matches$formation %>%
    table() %>%
    sort(decreasing = TRUE) %>%
    names() %>%
    head(1)

  # Build-up style
  if (mean(matches$passes) > 500 && mean(matches$possession) > 55) {
    build_up <- "Possession-based"
  } else if (mean(matches$direct_attacks) > 20) {
    build_up <- "Direct/Counter"
  } else {
    build_up <- "Balanced"
  }

  # Pressing style
  if (mean(matches$ppda, na.rm = TRUE) < 10) {
    pressing <- "High Press"
  } else if (mean(matches$ppda, na.rm = TRUE) > 15) {
    pressing <- "Low Block"
  } else {
    pressing <- "Mid Block"
  }

  # Attack zones
  attack_zones <- list(
    left_wing = mean(matches$attacks_left, na.rm = TRUE),
    center = mean(matches$attacks_center, na.rm = TRUE),
    right_wing = mean(matches$attacks_right, na.rm = TRUE)
  )

  primary_attack <- names(attack_zones)[which.max(unlist(attack_zones))]

  list(
    formation = formation,
    build_up_style = build_up,
    pressing_style = pressing,
    primary_attack_zone = primary_attack,
    attack_zones = attack_zones,
    set_piece_threat = mean(matches$set_piece_goals, na.rm = TRUE) > 0.5
  )
}

# Tactical Recommendations Generator
generate_tactical_recommendations <- function(report) {

  recs <- list()

  # Based on pressing style
  if (report$tactics$pressing_style == "High Press") {
    recs$against_press <- list(
      recommendation = "Play through or over the press",
      detail = "They press high - use quick passing combinations or long balls behind",
      supporting_data = paste("PPDA:", round(report$profile$style$avg_ppda, 1))
    )
  } else if (report$tactics$pressing_style == "Low Block") {
    recs$against_block <- list(
      recommendation = "Patient build-up, exploit wide areas",
      detail = "They sit deep - circulate ball and stretch them horizontally",
      supporting_data = paste("PPDA:", round(report$profile$style$avg_ppda, 1))
    )
  }

  # Based on build-up style
  if (report$tactics$build_up_style == "Possession-based") {
    recs$disrupt_possession <- list(
      recommendation = "Press their build-up triggers",
      detail = "Target their center-backs and defensive midfielder when receiving",
      supporting_data = paste("Avg passes:", round(report$profile$style$avg_passes))
    )
  }

  # Based on attack zones
  if (report$tactics$primary_attack_zone == "left_wing") {
    recs$defend_left <- list(
      recommendation = "Reinforce right defensive side",
      detail = "Their primary threat comes down the left - double up if needed",
      supporting_data = paste(round(report$tactics$attack_zones$left_wing, 1), "% attacks from left")
    )
  } else if (report$tactics$primary_attack_zone == "right_wing") {
    recs$defend_right <- list(
      recommendation = "Reinforce left defensive side",
      detail = "Their primary threat comes down the right - double up if needed",
      supporting_data = paste(round(report$tactics$attack_zones$right_wing, 1), "% attacks from right")
    )
  }

  # Set pieces
  if (report$tactics$set_piece_threat) {
    recs$set_pieces <- list(
      recommendation = "Extra attention on set pieces",
      detail = "Significant goal threat from corners and free kicks",
      supporting_data = "Above average set piece goals"
    )
  }

  # Key player focus
  top_player <- report$key_players$player_name[1]
  recs$key_man <- list(
    recommendation = paste("Man-mark or limit service to", top_player),
    detail = paste("Their most dangerous player with",
                   report$key_players$goals[1], "goals and",
                   report$key_players$assists[1], "assists"),
    supporting_data = paste("Contribution score:", round(report$key_players$contribution_score[1], 1))
  )

  recs
}

# Format report for output
format_scout_report <- function(report) {

  output <- paste0(
    "═══════════════════════════════════════════════\n",
    "  OPPOSITION SCOUT REPORT: ", toupper(report$team), "\n",
    "═══════════════════════════════════════════════\n\n",

    "RECENT FORM (Last 5 Matches)\n",
    "──────────────────────────────\n",
    "Record: ", report$profile$recent_form$wins, "W-",
    report$profile$recent_form$draws, "D-",
    report$profile$recent_form$losses, "L\n",
    "Goals: ", report$profile$recent_form$goals_for, " scored, ",
    report$profile$recent_form$goals_against, " conceded\n",
    "xG: ", round(report$profile$recent_form$xg_for, 2), " for, ",
    round(report$profile$recent_form$xg_against, 2), " against\n",
    "Avg Possession: ", round(report$profile$recent_form$avg_possession, 1), "%\n\n",

    "TACTICAL PROFILE\n",
    "──────────────────────────────\n",
    "Formation: ", report$tactics$formation, "\n",
    "Build-up: ", report$tactics$build_up_style, "\n",
    "Defensive: ", report$tactics$pressing_style, "\n",
    "Primary Attack: ", report$tactics$primary_attack_zone, "\n\n",

    "KEY PLAYERS TO WATCH\n",
    "──────────────────────────────\n"
  )

  for (i in 1:min(3, nrow(report$key_players))) {
    p <- report$key_players[i, ]
    output <- paste0(output,
                     i, ". ", p$player_name, " (", p$position, ")\n",
                     "   ", p$goals, "G, ", p$assists, "A | ",
                     "Threat: ", p$threat_level, "\n")
  }

  output <- paste0(output,
                   "\nTACTICAL RECOMMENDATIONS\n",
                   "──────────────────────────────\n")

  for (rec_name in names(report$recommendations)) {
    rec <- report$recommendations[[rec_name]]
    output <- paste0(output,
                     "▸ ", rec$recommendation, "\n",
                     "  ", rec$detail, "\n",
                     "  [", rec$supporting_data, "]\n\n")
  }

  output <- paste0(output,
                   "═══════════════════════════════════════════════\n",
                   "Generated: ", format(report$generated, "%Y-%m-%d %H:%M"), "\n")

  cat(output)
  invisible(output)
}

# Demo with simulated data
set.seed(42)

# Simulated match data
match_data <- tibble(
  match_id = 1:20,
  date = seq(Sys.Date() - 40, Sys.Date(), length.out = 20),
  team = rep(c("Barcelona", "Real Madrid"), each = 10),
  opponent = rep(c("Real Madrid", "Barcelona"), each = 10),
  result = sample(c("W", "D", "L"), 20, replace = TRUE, prob = c(0.5, 0.25, 0.25)),
  goals_for = sample(0:4, 20, replace = TRUE),
  goals_against = sample(0:3, 20, replace = TRUE),
  xg_for = runif(20, 0.8, 2.5),
  xg_against = runif(20, 0.5, 2.0),
  possession = runif(20, 45, 70),
  passes = sample(400:600, 20, replace = TRUE),
  pass_completion = runif(20, 75, 90),
  shots = sample(8:20, 20, replace = TRUE),
  crosses = sample(10:30, 20, replace = TRUE),
  ppda = runif(20, 8, 18),
  direct_attacks = sample(10:25, 20, replace = TRUE),
  attacks_left = runif(20, 25, 40),
  attacks_center = runif(20, 20, 35),
  attacks_right = runif(20, 25, 40),
  set_piece_goals = sample(0:2, 20, replace = TRUE),
  formation = sample(c("4-3-3", "4-4-2", "3-5-2"), 20, replace = TRUE, prob = c(0.6, 0.3, 0.1))
)

# Simulated player data
player_data <- tibble(
  player_name = c("Lewandowski", "Pedri", "Gavi", "Araujo", "Raphinha",
                  "Vinicius Jr.", "Bellingham", "Rodrygo", "Valverde", "Modric"),
  team = rep(c("Barcelona", "Real Madrid"), each = 5),
  position = c("Forward", "Midfielder", "Midfielder", "Defender", "Forward",
               "Forward", "Midfielder", "Forward", "Midfielder", "Midfielder"),
  minutes = sample(800:2500, 10, replace = TRUE),
  goals = c(15, 5, 3, 2, 8, 12, 10, 7, 4, 3),
  assists = c(4, 8, 6, 1, 5, 7, 6, 5, 8, 9),
  xg = c(12.5, 4.2, 2.8, 1.5, 7.2, 10.8, 8.5, 6.2, 3.5, 2.8),
  xa = c(3.2, 7.5, 5.8, 0.8, 4.5, 6.2, 5.5, 4.8, 7.2, 8.5),
  key_passes = c(25, 65, 48, 12, 42, 55, 52, 38, 58, 62),
  successful_dribbles = c(18, 35, 42, 8, 52, 85, 45, 55, 28, 22),
  tackles_won = c(12, 45, 52, 85, 18, 15, 42, 20, 55, 38)
)

# Generate report
cat("=== OPPOSITION SCOUT REPORT GENERATOR ===\n\n")

report <- generate_scout_report("Real Madrid", match_data, player_data)
format_scout_report(report)

Chapter Summary

Key Takeaways
  • Know your audience: Coaches, executives, scouts, and players all need different formats and levels of detail
  • Lead with the conclusion: Use pyramid structure—busy people read the first line
  • Tell a story: STAR framework (Situation, Task, Analysis, Recommendation) creates compelling narratives
  • Visualize appropriately: Simple for coaches, detailed for analysts—every chart should pass the "So What?" test
  • Build trust gradually: Start small, be right about something, admit when wrong, speak their language
  • Track your impact: Log recommendations and outcomes to demonstrate value over time
  • Systematize communication: Define deliverables, audiences, frequencies, and quality standards
Communication Principles
  • Clarity over comprehensiveness
  • Actionable over interesting
  • Evidence over opinion
  • Humble confidence over certainty
  • Visual over verbal when possible
  • Progressive disclosure of detail
  • Consistent style and format
  • Follow up on recommendations
Congratulations!

You've completed the Soccer Analytics textbook. The technical skills you've learned are powerful, but your impact will ultimately depend on how well you communicate insights to decision-makers. Keep practicing both the analysis and the communication—they're equally important for success in football analytics.