Chapter 60

Capstone - Complete Analytics System

Intermediate 30 min read 5 sections 10 code examples
0 of 60 chapters completed (0%)

Working in Football Analytics

The football analytics industry has grown from a handful of pioneers to a global profession. This chapter provides practical guidance on building a career in football analytics, from developing essential skills to landing your first role.

The Analytics Job Landscape

Football analytics roles exist across clubs, federations, media companies, betting firms, and data providers. Understanding where opportunities exist helps you target your career development.

Club Roles
  • Data Analyst: Day-to-day analysis support
  • Performance Analyst: Video and tactical analysis
  • Recruitment Analyst: Player scouting support
  • First Team Analyst: Opposition and match prep
  • Head of Analytics: Department leadership
Other Sectors
  • Data Providers: StatsBomb, Opta, Wyscout
  • Media: The Athletic, ESPN, broadcasters
  • Betting: Odds compilation, trading
  • Agencies: Player representation analytics
  • Federations: National team support
job_market
# Python: Analyze job market trends
import pandas as pd
import matplotlib.pyplot as plt

def analyze_job_market():
    """Analyze skill requirements in football analytics jobs."""

    # Example job requirements frequency from postings
    job_requirements = pd.DataFrame({
        "skill": ["Python", "R", "SQL", "Data Visualization",
                  "Statistics", "Machine Learning", "Football Knowledge",
                  "Communication", "Video Analysis", "Tableau/Power BI"],
        "frequency": [0.85, 0.45, 0.78, 0.72, 0.68, 0.42, 0.95, 0.88, 0.35, 0.52]
    })

    # Sort by frequency
    job_requirements = job_requirements.sort_values("frequency", ascending=True)

    # Visualize
    fig, ax = plt.subplots(figsize=(10, 6))

    colors = plt.cm.Greens(job_requirements["frequency"])
    ax.barh(job_requirements["skill"], job_requirements["frequency"], color=colors)

    ax.set_xlabel("Frequency in Job Postings")
    ax.set_title("Skills Required in Football Analytics Jobs")
    ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f"{x:.0%}"))

    plt.tight_layout()
    plt.show()

    return job_requirements

# Salary ranges by role (approximate)
salary_data = pd.DataFrame({
    "role": ["Junior Analyst", "Data Analyst", "Senior Analyst",
             "Lead Analyst", "Head of Analytics"],
    "min_salary": [25000, 35000, 50000, 65000, 80000],
    "max_salary": [35000, 50000, 70000, 90000, 150000],
    "experience_years": ["0-2", "2-4", "4-6", "6-8", "8+"]
})

print("Approximate Salary Ranges (GBP):")
print(salary_data.to_string(index=False))
# R: Analyze job market trends
library(tidyverse)
library(rvest)

# Scrape football analytics job postings (conceptual)
analyze_job_market <- function() {
  # Example job requirements frequency
  job_requirements <- tribble(
    ~skill, ~frequency,
    "Python", 0.85,
    "R", 0.45,
    "SQL", 0.78,
    "Data Visualization", 0.72,
    "Statistics", 0.68,
    "Machine Learning", 0.42,
    "Football Knowledge", 0.95,
    "Communication", 0.88,
    "Video Analysis", 0.35,
    "Tableau/Power BI", 0.52
  )

  # Visualize requirements
  job_requirements %>%
    mutate(skill = fct_reorder(skill, frequency)) %>%
    ggplot(aes(x = skill, y = frequency, fill = frequency)) +
    geom_col() +
    coord_flip() +
    scale_y_continuous(labels = scales::percent) +
    scale_fill_gradient(low = "#2E7D32", high = "#1B5E20") +
    labs(title = "Skills Required in Football Analytics Jobs",
         x = NULL, y = "Frequency in Job Postings") +
    theme_minimal() +
    theme(legend.position = "none")
}

analyze_job_market()
Output
Approximate Salary Ranges (GBP):
            role  min_salary  max_salary experience_years
  Junior Analyst       25000       35000              0-2
    Data Analyst       35000       50000              2-4
  Senior Analyst       50000       70000              4-6
    Lead Analyst       65000       90000              6-8
Head of Analytics      80000      150000               8+

Essential Skills

Success in football analytics requires a blend of technical expertise, football knowledge, and soft skills. Here's a comprehensive breakdown of what you need to develop.

Category Skill Importance How to Develop
Technical Python or R Essential Online courses, personal projects
SQL Essential Practice with football databases
Statistics Essential Coursera/edX courses, apply to football
Data Visualization High Create football visualizations regularly
Domain Football Knowledge Essential Watch matches analytically, read tactics blogs
Metrics Understanding Essential Study xG, xA, and advanced metrics
Data Sources High Work with StatsBomb, FBref, Wyscout
Soft Skills Communication Essential Write articles, present findings
Storytelling High Practice explaining complex concepts simply
Collaboration High Contribute to open source, join communities
skills_assessment
# Python: Skills assessment and learning path
import pandas as pd
import matplotlib.pyplot as plt

class SkillsAssessment:
    """Framework for assessing and developing analytics skills."""

    def __init__(self):
        self.skills = pd.DataFrame({
            "category": ["Technical"] * 5 + ["Domain"] * 3 + ["Soft"] * 3,
            "skill": [
                "Python/R Programming", "SQL Databases", "Statistics",
                "Machine Learning", "Data Visualization",
                "Football Tactics", "Advanced Metrics", "Data Sources",
                "Communication", "Presentation", "Project Management"
            ],
            "current_level": [3, 2, 3, 2, 4, 4, 3, 3, 3, 2, 2],
            "target_level": [5, 4, 4, 3, 5, 5, 5, 4, 5, 4, 3]
        })

        self.skills["gap"] = self.skills["target_level"] - self.skills["current_level"]

    def get_priority_skills(self):
        """Identify highest priority skills to develop."""
        return self.skills.nlargest(5, "gap")[["skill", "gap", "category"]]

    def create_learning_path(self):
        """Generate personalized learning recommendations."""

        recommendations = {
            "Python/R Programming": [
                "Complete 'Python for Data Analysis' by Wes McKinney",
                "Work through StatsBomb tutorials",
                "Build 3 personal football analytics projects"
            ],
            "SQL Databases": [
                "Complete Mode Analytics SQL tutorial",
                "Set up local PostgreSQL with football data",
                "Practice complex queries on FBref data"
            ],
            "Advanced Metrics": [
                "Read 'The Expected Goals Philosophy'",
                "Implement xG model from scratch",
                "Study StatsBomb methodology documentation"
            ],
            "Communication": [
                "Start a football analytics blog/Twitter",
                "Write monthly analysis pieces",
                "Present at local football analytics meetup"
            ]
        }

        priority_skills = self.get_priority_skills()["skill"].tolist()

        learning_path = []
        for skill in priority_skills:
            if skill in recommendations:
                learning_path.append({
                    "skill": skill,
                    "actions": recommendations[skill]
                })

        return learning_path

    def visualize_assessment(self):
        """Create radar chart of current skills."""
        from math import pi

        categories = self.skills["skill"].tolist()
        current = self.skills["current_level"].tolist()
        target = self.skills["target_level"].tolist()

        # Create radar chart
        angles = [n / float(len(categories)) * 2 * pi for n in range(len(categories))]
        angles += angles[:1]
        current += current[:1]
        target += target[:1]

        fig, ax = plt.subplots(figsize=(10, 10), subplot_kw=dict(polar=True))

        ax.plot(angles, current, "o-", linewidth=2, label="Current")
        ax.fill(angles, current, alpha=0.25)
        ax.plot(angles, target, "o-", linewidth=2, label="Target")
        ax.fill(angles, target, alpha=0.25)

        ax.set_xticks(angles[:-1])
        ax.set_xticklabels(categories, size=8)
        ax.set_ylim(0, 5)
        ax.legend(loc="upper right")
        ax.set_title("Skills Assessment")

        plt.tight_layout()
        plt.show()

# Usage
assessment = SkillsAssessment()
print("Priority Skills to Develop:")
print(assessment.get_priority_skills())

print("\nLearning Path:")
for item in assessment.create_learning_path():
    print(f"\n{item['skill']}:")
    for action in item["actions"]:
        print(f"  - {action}")
# R: Skills assessment framework
library(tidyverse)

# Self-assessment framework
assess_skills <- function() {
  # Rate yourself 1-5 on each skill
  skills_assessment <- tribble(
    ~category, ~skill, ~current_level, ~target_level,
    "Technical", "Python/R Programming", 3, 5,
    "Technical", "SQL Databases", 2, 4,
    "Technical", "Statistics", 3, 4,
    "Technical", "Machine Learning", 2, 3,
    "Technical", "Data Visualization", 4, 5,
    "Domain", "Football Tactics", 4, 5,
    "Domain", "Advanced Metrics (xG, etc)", 3, 5,
    "Domain", "Data Sources", 3, 4,
    "Soft", "Communication", 3, 5,
    "Soft", "Presentation", 2, 4,
    "Soft", "Project Management", 2, 3
  )

  # Calculate gaps
  skills_assessment <- skills_assessment %>%
    mutate(
      gap = target_level - current_level,
      priority = case_when(
        gap >= 2 & category == "Technical" ~ "High",
        gap >= 2 ~ "Medium",
        gap == 1 ~ "Low",
        TRUE ~ "Achieved"
      )
    )

  # Visualize gaps
  skills_assessment %>%
    ggplot(aes(x = reorder(skill, gap), y = gap, fill = category)) +
    geom_col() +
    coord_flip() +
    facet_wrap(~priority, scales = "free_y") +
    labs(title = "Skills Gap Analysis",
         x = NULL, y = "Gap (Target - Current)") +
    theme_minimal()
}
Output
Priority Skills to Develop:
                   skill  gap   category
  Advanced Metrics     2     Domain
  Communication        2       Soft
  Python/R Prog        2  Technical
  SQL Databases        2  Technical
  Presentation         2       Soft

Building Your Portfolio

A strong portfolio demonstrates your abilities better than any resume. Public work shows potential employers exactly what you can do and how you think about football.

portfolio_projects
# Python: Portfolio project tracker
import pandas as pd
from datetime import datetime, timedelta

class PortfolioTracker:
    """Track and plan portfolio projects."""

    def __init__(self):
        self.projects = pd.DataFrame({
            "project": [
                "Player Comparison Tool",
                "xG Model from Scratch",
                "Recruitment Dashboard",
                "Match Prediction Model",
                "Tactical Analysis Piece",
                "Pass Network Analysis",
                "Shot Map Generator",
                "Player Similarity Finder"
            ],
            "difficulty": [
                "Beginner", "Intermediate", "Intermediate", "Advanced",
                "Beginner", "Intermediate", "Beginner", "Advanced"
            ],
            "skills": [
                "Python, Visualization",
                "ML, Statistics, Feature Engineering",
                "SQL, Visualization, Domain Knowledge",
                "ML, Statistics, Model Evaluation",
                "Writing, Visualization, Tactics",
                "Network Analysis, Visualization",
                "Python, mplsoccer",
                "ML, Embeddings, Clustering"
            ],
            "hours_estimate": [10, 30, 25, 40, 8, 20, 6, 35],
            "impact": ["Medium", "High", "High", "High",
                      "Medium", "Medium", "Low", "High"]
        })

    def suggest_first_project(self):
        """Suggest best first project for beginners."""
        beginners = self.projects[self.projects["difficulty"] == "Beginner"]
        return beginners.sort_values("hours_estimate").iloc[0]

    def create_12_week_plan(self):
        """Create a 12-week portfolio building plan."""

        plan = [
            {"week": "1-2", "project": "Shot Map Generator",
             "goal": "Build automated shot maps with xG coloring"},
            {"week": "3-4", "project": "Tactical Analysis Piece",
             "goal": "Write and publish deep analysis article"},
            {"week": "5-7", "project": "Player Comparison Tool",
             "goal": "Interactive radar chart comparison tool"},
            {"week": "8-10", "project": "xG Model from Scratch",
             "goal": "Train, evaluate, and document xG model"},
            {"week": "11-12", "project": "Portfolio Website",
             "goal": "Consolidate all work into professional site"}
        ]

        return pd.DataFrame(plan)

    def generate_readme_template(self, project_name):
        """Generate README template for GitHub project."""

        template = f"""# {project_name}

## Overview
Brief description of what this project does and why it's useful.

## Data Sources
- Source 1: [Link]
- Source 2: [Link]

## Methodology
Explain your approach and any key decisions.

## Key Findings
Highlight 2-3 interesting insights from your analysis.

## Usage
```python
# Example code to run the project
python main.py --team "Arsenal" --season 2023
```

## Visualizations
Include sample outputs/visualizations.

## Future Improvements
- Improvement 1
- Improvement 2

## Contact
- Twitter: @yourhandle
- Email: your.email@example.com
"""
        return template

# Usage
tracker = PortfolioTracker()

print("Suggested First Project:")
print(tracker.suggest_first_project())

print("\n12-Week Portfolio Plan:")
print(tracker.create_12_week_plan().to_string(index=False))
# R: Portfolio project ideas generator
library(tidyverse)

# Portfolio project framework
portfolio_projects <- tribble(
  ~project_type, ~difficulty, ~skills_demonstrated, ~data_source, ~description,
  "Player Comparison Tool", "Beginner", "R/Python, Visualization", "FBref",
    "Build radar charts comparing players across multiple metrics",

  "xG Model from Scratch", "Intermediate", "ML, Statistics, Feature Engineering", "StatsBomb",
    "Train and evaluate your own expected goals model",

  "Recruitment Dashboard", "Intermediate", "SQL, Visualization, Domain Knowledge", "FBref/TM",
    "Interactive tool for filtering and comparing players for recruitment",

  "Match Prediction Model", "Advanced", "ML, Statistics, Model Evaluation", "Multiple",
    "Build and backtest a match outcome prediction system",

  "Tactical Analysis Piece", "Beginner", "Writing, Visualization, Tactics", "StatsBomb",
    "Deep dive analysis of a team tactics or player style",

  "Pass Network Analysis", "Intermediate", "Network Analysis, Visualization", "StatsBomb",
    "Analyze team passing patterns using graph theory",

  "Shot Map Generator", "Beginner", "R/Python, ggplot/mplsoccer", "Any xG source",
    "Automated shot map creation with xG coloring",

  "Player Similarity Finder", "Advanced", "ML, Embeddings, Clustering", "FBref",
    "Find similar players using machine learning techniques"
)

# Function to suggest projects based on current skill level
suggest_projects <- function(skill_level = "Beginner", focus_area = NULL) {
  projects <- portfolio_projects %>%
    filter(difficulty == skill_level)

  if (!is.null(focus_area)) {
    projects <- projects %>%
      filter(str_detect(skills_demonstrated, focus_area))
  }

  projects %>%
    select(project_type, skills_demonstrated, description)
}

# Get suggestions
suggest_projects("Intermediate", "ML")
Output
Suggested First Project:
project: Shot Map Generator
difficulty: Beginner
skills: Python, mplsoccer
hours_estimate: 6
impact: Low

12-Week Portfolio Plan:
  week                    project                                        goal
   1-2       Shot Map Generator    Build automated shot maps with xG coloring
   3-4  Tactical Analysis Piece     Write and publish deep analysis article
   5-7   Player Comparison Tool  Interactive radar chart comparison tool
  8-10    xG Model from Scratch  Train, evaluate, and document xG model
 11-12        Portfolio Website   Consolidate all work into professional site

Writing Effective Analysis

Written analysis is how you demonstrate both technical ability and communication skills. Follow this structure for compelling pieces:

  1. Hook: Start with an interesting question or finding
  2. Context: Why does this matter? Set the scene
  3. Methodology: Briefly explain your approach (don't overdo it)
  4. Findings: Present your analysis with clear visualizations
  5. So What: Explain the implications and actionable insights
  6. Limitations: Acknowledge what you can't conclude

Networking and Community

Football analytics has a vibrant community. Building genuine connections can open doors that job applications alone cannot.

networking
# Python: Networking strategy
import pandas as pd
from datetime import datetime, timedelta

class NetworkingStrategy:
    """Strategic approach to building analytics network."""

    def __init__(self):
        self.connections = []
        self.activities = []

        self.communities = {
            "slack": {
                "name": "Football Slices",
                "focus": "General football analytics discussion",
                "activity": "Daily engagement, share work"
            },
            "twitter": {
                "name": "Football Analytics Twitter",
                "focus": "Quick insights, visualizations",
                "activity": "Post 2-3x/week, engage daily"
            },
            "conferences": {
                "name": "OptaPro, StatsBomb, Sloan",
                "focus": "Professional networking",
                "activity": "Attend 1-2 per year"
            },
            "meetups": {
                "name": "Local analytics meetups",
                "focus": "In-person connections",
                "activity": "Monthly attendance"
            }
        }

    def log_activity(self, activity_type: str, platform: str,
                    description: str, connection: str = None):
        """Log networking activity."""
        self.activities.append({
            "date": datetime.now(),
            "type": activity_type,
            "platform": platform,
            "description": description,
            "connection": connection
        })

    def weekly_engagement_plan(self) -> dict:
        """Generate weekly networking plan."""
        return {
            "monday": [
                "Share interesting viz or insight on Twitter",
                "Engage with 3 analytics posts"
            ],
            "tuesday": [
                "Post question or discussion in Slack",
                "Comment on recent analysis article"
            ],
            "wednesday": [
                "DM one new connection with genuine question",
                "Share relevant paper or resource"
            ],
            "thursday": [
                "Write short thread on recent analysis",
                "Attend virtual meetup/seminar if available"
            ],
            "friday": [
                "Share portfolio project update",
                "Summarize week's learnings"
            ],
            "weekend": [
                "Watch match and live-tweet observations",
                "Plan next week's content"
            ]
        }

    def outreach_template(self, connection_name: str,
                          their_work: str, your_question: str) -> str:
        """Generate genuine outreach message."""
        return f"""Hi {connection_name},

I really enjoyed your recent work on {their_work}. The approach to
[specific detail] was particularly interesting.

I'm working on a similar problem and wondered: {your_question}

No pressure to respond - I know you're busy. Just wanted to share
my appreciation for your work.

Best,
[Your name]

P.S. Here's my recent analysis if you're curious: [link]"""

    def conference_prep(self, conference_name: str) -> list:
        """Prepare for analytics conference."""
        return [
            f"Research speakers and their recent work",
            f"Prepare 30-second intro of yourself and your work",
            f"Bring business cards or prepare digital contact sharing",
            f"List 5 specific people you want to meet",
            f"Prepare 2-3 thoughtful questions for panels",
            f"Plan social media engagement during event",
            f"Schedule follow-up messages within 48 hours"
        ]

# Mentorship framework
class MentorshipFinder:
    """Find and approach potential mentors."""

    def identify_potential_mentors(self, your_goals: list) -> list:
        """Identify mentors based on career goals."""

        mentor_types = {
            "club_analyst": [
                "Current club analysts 2-3 years ahead",
                "Former analysts now in senior roles",
                "Heads of analytics departments"
            ],
            "data_science": [
                "Sports data scientists at tech companies",
                "Academic researchers in sports analytics",
                "Data provider employees"
            ],
            "media": [
                "Analytics journalists at major outlets",
                "Podcasters covering analytics",
                "Visualization specialists"
            ]
        }

        suggestions = []
        for goal in your_goals:
            if goal in mentor_types:
                suggestions.extend(mentor_types[goal])

        return suggestions

    def approach_template(self) -> str:
        """Template for mentor outreach."""
        return """Subject: Quick Question - Aspiring Analyst

Hi [Name],

I've been following your work for [time period] and particularly
admire [specific work/insight]. Your path from [their background]
to [current role] resonates with where I'm trying to go.

I'm currently [your situation] and working on [your projects].
I'd love to ask you one question: [single, specific question].

I understand you're busy, so even a brief response would be
invaluable. Happy to share more context if helpful.

Thank you for inspiring the community with your work.

Best,
[Your name]

[Link to your best public work]"""

networking = NetworkingStrategy()
print("Weekly Engagement Plan:")
for day, tasks in networking.weekly_engagement_plan().items():
    print(f"\n{day.title()}:")
    for task in tasks:
        print(f"  - {task}")
# R: Community engagement tracking
library(tidyverse)

# Key communities and resources
analytics_communities <- tribble(
    ~community, ~platform, ~focus, ~engagement_tip,
    "Football Slices Slack", "Slack", "General analytics", "Share work, ask questions",
    "OptaPro Forum", "In-person/virtual", "Professional networking", "Submit papers, attend conference",
    "StatsBomb Conference", "Annual event", "Industry insights", "Network at breaks",
    "Twitter/X Analytics", "Social media", "Quick insights, viz", "Engage with analysts work",
    "r/FantasyPL", "Reddit", "FPL-focused", "Share models, help community",
    "Friends of Tracking", "YouTube/Academic", "Tracking data", "Watch seminars, contribute"
)

# Key people to follow (by area)
key_follows <- tribble(
    ~name, ~handle, ~area, ~reason,
    "Ted Knutson", "@mixedknuts", "Industry", "StatsBomb founder, hiring insights",
    "Tom Worville", "@Worville", "Club analytics", "Great visualizations, career advice",
    "Grace Robertson", "@GraceOnFootball", "Media analytics", "Athletic, presentation skills",
    "David Sumpter", "@Soccermatics", "Academic", "Research perspective, tutorials",
    "Jan Van Haaren", "@JanVanHaworst", "Academic/Club", "Research and practical insights"
)

# Engagement tracker
create_engagement_tracker <- function() {
    tibble(
        date = as.Date(character()),
        activity_type = character(),  # Post, Comment, DM, Event
        platform = character(),
        description = character(),
        connection_made = character()
    )
}
Output
Weekly Engagement Plan:

Monday:
  - Share interesting viz or insight on Twitter
  - Engage with 3 analytics posts

Tuesday:
  - Post question or discussion in Slack
  - Comment on recent analysis article

Wednesday:
  - DM one new connection with genuine question
  - Share relevant paper or resource

Thursday:
  - Write short thread on recent analysis
  - Attend virtual meetup/seminar if available

Friday:
  - Share portfolio project update
  - Summarize week's learnings

Weekend:
  - Watch match and live-tweet observations
  - Plan next week's content

Alternative Paths Into Analytics

There's no single route into football analytics. Understanding different paths helps you leverage your unique background.

Technical Path

Background: CS, statistics, data science

Advantage: Strong technical foundation

Gap to fill: Domain knowledge, football understanding

Strategy: Build football-specific projects, watch matches analytically, learn the language of the game

Football Path

Background: Coaching, playing, scouting

Advantage: Deep football knowledge

Gap to fill: Technical skills

Strategy: Learn Python/R basics, start with simple analysis, translate football knowledge to data questions

Media Path

Background: Journalism, content creation

Advantage: Communication skills

Gap to fill: Technical skills, methodology

Strategy: Add data literacy, collaborate with technical analysts, focus on storytelling with data

career_paths
# Python: Career path planner
import pandas as pd

class CareerPathPlanner:
    """Plan transition into football analytics."""

    def __init__(self, background: str):
        self.background = background
        self.paths = self._define_paths()

    def _define_paths(self) -> dict:
        return {
            "technical": {
                "strengths": ["Programming", "Statistics", "ML"],
                "gaps": ["Football knowledge", "Communication to non-tech"],
                "recommended_focus": [
                    "Watch 3+ matches per week analytically",
                    "Read tactical blogs (Spielverlagerung, etc.)",
                    "Build football-specific projects",
                    "Practice explaining concepts simply"
                ],
                "timeline": "6-12 months to job-ready"
            },
            "football": {
                "strengths": ["Domain knowledge", "Stakeholder understanding"],
                "gaps": ["Programming", "Statistics"],
                "recommended_focus": [
                    "Python basics (3 months minimum)",
                    "Statistics fundamentals",
                    "Start with video analysis tools",
                    "Translate coaching questions to data questions"
                ],
                "timeline": "12-18 months to job-ready"
            },
            "media": {
                "strengths": ["Communication", "Storytelling", "Audience sense"],
                "gaps": ["Technical depth", "Methodology rigor"],
                "recommended_focus": [
                    "Learn data literacy fundamentals",
                    "Collaborate with technical analysts",
                    "Focus on data visualization",
                    "Build hybrid skill set"
                ],
                "timeline": "6-12 months to job-ready"
            },
            "academic": {
                "strengths": ["Research methods", "Writing", "Deep analysis"],
                "gaps": ["Practical application", "Speed/deadlines"],
                "recommended_focus": [
                    "Build applied projects",
                    "Practice quick-turnaround analysis",
                    "Engage with industry practitioners",
                    "Translate research to actionable insights"
                ],
                "timeline": "3-6 months to job-ready"
            }
        }

    def get_personalized_plan(self) -> dict:
        """Get plan based on background."""
        if self.background in self.paths:
            return self.paths[self.background]
        return self.paths["technical"]  # Default

    def create_6_month_roadmap(self) -> list:
        """Create 6-month transition roadmap."""
        path = self.get_personalized_plan()

        return [
            {
                "month": "1-2",
                "focus": "Foundation",
                "tasks": [
                    "Complete basic Python/R course",
                    "Set up GitHub and start documenting learning",
                    "Begin watching matches with analytical focus",
                    "Join Football Slices Slack"
                ]
            },
            {
                "month": "3-4",
                "focus": "First Projects",
                "tasks": [
                    "Complete first portfolio project (shot map or simple viz)",
                    "Write first analysis blog post",
                    "Engage regularly on Twitter with analytics community",
                    "Learn StatsBomb open data"
                ]
            },
            {
                "month": "5-6",
                "focus": "Intermediate + Visibility",
                "tasks": [
                    "Complete intermediate project (xG model or player comparison)",
                    "Submit analysis to blog/publication",
                    "Start applying for entry-level roles or internships",
                    "Attend first analytics event (virtual or in-person)"
                ]
            }
        ]

    def identify_role_fit(self) -> list:
        """Suggest best-fit roles based on background."""
        role_suggestions = {
            "technical": [
                "Data Analyst at data provider",
                "ML Engineer at betting company",
                "Analytics developer at club"
            ],
            "football": [
                "Performance Analyst",
                "Video Analyst (with analytics focus)",
                "Scout with data component"
            ],
            "media": [
                "Analytics writer/journalist",
                "Content creator at data company",
                "Visualization specialist"
            ],
            "academic": [
                "Research role at federation",
                "R&D at data provider",
                "Consulting for clubs"
            ]
        }
        return role_suggestions.get(self.background, role_suggestions["technical"])

# Example usage
planner = CareerPathPlanner("technical")
print("Personalized Plan for Technical Background:")
plan = planner.get_personalized_plan()
print(f"\nStrengths: {plan['strengths']}")
print(f"Gaps to fill: {plan['gaps']}")
print(f"Timeline: {plan['timeline']}")

print("\n6-Month Roadmap:")
for phase in planner.create_6_month_roadmap():
    print(f"\nMonth {phase['month']}: {phase['focus']}")
    for task in phase["tasks"]:
        print(f"  - {task}")
# R: Career path analysis
library(tidyverse)

# Example career paths
career_paths <- tribble(
    ~person, ~start, ~transition, ~current, ~years_total,
    "Analyst A", "PhD Statistics", "Hobby projects + Twitter", "Head of Analytics", 6,
    "Analyst B", "Youth Coach", "UEFA B + self-taught Python", "First Team Analyst", 4,
    "Analyst C", "Software Engineer", "Side projects + networking", "Data Provider", 3,
    "Analyst D", "Sports Journalist", "Learn R + public analysis", "Analytics Writer", 2,
    "Analyst E", "Economics Graduate", "Masters + internship", "Club Data Analyst", 3,
    "Analyst F", "No degree", "Blog + Twitter + persistence", "Freelance Analyst", 5
)

# Analyze common factors in successful transitions
success_factors <- tribble(
    ~factor, ~importance, ~time_investment,
    "Public work portfolio", "Essential", "6-12 months",
    "Technical skills (Python/R)", "Essential", "3-6 months minimum",
    "Network in analytics community", "Very High", "Ongoing",
    "Football domain knowledge", "Essential", "Background + ongoing",
    "Persistence and patience", "Essential", "1-3 years typical",
    "Formal education", "Helpful but not required", "Variable"
)
Output
Personalized Plan for Technical Background:

Strengths: ['Programming', 'Statistics', 'ML']
Gaps to fill: ['Football knowledge', 'Communication to non-tech']
Timeline: 6-12 months to job-ready

6-Month Roadmap:

Month 1-2: Foundation
  - Complete basic Python/R course
  - Set up GitHub and start documenting learning
  - Begin watching matches with analytical focus
  - Join Football Slices Slack

Month 3-4: First Projects
  - Complete first portfolio project (shot map or simple viz)
  - Write first analysis blog post
  - Engage regularly on Twitter with analytics community
  - Learn StatsBomb open data

Month 5-6: Intermediate + Visibility
  - Complete intermediate project (xG model or player comparison)
  - Submit analysis to blog/publication
  - Start applying for entry-level roles or internships
  - Attend first analytics event (virtual or in-person)

Technical Interview Challenges

Many football analytics interviews include technical challenges. Here are common types and how to prepare for them.

interview_challenges
# Python: Interview challenge examples
import pandas as pd
import numpy as np

class InterviewChallenges:
    """Common football analytics interview challenges."""

    @staticmethod
    def challenge_1_pass_zones(events: pd.DataFrame) -> pd.DataFrame:
        """
        Challenge: Calculate pass completion by pitch zone.

        Input: Event data with columns (type, x, outcome)
        Output: Summary by defensive/middle/final third
        """
        passes = events[events["type"] == "Pass"].copy()

        passes["zone"] = pd.cut(
            passes["x"],
            bins=[0, 33, 66, 100],
            labels=["Defensive Third", "Middle Third", "Final Third"]
        )

        summary = passes.groupby("zone").agg(
            total_passes=("type", "count"),
            completed=("outcome", lambda x: (x == "Complete").sum())
        ).reset_index()

        summary["completion_pct"] = (
            summary["completed"] / summary["total_passes"] * 100
        ).round(1)

        return summary

    @staticmethod
    def challenge_2_xg_calculation(shots: pd.DataFrame) -> pd.DataFrame:
        """
        Challenge: Build simple xG model and calculate team totals.

        Demonstrates: Feature engineering, model thinking
        """
        from math import sqrt, atan2, pi

        # Calculate shot features
        goal_x, goal_y = 100, 50

        shots["distance"] = np.sqrt(
            (goal_x - shots["x"])**2 + (goal_y - shots["y"])**2
        )
        shots["angle"] = np.abs(np.arctan2(
            goal_y - shots["y"],
            goal_x - shots["x"]
        ))

        # Simple xG model (production would use trained model)
        shots["xg"] = np.maximum(0,
            0.5 - shots["distance"] * 0.015 -
            shots["angle"] * 0.1 +
            0.3 * (shots["body_part"] == "Head").astype(int) * -0.1
        )

        # Aggregate by team
        team_xg = shots.groupby("team").agg(
            shots=("type", "count"),
            goals=("outcome", lambda x: (x == "Goal").sum()),
            total_xg=("xg", "sum")
        ).reset_index()

        team_xg["xg_difference"] = team_xg["goals"] - team_xg["total_xg"]

        return team_xg

    @staticmethod
    def challenge_3_sql_query() -> str:
        """
        Challenge: Write SQL for top progressive passers.

        Tests: SQL proficiency, metric understanding
        """
        return """
        WITH player_minutes AS (
            SELECT player_id, SUM(minutes_played) as total_minutes
            FROM minutes
            GROUP BY player_id
            HAVING SUM(minutes_played) >= 900
        )
        SELECT
            p.player_name,
            t.team_name,
            COUNT(*) as progressive_passes,
            ROUND(COUNT(*) * 90.0 / pm.total_minutes, 2) as prog_pass_p90
        FROM passes pa
        JOIN players p ON pa.player_id = p.player_id
        JOIN teams t ON pa.team_id = t.team_id
        JOIN player_minutes pm ON pa.player_id = pm.player_id
        WHERE pa.end_x - pa.start_x >= 10
          AND pa.start_x < 80  -- Doesn't start too close to goal
          AND pa.outcome = 'Complete'
        GROUP BY p.player_id, p.player_name, t.team_name, pm.total_minutes
        ORDER BY prog_pass_p90 DESC
        LIMIT 20;
        """

    @staticmethod
    def challenge_4_explain_concept() -> dict:
        """
        Challenge: Explain xG to a skeptical coach.

        Tests: Communication, stakeholder management
        """
        return {
            "opening": """
                I understand the concern about reducing football to numbers.
                xG isn't meant to replace what you see - it's meant to add
                context to help make decisions.
            """,

            "analogy": """
                Think of it like this: if a striker misses a chance from
                6 yards out, you know that's a bad miss. xG just puts a
                number on HOW bad - that was probably a 0.75 xG chance,
                meaning historically 75% of similar shots go in.
            """,

            "practical_use": """
                Where I've seen it help coaches:
                1. Identifying if finishing is sustainable (high goals, low xG)
                2. Seeing if the team creates enough quality chances
                3. Comparing attackers beyond just goal counts
            """,

            "limitations": """
                What xG doesn't tell us:
                - Whether the finish was good or bad execution
                - Defender positioning (in basic models)
                - Game context and pressure

                That's where your expertise fills the gap.
            """,

            "close": """
                I'd love to show you some examples from recent matches
                where xG helped identify something that wasn't obvious
                from the result. What match should we look at?
            """
        }

    @staticmethod
    def challenge_5_visualization(events: pd.DataFrame):
        """
        Challenge: Create an insightful visualization.

        Tests: Data viz skills, football understanding
        """
        import matplotlib.pyplot as plt
        from mplsoccer import Pitch

        shots = events[events["type"] == "Shot"]

        pitch = Pitch(pitch_type="statsbomb")
        fig, ax = pitch.draw(figsize=(12, 8))

        # Plot shots colored by outcome
        goals = shots[shots["outcome"] == "Goal"]
        misses = shots[shots["outcome"] != "Goal"]

        pitch.scatter(
            misses["x"], misses["y"],
            s=misses["xg"] * 500,
            c="gray", alpha=0.5,
            edgecolors="black",
            ax=ax, label="No Goal"
        )

        pitch.scatter(
            goals["x"], goals["y"],
            s=goals["xg"] * 500,
            c="red", alpha=0.8,
            edgecolors="black",
            ax=ax, label="Goal"
        )

        ax.legend()
        ax.set_title("Shot Map with xG (size = expected goals)")

        return fig

# Practice challenges
challenges = InterviewChallenges()

# Example: Explain xG structure
print("Challenge 4 - Explain xG to Coach:")
explanation = challenges.challenge_4_explain_concept()
for section, content in explanation.items():
    print(f"\n{section.upper()}:")
    print(content.strip())
# R: Common interview challenges
library(tidyverse)

# Challenge 1: Data manipulation task
# "Given match event data, calculate pass completion by zone"

solve_pass_zones <- function(events) {
    events %>%
        filter(type == "Pass") %>%
        mutate(
            zone = case_when(
                x < 33 ~ "Defensive Third",
                x < 66 ~ "Middle Third",
                TRUE ~ "Final Third"
            )
        ) %>%
        group_by(zone) %>%
        summarise(
            total_passes = n(),
            completed = sum(outcome == "Complete"),
            completion_pct = completed / total_passes * 100,
            .groups = "drop"
        )
}

# Challenge 2: SQL query task
# "Write a query to find players with most progressive passes"
progressive_pass_query <- "
SELECT
    p.player_name,
    t.team_name,
    COUNT(*) as progressive_passes,
    ROUND(COUNT(*) * 90.0 / SUM(m.minutes_played), 2) as prog_per_90
FROM passes pa
JOIN players p ON pa.player_id = p.player_id
JOIN teams t ON pa.team_id = t.team_id
JOIN minutes m ON pa.player_id = m.player_id AND pa.match_id = m.match_id
WHERE pa.end_x - pa.start_x >= 10  -- Moves ball 10+ yards forward
  AND pa.end_x >= 60  -- Ends in opposition half
  AND pa.outcome = 'Complete'
GROUP BY p.player_id, p.player_name, t.team_name
HAVING SUM(m.minutes_played) >= 900  -- Minimum 10 full matches
ORDER BY prog_per_90 DESC
LIMIT 20
"

# Challenge 3: Explain methodology
explain_xg <- function() {
    "
    xG (Expected Goals) quantifies shot quality by estimating
    the probability of a shot resulting in a goal.

    Key features:
    - Shot location (distance and angle to goal)
    - Body part (foot vs header)
    - Type of assist (through ball, cross, etc.)
    - Game state (open play vs set piece)

    Model training:
    - Historical shots labeled with outcome (goal/no goal)
    - Logistic regression or gradient boosting
    - Validated on held-out data

    Limitations:
    - Doesn't capture shot execution quality
    - Pre-shot movement not included in basic models
    - Small sample sizes at individual level

    Use cases:
    - Evaluating finishing (goals vs xG)
    - Assessing chance creation (team xG created)
    - Transfer valuation (underlying attacking quality)
    "
}
Output
Challenge 4 - Explain xG to Coach:

OPENING:
I understand the concern about reducing football to numbers.
xG isn't meant to replace what you see - it's meant to add
context to help make decisions.

ANALOGY:
Think of it like this: if a striker misses a chance from
6 yards out, you know that's a bad miss. xG just puts a
number on HOW bad - that was probably a 0.75 xG chance,
meaning historically 75% of similar shots go in.

PRACTICAL_USE:
Where I've seen it help coaches:
1. Identifying if finishing is sustainable (high goals, low xG)
2. Seeing if the team creates enough quality chances
3. Comparing attackers beyond just goal counts

Career Progression and Negotiation

Understanding career progression helps you plan long-term and negotiate effectively.

Level Typical Title Experience Responsibilities Skills Focus
Entry Junior Analyst / Intern 0-2 years Data prep, visualizations, ad-hoc requests Technical execution, learning systems
Mid Data Analyst 2-4 years Independent analysis, stakeholder presentations Project ownership, communication
Senior Senior Analyst 4-6 years Complex projects, methodology development Strategic thinking, mentoring
Lead Lead / Principal Analyst 6-8 years Team leadership, cross-department influence Leadership, organizational impact
Executive Head of Analytics / Director 8+ years Department strategy, budget, executive influence Vision, business acumen, politics
career_progression
# Python: Negotiation and career tools
import pandas as pd
from datetime import datetime

class CareerPlanner:
    """Plan and track career progression."""

    def __init__(self):
        self.salary_data = {
            "Junior Analyst": (25000, 35000),
            "Data Analyst": (35000, 50000),
            "Senior Analyst": (50000, 70000),
            "Lead Analyst": (65000, 90000),
            "Head of Analytics": (80000, 150000)
        }

        self.location_multipliers = {
            "London": 1.20,
            "Manchester": 1.00,
            "Liverpool": 0.95,
            "Europe (top league city)": 1.10,
            "USA (major market)": 1.50,
            "Remote": 0.90
        }

    def estimate_salary(self, role: str, location: str) -> dict:
        """Estimate salary range for role and location."""
        base = self.salary_data.get(role, (30000, 45000))
        multiplier = self.location_multipliers.get(location, 1.0)

        return {
            "role": role,
            "location": location,
            "min_salary": int(base[0] * multiplier),
            "max_salary": int(base[1] * multiplier),
            "negotiation_target": int((base[0] + base[1]) / 2 * multiplier * 1.1)
        }

    def negotiation_prep(self, current_offer: int, role: str,
                        location: str) -> dict:
        """Prepare for salary negotiation."""
        estimate = self.estimate_salary(role, location)

        return {
            "offer": current_offer,
            "market_range": (estimate["min_salary"], estimate["max_salary"]),
            "offer_position": self._offer_position(current_offer, estimate),
            "counter_suggestion": int(current_offer * 1.10),  # 10% counter
            "talking_points": [
                "Market research shows similar roles pay £X-£Y",
                "My specific skills in [skill] add value because...",
                "I've demonstrated impact through [portfolio work]",
                "I'm committed long-term and want to grow with the org"
            ],
            "non_salary_items": [
                "Conference attendance budget",
                "Professional development time",
                "Remote work flexibility",
                "Performance bonus structure",
                "Equipment/software budget"
            ]
        }

    def _offer_position(self, offer: int, estimate: dict) -> str:
        """Determine where offer falls in range."""
        min_s, max_s = estimate["min_salary"], estimate["max_salary"]
        if offer < min_s:
            return "Below market - strong case for counter"
        elif offer < (min_s + max_s) / 2:
            return "Lower half of range - room to negotiate"
        elif offer < max_s:
            return "Above average - negotiate smaller items"
        else:
            return "Top of range - focus on non-salary"

    def promotion_readiness(self, current_level: str) -> dict:
        """Assess readiness for next level."""
        requirements = {
            "Junior Analyst": {
                "next_level": "Data Analyst",
                "requirements": [
                    "2+ years experience",
                    "3+ completed independent projects",
                    "Positive stakeholder feedback",
                    "Basic mentoring of interns"
                ],
                "evidence_needed": [
                    "Portfolio of work",
                    "Written feedback from stakeholders",
                    "Examples of impact on decisions"
                ]
            },
            "Data Analyst": {
                "next_level": "Senior Analyst",
                "requirements": [
                    "4+ years experience",
                    "Led major project end-to-end",
                    "Developed new methodology or tool",
                    "Cross-team collaboration"
                ],
                "evidence_needed": [
                    "Case study of major project impact",
                    "Documentation of methodology contribution",
                    "Peer recognition"
                ]
            },
            "Senior Analyst": {
                "next_level": "Lead Analyst",
                "requirements": [
                    "6+ years experience",
                    "Managed/mentored junior analysts",
                    "Influenced department strategy",
                    "External visibility (conferences, papers)"
                ],
                "evidence_needed": [
                    "Team development examples",
                    "Strategic recommendations adopted",
                    "Industry recognition"
                ]
            }
        }

        return requirements.get(current_level, {})

# Usage example
planner = CareerPlanner()

# Salary estimate
estimate = planner.estimate_salary("Data Analyst", "London")
print("Salary Estimate for Data Analyst in London:")
print(f"  Range: £{estimate['min_salary']:,} - £{estimate['max_salary']:,}")
print(f"  Negotiation target: £{estimate['negotiation_target']:,}")

# Negotiation prep
prep = planner.negotiation_prep(40000, "Data Analyst", "London")
print(f"\nOffer position: {prep['offer_position']}")
print(f"Suggested counter: £{prep['counter_suggestion']:,}")
# R: Career planning framework
library(tidyverse)

# Define career goals and track progress
career_tracker <- tribble(
    ~goal_type, ~goal, ~target_date, ~status, ~notes,
    "Skill", "Advanced SQL", "2024-03", "Complete", "Passed Mode Analytics cert",
    "Portfolio", "xG Model", "2024-04", "In Progress", "70% complete",
    "Network", "Attend OptaPro", "2024-09", "Planned", "Submitted abstract",
    "Application", "Apply to 10 clubs", "2024-06", "In Progress", "4/10 sent",
    "Interview", "Practice mock interviews", "2024-05", "Not Started", ""
)

# Salary research framework
research_salary <- function(role, location, experience_years) {
    # Base ranges (GBP, adjust for location)
    base_ranges <- tribble(
        ~role, ~min_salary, ~max_salary,
        "Junior Analyst", 25000, 35000,
        "Data Analyst", 35000, 50000,
        "Senior Analyst", 50000, 70000,
        "Lead Analyst", 65000, 90000,
        "Head of Analytics", 80000, 150000
    )

    # Location adjustments
    location_multiplier <- case_when(
        location == "London" ~ 1.2,
        location == "Manchester" ~ 1.0,
        location == "Europe (big city)" ~ 1.1,
        location == "USA" ~ 1.5,
        TRUE ~ 1.0
    )

    base_ranges %>%
        filter(role == !!role) %>%
        mutate(
            adjusted_min = min_salary * location_multiplier,
            adjusted_max = max_salary * location_multiplier
        )
}
Output
Salary Estimate for Data Analyst in London:
  Range: £42,000 - £60,000
  Negotiation target: £56,100

Offer position: Below market - strong case for counter
Suggested counter: £44,000

Day-to-Day Reality

Understanding what analysts actually do helps set realistic expectations and prepare for the role.

Typical Week (Club Analyst)
  • Monday: Post-match analysis, player load data
  • Tuesday: Opposition report preparation
  • Wednesday: Training data collection
  • Thursday: Present opposition analysis to staff
  • Friday: Final prep, set piece analysis
  • Saturday: Match day - live data support
  • Sunday: Initial match review
Common Challenges
  • Tight deadlines around match schedules
  • Communicating with non-technical stakeholders
  • Data quality and availability issues
  • Balancing quick requests vs deep projects
  • Getting buy-in from traditional staff
  • Working weekends and irregular hours

Practice Exercises

Exercise 41.1: Skills Audit

Complete a thorough self-assessment of your technical, domain, and soft skills. Identify your top 3 gaps and create a 90-day learning plan to address them.

Exercise 41.2: Portfolio Project

Complete one portfolio project from scratch. Publish the code on GitHub with documentation and write an accompanying blog post explaining your methodology and findings.

Exercise 41.3: Mock Interview

Practice answering 5 technical and 3 behavioral interview questions out loud. Record yourself and review for clarity and confidence.

Exercise 41.4: Network Building

Identify 10 football analytics professionals to follow on Twitter. Engage meaningfully with their content for one month. Attend one virtual or in-person analytics event.

Summary

Building a career in football analytics requires patience, consistent skill development, and visibility through public work. In the next chapter, we'll explore analytics applications for broadcasters and media.