Capstone - Complete Analytics System
Working in Football Analytics
The football analytics industry has grown from a handful of pioneers to a global profession. This chapter provides practical guidance on building a career in football analytics, from developing essential skills to landing your first role.
Learning Objectives
- Understand the football analytics job market and roles
- Develop the technical and soft skills required
- Build an impressive portfolio of public work
- Navigate the job application and interview process
- Learn from successful career paths in the industry
- Understand the day-to-day reality of analytics roles
The Analytics Job Landscape
Football analytics roles exist across clubs, federations, media companies, betting firms, and data providers. Understanding where opportunities exist helps you target your career development.
- Data Analyst: Day-to-day analysis support
- Performance Analyst: Video and tactical analysis
- Recruitment Analyst: Player scouting support
- First Team Analyst: Opposition and match prep
- Head of Analytics: Department leadership
- Data Providers: StatsBomb, Opta, Wyscout
- Media: The Athletic, ESPN, broadcasters
- Betting: Odds compilation, trading
- Agencies: Player representation analytics
- Federations: National team support
# Python: Analyze job market trends
import pandas as pd
import matplotlib.pyplot as plt
def analyze_job_market():
"""Analyze skill requirements in football analytics jobs."""
# Example job requirements frequency from postings
job_requirements = pd.DataFrame({
"skill": ["Python", "R", "SQL", "Data Visualization",
"Statistics", "Machine Learning", "Football Knowledge",
"Communication", "Video Analysis", "Tableau/Power BI"],
"frequency": [0.85, 0.45, 0.78, 0.72, 0.68, 0.42, 0.95, 0.88, 0.35, 0.52]
})
# Sort by frequency
job_requirements = job_requirements.sort_values("frequency", ascending=True)
# Visualize
fig, ax = plt.subplots(figsize=(10, 6))
colors = plt.cm.Greens(job_requirements["frequency"])
ax.barh(job_requirements["skill"], job_requirements["frequency"], color=colors)
ax.set_xlabel("Frequency in Job Postings")
ax.set_title("Skills Required in Football Analytics Jobs")
ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f"{x:.0%}"))
plt.tight_layout()
plt.show()
return job_requirements
# Salary ranges by role (approximate)
salary_data = pd.DataFrame({
"role": ["Junior Analyst", "Data Analyst", "Senior Analyst",
"Lead Analyst", "Head of Analytics"],
"min_salary": [25000, 35000, 50000, 65000, 80000],
"max_salary": [35000, 50000, 70000, 90000, 150000],
"experience_years": ["0-2", "2-4", "4-6", "6-8", "8+"]
})
print("Approximate Salary Ranges (GBP):")
print(salary_data.to_string(index=False))# R: Analyze job market trends
library(tidyverse)
library(rvest)
# Scrape football analytics job postings (conceptual)
analyze_job_market <- function() {
# Example job requirements frequency
job_requirements <- tribble(
~skill, ~frequency,
"Python", 0.85,
"R", 0.45,
"SQL", 0.78,
"Data Visualization", 0.72,
"Statistics", 0.68,
"Machine Learning", 0.42,
"Football Knowledge", 0.95,
"Communication", 0.88,
"Video Analysis", 0.35,
"Tableau/Power BI", 0.52
)
# Visualize requirements
job_requirements %>%
mutate(skill = fct_reorder(skill, frequency)) %>%
ggplot(aes(x = skill, y = frequency, fill = frequency)) +
geom_col() +
coord_flip() +
scale_y_continuous(labels = scales::percent) +
scale_fill_gradient(low = "#2E7D32", high = "#1B5E20") +
labs(title = "Skills Required in Football Analytics Jobs",
x = NULL, y = "Frequency in Job Postings") +
theme_minimal() +
theme(legend.position = "none")
}
analyze_job_market()Approximate Salary Ranges (GBP):
role min_salary max_salary experience_years
Junior Analyst 25000 35000 0-2
Data Analyst 35000 50000 2-4
Senior Analyst 50000 70000 4-6
Lead Analyst 65000 90000 6-8
Head of Analytics 80000 150000 8+Essential Skills
Success in football analytics requires a blend of technical expertise, football knowledge, and soft skills. Here's a comprehensive breakdown of what you need to develop.
| Category | Skill | Importance | How to Develop |
|---|---|---|---|
| Technical | Python or R | Essential | Online courses, personal projects |
| SQL | Essential | Practice with football databases | |
| Statistics | Essential | Coursera/edX courses, apply to football | |
| Data Visualization | High | Create football visualizations regularly | |
| Domain | Football Knowledge | Essential | Watch matches analytically, read tactics blogs |
| Metrics Understanding | Essential | Study xG, xA, and advanced metrics | |
| Data Sources | High | Work with StatsBomb, FBref, Wyscout | |
| Soft Skills | Communication | Essential | Write articles, present findings |
| Storytelling | High | Practice explaining complex concepts simply | |
| Collaboration | High | Contribute to open source, join communities |
# Python: Skills assessment and learning path
import pandas as pd
import matplotlib.pyplot as plt
class SkillsAssessment:
"""Framework for assessing and developing analytics skills."""
def __init__(self):
self.skills = pd.DataFrame({
"category": ["Technical"] * 5 + ["Domain"] * 3 + ["Soft"] * 3,
"skill": [
"Python/R Programming", "SQL Databases", "Statistics",
"Machine Learning", "Data Visualization",
"Football Tactics", "Advanced Metrics", "Data Sources",
"Communication", "Presentation", "Project Management"
],
"current_level": [3, 2, 3, 2, 4, 4, 3, 3, 3, 2, 2],
"target_level": [5, 4, 4, 3, 5, 5, 5, 4, 5, 4, 3]
})
self.skills["gap"] = self.skills["target_level"] - self.skills["current_level"]
def get_priority_skills(self):
"""Identify highest priority skills to develop."""
return self.skills.nlargest(5, "gap")[["skill", "gap", "category"]]
def create_learning_path(self):
"""Generate personalized learning recommendations."""
recommendations = {
"Python/R Programming": [
"Complete 'Python for Data Analysis' by Wes McKinney",
"Work through StatsBomb tutorials",
"Build 3 personal football analytics projects"
],
"SQL Databases": [
"Complete Mode Analytics SQL tutorial",
"Set up local PostgreSQL with football data",
"Practice complex queries on FBref data"
],
"Advanced Metrics": [
"Read 'The Expected Goals Philosophy'",
"Implement xG model from scratch",
"Study StatsBomb methodology documentation"
],
"Communication": [
"Start a football analytics blog/Twitter",
"Write monthly analysis pieces",
"Present at local football analytics meetup"
]
}
priority_skills = self.get_priority_skills()["skill"].tolist()
learning_path = []
for skill in priority_skills:
if skill in recommendations:
learning_path.append({
"skill": skill,
"actions": recommendations[skill]
})
return learning_path
def visualize_assessment(self):
"""Create radar chart of current skills."""
from math import pi
categories = self.skills["skill"].tolist()
current = self.skills["current_level"].tolist()
target = self.skills["target_level"].tolist()
# Create radar chart
angles = [n / float(len(categories)) * 2 * pi for n in range(len(categories))]
angles += angles[:1]
current += current[:1]
target += target[:1]
fig, ax = plt.subplots(figsize=(10, 10), subplot_kw=dict(polar=True))
ax.plot(angles, current, "o-", linewidth=2, label="Current")
ax.fill(angles, current, alpha=0.25)
ax.plot(angles, target, "o-", linewidth=2, label="Target")
ax.fill(angles, target, alpha=0.25)
ax.set_xticks(angles[:-1])
ax.set_xticklabels(categories, size=8)
ax.set_ylim(0, 5)
ax.legend(loc="upper right")
ax.set_title("Skills Assessment")
plt.tight_layout()
plt.show()
# Usage
assessment = SkillsAssessment()
print("Priority Skills to Develop:")
print(assessment.get_priority_skills())
print("\nLearning Path:")
for item in assessment.create_learning_path():
print(f"\n{item['skill']}:")
for action in item["actions"]:
print(f" - {action}")# R: Skills assessment framework
library(tidyverse)
# Self-assessment framework
assess_skills <- function() {
# Rate yourself 1-5 on each skill
skills_assessment <- tribble(
~category, ~skill, ~current_level, ~target_level,
"Technical", "Python/R Programming", 3, 5,
"Technical", "SQL Databases", 2, 4,
"Technical", "Statistics", 3, 4,
"Technical", "Machine Learning", 2, 3,
"Technical", "Data Visualization", 4, 5,
"Domain", "Football Tactics", 4, 5,
"Domain", "Advanced Metrics (xG, etc)", 3, 5,
"Domain", "Data Sources", 3, 4,
"Soft", "Communication", 3, 5,
"Soft", "Presentation", 2, 4,
"Soft", "Project Management", 2, 3
)
# Calculate gaps
skills_assessment <- skills_assessment %>%
mutate(
gap = target_level - current_level,
priority = case_when(
gap >= 2 & category == "Technical" ~ "High",
gap >= 2 ~ "Medium",
gap == 1 ~ "Low",
TRUE ~ "Achieved"
)
)
# Visualize gaps
skills_assessment %>%
ggplot(aes(x = reorder(skill, gap), y = gap, fill = category)) +
geom_col() +
coord_flip() +
facet_wrap(~priority, scales = "free_y") +
labs(title = "Skills Gap Analysis",
x = NULL, y = "Gap (Target - Current)") +
theme_minimal()
}Priority Skills to Develop:
skill gap category
Advanced Metrics 2 Domain
Communication 2 Soft
Python/R Prog 2 Technical
SQL Databases 2 Technical
Presentation 2 SoftBuilding Your Portfolio
A strong portfolio demonstrates your abilities better than any resume. Public work shows potential employers exactly what you can do and how you think about football.
Portfolio Essentials
- GitHub: Code repositories showing your technical skills
- Blog/Substack: Written analysis demonstrating communication
- Twitter/X: Quick insights and visualizations
- Tableau Public: Interactive dashboards
- Kaggle: Competition entries and notebooks
# Python: Portfolio project tracker
import pandas as pd
from datetime import datetime, timedelta
class PortfolioTracker:
"""Track and plan portfolio projects."""
def __init__(self):
self.projects = pd.DataFrame({
"project": [
"Player Comparison Tool",
"xG Model from Scratch",
"Recruitment Dashboard",
"Match Prediction Model",
"Tactical Analysis Piece",
"Pass Network Analysis",
"Shot Map Generator",
"Player Similarity Finder"
],
"difficulty": [
"Beginner", "Intermediate", "Intermediate", "Advanced",
"Beginner", "Intermediate", "Beginner", "Advanced"
],
"skills": [
"Python, Visualization",
"ML, Statistics, Feature Engineering",
"SQL, Visualization, Domain Knowledge",
"ML, Statistics, Model Evaluation",
"Writing, Visualization, Tactics",
"Network Analysis, Visualization",
"Python, mplsoccer",
"ML, Embeddings, Clustering"
],
"hours_estimate": [10, 30, 25, 40, 8, 20, 6, 35],
"impact": ["Medium", "High", "High", "High",
"Medium", "Medium", "Low", "High"]
})
def suggest_first_project(self):
"""Suggest best first project for beginners."""
beginners = self.projects[self.projects["difficulty"] == "Beginner"]
return beginners.sort_values("hours_estimate").iloc[0]
def create_12_week_plan(self):
"""Create a 12-week portfolio building plan."""
plan = [
{"week": "1-2", "project": "Shot Map Generator",
"goal": "Build automated shot maps with xG coloring"},
{"week": "3-4", "project": "Tactical Analysis Piece",
"goal": "Write and publish deep analysis article"},
{"week": "5-7", "project": "Player Comparison Tool",
"goal": "Interactive radar chart comparison tool"},
{"week": "8-10", "project": "xG Model from Scratch",
"goal": "Train, evaluate, and document xG model"},
{"week": "11-12", "project": "Portfolio Website",
"goal": "Consolidate all work into professional site"}
]
return pd.DataFrame(plan)
def generate_readme_template(self, project_name):
"""Generate README template for GitHub project."""
template = f"""# {project_name}
## Overview
Brief description of what this project does and why it's useful.
## Data Sources
- Source 1: [Link]
- Source 2: [Link]
## Methodology
Explain your approach and any key decisions.
## Key Findings
Highlight 2-3 interesting insights from your analysis.
## Usage
```python
# Example code to run the project
python main.py --team "Arsenal" --season 2023
```
## Visualizations
Include sample outputs/visualizations.
## Future Improvements
- Improvement 1
- Improvement 2
## Contact
- Twitter: @yourhandle
- Email: your.email@example.com
"""
return template
# Usage
tracker = PortfolioTracker()
print("Suggested First Project:")
print(tracker.suggest_first_project())
print("\n12-Week Portfolio Plan:")
print(tracker.create_12_week_plan().to_string(index=False))# R: Portfolio project ideas generator
library(tidyverse)
# Portfolio project framework
portfolio_projects <- tribble(
~project_type, ~difficulty, ~skills_demonstrated, ~data_source, ~description,
"Player Comparison Tool", "Beginner", "R/Python, Visualization", "FBref",
"Build radar charts comparing players across multiple metrics",
"xG Model from Scratch", "Intermediate", "ML, Statistics, Feature Engineering", "StatsBomb",
"Train and evaluate your own expected goals model",
"Recruitment Dashboard", "Intermediate", "SQL, Visualization, Domain Knowledge", "FBref/TM",
"Interactive tool for filtering and comparing players for recruitment",
"Match Prediction Model", "Advanced", "ML, Statistics, Model Evaluation", "Multiple",
"Build and backtest a match outcome prediction system",
"Tactical Analysis Piece", "Beginner", "Writing, Visualization, Tactics", "StatsBomb",
"Deep dive analysis of a team tactics or player style",
"Pass Network Analysis", "Intermediate", "Network Analysis, Visualization", "StatsBomb",
"Analyze team passing patterns using graph theory",
"Shot Map Generator", "Beginner", "R/Python, ggplot/mplsoccer", "Any xG source",
"Automated shot map creation with xG coloring",
"Player Similarity Finder", "Advanced", "ML, Embeddings, Clustering", "FBref",
"Find similar players using machine learning techniques"
)
# Function to suggest projects based on current skill level
suggest_projects <- function(skill_level = "Beginner", focus_area = NULL) {
projects <- portfolio_projects %>%
filter(difficulty == skill_level)
if (!is.null(focus_area)) {
projects <- projects %>%
filter(str_detect(skills_demonstrated, focus_area))
}
projects %>%
select(project_type, skills_demonstrated, description)
}
# Get suggestions
suggest_projects("Intermediate", "ML")Suggested First Project:
project: Shot Map Generator
difficulty: Beginner
skills: Python, mplsoccer
hours_estimate: 6
impact: Low
12-Week Portfolio Plan:
week project goal
1-2 Shot Map Generator Build automated shot maps with xG coloring
3-4 Tactical Analysis Piece Write and publish deep analysis article
5-7 Player Comparison Tool Interactive radar chart comparison tool
8-10 xG Model from Scratch Train, evaluate, and document xG model
11-12 Portfolio Website Consolidate all work into professional siteWriting Effective Analysis
Written analysis is how you demonstrate both technical ability and communication skills. Follow this structure for compelling pieces:
- Hook: Start with an interesting question or finding
- Context: Why does this matter? Set the scene
- Methodology: Briefly explain your approach (don't overdo it)
- Findings: Present your analysis with clear visualizations
- So What: Explain the implications and actionable insights
- Limitations: Acknowledge what you can't conclude
The Job Search Process
Finding football analytics roles requires a targeted approach. Most positions aren't advertised publicly, making networking and visibility crucial.
# Python: Job search and interview preparation
import pandas as pd
from datetime import datetime, timedelta
class JobSearchManager:
"""Manage your football analytics job search."""
def __init__(self):
self.applications = []
self.interview_prep = self._load_interview_prep()
def add_application(self, company, role, source, contact=None):
"""Track a new job application."""
self.applications.append({
"company": company,
"role": role,
"date_applied": datetime.now(),
"source": source,
"status": "Applied",
"contact": contact,
"follow_up_date": datetime.now() + timedelta(days=7)
})
def get_pending_follow_ups(self):
"""Get applications needing follow-up."""
today = datetime.now()
return [app for app in self.applications
if app["follow_up_date"] <= today
and app["status"] == "Applied"]
def _load_interview_prep(self):
"""Load interview preparation questions."""
return {
"technical": [
{
"question": "Explain how xG is calculated and its limitations",
"key_points": [
"Historical conversion rates for shot locations",
"Features: distance, angle, body part, situation",
"Limitations: doesn't capture shot quality, defender positions"
]
},
{
"question": "How would you identify underperforming players?",
"key_points": [
"Compare actual output to expected (Goals vs xG)",
"Consider sample size and regression to mean",
"Look at underlying process metrics not just outcomes"
]
},
{
"question": "Write SQL to find top progressive passers",
"key_points": [
"Define progressive pass (moves ball X yards toward goal)",
"Normalize per 90 minutes",
"Filter for minimum minutes played"
]
}
],
"football": [
{
"question": "How would you analyze a high pressing team?",
"key_points": [
"PPDA (Passes Per Defensive Action)",
"High turnovers and counter-pressing success",
"Pressing triggers and trap zones",
"Physical and tactical demands"
]
},
{
"question": "Evaluate a #6 for our system",
"key_points": [
"Understand team's tactical requirements",
"Defensive metrics: tackles, interceptions, positioning",
"Ball progression: progressive passes, carries",
"Passing range and press resistance"
]
}
],
"behavioral": [
{
"question": "Explain xG to a skeptical coach",
"key_points": [
"Start with their concerns",
"Use concrete match examples",
"Focus on decision-making, not replacing intuition",
"Acknowledge limitations upfront"
]
},
{
"question": "How do you prioritize competing requests?",
"key_points": [
"Understand urgency and impact",
"Communicate timelines clearly",
"Quick wins vs deep analysis",
"Manage expectations"
]
}
]
}
def practice_interview(self, category="technical"):
"""Get random interview question to practice."""
import random
questions = self.interview_prep.get(category, [])
if questions:
q = random.choice(questions)
return q
return None
# Usage
manager = JobSearchManager()
manager.add_application(
company="Example FC",
role="Data Analyst",
source="LinkedIn",
contact="John Smith"
)
print("Interview Practice - Technical Question:")
q = manager.practice_interview("technical")
print(f"\nQ: {q['question']}")
print("\nKey points to cover:")
for point in q["key_points"]:
print(f" - {point}")# R: Job search strategy
library(tidyverse)
# Where to find football analytics jobs
job_sources <- tribble(
~source, ~type, ~volume, ~competition, ~tips,
"LinkedIn", "Public Posting", "Medium", "Very High", "Set alerts for 'football analyst'",
"Twitter/X", "Network/Informal", "High", "Medium", "Follow club analysts, engage with content",
"Club Websites", "Direct Application", "Low", "High", "Check careers pages of target clubs",
"Analytics Community", "Referral", "Medium", "Low", "Attend conferences, join Slack groups",
"Data Providers", "Public Posting", "Medium", "High", "StatsBomb, Opta, Second Spectrum",
"Speculative Apps", "Cold Outreach", "N/A", "Low", "Research clubs without analytics departments"
)
# Application tracking framework
create_job_tracker <- function() {
tibble(
company = character(),
role = character(),
date_applied = as.Date(character()),
source = character(),
status = character(), # Applied, Phone Screen, Interview, Offer, Rejected
contact_name = character(),
notes = character(),
follow_up_date = as.Date(character())
)
}
# Interview preparation topics
interview_topics <- tribble(
~category, ~topic, ~example_question,
"Technical", "xG Understanding", "Explain how xG is calculated and its limitations",
"Technical", "SQL Query", "Write a query to find top 5 progressive passers",
"Technical", "Statistics", "How would you determine if a player is underperforming?",
"Technical", "Coding Challenge", "Build a shot map from this dataset",
"Football", "Tactical Analysis", "How would you analyze a high press?",
"Football", "Current Events", "What did you notice about [recent match]?",
"Football", "Player Evaluation", "Assess this player for our squad needs",
"Soft Skills", "Communication", "Explain xG to a coach who is skeptical",
"Soft Skills", "Problem Solving", "How would you approach [ambiguous project]?",
"Soft Skills", "Prioritization", "Multiple stakeholders want different things..."
)Interview Practice - Technical Question:
Q: How would you identify underperforming players?
Key points to cover:
- Compare actual output to expected (Goals vs xG)
- Consider sample size and regression to mean
- Look at underlying process metrics not just outcomesNetworking and Community
Football analytics has a vibrant community. Building genuine connections can open doors that job applications alone cannot.
# Python: Networking strategy
import pandas as pd
from datetime import datetime, timedelta
class NetworkingStrategy:
"""Strategic approach to building analytics network."""
def __init__(self):
self.connections = []
self.activities = []
self.communities = {
"slack": {
"name": "Football Slices",
"focus": "General football analytics discussion",
"activity": "Daily engagement, share work"
},
"twitter": {
"name": "Football Analytics Twitter",
"focus": "Quick insights, visualizations",
"activity": "Post 2-3x/week, engage daily"
},
"conferences": {
"name": "OptaPro, StatsBomb, Sloan",
"focus": "Professional networking",
"activity": "Attend 1-2 per year"
},
"meetups": {
"name": "Local analytics meetups",
"focus": "In-person connections",
"activity": "Monthly attendance"
}
}
def log_activity(self, activity_type: str, platform: str,
description: str, connection: str = None):
"""Log networking activity."""
self.activities.append({
"date": datetime.now(),
"type": activity_type,
"platform": platform,
"description": description,
"connection": connection
})
def weekly_engagement_plan(self) -> dict:
"""Generate weekly networking plan."""
return {
"monday": [
"Share interesting viz or insight on Twitter",
"Engage with 3 analytics posts"
],
"tuesday": [
"Post question or discussion in Slack",
"Comment on recent analysis article"
],
"wednesday": [
"DM one new connection with genuine question",
"Share relevant paper or resource"
],
"thursday": [
"Write short thread on recent analysis",
"Attend virtual meetup/seminar if available"
],
"friday": [
"Share portfolio project update",
"Summarize week's learnings"
],
"weekend": [
"Watch match and live-tweet observations",
"Plan next week's content"
]
}
def outreach_template(self, connection_name: str,
their_work: str, your_question: str) -> str:
"""Generate genuine outreach message."""
return f"""Hi {connection_name},
I really enjoyed your recent work on {their_work}. The approach to
[specific detail] was particularly interesting.
I'm working on a similar problem and wondered: {your_question}
No pressure to respond - I know you're busy. Just wanted to share
my appreciation for your work.
Best,
[Your name]
P.S. Here's my recent analysis if you're curious: [link]"""
def conference_prep(self, conference_name: str) -> list:
"""Prepare for analytics conference."""
return [
f"Research speakers and their recent work",
f"Prepare 30-second intro of yourself and your work",
f"Bring business cards or prepare digital contact sharing",
f"List 5 specific people you want to meet",
f"Prepare 2-3 thoughtful questions for panels",
f"Plan social media engagement during event",
f"Schedule follow-up messages within 48 hours"
]
# Mentorship framework
class MentorshipFinder:
"""Find and approach potential mentors."""
def identify_potential_mentors(self, your_goals: list) -> list:
"""Identify mentors based on career goals."""
mentor_types = {
"club_analyst": [
"Current club analysts 2-3 years ahead",
"Former analysts now in senior roles",
"Heads of analytics departments"
],
"data_science": [
"Sports data scientists at tech companies",
"Academic researchers in sports analytics",
"Data provider employees"
],
"media": [
"Analytics journalists at major outlets",
"Podcasters covering analytics",
"Visualization specialists"
]
}
suggestions = []
for goal in your_goals:
if goal in mentor_types:
suggestions.extend(mentor_types[goal])
return suggestions
def approach_template(self) -> str:
"""Template for mentor outreach."""
return """Subject: Quick Question - Aspiring Analyst
Hi [Name],
I've been following your work for [time period] and particularly
admire [specific work/insight]. Your path from [their background]
to [current role] resonates with where I'm trying to go.
I'm currently [your situation] and working on [your projects].
I'd love to ask you one question: [single, specific question].
I understand you're busy, so even a brief response would be
invaluable. Happy to share more context if helpful.
Thank you for inspiring the community with your work.
Best,
[Your name]
[Link to your best public work]"""
networking = NetworkingStrategy()
print("Weekly Engagement Plan:")
for day, tasks in networking.weekly_engagement_plan().items():
print(f"\n{day.title()}:")
for task in tasks:
print(f" - {task}")# R: Community engagement tracking
library(tidyverse)
# Key communities and resources
analytics_communities <- tribble(
~community, ~platform, ~focus, ~engagement_tip,
"Football Slices Slack", "Slack", "General analytics", "Share work, ask questions",
"OptaPro Forum", "In-person/virtual", "Professional networking", "Submit papers, attend conference",
"StatsBomb Conference", "Annual event", "Industry insights", "Network at breaks",
"Twitter/X Analytics", "Social media", "Quick insights, viz", "Engage with analysts work",
"r/FantasyPL", "Reddit", "FPL-focused", "Share models, help community",
"Friends of Tracking", "YouTube/Academic", "Tracking data", "Watch seminars, contribute"
)
# Key people to follow (by area)
key_follows <- tribble(
~name, ~handle, ~area, ~reason,
"Ted Knutson", "@mixedknuts", "Industry", "StatsBomb founder, hiring insights",
"Tom Worville", "@Worville", "Club analytics", "Great visualizations, career advice",
"Grace Robertson", "@GraceOnFootball", "Media analytics", "Athletic, presentation skills",
"David Sumpter", "@Soccermatics", "Academic", "Research perspective, tutorials",
"Jan Van Haaren", "@JanVanHaworst", "Academic/Club", "Research and practical insights"
)
# Engagement tracker
create_engagement_tracker <- function() {
tibble(
date = as.Date(character()),
activity_type = character(), # Post, Comment, DM, Event
platform = character(),
description = character(),
connection_made = character()
)
}Weekly Engagement Plan:
Monday:
- Share interesting viz or insight on Twitter
- Engage with 3 analytics posts
Tuesday:
- Post question or discussion in Slack
- Comment on recent analysis article
Wednesday:
- DM one new connection with genuine question
- Share relevant paper or resource
Thursday:
- Write short thread on recent analysis
- Attend virtual meetup/seminar if available
Friday:
- Share portfolio project update
- Summarize week's learnings
Weekend:
- Watch match and live-tweet observations
- Plan next week's contentAlternative Paths Into Analytics
There's no single route into football analytics. Understanding different paths helps you leverage your unique background.
Background: CS, statistics, data science
Advantage: Strong technical foundation
Gap to fill: Domain knowledge, football understanding
Strategy: Build football-specific projects, watch matches analytically, learn the language of the game
Background: Coaching, playing, scouting
Advantage: Deep football knowledge
Gap to fill: Technical skills
Strategy: Learn Python/R basics, start with simple analysis, translate football knowledge to data questions
Background: Journalism, content creation
Advantage: Communication skills
Gap to fill: Technical skills, methodology
Strategy: Add data literacy, collaborate with technical analysts, focus on storytelling with data
# Python: Career path planner
import pandas as pd
class CareerPathPlanner:
"""Plan transition into football analytics."""
def __init__(self, background: str):
self.background = background
self.paths = self._define_paths()
def _define_paths(self) -> dict:
return {
"technical": {
"strengths": ["Programming", "Statistics", "ML"],
"gaps": ["Football knowledge", "Communication to non-tech"],
"recommended_focus": [
"Watch 3+ matches per week analytically",
"Read tactical blogs (Spielverlagerung, etc.)",
"Build football-specific projects",
"Practice explaining concepts simply"
],
"timeline": "6-12 months to job-ready"
},
"football": {
"strengths": ["Domain knowledge", "Stakeholder understanding"],
"gaps": ["Programming", "Statistics"],
"recommended_focus": [
"Python basics (3 months minimum)",
"Statistics fundamentals",
"Start with video analysis tools",
"Translate coaching questions to data questions"
],
"timeline": "12-18 months to job-ready"
},
"media": {
"strengths": ["Communication", "Storytelling", "Audience sense"],
"gaps": ["Technical depth", "Methodology rigor"],
"recommended_focus": [
"Learn data literacy fundamentals",
"Collaborate with technical analysts",
"Focus on data visualization",
"Build hybrid skill set"
],
"timeline": "6-12 months to job-ready"
},
"academic": {
"strengths": ["Research methods", "Writing", "Deep analysis"],
"gaps": ["Practical application", "Speed/deadlines"],
"recommended_focus": [
"Build applied projects",
"Practice quick-turnaround analysis",
"Engage with industry practitioners",
"Translate research to actionable insights"
],
"timeline": "3-6 months to job-ready"
}
}
def get_personalized_plan(self) -> dict:
"""Get plan based on background."""
if self.background in self.paths:
return self.paths[self.background]
return self.paths["technical"] # Default
def create_6_month_roadmap(self) -> list:
"""Create 6-month transition roadmap."""
path = self.get_personalized_plan()
return [
{
"month": "1-2",
"focus": "Foundation",
"tasks": [
"Complete basic Python/R course",
"Set up GitHub and start documenting learning",
"Begin watching matches with analytical focus",
"Join Football Slices Slack"
]
},
{
"month": "3-4",
"focus": "First Projects",
"tasks": [
"Complete first portfolio project (shot map or simple viz)",
"Write first analysis blog post",
"Engage regularly on Twitter with analytics community",
"Learn StatsBomb open data"
]
},
{
"month": "5-6",
"focus": "Intermediate + Visibility",
"tasks": [
"Complete intermediate project (xG model or player comparison)",
"Submit analysis to blog/publication",
"Start applying for entry-level roles or internships",
"Attend first analytics event (virtual or in-person)"
]
}
]
def identify_role_fit(self) -> list:
"""Suggest best-fit roles based on background."""
role_suggestions = {
"technical": [
"Data Analyst at data provider",
"ML Engineer at betting company",
"Analytics developer at club"
],
"football": [
"Performance Analyst",
"Video Analyst (with analytics focus)",
"Scout with data component"
],
"media": [
"Analytics writer/journalist",
"Content creator at data company",
"Visualization specialist"
],
"academic": [
"Research role at federation",
"R&D at data provider",
"Consulting for clubs"
]
}
return role_suggestions.get(self.background, role_suggestions["technical"])
# Example usage
planner = CareerPathPlanner("technical")
print("Personalized Plan for Technical Background:")
plan = planner.get_personalized_plan()
print(f"\nStrengths: {plan['strengths']}")
print(f"Gaps to fill: {plan['gaps']}")
print(f"Timeline: {plan['timeline']}")
print("\n6-Month Roadmap:")
for phase in planner.create_6_month_roadmap():
print(f"\nMonth {phase['month']}: {phase['focus']}")
for task in phase["tasks"]:
print(f" - {task}")# R: Career path analysis
library(tidyverse)
# Example career paths
career_paths <- tribble(
~person, ~start, ~transition, ~current, ~years_total,
"Analyst A", "PhD Statistics", "Hobby projects + Twitter", "Head of Analytics", 6,
"Analyst B", "Youth Coach", "UEFA B + self-taught Python", "First Team Analyst", 4,
"Analyst C", "Software Engineer", "Side projects + networking", "Data Provider", 3,
"Analyst D", "Sports Journalist", "Learn R + public analysis", "Analytics Writer", 2,
"Analyst E", "Economics Graduate", "Masters + internship", "Club Data Analyst", 3,
"Analyst F", "No degree", "Blog + Twitter + persistence", "Freelance Analyst", 5
)
# Analyze common factors in successful transitions
success_factors <- tribble(
~factor, ~importance, ~time_investment,
"Public work portfolio", "Essential", "6-12 months",
"Technical skills (Python/R)", "Essential", "3-6 months minimum",
"Network in analytics community", "Very High", "Ongoing",
"Football domain knowledge", "Essential", "Background + ongoing",
"Persistence and patience", "Essential", "1-3 years typical",
"Formal education", "Helpful but not required", "Variable"
)Personalized Plan for Technical Background:
Strengths: ['Programming', 'Statistics', 'ML']
Gaps to fill: ['Football knowledge', 'Communication to non-tech']
Timeline: 6-12 months to job-ready
6-Month Roadmap:
Month 1-2: Foundation
- Complete basic Python/R course
- Set up GitHub and start documenting learning
- Begin watching matches with analytical focus
- Join Football Slices Slack
Month 3-4: First Projects
- Complete first portfolio project (shot map or simple viz)
- Write first analysis blog post
- Engage regularly on Twitter with analytics community
- Learn StatsBomb open data
Month 5-6: Intermediate + Visibility
- Complete intermediate project (xG model or player comparison)
- Submit analysis to blog/publication
- Start applying for entry-level roles or internships
- Attend first analytics event (virtual or in-person)Technical Interview Challenges
Many football analytics interviews include technical challenges. Here are common types and how to prepare for them.
# Python: Interview challenge examples
import pandas as pd
import numpy as np
class InterviewChallenges:
"""Common football analytics interview challenges."""
@staticmethod
def challenge_1_pass_zones(events: pd.DataFrame) -> pd.DataFrame:
"""
Challenge: Calculate pass completion by pitch zone.
Input: Event data with columns (type, x, outcome)
Output: Summary by defensive/middle/final third
"""
passes = events[events["type"] == "Pass"].copy()
passes["zone"] = pd.cut(
passes["x"],
bins=[0, 33, 66, 100],
labels=["Defensive Third", "Middle Third", "Final Third"]
)
summary = passes.groupby("zone").agg(
total_passes=("type", "count"),
completed=("outcome", lambda x: (x == "Complete").sum())
).reset_index()
summary["completion_pct"] = (
summary["completed"] / summary["total_passes"] * 100
).round(1)
return summary
@staticmethod
def challenge_2_xg_calculation(shots: pd.DataFrame) -> pd.DataFrame:
"""
Challenge: Build simple xG model and calculate team totals.
Demonstrates: Feature engineering, model thinking
"""
from math import sqrt, atan2, pi
# Calculate shot features
goal_x, goal_y = 100, 50
shots["distance"] = np.sqrt(
(goal_x - shots["x"])**2 + (goal_y - shots["y"])**2
)
shots["angle"] = np.abs(np.arctan2(
goal_y - shots["y"],
goal_x - shots["x"]
))
# Simple xG model (production would use trained model)
shots["xg"] = np.maximum(0,
0.5 - shots["distance"] * 0.015 -
shots["angle"] * 0.1 +
0.3 * (shots["body_part"] == "Head").astype(int) * -0.1
)
# Aggregate by team
team_xg = shots.groupby("team").agg(
shots=("type", "count"),
goals=("outcome", lambda x: (x == "Goal").sum()),
total_xg=("xg", "sum")
).reset_index()
team_xg["xg_difference"] = team_xg["goals"] - team_xg["total_xg"]
return team_xg
@staticmethod
def challenge_3_sql_query() -> str:
"""
Challenge: Write SQL for top progressive passers.
Tests: SQL proficiency, metric understanding
"""
return """
WITH player_minutes AS (
SELECT player_id, SUM(minutes_played) as total_minutes
FROM minutes
GROUP BY player_id
HAVING SUM(minutes_played) >= 900
)
SELECT
p.player_name,
t.team_name,
COUNT(*) as progressive_passes,
ROUND(COUNT(*) * 90.0 / pm.total_minutes, 2) as prog_pass_p90
FROM passes pa
JOIN players p ON pa.player_id = p.player_id
JOIN teams t ON pa.team_id = t.team_id
JOIN player_minutes pm ON pa.player_id = pm.player_id
WHERE pa.end_x - pa.start_x >= 10
AND pa.start_x < 80 -- Doesn't start too close to goal
AND pa.outcome = 'Complete'
GROUP BY p.player_id, p.player_name, t.team_name, pm.total_minutes
ORDER BY prog_pass_p90 DESC
LIMIT 20;
"""
@staticmethod
def challenge_4_explain_concept() -> dict:
"""
Challenge: Explain xG to a skeptical coach.
Tests: Communication, stakeholder management
"""
return {
"opening": """
I understand the concern about reducing football to numbers.
xG isn't meant to replace what you see - it's meant to add
context to help make decisions.
""",
"analogy": """
Think of it like this: if a striker misses a chance from
6 yards out, you know that's a bad miss. xG just puts a
number on HOW bad - that was probably a 0.75 xG chance,
meaning historically 75% of similar shots go in.
""",
"practical_use": """
Where I've seen it help coaches:
1. Identifying if finishing is sustainable (high goals, low xG)
2. Seeing if the team creates enough quality chances
3. Comparing attackers beyond just goal counts
""",
"limitations": """
What xG doesn't tell us:
- Whether the finish was good or bad execution
- Defender positioning (in basic models)
- Game context and pressure
That's where your expertise fills the gap.
""",
"close": """
I'd love to show you some examples from recent matches
where xG helped identify something that wasn't obvious
from the result. What match should we look at?
"""
}
@staticmethod
def challenge_5_visualization(events: pd.DataFrame):
"""
Challenge: Create an insightful visualization.
Tests: Data viz skills, football understanding
"""
import matplotlib.pyplot as plt
from mplsoccer import Pitch
shots = events[events["type"] == "Shot"]
pitch = Pitch(pitch_type="statsbomb")
fig, ax = pitch.draw(figsize=(12, 8))
# Plot shots colored by outcome
goals = shots[shots["outcome"] == "Goal"]
misses = shots[shots["outcome"] != "Goal"]
pitch.scatter(
misses["x"], misses["y"],
s=misses["xg"] * 500,
c="gray", alpha=0.5,
edgecolors="black",
ax=ax, label="No Goal"
)
pitch.scatter(
goals["x"], goals["y"],
s=goals["xg"] * 500,
c="red", alpha=0.8,
edgecolors="black",
ax=ax, label="Goal"
)
ax.legend()
ax.set_title("Shot Map with xG (size = expected goals)")
return fig
# Practice challenges
challenges = InterviewChallenges()
# Example: Explain xG structure
print("Challenge 4 - Explain xG to Coach:")
explanation = challenges.challenge_4_explain_concept()
for section, content in explanation.items():
print(f"\n{section.upper()}:")
print(content.strip())# R: Common interview challenges
library(tidyverse)
# Challenge 1: Data manipulation task
# "Given match event data, calculate pass completion by zone"
solve_pass_zones <- function(events) {
events %>%
filter(type == "Pass") %>%
mutate(
zone = case_when(
x < 33 ~ "Defensive Third",
x < 66 ~ "Middle Third",
TRUE ~ "Final Third"
)
) %>%
group_by(zone) %>%
summarise(
total_passes = n(),
completed = sum(outcome == "Complete"),
completion_pct = completed / total_passes * 100,
.groups = "drop"
)
}
# Challenge 2: SQL query task
# "Write a query to find players with most progressive passes"
progressive_pass_query <- "
SELECT
p.player_name,
t.team_name,
COUNT(*) as progressive_passes,
ROUND(COUNT(*) * 90.0 / SUM(m.minutes_played), 2) as prog_per_90
FROM passes pa
JOIN players p ON pa.player_id = p.player_id
JOIN teams t ON pa.team_id = t.team_id
JOIN minutes m ON pa.player_id = m.player_id AND pa.match_id = m.match_id
WHERE pa.end_x - pa.start_x >= 10 -- Moves ball 10+ yards forward
AND pa.end_x >= 60 -- Ends in opposition half
AND pa.outcome = 'Complete'
GROUP BY p.player_id, p.player_name, t.team_name
HAVING SUM(m.minutes_played) >= 900 -- Minimum 10 full matches
ORDER BY prog_per_90 DESC
LIMIT 20
"
# Challenge 3: Explain methodology
explain_xg <- function() {
"
xG (Expected Goals) quantifies shot quality by estimating
the probability of a shot resulting in a goal.
Key features:
- Shot location (distance and angle to goal)
- Body part (foot vs header)
- Type of assist (through ball, cross, etc.)
- Game state (open play vs set piece)
Model training:
- Historical shots labeled with outcome (goal/no goal)
- Logistic regression or gradient boosting
- Validated on held-out data
Limitations:
- Doesn't capture shot execution quality
- Pre-shot movement not included in basic models
- Small sample sizes at individual level
Use cases:
- Evaluating finishing (goals vs xG)
- Assessing chance creation (team xG created)
- Transfer valuation (underlying attacking quality)
"
}Challenge 4 - Explain xG to Coach:
OPENING:
I understand the concern about reducing football to numbers.
xG isn't meant to replace what you see - it's meant to add
context to help make decisions.
ANALOGY:
Think of it like this: if a striker misses a chance from
6 yards out, you know that's a bad miss. xG just puts a
number on HOW bad - that was probably a 0.75 xG chance,
meaning historically 75% of similar shots go in.
PRACTICAL_USE:
Where I've seen it help coaches:
1. Identifying if finishing is sustainable (high goals, low xG)
2. Seeing if the team creates enough quality chances
3. Comparing attackers beyond just goal countsCareer Progression and Negotiation
Understanding career progression helps you plan long-term and negotiate effectively.
| Level | Typical Title | Experience | Responsibilities | Skills Focus |
|---|---|---|---|---|
| Entry | Junior Analyst / Intern | 0-2 years | Data prep, visualizations, ad-hoc requests | Technical execution, learning systems |
| Mid | Data Analyst | 2-4 years | Independent analysis, stakeholder presentations | Project ownership, communication |
| Senior | Senior Analyst | 4-6 years | Complex projects, methodology development | Strategic thinking, mentoring |
| Lead | Lead / Principal Analyst | 6-8 years | Team leadership, cross-department influence | Leadership, organizational impact |
| Executive | Head of Analytics / Director | 8+ years | Department strategy, budget, executive influence | Vision, business acumen, politics |
# Python: Negotiation and career tools
import pandas as pd
from datetime import datetime
class CareerPlanner:
"""Plan and track career progression."""
def __init__(self):
self.salary_data = {
"Junior Analyst": (25000, 35000),
"Data Analyst": (35000, 50000),
"Senior Analyst": (50000, 70000),
"Lead Analyst": (65000, 90000),
"Head of Analytics": (80000, 150000)
}
self.location_multipliers = {
"London": 1.20,
"Manchester": 1.00,
"Liverpool": 0.95,
"Europe (top league city)": 1.10,
"USA (major market)": 1.50,
"Remote": 0.90
}
def estimate_salary(self, role: str, location: str) -> dict:
"""Estimate salary range for role and location."""
base = self.salary_data.get(role, (30000, 45000))
multiplier = self.location_multipliers.get(location, 1.0)
return {
"role": role,
"location": location,
"min_salary": int(base[0] * multiplier),
"max_salary": int(base[1] * multiplier),
"negotiation_target": int((base[0] + base[1]) / 2 * multiplier * 1.1)
}
def negotiation_prep(self, current_offer: int, role: str,
location: str) -> dict:
"""Prepare for salary negotiation."""
estimate = self.estimate_salary(role, location)
return {
"offer": current_offer,
"market_range": (estimate["min_salary"], estimate["max_salary"]),
"offer_position": self._offer_position(current_offer, estimate),
"counter_suggestion": int(current_offer * 1.10), # 10% counter
"talking_points": [
"Market research shows similar roles pay £X-£Y",
"My specific skills in [skill] add value because...",
"I've demonstrated impact through [portfolio work]",
"I'm committed long-term and want to grow with the org"
],
"non_salary_items": [
"Conference attendance budget",
"Professional development time",
"Remote work flexibility",
"Performance bonus structure",
"Equipment/software budget"
]
}
def _offer_position(self, offer: int, estimate: dict) -> str:
"""Determine where offer falls in range."""
min_s, max_s = estimate["min_salary"], estimate["max_salary"]
if offer < min_s:
return "Below market - strong case for counter"
elif offer < (min_s + max_s) / 2:
return "Lower half of range - room to negotiate"
elif offer < max_s:
return "Above average - negotiate smaller items"
else:
return "Top of range - focus on non-salary"
def promotion_readiness(self, current_level: str) -> dict:
"""Assess readiness for next level."""
requirements = {
"Junior Analyst": {
"next_level": "Data Analyst",
"requirements": [
"2+ years experience",
"3+ completed independent projects",
"Positive stakeholder feedback",
"Basic mentoring of interns"
],
"evidence_needed": [
"Portfolio of work",
"Written feedback from stakeholders",
"Examples of impact on decisions"
]
},
"Data Analyst": {
"next_level": "Senior Analyst",
"requirements": [
"4+ years experience",
"Led major project end-to-end",
"Developed new methodology or tool",
"Cross-team collaboration"
],
"evidence_needed": [
"Case study of major project impact",
"Documentation of methodology contribution",
"Peer recognition"
]
},
"Senior Analyst": {
"next_level": "Lead Analyst",
"requirements": [
"6+ years experience",
"Managed/mentored junior analysts",
"Influenced department strategy",
"External visibility (conferences, papers)"
],
"evidence_needed": [
"Team development examples",
"Strategic recommendations adopted",
"Industry recognition"
]
}
}
return requirements.get(current_level, {})
# Usage example
planner = CareerPlanner()
# Salary estimate
estimate = planner.estimate_salary("Data Analyst", "London")
print("Salary Estimate for Data Analyst in London:")
print(f" Range: £{estimate['min_salary']:,} - £{estimate['max_salary']:,}")
print(f" Negotiation target: £{estimate['negotiation_target']:,}")
# Negotiation prep
prep = planner.negotiation_prep(40000, "Data Analyst", "London")
print(f"\nOffer position: {prep['offer_position']}")
print(f"Suggested counter: £{prep['counter_suggestion']:,}")# R: Career planning framework
library(tidyverse)
# Define career goals and track progress
career_tracker <- tribble(
~goal_type, ~goal, ~target_date, ~status, ~notes,
"Skill", "Advanced SQL", "2024-03", "Complete", "Passed Mode Analytics cert",
"Portfolio", "xG Model", "2024-04", "In Progress", "70% complete",
"Network", "Attend OptaPro", "2024-09", "Planned", "Submitted abstract",
"Application", "Apply to 10 clubs", "2024-06", "In Progress", "4/10 sent",
"Interview", "Practice mock interviews", "2024-05", "Not Started", ""
)
# Salary research framework
research_salary <- function(role, location, experience_years) {
# Base ranges (GBP, adjust for location)
base_ranges <- tribble(
~role, ~min_salary, ~max_salary,
"Junior Analyst", 25000, 35000,
"Data Analyst", 35000, 50000,
"Senior Analyst", 50000, 70000,
"Lead Analyst", 65000, 90000,
"Head of Analytics", 80000, 150000
)
# Location adjustments
location_multiplier <- case_when(
location == "London" ~ 1.2,
location == "Manchester" ~ 1.0,
location == "Europe (big city)" ~ 1.1,
location == "USA" ~ 1.5,
TRUE ~ 1.0
)
base_ranges %>%
filter(role == !!role) %>%
mutate(
adjusted_min = min_salary * location_multiplier,
adjusted_max = max_salary * location_multiplier
)
}Salary Estimate for Data Analyst in London:
Range: £42,000 - £60,000
Negotiation target: £56,100
Offer position: Below market - strong case for counter
Suggested counter: £44,000Day-to-Day Reality
Understanding what analysts actually do helps set realistic expectations and prepare for the role.
- Monday: Post-match analysis, player load data
- Tuesday: Opposition report preparation
- Wednesday: Training data collection
- Thursday: Present opposition analysis to staff
- Friday: Final prep, set piece analysis
- Saturday: Match day - live data support
- Sunday: Initial match review
- Tight deadlines around match schedules
- Communicating with non-technical stakeholders
- Data quality and availability issues
- Balancing quick requests vs deep projects
- Getting buy-in from traditional staff
- Working weekends and irregular hours
Practice Exercises
Career Development Tasks
Complete these exercises to advance your analytics career:
Complete a thorough self-assessment of your technical, domain, and soft skills. Identify your top 3 gaps and create a 90-day learning plan to address them.
Complete one portfolio project from scratch. Publish the code on GitHub with documentation and write an accompanying blog post explaining your methodology and findings.
Practice answering 5 technical and 3 behavioral interview questions out loud. Record yourself and review for clarity and confidence.
Identify 10 football analytics professionals to follow on Twitter. Engage meaningfully with their content for one month. Attend one virtual or in-person analytics event.
Summary
Key Takeaways
- Football analytics roles exist across clubs, media, betting, and data providers
- Essential skills: Python/R, SQL, statistics, football knowledge, communication
- A public portfolio of work is more valuable than credentials
- Networking is crucial - many roles aren't publicly advertised
- Prepare for interviews with both technical and behavioral questions
- The job involves tight deadlines, weekend work, and communicating with non-technical staff
Building a career in football analytics requires patience, consistent skill development, and visibility through public work. In the next chapter, we'll explore analytics applications for broadcasters and media.