Capstone - Complete Analytics System
Learning Objectives
- Understand the relationship between video and data analysis
- Build automated video tagging systems using event data
- Create video playlists from statistical queries
- Integrate video clips into scouting reports
- Design video-data workflows for coaching staff
Data tells you what happened; video shows you how and why. The most effective football analytics combines statistical insights with video evidence. This chapter explores how to bridge the gap between numbers and footage, creating workflows that leverage both for maximum impact.
The Video-Data Connection
Modern football clubs have access to vast amounts of video footage—every match, every training session, every youth game. The challenge isn't access to video; it's finding the right moments efficiently. Data analysis provides the filter; video provides the context.
- What events occurred
- Where on the pitch
- When in the match
- Statistical outcomes (xG, etc.)
- Patterns across matches
- How actions were executed
- Body positioning and technique
- Off-ball movements
- Decision-making process
- Tactical context
# Python: Video-data integration framework
from dataclasses import dataclass, field
from typing import Dict, List, Optional
from datetime import timedelta
import pandas as pd
@dataclass
class VideoClip:
"""Represents a single video clip linked to event data."""
clip_id: str
match_id: str
start_time: float # seconds from match start
end_time: float
event_type: Optional[str] = None
player: Optional[str] = None
tags: List[str] = field(default_factory=list)
video_path: Optional[str] = None
thumbnail_path: Optional[str] = None
def format_timestamp(self) -> str:
"""Format start time as MM:SS."""
minutes = int(self.start_time // 60)
seconds = int(self.start_time % 60)
return f"{minutes:02d}:{seconds:02d}"
@property
def duration(self) -> float:
"""Calculate clip duration in seconds."""
return self.end_time - self.start_time
def add_tag(self, tag: str):
"""Add a tag to the clip."""
if tag not in self.tags:
self.tags.append(tag)
def to_dict(self) -> Dict:
"""Convert to dictionary."""
return {
"clip_id": self.clip_id,
"match_id": self.match_id,
"timestamp": self.format_timestamp(),
"duration": self.duration,
"event_type": self.event_type,
"player": self.player,
"tags": self.tags
}
class VideoLibrary:
"""Manages video clips and provides search functionality."""
def __init__(self):
self.clips: Dict[str, VideoClip] = {}
self.matches: pd.DataFrame = pd.DataFrame()
def add_clip(self, clip: VideoClip):
"""Add a clip to the library."""
self.clips[clip.clip_id] = clip
def search_by_tags(self, tags: List[str],
match_all: bool = False) -> List[VideoClip]:
"""Search clips by tags."""
results = []
for clip in self.clips.values():
if match_all:
if all(tag in clip.tags for tag in tags):
results.append(clip)
else:
if any(tag in clip.tags for tag in tags):
results.append(clip)
return results
def get_player_clips(self, player_name: str,
event_type: Optional[str] = None) -> List[VideoClip]:
"""Get all clips for a specific player."""
results = []
for clip in self.clips.values():
if clip.player == player_name:
if event_type is None or clip.event_type == event_type:
results.append(clip)
return results
def get_match_clips(self, match_id: str) -> List[VideoClip]:
"""Get all clips from a specific match."""
return [c for c in self.clips.values() if c.match_id == match_id]
def to_dataframe(self) -> pd.DataFrame:
"""Convert library to DataFrame."""
return pd.DataFrame([c.to_dict() for c in self.clips.values()])
# Initialize library
library = VideoLibrary()
print("Video library framework initialized")# R: Video-data integration framework
library(tidyverse)
library(R6)
# Video Clip Class
VideoClip <- R6Class("VideoClip",
public = list(
clip_id = NULL,
match_id = NULL,
start_time = NULL, # seconds from match start
end_time = NULL,
event_type = NULL,
player = NULL,
tags = NULL,
video_path = NULL,
initialize = function(clip_id, match_id, start_time, end_time,
event_type = NULL, player = NULL, tags = NULL) {
self$clip_id <- clip_id
self$match_id <- match_id
self$start_time <- start_time
self$end_time <- end_time
self$event_type <- event_type
self$player <- player
self$tags <- tags %||% character()
},
# Format timestamp for display
format_timestamp = function() {
start_min <- floor(self$start_time / 60)
start_sec <- self$start_time %% 60
sprintf("%02d:%02d", start_min, start_sec)
},
# Calculate duration
duration = function() {
self$end_time - self$start_time
},
add_tag = function(tag) {
self$tags <- unique(c(self$tags, tag))
}
)
)
# Video Library Manager
VideoLibrary <- R6Class("VideoLibrary",
public = list(
clips = NULL,
matches = NULL,
initialize = function() {
self$clips <- list()
self$matches <- tibble()
},
# Add clip to library
add_clip = function(clip) {
self$clips[[clip$clip_id]] <- clip
},
# Search clips by tags
search_by_tags = function(tags, match_all = FALSE) {
results <- list()
for (clip in self$clips) {
if (match_all) {
if (all(tags %in% clip$tags)) {
results[[clip$clip_id]] <- clip
}
} else {
if (any(tags %in% clip$tags)) {
results[[clip$clip_id]] <- clip
}
}
}
return(results)
},
# Get clips for a player
get_player_clips = function(player_name, event_type = NULL) {
results <- list()
for (clip in self$clips) {
if (clip$player == player_name) {
if (is.null(event_type) || clip$event_type == event_type) {
results[[clip$clip_id]] <- clip
}
}
}
return(results)
}
)
)
cat("Video library framework initialized\n")Automated Video Tagging
Event data provides timestamps that can automatically generate video clips. By linking each event to its corresponding video moment, we can instantly find relevant footage without manual tagging.
# Python: Automated video tagging from event data
import pandas as pd
from typing import List, Dict, Optional
from dataclasses import dataclass
class VideoTagger:
"""Automatically generate video clips from event data."""
def __init__(self, buffer_before: float = 5.0,
buffer_after: float = 3.0):
self.buffer_before = buffer_before
self.buffer_after = buffer_after
def events_to_clips(self, events_df: pd.DataFrame,
match_id: str) -> List[VideoClip]:
"""Convert event data to video clips."""
clips = []
for _, event in events_df.iterrows():
# Calculate event time in seconds
event_seconds = event["minute"] * 60 + event.get("second", 0)
# Create clip
clip = VideoClip(
clip_id=f"{match_id}_{event['event_id']}",
match_id=match_id,
start_time=max(0, event_seconds - self.buffer_before),
end_time=event_seconds + self.buffer_after,
event_type=event.get("type"),
player=event.get("player"),
tags=self._generate_auto_tags(event)
)
clips.append(clip)
return clips
def _generate_auto_tags(self, event: pd.Series) -> List[str]:
"""Generate automatic tags based on event properties."""
tags = []
# Event type
if pd.notna(event.get("type")):
tags.append(event["type"])
# Outcome
if pd.notna(event.get("outcome")):
tags.append(event["outcome"])
# Location zone
if pd.notna(event.get("x")) and pd.notna(event.get("y")):
zone = self._classify_pitch_zone(event["x"], event["y"])
tags.append(zone)
# Pressure
if event.get("under_pressure", False):
tags.append("under_pressure")
# Body part
if pd.notna(event.get("body_part")):
tags.append(event["body_part"])
# xG classification for shots
if event.get("type") == "Shot" and pd.notna(event.get("xg")):
xg = event["xg"]
if xg >= 0.3:
tags.append("big_chance")
elif xg >= 0.1:
tags.append("good_chance")
else:
tags.append("low_xg_shot")
return list(set(tags))
def _classify_pitch_zone(self, x: float, y: float) -> str:
"""Classify location into pitch zone."""
# Assuming 120x80 pitch
if x >= 102:
zone = "box"
elif x >= 80:
zone = "final_third"
elif x >= 40:
zone = "middle_third"
else:
zone = "defensive_third"
if y <= 26:
side = "left"
elif y >= 54:
side = "right"
else:
side = "central"
return f"{zone}_{side}"
# Extended tagging for specific analysis types
class AnalysisSpecificTagger(VideoTagger):
"""Extended tagger for specific analysis needs."""
def tag_pressing_events(self, events_df: pd.DataFrame,
match_id: str) -> List[VideoClip]:
"""Tag pressing and counterpressing events."""
pressing_events = events_df[
events_df["type"].isin(["Pressure", "Ball Recovery", "Tackle"])
]
clips = self.events_to_clips(pressing_events, match_id)
# Add pressing-specific tags
for clip in clips:
clip.add_tag("pressing_analysis")
# Check if counterpress (within 5 sec of losing ball)
# Would need possession change data
return clips
def tag_set_pieces(self, events_df: pd.DataFrame,
match_id: str) -> List[VideoClip]:
"""Tag set piece events with extended context."""
set_pieces = events_df[
events_df["type"].isin(["Corner", "Free Kick", "Throw-in"])
]
clips = []
for _, event in set_pieces.iterrows():
event_seconds = event["minute"] * 60 + event.get("second", 0)
# Longer buffer for set pieces
clip = VideoClip(
clip_id=f"{match_id}_{event['event_id']}_setpiece",
match_id=match_id,
start_time=max(0, event_seconds - 3),
end_time=event_seconds + 15, # Longer to capture outcome
event_type=event.get("type"),
player=event.get("player"),
tags=["set_piece", event.get("type", "")]
)
clips.append(clip)
return clips
tagger = VideoTagger()
print("Video tagger initialized")# R: Automated video tagging from event data
library(tidyverse)
# Convert event data to video clips
events_to_clips <- function(events_df, match_id, buffer_before = 5, buffer_after = 3) {
clips <- list()
for (i in 1:nrow(events_df)) {
event <- events_df[i, ]
# Convert match minute to seconds
event_seconds <- event$minute * 60 + (event$second %||% 0)
# Create clip with buffer
clip <- VideoClip$new(
clip_id = paste(match_id, event$event_id, sep = "_"),
match_id = match_id,
start_time = max(0, event_seconds - buffer_before),
end_time = event_seconds + buffer_after,
event_type = event$type,
player = event$player
)
# Add automatic tags based on event type
clip$tags <- generate_auto_tags(event)
clips[[clip$clip_id]] <- clip
}
return(clips)
}
# Generate automatic tags based on event properties
generate_auto_tags <- function(event) {
tags <- character()
# Event type tags
tags <- c(tags, event$type)
# Outcome tags
if (!is.null(event$outcome)) {
tags <- c(tags, event$outcome)
}
# Location tags
if (!is.null(event$x) && !is.null(event$y)) {
zone <- classify_pitch_zone(event$x, event$y)
tags <- c(tags, zone)
}
# Special situation tags
if (!is.null(event$under_pressure) && event$under_pressure) {
tags <- c(tags, "under_pressure")
}
if (!is.null(event$body_part)) {
tags <- c(tags, event$body_part)
}
# xG tags for shots
if (event$type == "Shot" && !is.null(event$xg)) {
if (event$xg >= 0.3) {
tags <- c(tags, "big_chance")
} else if (event$xg >= 0.1) {
tags <- c(tags, "good_chance")
} else {
tags <- c(tags, "low_xg_shot")
}
}
return(unique(tags))
}
# Classify pitch zone
classify_pitch_zone <- function(x, y) {
# Assuming 120x80 pitch dimensions
zone <- case_when(
x >= 102 ~ "box",
x >= 80 ~ "final_third",
x >= 40 ~ "middle_third",
TRUE ~ "defensive_third"
)
side <- case_when(
y <= 26 ~ "left",
y >= 54 ~ "right",
TRUE ~ "central"
)
paste(zone, side, sep = "_")
}
cat("Auto-tagging functions defined\n")Creating Video Playlists from Data
The real power of video-data integration is the ability to create custom playlists from statistical queries. "Show me all progressive passes into the box by our left-back" becomes a few lines of code.
# Python: Create video playlists from data queries
import pandas as pd
from typing import List, Dict, Optional, Callable
from dataclasses import dataclass, field
from datetime import datetime
@dataclass
class VideoPlaylist:
"""Collection of video clips for a specific purpose."""
name: str
description: str = ""
clips: List[VideoClip] = field(default_factory=list)
created_date: datetime = field(default_factory=datetime.now)
def add_clip(self, clip: VideoClip):
"""Add a clip to the playlist."""
self.clips.append(clip)
def add_clips(self, clips: List[VideoClip]):
"""Add multiple clips."""
self.clips.extend(clips)
def sort_chronologically(self):
"""Sort clips by start time."""
self.clips.sort(key=lambda c: (c.match_id, c.start_time))
@property
def total_duration(self) -> float:
"""Total duration of all clips in seconds."""
return sum(c.duration for c in self.clips)
@property
def clip_count(self) -> int:
"""Number of clips in playlist."""
return len(self.clips)
def to_dataframe(self) -> pd.DataFrame:
"""Export playlist as DataFrame."""
return pd.DataFrame([c.to_dict() for c in self.clips])
def export_for_video_software(self) -> List[Dict]:
"""Export in format for video editing software."""
return [
{
"clip_name": c.clip_id,
"source_file": c.video_path,
"in_point": c.start_time,
"out_point": c.end_time,
"duration": c.duration,
"markers": c.tags
}
for c in self.clips
]
class PlaylistGenerator:
"""Generate playlists from data queries."""
def __init__(self, library: VideoLibrary):
self.library = library
def create_from_query(self, events: pd.DataFrame,
name: str,
filters: Dict) -> VideoPlaylist:
"""Create playlist from filtered events."""
playlist = VideoPlaylist(
name=name,
description=f"Auto-generated: {datetime.now()}"
)
# Apply filters
filtered = events.copy()
if "player" in filters and filters["player"]:
filtered = filtered[filtered["player"] == filters["player"]]
if "event_types" in filters and filters["event_types"]:
filtered = filtered[filtered["type"].isin(filters["event_types"])]
if "min_xg" in filters and filters["min_xg"]:
filtered = filtered[filtered["xg"] >= filters["min_xg"]]
if "zone" in filters and filters["zone"]:
filtered = filtered[filtered["pitch_zone"] == filters["zone"]]
if "team" in filters and filters["team"]:
filtered = filtered[filtered["team"] == filters["team"]]
# Find corresponding clips
for _, event in filtered.iterrows():
clip_id = f"{event['match_id']}_{event['event_id']}"
if clip_id in self.library.clips:
playlist.add_clip(self.library.clips[clip_id])
playlist.sort_chronologically()
return playlist
def create_scouting_playlist(self, player_name: str,
events: pd.DataFrame) -> VideoPlaylist:
"""Create comprehensive scouting playlist for a player."""
key_events = ["Shot", "Goal", "Assist", "Key Pass",
"Dribble", "Progressive Pass", "Tackle", "Interception"]
return self.create_from_query(
events,
name=f"Scouting: {player_name}",
filters={
"player": player_name,
"event_types": key_events
}
)
def create_big_chances_playlist(self, events: pd.DataFrame,
team: Optional[str] = None) -> VideoPlaylist:
"""Create playlist of big chances (high xG shots)."""
filters = {
"event_types": ["Shot"],
"min_xg": 0.3
}
if team:
filters["team"] = team
return self.create_from_query(
events,
name=f"Big Chances{f' - {team}' if team else ''}",
filters=filters
)
def create_defensive_errors_playlist(self, events: pd.DataFrame,
team: str) -> VideoPlaylist:
"""Create playlist of defensive errors and goals conceded."""
# Get shots conceded with high xG
shots_against = events[
(events["type"] == "Shot") &
(events["team"] != team) &
(events["xg"] >= 0.15)
]
playlist = VideoPlaylist(
name=f"Defensive Review: {team}",
description="High-quality chances conceded"
)
for _, event in shots_against.iterrows():
clip_id = f"{event['match_id']}_{event['event_id']}"
if clip_id in self.library.clips:
playlist.add_clip(self.library.clips[clip_id])
return playlist
print("Playlist generator ready")# R: Create video playlists from data queries
library(tidyverse)
# Playlist class
VideoPlaylist <- R6Class("VideoPlaylist",
public = list(
name = NULL,
description = NULL,
clips = NULL,
created_date = NULL,
initialize = function(name, description = NULL) {
self$name <- name
self$description <- description
self$clips <- list()
self$created_date <- Sys.Date()
},
add_clip = function(clip) {
self$clips[[length(self$clips) + 1]] <- clip
},
# Sort clips by timestamp
sort_chronologically = function() {
times <- sapply(self$clips, function(c) c$start_time)
self$clips <- self$clips[order(times)]
},
# Get total duration
total_duration = function() {
sum(sapply(self$clips, function(c) c$duration()))
},
# Export playlist
export_playlist = function(format = "csv") {
clips_df <- map_dfr(self$clips, function(clip) {
tibble(
clip_id = clip$clip_id,
match_id = clip$match_id,
timestamp = clip$format_timestamp(),
duration = clip$duration(),
event_type = clip$event_type,
player = clip$player,
tags = paste(clip$tags, collapse = ", ")
)
})
return(clips_df)
}
)
)
# Playlist generator from queries
create_playlist_from_query <- function(events, library, query_name, filters) {
playlist <- VideoPlaylist$new(
name = query_name,
description = paste("Auto-generated:", Sys.time())
)
# Apply filters to events
filtered_events <- events
if (!is.null(filters$player)) {
filtered_events <- filtered_events %>%
filter(player == filters$player)
}
if (!is.null(filters$event_type)) {
filtered_events <- filtered_events %>%
filter(type %in% filters$event_type)
}
if (!is.null(filters$min_xg)) {
filtered_events <- filtered_events %>%
filter(xg >= filters$min_xg)
}
if (!is.null(filters$zone)) {
filtered_events <- filtered_events %>%
filter(pitch_zone == filters$zone)
}
# Find corresponding clips
for (i in 1:nrow(filtered_events)) {
event <- filtered_events[i, ]
clip_id <- paste(event$match_id, event$event_id, sep = "_")
if (clip_id %in% names(library$clips)) {
playlist$add_clip(library$clips[[clip_id]])
}
}
playlist$sort_chronologically()
return(playlist)
}
# Example: Create scouting playlist for a player
create_scouting_playlist <- function(player_name, events, library) {
# Key events for scouting
key_types <- c("Shot", "Goal", "Assist", "Key Pass",
"Dribble", "Progressive Pass", "Tackle")
filters <- list(
player = player_name,
event_type = key_types
)
playlist <- create_playlist_from_query(
events, library,
query_name = paste("Scouting:", player_name),
filters = filters
)
return(playlist)
}
cat("Playlist creation functions defined\n")| Playlist Type | Filter Criteria | Use Case |
|---|---|---|
| Player Scouting | Player + key event types | Recruitment evaluation |
| Big Chances | Shots with xG >= 0.3 | Finishing analysis |
| Set Pieces | Corners, free kicks, throw-ins | Set piece coaching |
| Progressive Plays | Progressive passes/carries | Build-up analysis |
| Defensive Errors | High xG shots conceded | Defensive review |
| Pressing Triggers | Successful pressures + recoveries | Pressing coaching |
Video Integration in Scouting Reports
Scouting reports are most effective when statistical profiles are backed by video evidence. Each claim should be supported by clips showing the behavior in action.
# Python: Integrate video into scouting reports
from typing import Dict, List, Optional
from dataclasses import dataclass, field
import pandas as pd
@dataclass
class InsightWithEvidence:
"""Statistical insight backed by video evidence."""
category: str
description: str
stats: Dict
clips: List[VideoClip]
class ScoutingReportWithVideo:
"""Scouting report integrating stats and video."""
def __init__(self, player_name: str, position: str):
self.player_name = player_name
self.position = position
self.insights: List[InsightWithEvidence] = []
self.summary_clips: List[VideoClip] = []
def add_insight(self, category: str, description: str,
stats: Dict, clips: List[VideoClip]):
"""Add insight with video evidence."""
self.insights.append(InsightWithEvidence(
category=category,
description=description,
stats=stats,
clips=clips
))
def generate_report(self) -> str:
"""Generate formatted report."""
report = f"""
==========================================================
SCOUTING REPORT: {self.player_name} ({self.position})
==========================================================
"""
for insight in self.insights:
clip_summary = self._format_clip_list(insight.clips)
report += f"""
{insight.category.upper()}
{"-" * len(insight.category)}
{insight.description}
Key Statistics:
{self._format_stats(insight.stats)}
Video Evidence ({len(insight.clips)} clips):
{clip_summary}
"""
return report
def _format_stats(self, stats: Dict) -> str:
"""Format statistics for display."""
return "\n".join([f" {k}: {v}" for k, v in stats.items()])
def _format_clip_list(self, clips: List[VideoClip], max_clips: int = 5) -> str:
"""Format clip list for display."""
if not clips:
return " No clips available"
lines = []
for clip in clips[:max_clips]:
tags = ", ".join(clip.tags[:3])
lines.append(f" - {clip.format_timestamp()}: {clip.event_type} ({tags})")
if len(clips) > max_clips:
lines.append(f" ... and {len(clips) - max_clips} more clips")
return "\n".join(lines)
def export_with_clips(self) -> Dict:
"""Export report data with clip references."""
return {
"player": self.player_name,
"position": self.position,
"insights": [
{
"category": i.category,
"description": i.description,
"stats": i.stats,
"clip_ids": [c.clip_id for c in i.clips]
}
for i in self.insights
],
"total_clips": sum(len(i.clips) for i in self.insights)
}
class ScoutingReportGenerator:
"""Generate scouting reports with video integration."""
def __init__(self, library: VideoLibrary):
self.library = library
def generate_report(self, player_name: str,
player_stats: pd.Series,
events: pd.DataFrame) -> ScoutingReportWithVideo:
"""Generate comprehensive scouting report."""
position = player_stats.get("position", "Unknown")
report = ScoutingReportWithVideo(player_name, position)
# Finishing analysis (for attackers)
if any(p in position for p in ["FW", "MF", "AM"]):
self._add_finishing_insight(report, player_name, player_stats, events)
# Chance creation
self._add_creation_insight(report, player_name, player_stats, events)
# Defensive contribution
self._add_defensive_insight(report, player_name, player_stats, events)
# Progression
self._add_progression_insight(report, player_name, player_stats, events)
return report
def _add_finishing_insight(self, report: ScoutingReportWithVideo,
player_name: str, stats: pd.Series,
events: pd.DataFrame):
"""Add finishing analysis with video."""
shot_events = events[
(events["player"] == player_name) &
(events["type"] == "Shot")
]
clips = self.library.get_player_clips(player_name, "Shot")
goal_clips = [c for c in clips if "Goal" in c.tags]
report.add_insight(
category="Finishing",
description=f"Shows clinical finishing ability with {len(goal_clips)} goals",
stats={
"Goals": len(goal_clips),
"Total Shots": len(clips),
"xG": round(stats.get("xg", 0), 2),
"Conversion Rate": f"{len(goal_clips)/max(1,len(clips))*100:.0f}%"
},
clips=clips
)
def _add_creation_insight(self, report, player_name, stats, events):
"""Add chance creation analysis."""
creative_clips = [
c for c in self.library.get_player_clips(player_name)
if any(tag in c.tags for tag in ["Key Pass", "Assist", "Shot Assist"])
]
report.add_insight(
category="Chance Creation",
description="Creative ability to unlock defenses",
stats={
"Assists": int(stats.get("assists", 0)),
"xA": round(stats.get("xa", 0), 2),
"Key Passes": len([c for c in creative_clips if "Key Pass" in c.tags])
},
clips=creative_clips
)
def _add_defensive_insight(self, report, player_name, stats, events):
"""Add defensive contribution analysis."""
defensive_clips = [
c for c in self.library.get_player_clips(player_name)
if any(tag in c.tags for tag in ["Tackle", "Interception", "Block", "Clearance"])
]
report.add_insight(
category="Defensive Work",
description="Contribution without the ball",
stats={
"Tackles Won": int(stats.get("tackles_won", 0)),
"Interceptions": int(stats.get("interceptions", 0)),
"Pressures": int(stats.get("pressures", 0))
},
clips=defensive_clips
)
def _add_progression_insight(self, report, player_name, stats, events):
"""Add ball progression analysis."""
progressive_clips = [
c for c in self.library.get_player_clips(player_name)
if any(tag in c.tags for tag in ["Progressive Pass", "Progressive Carry", "final_third"])
]
report.add_insight(
category="Ball Progression",
description="Ability to advance the ball effectively",
stats={
"Progressive Passes": int(stats.get("progressive_passes", 0)),
"Progressive Carries": int(stats.get("progressive_carries", 0)),
"Final Third Entries": len([c for c in progressive_clips if "final_third" in c.tags])
},
clips=progressive_clips
)
print("Scouting report generator with video ready")# R: Integrate video into scouting reports
library(tidyverse)
# Scouting report with video evidence
ScoutingReportWithVideo <- R6Class("ScoutingReportWithVideo",
public = list(
player_name = NULL,
position = NULL,
stats_summary = NULL,
video_evidence = NULL,
initialize = function(player_name, position) {
self$player_name <- player_name
self$position <- position
self$stats_summary <- list()
self$video_evidence <- list()
},
# Add statistical insight with video evidence
add_insight = function(category, stat_description, clips) {
self$stats_summary[[category]] <- stat_description
self$video_evidence[[category]] <- clips
},
# Generate report
generate_report = function() {
report <- sprintf("
==========================================================
SCOUTING REPORT: %s (%s)
==========================================================
", self$player_name, self$position)
for (category in names(self$stats_summary)) {
report <- paste0(report, sprintf("
%s
%s
%s
Statistical Evidence:
%s
Video Evidence: %d clips available
",
toupper(category),
paste(rep("-", nchar(category)), collapse = ""),
self$stats_summary[[category]],
self$format_video_list(self$video_evidence[[category]])
))
}
return(report)
},
format_video_list = function(clips) {
if (length(clips) == 0) return(" No clips available")
lines <- sapply(clips[1:min(5, length(clips))], function(c) {
sprintf(" - %s: %s (%s)", c$format_timestamp(), c$event_type,
paste(c$tags[1:min(3, length(c$tags))], collapse = ", "))
})
paste(lines, collapse = "\n")
}
)
)
# Auto-generate scouting report with video
generate_scouting_report_with_video <- function(player_name, player_stats,
events, library) {
# Get player position
position <- player_stats$position[player_stats$player == player_name][1]
report <- ScoutingReportWithVideo$new(player_name, position)
# Finishing (for forwards/midfielders)
if (grepl("FW|MF", position)) {
shot_clips <- library$get_player_clips(player_name, "Shot")
goal_clips <- Filter(function(c) "Goal" %in% c$tags, shot_clips)
finishing_desc <- sprintf(
"Goals: %d | xG: %.1f | Conversion: %.0f%%",
length(goal_clips),
sum(sapply(shot_clips, function(c) c$xg %||% 0)),
length(goal_clips) / max(1, length(shot_clips)) * 100
)
report$add_insight("Finishing", finishing_desc, shot_clips)
}
# Chance creation
creative_clips <- library$search_by_tags(c("Key Pass", "Assist", "Shot Assist"))
creative_clips <- Filter(function(c) c$player == player_name, creative_clips)
if (length(creative_clips) > 0) {
creation_desc <- sprintf(
"Key Passes: %d | Assists: %d",
sum(sapply(creative_clips, function(c) "Key Pass" %in% c$tags)),
sum(sapply(creative_clips, function(c) "Assist" %in% c$tags))
)
report$add_insight("Chance Creation", creation_desc, creative_clips)
}
return(report)
}Video-Data Workflows for Coaching
Coaches need different video packages than scouts. Match preparation, halftime feedback, and individual player development all require tailored video-data workflows.
# Python: Coaching-specific video workflows
from typing import Dict, List
import pandas as pd
from dataclasses import dataclass
@dataclass
class VideoPackage:
"""Collection of themed video content for coaching."""
title: str
description: str
sections: Dict[str, Dict]
class CoachingVideoWorkflows:
"""Video workflows tailored for coaching staff."""
def __init__(self, library: VideoLibrary):
self.library = library
def create_prematch_package(self, opponent: str,
opponent_events: pd.DataFrame) -> VideoPackage:
"""Create pre-match video package for coaches."""
package = VideoPackage(
title=f"Pre-Match Analysis: vs {opponent}",
description="Opponent analysis video package",
sections={}
)
# Attacking threats
attacking = opponent_events[
(opponent_events["team"] == opponent) &
(opponent_events["type"].isin(["Shot", "Key Pass", "Cross"]))
]
matches = opponent_events["match_id"].nunique()
package.sections["attacking"] = {
"title": "Attacking Threats",
"stats": {
"shots_per_game": len(attacking) / max(1, matches),
"xg_per_game": attacking["xg"].sum() / max(1, matches)
},
"clips": self.library.search_by_tags(["Shot", opponent]),
"coaching_points": [
"Watch for runs in behind from #9",
"Left winger likes to cut inside",
"Dangerous from set pieces"
]
}
# Set pieces
set_pieces = opponent_events[
(opponent_events["team"] == opponent) &
(opponent_events["type"].isin(["Corner", "Free Kick"]))
]
package.sections["set_pieces"] = {
"title": "Set Piece Routines",
"stats": {
"corners_per_game": len(set_pieces[set_pieces["type"] == "Corner"]) / max(1, matches)
},
"clips": self.library.search_by_tags(["set_piece", opponent]),
"coaching_points": [
"Near post corner routine",
"Short corner option"
]
}
# Defensive weaknesses
against = opponent_events[
(opponent_events["team"] != opponent) &
(opponent_events["type"] == "Shot")
]
package.sections["weaknesses"] = {
"title": "Exploitable Weaknesses",
"stats": {
"xga_per_game": against["xg"].sum() / max(1, matches)
},
"clips": self.library.search_by_tags(["Goal", f"vs_{opponent}"]),
"coaching_points": [
"High line vulnerable to balls over the top",
"Slow transition from attack to defense"
]
}
return package
def create_halftime_clips(self, match_events: pd.DataFrame,
our_team: str) -> Dict:
"""Generate halftime video clips for tactical adjustments."""
first_half = match_events[match_events["minute"] <= 45]
return {
"our_chances": {
"title": "Our Best Chances",
"clips": [
c for c in self.library.get_match_clips(match_events["match_id"].iloc[0])
if "Shot" in c.tags and c.start_time < 45*60
]
},
"chances_conceded": {
"title": "Chances Conceded",
"clips": [
c for c in self.library.get_match_clips(match_events["match_id"].iloc[0])
if "Shot" in c.tags and our_team not in str(c.tags) and c.start_time < 45*60
]
},
"pressing": {
"title": "Pressing Actions",
"clips": self.library.search_by_tags(["Pressure", "Ball Recovery"])[:5]
}
}
def create_player_feedback(self, player_name: str,
match_id: str,
events: pd.DataFrame) -> Dict:
"""Create individual player feedback package."""
player_events = events[
(events["match_id"] == match_id) &
(events["player"] == player_name)
]
player_clips = self.library.get_player_clips(player_name)
match_clips = [c for c in player_clips if c.match_id == match_id]
# Categorize clips
positive_tags = {"Goal", "Assist", "Tackle Won", "Interception", "Success"}
negative_tags = {"Miss", "Dispossession", "Error", "Lost"}
return {
"player": player_name,
"match_id": match_id,
"positives": {
"title": "Excellent Moments",
"clips": [c for c in match_clips
if any(t in positive_tags for t in c.tags)]
},
"improvements": {
"title": "Areas to Work On",
"clips": [c for c in match_clips
if any(t in negative_tags for t in c.tags)]
},
"summary": {
"total_actions": len(player_events),
"successful": len(player_events[player_events["outcome"] == "Success"]),
"video_clips": len(match_clips)
}
}
def create_weekly_development_package(self, player_name: str,
events: pd.DataFrame,
focus_area: str) -> Dict:
"""Create weekly development video package for a player."""
focus_tags = {
"finishing": ["Shot", "Goal", "big_chance"],
"passing": ["Progressive Pass", "Key Pass", "Through Ball"],
"defending": ["Tackle", "Interception", "Block"],
"dribbling": ["Dribble", "Carry", "Take On"]
}
clips = self.library.search_by_tags(focus_tags.get(focus_area, []))
player_clips = [c for c in clips if c.player == player_name]
return {
"player": player_name,
"focus": focus_area,
"your_clips": player_clips,
"example_clips": clips[:10], # Best examples from anyone
"drills_focus": f"This week focus on {focus_area}",
"clip_count": len(player_clips)
}
print("Coaching video workflows ready")# R: Coaching-specific video workflows
library(tidyverse)
# Pre-match video package for coaches
create_prematch_video_package <- function(opponent, opponent_events, library) {
package <- list()
# Opponent attacking patterns
attacking_events <- opponent_events %>%
filter(team == opponent, type %in% c("Shot", "Key Pass", "Cross"))
package$attacking <- list(
description = "Opponent Attacking Threats",
stats = list(
shots_per_game = nrow(attacking_events) / n_distinct(attacking_events$match_id),
xg_per_game = sum(attacking_events$xg, na.rm = TRUE) /
n_distinct(attacking_events$match_id)
),
clips = library$search_by_tags(c("Shot", opponent, "big_chance"))
)
# Set pieces
set_piece_events <- opponent_events %>%
filter(team == opponent, type %in% c("Corner", "Free Kick"))
package$set_pieces <- list(
description = "Opponent Set Piece Routines",
stats = list(
corners_per_game = sum(set_piece_events$type == "Corner") /
n_distinct(set_piece_events$match_id),
set_piece_goals = sum(set_piece_events$resulted_in_goal, na.rm = TRUE)
),
clips = library$search_by_tags(c("set_piece", opponent))
)
# Defensive vulnerabilities
goals_conceded <- opponent_events %>%
filter(team != opponent, type == "Goal")
package$weaknesses <- list(
description = "Opponent Defensive Weaknesses",
stats = list(
goals_conceded = nrow(goals_conceded),
xga_per_game = sum(opponent_events$xg[opponent_events$team != opponent],
na.rm = TRUE) / n_distinct(opponent_events$match_id)
),
clips = library$search_by_tags(c("Goal", "vs_" %+% opponent))
)
return(package)
}
# Post-match individual feedback
create_player_feedback_clips <- function(player_name, match_id, events, library) {
player_events <- events %>%
filter(match_id == !!match_id, player == player_name)
feedback <- list()
# Positive moments
positive_events <- player_events %>%
filter(outcome %in% c("Success", "Goal", "Assist") |
type %in% c("Goal", "Assist", "Tackle Won"))
feedback$positives <- list(
title = "Good Moments",
clips = map(1:nrow(positive_events), function(i) {
event <- positive_events[i, ]
clip_id <- paste(match_id, event$event_id, sep = "_")
library$clips[[clip_id]]
}) %>% compact()
)
# Areas to improve
negative_events <- player_events %>%
filter(outcome %in% c("Fail", "Lost", "Miss") |
type %in% c("Dispossession", "Error"))
feedback$improvements <- list(
title = "Areas to Improve",
clips = map(1:nrow(negative_events), function(i) {
event <- negative_events[i, ]
clip_id <- paste(match_id, event$event_id, sep = "_")
library$clips[[clip_id]]
}) %>% compact()
)
return(feedback)
}Practice Exercises
Using StatsBomb open data, build a complete video tagging system. Convert all events from a match into video clips with automatic tags based on event type, location, and outcome.
Build a playlist generator that takes a player name and creates separate playlists for: finishing, chance creation, defensive actions, and aerial duels. Include filtering by time period and competition.
Design a user interface (wireframe or prototype) for coaches to access video clips. Include: search by player/event, playlist creation, annotation tools, and sharing capabilities.
Summary
Key Takeaways
- Data + Video: Data tells what happened; video shows how and why
- Automated tagging: Event data provides timestamps for automatic clip generation
- Query-based playlists: Create custom video playlists from statistical queries
- Evidence-based reports: Back statistical claims with video evidence
- Tailored workflows: Different users need different video packages
Key Components
- Video clip data model with timestamps and tags
- Automated tagging from event data
- Playlist generation from filters
- Scouting reports with embedded evidence
- Coaching-specific video packages