Chapter 60

Capstone - Complete Analytics System

Intermediate 30 min read 5 sections 10 code examples
0 of 60 chapters completed (0%)
Learning Objectives
  • Understand the relationship between video and data analysis
  • Build automated video tagging systems using event data
  • Create video playlists from statistical queries
  • Integrate video clips into scouting reports
  • Design video-data workflows for coaching staff

Data tells you what happened; video shows you how and why. The most effective football analytics combines statistical insights with video evidence. This chapter explores how to bridge the gap between numbers and footage, creating workflows that leverage both for maximum impact.

The Video-Data Connection

Modern football clubs have access to vast amounts of video footage—every match, every training session, every youth game. The challenge isn't access to video; it's finding the right moments efficiently. Data analysis provides the filter; video provides the context.

What Data Tells Us
  • What events occurred
  • Where on the pitch
  • When in the match
  • Statistical outcomes (xG, etc.)
  • Patterns across matches
What Video Shows Us
  • How actions were executed
  • Body positioning and technique
  • Off-ball movements
  • Decision-making process
  • Tactical context
video_framework.R / video_framework.py
# Python: Video-data integration framework
from dataclasses import dataclass, field
from typing import Dict, List, Optional
from datetime import timedelta
import pandas as pd

@dataclass
class VideoClip:
    """Represents a single video clip linked to event data."""
    clip_id: str
    match_id: str
    start_time: float  # seconds from match start
    end_time: float
    event_type: Optional[str] = None
    player: Optional[str] = None
    tags: List[str] = field(default_factory=list)
    video_path: Optional[str] = None
    thumbnail_path: Optional[str] = None

    def format_timestamp(self) -> str:
        """Format start time as MM:SS."""
        minutes = int(self.start_time // 60)
        seconds = int(self.start_time % 60)
        return f"{minutes:02d}:{seconds:02d}"

    @property
    def duration(self) -> float:
        """Calculate clip duration in seconds."""
        return self.end_time - self.start_time

    def add_tag(self, tag: str):
        """Add a tag to the clip."""
        if tag not in self.tags:
            self.tags.append(tag)

    def to_dict(self) -> Dict:
        """Convert to dictionary."""
        return {
            "clip_id": self.clip_id,
            "match_id": self.match_id,
            "timestamp": self.format_timestamp(),
            "duration": self.duration,
            "event_type": self.event_type,
            "player": self.player,
            "tags": self.tags
        }

class VideoLibrary:
    """Manages video clips and provides search functionality."""

    def __init__(self):
        self.clips: Dict[str, VideoClip] = {}
        self.matches: pd.DataFrame = pd.DataFrame()

    def add_clip(self, clip: VideoClip):
        """Add a clip to the library."""
        self.clips[clip.clip_id] = clip

    def search_by_tags(self, tags: List[str],
                       match_all: bool = False) -> List[VideoClip]:
        """Search clips by tags."""
        results = []
        for clip in self.clips.values():
            if match_all:
                if all(tag in clip.tags for tag in tags):
                    results.append(clip)
            else:
                if any(tag in clip.tags for tag in tags):
                    results.append(clip)
        return results

    def get_player_clips(self, player_name: str,
                         event_type: Optional[str] = None) -> List[VideoClip]:
        """Get all clips for a specific player."""
        results = []
        for clip in self.clips.values():
            if clip.player == player_name:
                if event_type is None or clip.event_type == event_type:
                    results.append(clip)
        return results

    def get_match_clips(self, match_id: str) -> List[VideoClip]:
        """Get all clips from a specific match."""
        return [c for c in self.clips.values() if c.match_id == match_id]

    def to_dataframe(self) -> pd.DataFrame:
        """Convert library to DataFrame."""
        return pd.DataFrame([c.to_dict() for c in self.clips.values()])

# Initialize library
library = VideoLibrary()
print("Video library framework initialized")
# R: Video-data integration framework
library(tidyverse)
library(R6)

# Video Clip Class
VideoClip <- R6Class("VideoClip",
  public = list(
    clip_id = NULL,
    match_id = NULL,
    start_time = NULL,  # seconds from match start
    end_time = NULL,
    event_type = NULL,
    player = NULL,
    tags = NULL,
    video_path = NULL,

    initialize = function(clip_id, match_id, start_time, end_time,
                         event_type = NULL, player = NULL, tags = NULL) {
      self$clip_id <- clip_id
      self$match_id <- match_id
      self$start_time <- start_time
      self$end_time <- end_time
      self$event_type <- event_type
      self$player <- player
      self$tags <- tags %||% character()
    },

    # Format timestamp for display
    format_timestamp = function() {
      start_min <- floor(self$start_time / 60)
      start_sec <- self$start_time %% 60
      sprintf("%02d:%02d", start_min, start_sec)
    },

    # Calculate duration
    duration = function() {
      self$end_time - self$start_time
    },

    add_tag = function(tag) {
      self$tags <- unique(c(self$tags, tag))
    }
  )
)

# Video Library Manager
VideoLibrary <- R6Class("VideoLibrary",
  public = list(
    clips = NULL,
    matches = NULL,

    initialize = function() {
      self$clips <- list()
      self$matches <- tibble()
    },

    # Add clip to library
    add_clip = function(clip) {
      self$clips[[clip$clip_id]] <- clip
    },

    # Search clips by tags
    search_by_tags = function(tags, match_all = FALSE) {
      results <- list()

      for (clip in self$clips) {
        if (match_all) {
          if (all(tags %in% clip$tags)) {
            results[[clip$clip_id]] <- clip
          }
        } else {
          if (any(tags %in% clip$tags)) {
            results[[clip$clip_id]] <- clip
          }
        }
      }

      return(results)
    },

    # Get clips for a player
    get_player_clips = function(player_name, event_type = NULL) {
      results <- list()

      for (clip in self$clips) {
        if (clip$player == player_name) {
          if (is.null(event_type) || clip$event_type == event_type) {
            results[[clip$clip_id]] <- clip
          }
        }
      }

      return(results)
    }
  )
)

cat("Video library framework initialized\n")

Automated Video Tagging

Event data provides timestamps that can automatically generate video clips. By linking each event to its corresponding video moment, we can instantly find relevant footage without manual tagging.

video_tagging.R / video_tagging.py
# Python: Automated video tagging from event data
import pandas as pd
from typing import List, Dict, Optional
from dataclasses import dataclass

class VideoTagger:
    """Automatically generate video clips from event data."""

    def __init__(self, buffer_before: float = 5.0,
                 buffer_after: float = 3.0):
        self.buffer_before = buffer_before
        self.buffer_after = buffer_after

    def events_to_clips(self, events_df: pd.DataFrame,
                        match_id: str) -> List[VideoClip]:
        """Convert event data to video clips."""
        clips = []

        for _, event in events_df.iterrows():
            # Calculate event time in seconds
            event_seconds = event["minute"] * 60 + event.get("second", 0)

            # Create clip
            clip = VideoClip(
                clip_id=f"{match_id}_{event['event_id']}",
                match_id=match_id,
                start_time=max(0, event_seconds - self.buffer_before),
                end_time=event_seconds + self.buffer_after,
                event_type=event.get("type"),
                player=event.get("player"),
                tags=self._generate_auto_tags(event)
            )

            clips.append(clip)

        return clips

    def _generate_auto_tags(self, event: pd.Series) -> List[str]:
        """Generate automatic tags based on event properties."""
        tags = []

        # Event type
        if pd.notna(event.get("type")):
            tags.append(event["type"])

        # Outcome
        if pd.notna(event.get("outcome")):
            tags.append(event["outcome"])

        # Location zone
        if pd.notna(event.get("x")) and pd.notna(event.get("y")):
            zone = self._classify_pitch_zone(event["x"], event["y"])
            tags.append(zone)

        # Pressure
        if event.get("under_pressure", False):
            tags.append("under_pressure")

        # Body part
        if pd.notna(event.get("body_part")):
            tags.append(event["body_part"])

        # xG classification for shots
        if event.get("type") == "Shot" and pd.notna(event.get("xg")):
            xg = event["xg"]
            if xg >= 0.3:
                tags.append("big_chance")
            elif xg >= 0.1:
                tags.append("good_chance")
            else:
                tags.append("low_xg_shot")

        return list(set(tags))

    def _classify_pitch_zone(self, x: float, y: float) -> str:
        """Classify location into pitch zone."""
        # Assuming 120x80 pitch
        if x >= 102:
            zone = "box"
        elif x >= 80:
            zone = "final_third"
        elif x >= 40:
            zone = "middle_third"
        else:
            zone = "defensive_third"

        if y <= 26:
            side = "left"
        elif y >= 54:
            side = "right"
        else:
            side = "central"

        return f"{zone}_{side}"

# Extended tagging for specific analysis types
class AnalysisSpecificTagger(VideoTagger):
    """Extended tagger for specific analysis needs."""

    def tag_pressing_events(self, events_df: pd.DataFrame,
                            match_id: str) -> List[VideoClip]:
        """Tag pressing and counterpressing events."""
        pressing_events = events_df[
            events_df["type"].isin(["Pressure", "Ball Recovery", "Tackle"])
        ]

        clips = self.events_to_clips(pressing_events, match_id)

        # Add pressing-specific tags
        for clip in clips:
            clip.add_tag("pressing_analysis")
            # Check if counterpress (within 5 sec of losing ball)
            # Would need possession change data

        return clips

    def tag_set_pieces(self, events_df: pd.DataFrame,
                       match_id: str) -> List[VideoClip]:
        """Tag set piece events with extended context."""
        set_pieces = events_df[
            events_df["type"].isin(["Corner", "Free Kick", "Throw-in"])
        ]

        clips = []
        for _, event in set_pieces.iterrows():
            event_seconds = event["minute"] * 60 + event.get("second", 0)

            # Longer buffer for set pieces
            clip = VideoClip(
                clip_id=f"{match_id}_{event['event_id']}_setpiece",
                match_id=match_id,
                start_time=max(0, event_seconds - 3),
                end_time=event_seconds + 15,  # Longer to capture outcome
                event_type=event.get("type"),
                player=event.get("player"),
                tags=["set_piece", event.get("type", "")]
            )
            clips.append(clip)

        return clips

tagger = VideoTagger()
print("Video tagger initialized")
# R: Automated video tagging from event data
library(tidyverse)

# Convert event data to video clips
events_to_clips <- function(events_df, match_id, buffer_before = 5, buffer_after = 3) {

  clips <- list()

  for (i in 1:nrow(events_df)) {
    event <- events_df[i, ]

    # Convert match minute to seconds
    event_seconds <- event$minute * 60 + (event$second %||% 0)

    # Create clip with buffer
    clip <- VideoClip$new(
      clip_id = paste(match_id, event$event_id, sep = "_"),
      match_id = match_id,
      start_time = max(0, event_seconds - buffer_before),
      end_time = event_seconds + buffer_after,
      event_type = event$type,
      player = event$player
    )

    # Add automatic tags based on event type
    clip$tags <- generate_auto_tags(event)

    clips[[clip$clip_id]] <- clip
  }


  return(clips)
}

# Generate automatic tags based on event properties
generate_auto_tags <- function(event) {
  tags <- character()

  # Event type tags
  tags <- c(tags, event$type)

  # Outcome tags
  if (!is.null(event$outcome)) {
    tags <- c(tags, event$outcome)
  }

  # Location tags
  if (!is.null(event$x) && !is.null(event$y)) {
    zone <- classify_pitch_zone(event$x, event$y)
    tags <- c(tags, zone)
  }

  # Special situation tags
  if (!is.null(event$under_pressure) && event$under_pressure) {
    tags <- c(tags, "under_pressure")
  }

  if (!is.null(event$body_part)) {
    tags <- c(tags, event$body_part)
  }

  # xG tags for shots
  if (event$type == "Shot" && !is.null(event$xg)) {
    if (event$xg >= 0.3) {
      tags <- c(tags, "big_chance")
    } else if (event$xg >= 0.1) {
      tags <- c(tags, "good_chance")
    } else {
      tags <- c(tags, "low_xg_shot")
    }
  }

  return(unique(tags))
}

# Classify pitch zone
classify_pitch_zone <- function(x, y) {
  # Assuming 120x80 pitch dimensions
  zone <- case_when(
    x >= 102 ~ "box",
    x >= 80 ~ "final_third",
    x >= 40 ~ "middle_third",
    TRUE ~ "defensive_third"
  )

  side <- case_when(
    y <= 26 ~ "left",
    y >= 54 ~ "right",
    TRUE ~ "central"
  )

  paste(zone, side, sep = "_")
}

cat("Auto-tagging functions defined\n")

Creating Video Playlists from Data

The real power of video-data integration is the ability to create custom playlists from statistical queries. "Show me all progressive passes into the box by our left-back" becomes a few lines of code.

video_playlists.R / video_playlists.py
# Python: Create video playlists from data queries
import pandas as pd
from typing import List, Dict, Optional, Callable
from dataclasses import dataclass, field
from datetime import datetime

@dataclass
class VideoPlaylist:
    """Collection of video clips for a specific purpose."""
    name: str
    description: str = ""
    clips: List[VideoClip] = field(default_factory=list)
    created_date: datetime = field(default_factory=datetime.now)

    def add_clip(self, clip: VideoClip):
        """Add a clip to the playlist."""
        self.clips.append(clip)

    def add_clips(self, clips: List[VideoClip]):
        """Add multiple clips."""
        self.clips.extend(clips)

    def sort_chronologically(self):
        """Sort clips by start time."""
        self.clips.sort(key=lambda c: (c.match_id, c.start_time))

    @property
    def total_duration(self) -> float:
        """Total duration of all clips in seconds."""
        return sum(c.duration for c in self.clips)

    @property
    def clip_count(self) -> int:
        """Number of clips in playlist."""
        return len(self.clips)

    def to_dataframe(self) -> pd.DataFrame:
        """Export playlist as DataFrame."""
        return pd.DataFrame([c.to_dict() for c in self.clips])

    def export_for_video_software(self) -> List[Dict]:
        """Export in format for video editing software."""
        return [
            {
                "clip_name": c.clip_id,
                "source_file": c.video_path,
                "in_point": c.start_time,
                "out_point": c.end_time,
                "duration": c.duration,
                "markers": c.tags
            }
            for c in self.clips
        ]

class PlaylistGenerator:
    """Generate playlists from data queries."""

    def __init__(self, library: VideoLibrary):
        self.library = library

    def create_from_query(self, events: pd.DataFrame,
                          name: str,
                          filters: Dict) -> VideoPlaylist:
        """Create playlist from filtered events."""
        playlist = VideoPlaylist(
            name=name,
            description=f"Auto-generated: {datetime.now()}"
        )

        # Apply filters
        filtered = events.copy()

        if "player" in filters and filters["player"]:
            filtered = filtered[filtered["player"] == filters["player"]]

        if "event_types" in filters and filters["event_types"]:
            filtered = filtered[filtered["type"].isin(filters["event_types"])]

        if "min_xg" in filters and filters["min_xg"]:
            filtered = filtered[filtered["xg"] >= filters["min_xg"]]

        if "zone" in filters and filters["zone"]:
            filtered = filtered[filtered["pitch_zone"] == filters["zone"]]

        if "team" in filters and filters["team"]:
            filtered = filtered[filtered["team"] == filters["team"]]

        # Find corresponding clips
        for _, event in filtered.iterrows():
            clip_id = f"{event['match_id']}_{event['event_id']}"
            if clip_id in self.library.clips:
                playlist.add_clip(self.library.clips[clip_id])

        playlist.sort_chronologically()
        return playlist

    def create_scouting_playlist(self, player_name: str,
                                  events: pd.DataFrame) -> VideoPlaylist:
        """Create comprehensive scouting playlist for a player."""
        key_events = ["Shot", "Goal", "Assist", "Key Pass",
                      "Dribble", "Progressive Pass", "Tackle", "Interception"]

        return self.create_from_query(
            events,
            name=f"Scouting: {player_name}",
            filters={
                "player": player_name,
                "event_types": key_events
            }
        )

    def create_big_chances_playlist(self, events: pd.DataFrame,
                                     team: Optional[str] = None) -> VideoPlaylist:
        """Create playlist of big chances (high xG shots)."""
        filters = {
            "event_types": ["Shot"],
            "min_xg": 0.3
        }
        if team:
            filters["team"] = team

        return self.create_from_query(
            events,
            name=f"Big Chances{f' - {team}' if team else ''}",
            filters=filters
        )

    def create_defensive_errors_playlist(self, events: pd.DataFrame,
                                          team: str) -> VideoPlaylist:
        """Create playlist of defensive errors and goals conceded."""
        # Get shots conceded with high xG
        shots_against = events[
            (events["type"] == "Shot") &
            (events["team"] != team) &
            (events["xg"] >= 0.15)
        ]

        playlist = VideoPlaylist(
            name=f"Defensive Review: {team}",
            description="High-quality chances conceded"
        )

        for _, event in shots_against.iterrows():
            clip_id = f"{event['match_id']}_{event['event_id']}"
            if clip_id in self.library.clips:
                playlist.add_clip(self.library.clips[clip_id])

        return playlist

print("Playlist generator ready")
# R: Create video playlists from data queries
library(tidyverse)

# Playlist class
VideoPlaylist <- R6Class("VideoPlaylist",
  public = list(
    name = NULL,
    description = NULL,
    clips = NULL,
    created_date = NULL,

    initialize = function(name, description = NULL) {
      self$name <- name
      self$description <- description
      self$clips <- list()
      self$created_date <- Sys.Date()
    },

    add_clip = function(clip) {
      self$clips[[length(self$clips) + 1]] <- clip
    },

    # Sort clips by timestamp
    sort_chronologically = function() {
      times <- sapply(self$clips, function(c) c$start_time)
      self$clips <- self$clips[order(times)]
    },

    # Get total duration
    total_duration = function() {
      sum(sapply(self$clips, function(c) c$duration()))
    },

    # Export playlist
    export_playlist = function(format = "csv") {
      clips_df <- map_dfr(self$clips, function(clip) {
        tibble(
          clip_id = clip$clip_id,
          match_id = clip$match_id,
          timestamp = clip$format_timestamp(),
          duration = clip$duration(),
          event_type = clip$event_type,
          player = clip$player,
          tags = paste(clip$tags, collapse = ", ")
        )
      })

      return(clips_df)
    }
  )
)

# Playlist generator from queries
create_playlist_from_query <- function(events, library, query_name, filters) {

  playlist <- VideoPlaylist$new(
    name = query_name,
    description = paste("Auto-generated:", Sys.time())
  )

  # Apply filters to events
  filtered_events <- events

  if (!is.null(filters$player)) {
    filtered_events <- filtered_events %>%
      filter(player == filters$player)
  }

  if (!is.null(filters$event_type)) {
    filtered_events <- filtered_events %>%
      filter(type %in% filters$event_type)
  }

  if (!is.null(filters$min_xg)) {
    filtered_events <- filtered_events %>%
      filter(xg >= filters$min_xg)
  }

  if (!is.null(filters$zone)) {
    filtered_events <- filtered_events %>%
      filter(pitch_zone == filters$zone)
  }

  # Find corresponding clips
  for (i in 1:nrow(filtered_events)) {
    event <- filtered_events[i, ]
    clip_id <- paste(event$match_id, event$event_id, sep = "_")

    if (clip_id %in% names(library$clips)) {
      playlist$add_clip(library$clips[[clip_id]])
    }
  }

  playlist$sort_chronologically()
  return(playlist)
}

# Example: Create scouting playlist for a player
create_scouting_playlist <- function(player_name, events, library) {
  # Key events for scouting
  key_types <- c("Shot", "Goal", "Assist", "Key Pass",
                 "Dribble", "Progressive Pass", "Tackle")

  filters <- list(
    player = player_name,
    event_type = key_types
  )

  playlist <- create_playlist_from_query(
    events, library,
    query_name = paste("Scouting:", player_name),
    filters = filters
  )

  return(playlist)
}

cat("Playlist creation functions defined\n")
Common Playlist Types
Playlist Type Filter Criteria Use Case
Player Scouting Player + key event types Recruitment evaluation
Big Chances Shots with xG >= 0.3 Finishing analysis
Set Pieces Corners, free kicks, throw-ins Set piece coaching
Progressive Plays Progressive passes/carries Build-up analysis
Defensive Errors High xG shots conceded Defensive review
Pressing Triggers Successful pressures + recoveries Pressing coaching

Video Integration in Scouting Reports

Scouting reports are most effective when statistical profiles are backed by video evidence. Each claim should be supported by clips showing the behavior in action.

scouting_video.R / scouting_video.py
# Python: Integrate video into scouting reports
from typing import Dict, List, Optional
from dataclasses import dataclass, field
import pandas as pd

@dataclass
class InsightWithEvidence:
    """Statistical insight backed by video evidence."""
    category: str
    description: str
    stats: Dict
    clips: List[VideoClip]

class ScoutingReportWithVideo:
    """Scouting report integrating stats and video."""

    def __init__(self, player_name: str, position: str):
        self.player_name = player_name
        self.position = position
        self.insights: List[InsightWithEvidence] = []
        self.summary_clips: List[VideoClip] = []

    def add_insight(self, category: str, description: str,
                    stats: Dict, clips: List[VideoClip]):
        """Add insight with video evidence."""
        self.insights.append(InsightWithEvidence(
            category=category,
            description=description,
            stats=stats,
            clips=clips
        ))

    def generate_report(self) -> str:
        """Generate formatted report."""
        report = f"""
==========================================================
SCOUTING REPORT: {self.player_name} ({self.position})
==========================================================
"""
        for insight in self.insights:
            clip_summary = self._format_clip_list(insight.clips)
            report += f"""
{insight.category.upper()}
{"-" * len(insight.category)}
{insight.description}

Key Statistics:
{self._format_stats(insight.stats)}

Video Evidence ({len(insight.clips)} clips):
{clip_summary}
"""
        return report

    def _format_stats(self, stats: Dict) -> str:
        """Format statistics for display."""
        return "\n".join([f"  {k}: {v}" for k, v in stats.items()])

    def _format_clip_list(self, clips: List[VideoClip], max_clips: int = 5) -> str:
        """Format clip list for display."""
        if not clips:
            return "  No clips available"

        lines = []
        for clip in clips[:max_clips]:
            tags = ", ".join(clip.tags[:3])
            lines.append(f"  - {clip.format_timestamp()}: {clip.event_type} ({tags})")

        if len(clips) > max_clips:
            lines.append(f"  ... and {len(clips) - max_clips} more clips")

        return "\n".join(lines)

    def export_with_clips(self) -> Dict:
        """Export report data with clip references."""
        return {
            "player": self.player_name,
            "position": self.position,
            "insights": [
                {
                    "category": i.category,
                    "description": i.description,
                    "stats": i.stats,
                    "clip_ids": [c.clip_id for c in i.clips]
                }
                for i in self.insights
            ],
            "total_clips": sum(len(i.clips) for i in self.insights)
        }

class ScoutingReportGenerator:
    """Generate scouting reports with video integration."""

    def __init__(self, library: VideoLibrary):
        self.library = library

    def generate_report(self, player_name: str,
                        player_stats: pd.Series,
                        events: pd.DataFrame) -> ScoutingReportWithVideo:
        """Generate comprehensive scouting report."""

        position = player_stats.get("position", "Unknown")
        report = ScoutingReportWithVideo(player_name, position)

        # Finishing analysis (for attackers)
        if any(p in position for p in ["FW", "MF", "AM"]):
            self._add_finishing_insight(report, player_name, player_stats, events)

        # Chance creation
        self._add_creation_insight(report, player_name, player_stats, events)

        # Defensive contribution
        self._add_defensive_insight(report, player_name, player_stats, events)

        # Progression
        self._add_progression_insight(report, player_name, player_stats, events)

        return report

    def _add_finishing_insight(self, report: ScoutingReportWithVideo,
                               player_name: str, stats: pd.Series,
                               events: pd.DataFrame):
        """Add finishing analysis with video."""
        shot_events = events[
            (events["player"] == player_name) &
            (events["type"] == "Shot")
        ]

        clips = self.library.get_player_clips(player_name, "Shot")
        goal_clips = [c for c in clips if "Goal" in c.tags]

        report.add_insight(
            category="Finishing",
            description=f"Shows clinical finishing ability with {len(goal_clips)} goals",
            stats={
                "Goals": len(goal_clips),
                "Total Shots": len(clips),
                "xG": round(stats.get("xg", 0), 2),
                "Conversion Rate": f"{len(goal_clips)/max(1,len(clips))*100:.0f}%"
            },
            clips=clips
        )

    def _add_creation_insight(self, report, player_name, stats, events):
        """Add chance creation analysis."""
        creative_clips = [
            c for c in self.library.get_player_clips(player_name)
            if any(tag in c.tags for tag in ["Key Pass", "Assist", "Shot Assist"])
        ]

        report.add_insight(
            category="Chance Creation",
            description="Creative ability to unlock defenses",
            stats={
                "Assists": int(stats.get("assists", 0)),
                "xA": round(stats.get("xa", 0), 2),
                "Key Passes": len([c for c in creative_clips if "Key Pass" in c.tags])
            },
            clips=creative_clips
        )

    def _add_defensive_insight(self, report, player_name, stats, events):
        """Add defensive contribution analysis."""
        defensive_clips = [
            c for c in self.library.get_player_clips(player_name)
            if any(tag in c.tags for tag in ["Tackle", "Interception", "Block", "Clearance"])
        ]

        report.add_insight(
            category="Defensive Work",
            description="Contribution without the ball",
            stats={
                "Tackles Won": int(stats.get("tackles_won", 0)),
                "Interceptions": int(stats.get("interceptions", 0)),
                "Pressures": int(stats.get("pressures", 0))
            },
            clips=defensive_clips
        )

    def _add_progression_insight(self, report, player_name, stats, events):
        """Add ball progression analysis."""
        progressive_clips = [
            c for c in self.library.get_player_clips(player_name)
            if any(tag in c.tags for tag in ["Progressive Pass", "Progressive Carry", "final_third"])
        ]

        report.add_insight(
            category="Ball Progression",
            description="Ability to advance the ball effectively",
            stats={
                "Progressive Passes": int(stats.get("progressive_passes", 0)),
                "Progressive Carries": int(stats.get("progressive_carries", 0)),
                "Final Third Entries": len([c for c in progressive_clips if "final_third" in c.tags])
            },
            clips=progressive_clips
        )

print("Scouting report generator with video ready")
# R: Integrate video into scouting reports
library(tidyverse)

# Scouting report with video evidence
ScoutingReportWithVideo <- R6Class("ScoutingReportWithVideo",
  public = list(
    player_name = NULL,
    position = NULL,
    stats_summary = NULL,
    video_evidence = NULL,

    initialize = function(player_name, position) {
      self$player_name <- player_name
      self$position <- position
      self$stats_summary <- list()
      self$video_evidence <- list()
    },

    # Add statistical insight with video evidence
    add_insight = function(category, stat_description, clips) {
      self$stats_summary[[category]] <- stat_description
      self$video_evidence[[category]] <- clips
    },

    # Generate report
    generate_report = function() {
      report <- sprintf("
==========================================================
SCOUTING REPORT: %s (%s)
==========================================================

", self$player_name, self$position)

      for (category in names(self$stats_summary)) {
        report <- paste0(report, sprintf("
%s
%s
%s

Statistical Evidence:
%s

Video Evidence: %d clips available
",
          toupper(category),
          paste(rep("-", nchar(category)), collapse = ""),
          self$stats_summary[[category]],
          self$format_video_list(self$video_evidence[[category]])
        ))
      }

      return(report)
    },

    format_video_list = function(clips) {
      if (length(clips) == 0) return("  No clips available")

      lines <- sapply(clips[1:min(5, length(clips))], function(c) {
        sprintf("  - %s: %s (%s)", c$format_timestamp(), c$event_type,
                paste(c$tags[1:min(3, length(c$tags))], collapse = ", "))
      })

      paste(lines, collapse = "\n")
    }
  )
)

# Auto-generate scouting report with video
generate_scouting_report_with_video <- function(player_name, player_stats,
                                                 events, library) {

  # Get player position
  position <- player_stats$position[player_stats$player == player_name][1]

  report <- ScoutingReportWithVideo$new(player_name, position)

  # Finishing (for forwards/midfielders)
  if (grepl("FW|MF", position)) {
    shot_clips <- library$get_player_clips(player_name, "Shot")
    goal_clips <- Filter(function(c) "Goal" %in% c$tags, shot_clips)

    finishing_desc <- sprintf(
      "Goals: %d | xG: %.1f | Conversion: %.0f%%",
      length(goal_clips),
      sum(sapply(shot_clips, function(c) c$xg %||% 0)),
      length(goal_clips) / max(1, length(shot_clips)) * 100
    )

    report$add_insight("Finishing", finishing_desc, shot_clips)
  }

  # Chance creation
  creative_clips <- library$search_by_tags(c("Key Pass", "Assist", "Shot Assist"))
  creative_clips <- Filter(function(c) c$player == player_name, creative_clips)

  if (length(creative_clips) > 0) {
    creation_desc <- sprintf(
      "Key Passes: %d | Assists: %d",
      sum(sapply(creative_clips, function(c) "Key Pass" %in% c$tags)),
      sum(sapply(creative_clips, function(c) "Assist" %in% c$tags))
    )
    report$add_insight("Chance Creation", creation_desc, creative_clips)
  }

  return(report)
}

Video-Data Workflows for Coaching

Coaches need different video packages than scouts. Match preparation, halftime feedback, and individual player development all require tailored video-data workflows.

coaching_workflows.R / coaching_workflows.py
# Python: Coaching-specific video workflows
from typing import Dict, List
import pandas as pd
from dataclasses import dataclass

@dataclass
class VideoPackage:
    """Collection of themed video content for coaching."""
    title: str
    description: str
    sections: Dict[str, Dict]

class CoachingVideoWorkflows:
    """Video workflows tailored for coaching staff."""

    def __init__(self, library: VideoLibrary):
        self.library = library

    def create_prematch_package(self, opponent: str,
                                 opponent_events: pd.DataFrame) -> VideoPackage:
        """Create pre-match video package for coaches."""

        package = VideoPackage(
            title=f"Pre-Match Analysis: vs {opponent}",
            description="Opponent analysis video package",
            sections={}
        )

        # Attacking threats
        attacking = opponent_events[
            (opponent_events["team"] == opponent) &
            (opponent_events["type"].isin(["Shot", "Key Pass", "Cross"]))
        ]

        matches = opponent_events["match_id"].nunique()
        package.sections["attacking"] = {
            "title": "Attacking Threats",
            "stats": {
                "shots_per_game": len(attacking) / max(1, matches),
                "xg_per_game": attacking["xg"].sum() / max(1, matches)
            },
            "clips": self.library.search_by_tags(["Shot", opponent]),
            "coaching_points": [
                "Watch for runs in behind from #9",
                "Left winger likes to cut inside",
                "Dangerous from set pieces"
            ]
        }

        # Set pieces
        set_pieces = opponent_events[
            (opponent_events["team"] == opponent) &
            (opponent_events["type"].isin(["Corner", "Free Kick"]))
        ]

        package.sections["set_pieces"] = {
            "title": "Set Piece Routines",
            "stats": {
                "corners_per_game": len(set_pieces[set_pieces["type"] == "Corner"]) / max(1, matches)
            },
            "clips": self.library.search_by_tags(["set_piece", opponent]),
            "coaching_points": [
                "Near post corner routine",
                "Short corner option"
            ]
        }

        # Defensive weaknesses
        against = opponent_events[
            (opponent_events["team"] != opponent) &
            (opponent_events["type"] == "Shot")
        ]

        package.sections["weaknesses"] = {
            "title": "Exploitable Weaknesses",
            "stats": {
                "xga_per_game": against["xg"].sum() / max(1, matches)
            },
            "clips": self.library.search_by_tags(["Goal", f"vs_{opponent}"]),
            "coaching_points": [
                "High line vulnerable to balls over the top",
                "Slow transition from attack to defense"
            ]
        }

        return package

    def create_halftime_clips(self, match_events: pd.DataFrame,
                               our_team: str) -> Dict:
        """Generate halftime video clips for tactical adjustments."""

        first_half = match_events[match_events["minute"] <= 45]

        return {
            "our_chances": {
                "title": "Our Best Chances",
                "clips": [
                    c for c in self.library.get_match_clips(match_events["match_id"].iloc[0])
                    if "Shot" in c.tags and c.start_time < 45*60
                ]
            },
            "chances_conceded": {
                "title": "Chances Conceded",
                "clips": [
                    c for c in self.library.get_match_clips(match_events["match_id"].iloc[0])
                    if "Shot" in c.tags and our_team not in str(c.tags) and c.start_time < 45*60
                ]
            },
            "pressing": {
                "title": "Pressing Actions",
                "clips": self.library.search_by_tags(["Pressure", "Ball Recovery"])[:5]
            }
        }

    def create_player_feedback(self, player_name: str,
                                match_id: str,
                                events: pd.DataFrame) -> Dict:
        """Create individual player feedback package."""

        player_events = events[
            (events["match_id"] == match_id) &
            (events["player"] == player_name)
        ]

        player_clips = self.library.get_player_clips(player_name)
        match_clips = [c for c in player_clips if c.match_id == match_id]

        # Categorize clips
        positive_tags = {"Goal", "Assist", "Tackle Won", "Interception", "Success"}
        negative_tags = {"Miss", "Dispossession", "Error", "Lost"}

        return {
            "player": player_name,
            "match_id": match_id,
            "positives": {
                "title": "Excellent Moments",
                "clips": [c for c in match_clips
                         if any(t in positive_tags for t in c.tags)]
            },
            "improvements": {
                "title": "Areas to Work On",
                "clips": [c for c in match_clips
                         if any(t in negative_tags for t in c.tags)]
            },
            "summary": {
                "total_actions": len(player_events),
                "successful": len(player_events[player_events["outcome"] == "Success"]),
                "video_clips": len(match_clips)
            }
        }

    def create_weekly_development_package(self, player_name: str,
                                           events: pd.DataFrame,
                                           focus_area: str) -> Dict:
        """Create weekly development video package for a player."""

        focus_tags = {
            "finishing": ["Shot", "Goal", "big_chance"],
            "passing": ["Progressive Pass", "Key Pass", "Through Ball"],
            "defending": ["Tackle", "Interception", "Block"],
            "dribbling": ["Dribble", "Carry", "Take On"]
        }

        clips = self.library.search_by_tags(focus_tags.get(focus_area, []))
        player_clips = [c for c in clips if c.player == player_name]

        return {
            "player": player_name,
            "focus": focus_area,
            "your_clips": player_clips,
            "example_clips": clips[:10],  # Best examples from anyone
            "drills_focus": f"This week focus on {focus_area}",
            "clip_count": len(player_clips)
        }

print("Coaching video workflows ready")
# R: Coaching-specific video workflows
library(tidyverse)

# Pre-match video package for coaches
create_prematch_video_package <- function(opponent, opponent_events, library) {

  package <- list()

  # Opponent attacking patterns
  attacking_events <- opponent_events %>%
    filter(team == opponent, type %in% c("Shot", "Key Pass", "Cross"))

  package$attacking <- list(
    description = "Opponent Attacking Threats",
    stats = list(
      shots_per_game = nrow(attacking_events) / n_distinct(attacking_events$match_id),
      xg_per_game = sum(attacking_events$xg, na.rm = TRUE) /
                    n_distinct(attacking_events$match_id)
    ),
    clips = library$search_by_tags(c("Shot", opponent, "big_chance"))
  )

  # Set pieces
  set_piece_events <- opponent_events %>%
    filter(team == opponent, type %in% c("Corner", "Free Kick"))

  package$set_pieces <- list(
    description = "Opponent Set Piece Routines",
    stats = list(
      corners_per_game = sum(set_piece_events$type == "Corner") /
                         n_distinct(set_piece_events$match_id),
      set_piece_goals = sum(set_piece_events$resulted_in_goal, na.rm = TRUE)
    ),
    clips = library$search_by_tags(c("set_piece", opponent))
  )

  # Defensive vulnerabilities
  goals_conceded <- opponent_events %>%
    filter(team != opponent, type == "Goal")

  package$weaknesses <- list(
    description = "Opponent Defensive Weaknesses",
    stats = list(
      goals_conceded = nrow(goals_conceded),
      xga_per_game = sum(opponent_events$xg[opponent_events$team != opponent],
                         na.rm = TRUE) / n_distinct(opponent_events$match_id)
    ),
    clips = library$search_by_tags(c("Goal", "vs_" %+% opponent))
  )

  return(package)
}

# Post-match individual feedback
create_player_feedback_clips <- function(player_name, match_id, events, library) {

  player_events <- events %>%
    filter(match_id == !!match_id, player == player_name)

  feedback <- list()

  # Positive moments
  positive_events <- player_events %>%
    filter(outcome %in% c("Success", "Goal", "Assist") |
           type %in% c("Goal", "Assist", "Tackle Won"))

  feedback$positives <- list(
    title = "Good Moments",
    clips = map(1:nrow(positive_events), function(i) {
      event <- positive_events[i, ]
      clip_id <- paste(match_id, event$event_id, sep = "_")
      library$clips[[clip_id]]
    }) %>% compact()
  )

  # Areas to improve
  negative_events <- player_events %>%
    filter(outcome %in% c("Fail", "Lost", "Miss") |
           type %in% c("Dispossession", "Error"))

  feedback$improvements <- list(
    title = "Areas to Improve",
    clips = map(1:nrow(negative_events), function(i) {
      event <- negative_events[i, ]
      clip_id <- paste(match_id, event$event_id, sep = "_")
      library$clips[[clip_id]]
    }) %>% compact()
  )

  return(feedback)
}

Practice Exercises

Exercise 1: Build a Video Tagging System

Using StatsBomb open data, build a complete video tagging system. Convert all events from a match into video clips with automatic tags based on event type, location, and outcome.

Exercise 2: Create a Scouting Playlist Generator

Build a playlist generator that takes a player name and creates separate playlists for: finishing, chance creation, defensive actions, and aerial duels. Include filtering by time period and competition.

Exercise 3: Design a Coaching Video Interface

Design a user interface (wireframe or prototype) for coaches to access video clips. Include: search by player/event, playlist creation, annotation tools, and sharing capabilities.

Summary

Key Takeaways
  • Data + Video: Data tells what happened; video shows how and why
  • Automated tagging: Event data provides timestamps for automatic clip generation
  • Query-based playlists: Create custom video playlists from statistical queries
  • Evidence-based reports: Back statistical claims with video evidence
  • Tailored workflows: Different users need different video packages
Key Components
  • Video clip data model with timestamps and tags
  • Automated tagging from event data
  • Playlist generation from filters
  • Scouting reports with embedded evidence
  • Coaching-specific video packages