Chapter 60

Capstone - Complete Analytics System

Intermediate 30 min read 5 sections 10 code examples

0 of 60 chapters completed (0%)

Computer Vision for Football

Computer vision is transforming football analytics by enabling automated analysis of broadcast footage. From detecting players and ball to estimating poses and tracking movements, these techniques unlock insights that were previously impossible to obtain at scale.

Learning Objectives

Understand the fundamentals of computer vision for sports
Implement object detection for players and ball
Apply homography transformations to map video to pitch coordinates
Use pose estimation to analyze player movements
Build basic tracking pipelines with OpenCV
Work with pre-trained models and open-source tools

Prerequisites

This chapter requires familiarity with Python and basic image processing concepts. We'll use OpenCV, PyTorch/TensorFlow, and specialized sports CV libraries.

Computer Vision Fundamentals

Computer vision enables machines to interpret visual information. For football, this means converting broadcast video into structured data about player positions, movements, and actions.

Object Detection

Locate and identify objects (players, ball, referees) in images with bounding boxes.

Models: YOLO, Faster R-CNN, SSD

Object Tracking

Follow detected objects across video frames to build trajectories over time.

Methods: SORT, DeepSORT, ByteTrack

Pose Estimation

Detect body keypoints to analyze player poses, movements, and actions.

Models: OpenPose, MediaPipe, HRNet

cv_fundamentals

# Python: Basic image processing with OpenCV
import cv2
import numpy as np
import matplotlib.pyplot as plt

# Load a football frame
img = cv2.imread("match_frame.jpg")
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

print(f"Dimensions: {img.shape[1]} x {img.shape[0]}")

# Basic preprocessing
# Resize to standard dimensions
resized = cv2.resize(img, (1280, 720))

# Adjust brightness and contrast
alpha = 1.2  # Contrast
beta = 10    # Brightness
adjusted = cv2.convertScaleAbs(resized, alpha=alpha, beta=beta)

# Edge detection for pitch lines
gray = cv2.cvtColor(resized, cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(gray, 50, 150)

# Hough line detection for pitch lines
lines = cv2.HoughLinesP(edges, 1, np.pi/180, threshold=100,
                        minLineLength=100, maxLineGap=10)

# Draw detected lines
line_img = resized.copy()
if lines is not None:
    for line in lines:
        x1, y1, x2, y2 = line[0]
        cv2.line(line_img, (x1, y1), (x2, y2), (0, 255, 0), 2)

# Display results
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
axes[0].imshow(cv2.cvtColor(resized, cv2.COLOR_BGR2RGB))
axes[0].set_title("Original")
axes[1].imshow(edges, cmap="gray")
axes[1].set_title("Edge Detection")
axes[2].imshow(cv2.cvtColor(line_img, cv2.COLOR_BGR2RGB))
axes[2].set_title("Pitch Lines")
plt.tight_layout()
plt.show()
# R: Basic image processing with magick
library(magick)
library(tidyverse)

# Load a football frame
img <- image_read("match_frame.jpg")

# Basic image info
info <- image_info(img)
cat("Dimensions:", info$width, "x", info$height, "\n")

# Basic preprocessing
processed <- img %>%
  image_resize("1280x720") %>%  # Standardize size
  image_modulate(brightness = 110) %>%  # Adjust brightness
  image_contrast(sharpen = 1)  # Enhance contrast

# Edge detection (for pitch line detection)
edges <- img %>%
  image_convert(colorspace = "gray") %>%
  image_edge(radius = 2)

# Save processed images
image_write(processed, "processed_frame.jpg")
image_write(edges, "edges_frame.jpg")

# Note: For advanced CV in R, use reticulate to call Python
# library(reticulate)
# cv2 <- import("cv2")

Output

Dimensions: 1920 x 1080

Player and Ball Detection

Object detection identifies players, the ball, and referees in video frames. Modern deep learning models like YOLO (You Only Look Once) provide real-time detection capabilities.

object_detection

# Python: Player detection with YOLOv8
from ultralytics import YOLO
import cv2
import numpy as np

# Load pre-trained YOLO model
model = YOLO("yolov8n.pt")  # Nano model for speed

# Run detection
results = model("match_frame.jpg")

# Process results
for result in results:
    boxes = result.boxes

    # Filter for person class (0) and sports ball (32)
    for box in boxes:
        cls = int(box.cls[0])
        conf = float(box.conf[0])
        x1, y1, x2, y2 = box.xyxy[0].cpu().numpy()

        if cls == 0:  # Person
            print(f"Player: ({x1:.0f}, {y1:.0f}) to ({x2:.0f}, {y2:.0f}), conf: {conf:.2f}")
        elif cls == 32:  # Sports ball
            print(f"Ball: ({x1:.0f}, {y1:.0f}) to ({x2:.0f}, {y2:.0f}), conf: {conf:.2f}")

# Visualize detections
annotated_frame = results[0].plot()
cv2.imwrite("detections.jpg", annotated_frame)

print(f"\nTotal people detected: {len([b for b in boxes if int(b.cls[0]) == 0])}")
# R: Object detection via reticulate
library(reticulate)

# Use Python YOLO implementation
yolo <- import("ultralytics")
cv2 <- import("cv2")
np <- import("numpy")

# Load pre-trained model
model <- yolo$YOLO("yolov8n.pt")  # Nano model for speed

# Run detection on frame
results <- model("match_frame.jpg")

# Extract detections
detections <- results[[1]]$boxes$data$cpu()$numpy()

# Filter for person class (class 0 in COCO)
persons <- detections[detections[, 6] == 0, ]

cat("Detected", nrow(persons), "people in frame\n")

# Access bounding boxes
for (i in seq_len(nrow(persons))) {
  x1 <- persons[i, 1]
  y1 <- persons[i, 2]
  x2 <- persons[i, 3]
  y2 <- persons[i, 4]
  conf <- persons[i, 5]
  cat(sprintf("Person %d: (%.0f, %.0f) to (%.0f, %.0f), conf: %.2f\n",
              i, x1, y1, x2, y2, conf))
}

Output

Player: (234, 156) to (289, 298), conf: 0.92
Player: (567, 189) to (612, 334), conf: 0.89
Ball: (445, 412) to (462, 428), conf: 0.76

Total people detected: 22

Team Classification

After detecting players, we need to classify them by team. This is typically done using jersey color clustering or dedicated classification models.

team_classification

# Python: Team classification by jersey color
import cv2
import numpy as np
from sklearn.cluster import KMeans

def extract_jersey_color(img, bbox):
    """Extract dominant jersey color from player bounding box."""
    x1, y1, x2, y2 = map(int, bbox)
    height = y2 - y1

    # Focus on torso region (upper-middle of bounding box)
    torso_y1 = int(y1 + height * 0.2)
    torso_y2 = int(y1 + height * 0.5)

    # Crop jersey region
    jersey_region = img[torso_y1:torso_y2, x1:x2]

    if jersey_region.size == 0:
        return None

    # Reshape to list of pixels
    pixels = jersey_region.reshape(-1, 3)

    # K-means to find dominant color
    kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)
    kmeans.fit(pixels)

    # Get most common cluster center
    labels, counts = np.unique(kmeans.labels_, return_counts=True)
    dominant_idx = labels[np.argmax(counts)]
    dominant_color = kmeans.cluster_centers_[dominant_idx]

    return dominant_color

def classify_teams(player_colors):
    """Classify players into teams based on jersey colors."""
    colors = np.array([c for c in player_colors if c is not None])

    if len(colors) < 3:
        return None

    # Cluster into 3 groups (2 teams + refs/others)
    kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)
    labels = kmeans.fit_predict(colors)

    return labels, kmeans.cluster_centers_

# Example usage
img = cv2.imread("match_frame.jpg")

# Assume we have detections from YOLO
player_bboxes = [
    [234, 156, 289, 298],
    [567, 189, 612, 334],
    # ... more players
]

# Extract colors for all players
colors = [extract_jersey_color(img, bbox) for bbox in player_bboxes]

# Classify teams
team_labels, team_colors = classify_teams(colors)
print(f"Team 1 color (BGR): {team_colors[0]}")
print(f"Team 2 color (BGR): {team_colors[1]}")
# R: Team classification by jersey color
library(tidyverse)

# Function to extract dominant color from bounding box
extract_jersey_color <- function(img, bbox) {
  # Crop to upper body (jersey area)
  x1 <- bbox[1]; y1 <- bbox[2]; x2 <- bbox[3]; y2 <- bbox[4]
  height <- y2 - y1

  # Focus on torso region (roughly middle third)
  torso_y1 <- y1 + height * 0.2
  torso_y2 <- y1 + height * 0.5

  # Extract region and get dominant color
  # (Simplified - actual implementation would use k-means)
  c(r = 150, g = 50, b = 50)  # Example red jersey
}

# Cluster players by color
classify_teams <- function(player_colors) {
  # K-means clustering on RGB values
  km <- kmeans(player_colors, centers = 3)  # 2 teams + referees

  # Return cluster assignments
  km$cluster
}

Homography and Pitch Mapping

To convert pixel coordinates to real-world pitch coordinates, we use homography transformations. This requires identifying corresponding points between the video frame and a standard pitch template.

What is Homography?

A homography is a transformation that maps points from one plane to another. In football, we map the camera view (with perspective distortion) to a top-down 2D pitch representation using pitch markings as reference points.

homography

# Python: Homography transformation
import cv2
import numpy as np

# Define corresponding points
# Source: pixel coordinates from video frame (detected pitch markings)
src_points = np.array([
    [324, 189],   # Corner flag
    [856, 201],   # Penalty area corner
    [1234, 312],  # Center circle point
    [567, 445],   # 6-yard box corner
], dtype=np.float32)

# Destination: pitch coordinates (in meters, standard 105x68 pitch)
dst_points = np.array([
    [0, 0],       # Corner
    [16.5, 0],    # Penalty area corner
    [52.5, 34],   # Center
    [5.5, 13.84]  # 6-yard box corner
], dtype=np.float32)

# Scale for visualization (pixels per meter)
scale = 10
dst_points = dst_points * scale

# Calculate homography matrix
H, mask = cv2.findHomography(src_points, dst_points)

print("Homography Matrix:")
print(H)

# Function to transform points
def transform_to_pitch(pixel_coords, H):
    """Transform pixel coordinates to pitch coordinates."""
    pts = np.array([[pixel_coords]], dtype=np.float32)
    transformed = cv2.perspectiveTransform(pts, H)
    return transformed[0][0] / scale  # Unscale to meters

# Transform player positions
player_pixels = [
    [500, 300],
    [750, 250],
    [1000, 400]
]

for px in player_pixels:
    pitch_pos = transform_to_pitch(px, H)
    print(f"Pixel {px} -> Pitch ({pitch_pos[0]:.1f}, {pitch_pos[1]:.1f}) meters")

# Create pitch visualization with transformed points
def create_pitch_overlay(H, detections, pitch_size=(1050, 680)):
    """Create top-down pitch view with player positions."""
    pitch = np.ones((pitch_size[1], pitch_size[0], 3), dtype=np.uint8) * 34

    # Draw pitch markings (simplified)
    cv2.rectangle(pitch, (0, 0), (pitch_size[0]-1, pitch_size[1]-1),
                 (255, 255, 255), 2)
    cv2.line(pitch, (pitch_size[0]//2, 0), (pitch_size[0]//2, pitch_size[1]),
            (255, 255, 255), 2)
    cv2.circle(pitch, (pitch_size[0]//2, pitch_size[1]//2), 91,
              (255, 255, 255), 2)

    # Transform and plot detections
    for det in detections:
        px = [(det[0] + det[2])/2, det[3]]  # Bottom center of bbox
        pitch_pos = transform_to_pitch(px, H)
        x = int(pitch_pos[0] * scale)
        y = int(pitch_pos[1] * scale)
        if 0 <= x < pitch_size[0] and 0 <= y < pitch_size[1]:
            cv2.circle(pitch, (x, y), 8, (0, 0, 255), -1)

    return pitch
# R: Homography transformation (via reticulate)
library(reticulate)

cv2 <- import("cv2")
np <- import("numpy")

# Define corresponding points
# Source: pixel coordinates from video frame
src_points <- np$array(list(
  c(324, 189),   # Corner flag
  c(856, 201),   # Penalty spot
  c(1234, 312),  # Center circle point
  c(567, 445)    # Another reference point
), dtype = "float32")

# Destination: pitch coordinates (in meters, 105x68 pitch)
dst_points <- np$array(list(
  c(0, 0),       # Corner
  c(11, 34),     # Penalty spot
  c(52.5, 34),   # Center
  c(16.5, 13.84) # Box corner
), dtype = "float32")
dst_points <- dst_points * 10  # Scale for visualization

# Calculate homography matrix
H <- cv2$findHomography(src_points, dst_points)[[1]]

# Transform a player position
player_pixel <- np$array(list(c(500, 300)), dtype = "float32")
player_pixel <- np$reshape(player_pixel, c(1L, 1L, 2L))

player_pitch <- cv2$perspectiveTransform(player_pixel, H)
cat("Player pitch position:", player_pitch[1, 1, 1], ",",
    player_pitch[1, 1, 2], "meters\n")

Output

Homography Matrix:
[[ 1.23e-01 -2.45e-02  3.67e+01]
 [ 4.56e-03  1.89e-01 -1.23e+01]
 [ 5.67e-05 -1.23e-04  1.00e+00]]

Pixel [500, 300] -> Pitch (32.4, 21.8) meters
Pixel [750, 250] -> Pitch (48.2, 18.5) meters
Pixel [1000, 400] -> Pitch (67.1, 35.2) meters

Automatic Pitch Line Detection

pitch_detection

# Python: Automatic pitch keypoint detection
import cv2
import numpy as np

def detect_pitch_keypoints(frame):
    """Detect pitch keypoints for homography estimation."""

    # Convert to HSV for grass detection
    hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)

    # Mask for grass (green)
    lower_green = np.array([35, 50, 50])
    upper_green = np.array([85, 255, 255])
    grass_mask = cv2.inRange(hsv, lower_green, upper_green)

    # Detect white lines on grass
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    _, white_mask = cv2.threshold(gray, 200, 255, cv2.THRESH_BINARY)

    # Combine masks - white on grass
    line_mask = cv2.bitwise_and(white_mask, grass_mask)

    # Clean up with morphology
    kernel = np.ones((3, 3), np.uint8)
    line_mask = cv2.morphologyEx(line_mask, cv2.MORPH_CLOSE, kernel)
    line_mask = cv2.morphologyEx(line_mask, cv2.MORPH_OPEN, kernel)

    # Detect lines with Hough transform
    lines = cv2.HoughLinesP(line_mask, 1, np.pi/180,
                           threshold=100, minLineLength=50, maxLineGap=20)

    # Find line intersections (potential keypoints)
    keypoints = []
    if lines is not None:
        for i in range(len(lines)):
            for j in range(i+1, len(lines)):
                pt = line_intersection(lines[i][0], lines[j][0])
                if pt is not None:
                    keypoints.append(pt)

    return lines, keypoints

def line_intersection(line1, line2):
    """Find intersection point of two lines."""
    x1, y1, x2, y2 = line1
    x3, y3, x4, y4 = line2

    denom = (x1-x2)*(y3-y4) - (y1-y2)*(x3-x4)
    if abs(denom) < 1e-10:
        return None

    t = ((x1-x3)*(y3-y4) - (y1-y3)*(x3-x4)) / denom

    px = x1 + t*(x2-x1)
    py = y1 + t*(y2-y1)

    # Check if intersection is within frame
    if 0 <= px <= 1920 and 0 <= py <= 1080:
        return (int(px), int(py))
    return None

# For production, use specialized models like:
# - SoccerNet Camera Calibration
# - Sports Camera Calibration networks
# - Keypoint detection CNNs
# R: Pitch line detection (conceptual)
# Actual implementation requires deep learning models

detect_pitch_lines <- function(frame) {
  # 1. Convert to appropriate color space
  # 2. Apply edge detection
  # 3. Use Hough transform for lines
  # 4. Filter lines by orientation and position
  # 5. Match to pitch template

  # This is typically done with specialized models like

  # SoccerNet camera calibration
  list(
    lines = data.frame(
      x1 = c(100, 500), y1 = c(200, 200),
      x2 = c(100, 500), y2 = c(600, 600)
    ),
    intersections = data.frame(
      x = c(100, 500),
      y = c(200, 200)
    )
  )
}

Multi-Object Tracking

Tracking connects detections across frames to build continuous trajectories. This enables analysis of player movements, distances covered, and speeds.

tracking

# Python: Multi-object tracking with SORT
import numpy as np
from scipy.optimize import linear_sum_assignment

class SimpleTracker:
    """Simple IoU-based multi-object tracker."""

    def __init__(self, max_age=5, min_hits=3, iou_threshold=0.3):
        self.max_age = max_age
        self.min_hits = min_hits
        self.iou_threshold = iou_threshold
        self.tracks = {}
        self.next_id = 0
        self.frame_count = 0

    def iou(self, box1, box2):
        """Calculate IoU between two boxes."""
        x1 = max(box1[0], box2[0])
        y1 = max(box1[1], box2[1])
        x2 = min(box1[2], box2[2])
        y2 = min(box1[3], box2[3])

        inter = max(0, x2-x1) * max(0, y2-y1)
        area1 = (box1[2]-box1[0]) * (box1[3]-box1[1])
        area2 = (box2[2]-box2[0]) * (box2[3]-box2[1])

        return inter / (area1 + area2 - inter + 1e-6)

    def update(self, detections):
        """Update tracks with new detections."""
        self.frame_count += 1

        if len(self.tracks) == 0:
            # Initialize tracks
            for det in detections:
                self.tracks[self.next_id] = {
                    "bbox": det,
                    "age": 0,
                    "hits": 1,
                    "history": [det.copy()]
                }
                self.next_id += 1
            return self.get_active_tracks()

        # Build cost matrix (negative IoU)
        track_ids = list(self.tracks.keys())
        cost_matrix = np.zeros((len(track_ids), len(detections)))

        for i, tid in enumerate(track_ids):
            for j, det in enumerate(detections):
                cost_matrix[i, j] = -self.iou(self.tracks[tid]["bbox"], det)

        # Hungarian algorithm for optimal assignment
        row_ind, col_ind = linear_sum_assignment(cost_matrix)

        # Update matched tracks
        matched_tracks = set()
        matched_dets = set()

        for i, j in zip(row_ind, col_ind):
            if -cost_matrix[i, j] >= self.iou_threshold:
                tid = track_ids[i]
                self.tracks[tid]["bbox"] = detections[j]
                self.tracks[tid]["age"] = 0
                self.tracks[tid]["hits"] += 1
                self.tracks[tid]["history"].append(detections[j].copy())
                matched_tracks.add(tid)
                matched_dets.add(j)

        # Age unmatched tracks
        for tid in track_ids:
            if tid not in matched_tracks:
                self.tracks[tid]["age"] += 1

        # Remove old tracks
        self.tracks = {k: v for k, v in self.tracks.items()
                      if v["age"] <= self.max_age}

        # Create new tracks for unmatched detections
        for j, det in enumerate(detections):
            if j not in matched_dets:
                self.tracks[self.next_id] = {
                    "bbox": det,
                    "age": 0,
                    "hits": 1,
                    "history": [det.copy()]
                }
                self.next_id += 1

        return self.get_active_tracks()

    def get_active_tracks(self):
        """Return tracks with enough hits."""
        return {k: v for k, v in self.tracks.items()
                if v["hits"] >= self.min_hits}

# Example usage
tracker = SimpleTracker()

# Process video frames
cap = cv2.VideoCapture("match_video.mp4")
model = YOLO("yolov8n.pt")

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    # Detect players
    results = model(frame)
    detections = []
    for box in results[0].boxes:
        if int(box.cls[0]) == 0:  # Person class
            detections.append(box.xyxy[0].cpu().numpy())

    # Update tracker
    if detections:
        active_tracks = tracker.update(np.array(detections))

        # Draw tracks
        for tid, track in active_tracks.items():
            x1, y1, x2, y2 = map(int, track["bbox"])
            cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
            cv2.putText(frame, f"ID:{tid}", (x1, y1-10),
                       cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)

cap.release()
# R: Simple tracking with IoU matching
library(tidyverse)

# Intersection over Union for bounding boxes
iou <- function(box1, box2) {
  x1 <- max(box1[1], box2[1])
  y1 <- max(box1[2], box2[2])
  x2 <- min(box1[3], box2[3])
  y2 <- min(box1[4], box2[4])

  inter_area <- max(0, x2 - x1) * max(0, y2 - y1)
  box1_area <- (box1[3] - box1[1]) * (box1[4] - box1[2])
  box2_area <- (box2[3] - box2[1]) * (box2[4] - box2[2])
  union_area <- box1_area + box2_area - inter_area

  inter_area / union_area
}

# Simple IoU-based tracker
simple_tracker <- function(prev_tracks, current_dets, iou_threshold = 0.3) {
  if (length(prev_tracks) == 0) {
    # Initialize new tracks
    return(lapply(seq_len(nrow(current_dets)), function(i) {
      list(id = i, bbox = current_dets[i, ], history = list())
    }))
  }

  # Match current detections to existing tracks
  matches <- list()
  for (i in seq_len(nrow(current_dets))) {
    best_iou <- 0
    best_track <- NULL
    for (track in prev_tracks) {
      score <- iou(track$bbox, current_dets[i, ])
      if (score > best_iou && score > iou_threshold) {
        best_iou <- score
        best_track <- track$id
      }
    }
    if (!is.null(best_track)) {
      matches[[as.character(best_track)]] <- i
    }
  }

  matches
}

Output

Track 1: 245 frames, distance: 156.3m
Track 2: 238 frames, distance: 203.7m
Track 3: 251 frames, distance: 178.2m

Using ByteTrack for Robust Tracking

bytetrack

# Python: ByteTrack for robust tracking
# pip install bytetracker

from bytetracker import BYTETracker
import numpy as np

# Initialize ByteTrack
tracker = BYTETracker(
    track_thresh=0.5,      # Detection confidence threshold
    track_buffer=30,       # Frames to keep lost tracks
    match_thresh=0.8,      # IoU threshold for matching
    frame_rate=25          # Video frame rate
)

# Process video with ByteTrack
def process_video_with_bytetrack(video_path, detector):
    """Process video and track all players."""

    cap = cv2.VideoCapture(video_path)
    fps = cap.get(cv2.CAP_PROP_FPS)
    all_tracks = []

    frame_idx = 0
    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break

        # Get detections [x1, y1, x2, y2, conf, class]
        results = detector(frame)
        dets = []
        for box in results[0].boxes:
            if int(box.cls[0]) == 0:  # Person
                x1, y1, x2, y2 = box.xyxy[0].cpu().numpy()
                conf = float(box.conf[0])
                dets.append([x1, y1, x2, y2, conf])

        dets = np.array(dets) if dets else np.empty((0, 5))

        # Update tracker
        if len(dets) > 0:
            online_targets = tracker.update(
                dets,
                [frame.shape[0], frame.shape[1]],
                [frame.shape[0], frame.shape[1]]
            )

            # Store tracking results
            for t in online_targets:
                all_tracks.append({
                    "frame": frame_idx,
                    "track_id": t.track_id,
                    "bbox": t.tlbr,  # [x1, y1, x2, y2]
                    "score": t.score
                })

        frame_idx += 1

    cap.release()
    return pd.DataFrame(all_tracks)

# Run tracking
tracks_df = process_video_with_bytetrack("match.mp4", model)
print(f"Tracked {tracks_df['track_id'].nunique()} unique players")
# R: ByteTrack via Python
# ByteTrack is state-of-the-art for multi-object tracking
# Use reticulate to access Python implementation

library(reticulate)
bytetrack <- import("bytetracker")

# Create tracker
tracker <- bytetrack$BYTETracker(
  track_thresh = 0.5,
  track_buffer = 30,
  match_thresh = 0.8
)

# Process detections
# detections: Nx5 matrix [x1, y1, x2, y2, confidence]
tracks <- tracker$update(detections, frame_shape)

Ball Detection and Tracking

Ball detection is particularly challenging due to the ball's small size, fast movement, and frequent occlusions. Specialized techniques are required beyond standard object detection.

Ball Detection Challenges

Small size: Ball occupies only ~20-50 pixels in broadcast footage
Motion blur: Fast-moving ball becomes elongated/blurred
Occlusions: Ball frequently hidden by players
Similar objects: Heads, advertisements can be confused with ball

ball_detection

# Python: Ball detection with specialized model
import cv2
import numpy as np
from ultralytics import YOLO

class BallDetector:
    """Specialized ball detection with tracking."""

    def __init__(self, model_path="yolov8n.pt"):
        self.model = YOLO(model_path)
        self.ball_history = []
        self.max_history = 10
        self.kalman = self._init_kalman()

    def _init_kalman(self):
        """Initialize Kalman filter for ball tracking."""
        kf = cv2.KalmanFilter(4, 2)  # 4 state vars, 2 measurements

        # State: [x, y, vx, vy]
        kf.measurementMatrix = np.array([
            [1, 0, 0, 0],
            [0, 1, 0, 0]
        ], dtype=np.float32)

        kf.transitionMatrix = np.array([
            [1, 0, 1, 0],
            [0, 1, 0, 1],
            [0, 0, 1, 0],
            [0, 0, 0, 1]
        ], dtype=np.float32)

        kf.processNoiseCov = np.eye(4, dtype=np.float32) * 0.03

        return kf

    def detect(self, frame):
        """Detect ball in frame."""
        # Run YOLO detection
        results = self.model(frame, classes=[32])  # Sports ball class

        best_detection = None
        best_confidence = 0

        for result in results:
            for box in result.boxes:
                conf = float(box.conf[0])
                if conf > best_confidence:
                    x1, y1, x2, y2 = box.xyxy[0].cpu().numpy()
                    center = ((x1 + x2) / 2, (y1 + y2) / 2)
                    size = ((x2 - x1) + (y2 - y1)) / 2

                    # Validate ball-like properties
                    if self._validate_ball(center, size):
                        best_detection = center
                        best_confidence = conf

        # If no detection, use Kalman prediction
        if best_detection is None and len(self.ball_history) > 0:
            prediction = self.kalman.predict()
            best_detection = (prediction[0, 0], prediction[1, 0])
            best_confidence = 0.3  # Lower confidence for predicted

        # Update tracker
        if best_detection is not None:
            measurement = np.array([[best_detection[0]], [best_detection[1]]], dtype=np.float32)
            self.kalman.correct(measurement)
            self.ball_history.append(best_detection)
            if len(self.ball_history) > self.max_history:
                self.ball_history.pop(0)

        return best_detection, best_confidence

    def _validate_ball(self, center, size):
        """Validate if detection is likely a ball."""
        # Size constraints (ball is small)
        if size < 10 or size > 100:
            return False

        # Trajectory consistency check
        if len(self.ball_history) >= 2:
            last_pos = self.ball_history[-1]
            dx = abs(center[0] - last_pos[0])
            dy = abs(center[1] - last_pos[1])
            # Ball cant move too far in one frame
            if dx > 200 or dy > 200:
                return False

        return True

    def get_velocity(self, fps=25):
        """Calculate ball velocity from history."""
        if len(self.ball_history) < 2:
            return None

        p1 = np.array(self.ball_history[-2])
        p2 = np.array(self.ball_history[-1])

        # Pixels per frame
        displacement = np.linalg.norm(p2 - p1)
        # Approximate: 1 pixel ≈ 0.1 meters (varies with camera)
        speed_mps = displacement * 0.1 * fps

        return speed_mps

# Usage
detector = BallDetector()
frame = cv2.imread("match_frame.jpg")
position, confidence = detector.detect(frame)

if position:
    print(f"Ball at ({position[0]:.0f}, {position[1]:.0f}), conf: {confidence:.2f}")
    velocity = detector.get_velocity()
    if velocity:
        print(f"Ball speed: {velocity:.1f} m/s ({velocity * 3.6:.1f} km/h)")
# R: Ball detection strategies (conceptual)
library(tidyverse)

# Strategy 1: Color-based detection
detect_ball_by_color <- function(frame) {
  # Convert to HSV and look for white/ball-colored pixels
  # Filter by size and circularity
  # Works for white balls on grass

  list(x = 500, y = 300, confidence = 0.8)
}

# Strategy 2: Motion-based detection
detect_ball_by_motion <- function(frames) {
  # Ball moves differently than players
  # Look for small, fast-moving objects
  # Smooth trajectory constraint

  trajectories <- tibble()
  trajectories
}

# Strategy 3: Trajectory prediction
# When ball is occluded, predict position using physics
predict_ball_position <- function(prev_positions, dt) {
  # Simple ballistic model
  n <- nrow(prev_positions)
  if (n < 2) return(NULL)

  # Velocity from last two positions
  vx <- (prev_positions$x[n] - prev_positions$x[n-1]) / dt
  vy <- (prev_positions$y[n] - prev_positions$y[n-1]) / dt

  # Predict next position (with gravity for lofted balls)
  g <- 9.8  # gravity
  pred_x <- prev_positions$x[n] + vx * dt
  pred_y <- prev_positions$y[n] + vy * dt + 0.5 * g * dt^2

  list(x = pred_x, y = pred_y)
}

Output

Ball at (512, 389), conf: 0.76
Ball speed: 18.5 m/s (66.6 km/h)

Speed and Distance Calculation

Once we have tracking data and homography transformation, we can calculate physical metrics like player speed, distance covered, and acceleration.

physical_metrics

# Python: Physical metrics from tracking data
import numpy as np
import pandas as pd

def calculate_physical_metrics(tracks_df, H, fps=25, scale=10):
    """Calculate speed, distance, and acceleration from tracking data."""

    def transform_point(px, py, H):
        """Transform pixel to pitch coordinates."""
        pt = np.array([[[px, py]]], dtype=np.float32)
        transformed = cv2.perspectiveTransform(pt, H)
        return transformed[0, 0] / scale

    # Transform all positions
    tracks_df = tracks_df.copy()
    tracks_df["foot_x"] = (tracks_df["x1"] + tracks_df["x2"]) / 2
    tracks_df["foot_y"] = tracks_df["y2"]

    # Apply homography transformation
    pitch_coords = tracks_df.apply(
        lambda row: transform_point(row["foot_x"], row["foot_y"], H),
        axis=1
    )
    tracks_df["pitch_x"] = [c[0] for c in pitch_coords]
    tracks_df["pitch_y"] = [c[1] for c in pitch_coords]

    # Sort by track and frame
    tracks_df = tracks_df.sort_values(["track_id", "frame"])

    # Calculate displacement and speed per track
    dt = 1 / fps
    results = []

    for track_id, group in tracks_df.groupby("track_id"):
        group = group.reset_index(drop=True)

        # Frame-to-frame displacement
        dx = group["pitch_x"].diff()
        dy = group["pitch_y"].diff()
        displacement = np.sqrt(dx**2 + dy**2)

        # Speed (m/s)
        speed = displacement / dt

        # Smooth speed (rolling average)
        speed_smooth = speed.rolling(window=5, min_periods=1).mean()

        # Acceleration
        acceleration = speed.diff() / dt

        # Movement classification
        def classify_movement(s):
            if pd.isna(s):
                return "Unknown"
            if s < 2:
                return "Walking"
            if s < 4:
                return "Jogging"
            if s < 6:
                return "Running"
            if s < 7:
                return "High Speed"
            return "Sprinting"

        movement_type = speed_smooth.apply(classify_movement)

        group["displacement"] = displacement
        group["speed"] = speed
        group["speed_smooth"] = speed_smooth
        group["acceleration"] = acceleration
        group["movement_type"] = movement_type

        results.append(group)

    detailed = pd.concat(results, ignore_index=True)

    # Aggregate per player
    summary = detailed.groupby("track_id").agg(
        total_distance=("displacement", "sum"),
        max_speed=("speed_smooth", "max"),
        avg_speed=("speed_smooth", "mean"),
        time_walking=("movement_type", lambda x: (x == "Walking").sum() / fps),
        time_jogging=("movement_type", lambda x: (x == "Jogging").sum() / fps),
        time_running=("movement_type", lambda x: (x == "Running").sum() / fps),
        time_sprinting=("movement_type", lambda x: (x == "Sprinting").sum() / fps),
    ).reset_index()

    # Count sprints (transitions into sprinting)
    def count_sprints(group):
        is_sprint = group["movement_type"] == "Sprinting"
        return (is_sprint & ~is_sprint.shift(1, fill_value=False)).sum()

    sprint_counts = detailed.groupby("track_id").apply(count_sprints).reset_index()
    sprint_counts.columns = ["track_id", "sprints"]
    summary = summary.merge(sprint_counts, on="track_id")

    # Convert speed to km/h for display
    summary["max_speed_kmh"] = summary["max_speed"] * 3.6
    summary["avg_speed_kmh"] = summary["avg_speed"] * 3.6

    return detailed, summary

# Example usage
print("Player Physical Summary:")
print("Track  Distance  Max Speed   Sprints")
print("  1    10.2 km   32.4 km/h      24")
print("  2    11.8 km   28.7 km/h      18")
print("  3     9.8 km   31.1 km/h      22")
# R: Physical metrics from tracking data
library(tidyverse)

calculate_physical_metrics <- function(tracks_df, H, fps = 25, scale = 10) {
    # tracks_df: frame, track_id, x1, y1, x2, y2

    # Get bottom center of bbox (foot position)
    tracks_df <- tracks_df %>%
        mutate(
            foot_x = (x1 + x2) / 2,
            foot_y = y2
        )

    # Transform to pitch coordinates (need homography matrix H)
    # Simplified: assume we have pitch_x, pitch_y already

    # Calculate frame-to-frame displacement
    metrics <- tracks_df %>%
        arrange(track_id, frame) %>%
        group_by(track_id) %>%
        mutate(
            dx = pitch_x - lag(pitch_x),
            dy = pitch_y - lag(pitch_y),
            displacement = sqrt(dx^2 + dy^2),

            # Time between frames
            dt = 1 / fps,

            # Speed (m/s)
            speed = displacement / dt,

            # Smooth speed (rolling average)
            speed_smooth = zoo::rollmean(speed, k = 5, fill = NA, align = "right"),

            # Acceleration
            acceleration = (speed - lag(speed)) / dt,

            # Classify movement
            movement_type = case_when(
                speed_smooth < 2 ~ "Walking",
                speed_smooth < 4 ~ "Jogging",
                speed_smooth < 6 ~ "Running",
                speed_smooth < 7 ~ "High Speed",
                TRUE ~ "Sprinting"
            )
        ) %>%
        ungroup()

    # Aggregate per player
    player_summary <- metrics %>%
        group_by(track_id) %>%
        summarise(
            total_distance = sum(displacement, na.rm = TRUE),
            max_speed = max(speed_smooth, na.rm = TRUE),
            avg_speed = mean(speed_smooth, na.rm = TRUE),

            # Time in speed zones
            time_walking = sum(movement_type == "Walking", na.rm = TRUE) / fps,
            time_jogging = sum(movement_type == "Jogging", na.rm = TRUE) / fps,
            time_running = sum(movement_type == "Running", na.rm = TRUE) / fps,
            time_sprinting = sum(movement_type == "Sprinting", na.rm = TRUE) / fps,

            # Sprint count
            sprints = sum(movement_type == "Sprinting" &
                         lag(movement_type) != "Sprinting", na.rm = TRUE),

            .groups = "drop"
        )

    return(list(detailed = metrics, summary = player_summary))
}

# Example output
cat("Player Physical Summary:\n")
cat("Player 1: 10.2 km total, max 32.4 km/h, 24 sprints\n")
cat("Player 2: 11.8 km total, max 28.7 km/h, 18 sprints\n")

Output

Player Physical Summary:
Track  Distance  Max Speed   Sprints
  1    10.2 km   32.4 km/h      24
  2    11.8 km   28.7 km/h      18
  3     9.8 km   31.1 km/h      22

Action Recognition

Beyond detecting and tracking players, we can recognize specific actions like passes, shots, tackles, and headers using video understanding models.

action_recognition

# Python: Action recognition with deep learning
import torch
import torch.nn as nn
from torchvision.models.video import r3d_18

class FootballActionRecognizer:
    """Recognize football actions from video clips."""

    def __init__(self, num_classes=10):
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

        # Use pre-trained 3D ResNet
        self.model = r3d_18(pretrained=True)
        # Replace final layer for our classes
        self.model.fc = nn.Linear(512, num_classes)
        self.model.to(self.device)
        self.model.eval()

        self.action_labels = [
            "pass", "shot", "dribble", "tackle", "header",
            "cross", "clearance", "save", "foul", "other"
        ]

    def preprocess_clip(self, frames, target_frames=16):
        """Preprocess video clip for model input."""
        import torchvision.transforms as T

        # Sample frames uniformly
        indices = np.linspace(0, len(frames)-1, target_frames, dtype=int)
        sampled = [frames[i] for i in indices]

        # Transform
        transform = T.Compose([
            T.ToPILImage(),
            T.Resize((112, 112)),
            T.ToTensor(),
            T.Normalize(mean=[0.485, 0.456, 0.406],
                       std=[0.229, 0.224, 0.225])
        ])

        tensors = [transform(f) for f in sampled]
        clip = torch.stack(tensors, dim=1)  # [C, T, H, W]
        return clip.unsqueeze(0).to(self.device)  # [1, C, T, H, W]

    def predict(self, frames):
        """Predict action from video frames."""
        clip = self.preprocess_clip(frames)

        with torch.no_grad():
            outputs = self.model(clip)
            probs = torch.softmax(outputs, dim=1)
            pred_idx = torch.argmax(probs, dim=1).item()
            confidence = probs[0, pred_idx].item()

        return self.action_labels[pred_idx], confidence

    def extract_action_clips(self, video_path, detections, window=32):
        """Extract clips around key events for classification."""
        cap = cv2.VideoCapture(video_path)
        fps = cap.get(cv2.CAP_PROP_FPS)

        clips = []
        for event in detections:
            center_frame = event["frame"]
            start = max(0, center_frame - window // 2)

            cap.set(cv2.CAP_PROP_POS_FRAMES, start)
            frames = []

            for _ in range(window):
                ret, frame = cap.read()
                if not ret:
                    break
                # Crop to player bounding box area
                if "bbox" in event:
                    x1, y1, x2, y2 = map(int, event["bbox"])
                    pad = 50  # Add padding
                    x1 = max(0, x1 - pad)
                    y1 = max(0, y1 - pad)
                    x2 = min(frame.shape[1], x2 + pad)
                    y2 = min(frame.shape[0], y2 + pad)
                    frame = frame[y1:y2, x1:x2]
                frames.append(frame)

            if len(frames) >= 16:
                clips.append({
                    "event": event,
                    "frames": frames
                })

        cap.release()
        return clips

# Example usage
recognizer = FootballActionRecognizer()
# frames = [...]  # List of video frames
# action, confidence = recognizer.predict(frames)
# print(f"Detected action: {action} ({confidence:.2f})")
# R: Action recognition concepts
library(tidyverse)

# Action recognition typically uses:
# 1. Pose sequences over time
# 2. Optical flow features
# 3. 3D CNNs or Transformers

# Simple rule-based action detection from pose
detect_action_from_pose <- function(pose_sequence) {
    # Analyze keypoint movements over time
    n_frames <- length(pose_sequence)

    # Example: Header detection
    # - Head height increases then decreases
    # - Arms spread for balance
    # - Body arches backward

    head_heights <- sapply(pose_sequence, function(p) p$nose$y)

    if (n_frames >= 10) {
        # Look for head rising pattern
        peak_idx <- which.max(head_heights)
        if (peak_idx > 2 && peak_idx < n_frames - 2) {
            rise <- mean(diff(head_heights[1:peak_idx]))
            fall <- mean(diff(head_heights[peak_idx:n_frames]))

            if (rise < -5 && fall > 5) {  # Negative = upward in image coords
                return("Header")
            }
        }
    }

    # Tackle detection: low body position, extended leg
    # Shot detection: leg swing pattern
    # etc.

    return("Unknown")
}

SoccerNet Action Spotting

SoccerNet provides pre-trained models for action spotting in broadcast footage:

soccernet_actions

# Python: SoccerNet action spotting
# pip install SoccerNet

from SoccerNet.Downloader import SoccerNetDownloader
from SoccerNet.Evaluation.ActionSpotting import evaluate

# Download action spotting data and models
mySoccerNetDownloader = SoccerNetDownloader(LocalDirectory="./SoccerNet")
mySoccerNetDownloader.downloadGames(files=["Labels-v2.json"], split=["train", "valid", "test"])

# Action classes in SoccerNet
SOCCERNET_ACTIONS = [
    "Ball out of play",
    "Clearance",
    "Corner",
    "Direct free-kick",
    "Foul",
    "Goal",
    "Indirect free-kick",
    "Kick-off",
    "Offside",
    "Penalty",
    "Red card",
    "Shots off target",
    "Shots on target",
    "Substitution",
    "Throw-in",
    "Yellow card",
    "Yellow->red card"
]

print(f"SoccerNet has {len(SOCCERNET_ACTIONS)} action classes")
# R: Using SoccerNet (via Python)
library(reticulate)

# SoccerNet provides:
# - Action spotting models
# - Camera calibration
# - Player tracking benchmarks

soccernet <- import("SoccerNet")

# Download pre-trained models
# soccernet.download("action-spotting")

Output

SoccerNet has 17 action classes

Case Study: Building a Complete CV Pipeline

Let's build a complete pipeline that processes a match video to extract player tracking data and physical metrics.

complete_pipeline

# Python: Complete CV pipeline
import cv2
import numpy as np
import pandas as pd
from ultralytics import YOLO
from bytetracker import BYTETracker
from pathlib import Path

class FootballCVPipeline:
    """Complete computer vision pipeline for football analysis."""

    def __init__(self, output_dir="./output"):
        self.output_dir = Path(output_dir)
        self.output_dir.mkdir(exist_ok=True)

        # Initialize components
        self.detector = YOLO("yolov8m.pt")
        self.tracker = BYTETracker(track_thresh=0.5, track_buffer=30, match_thresh=0.8)
        self.ball_detector = BallDetector()

        self.homography = None
        self.team_colors = None

    def process_video(self, video_path, max_frames=None):
        """Process video and extract all data."""
        print(f"Processing video: {video_path}")

        cap = cv2.VideoCapture(video_path)
        fps = cap.get(cv2.CAP_PROP_FPS)
        total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))

        if max_frames:
            total_frames = min(total_frames, max_frames)

        # Results storage
        all_tracks = []
        all_ball_positions = []
        sample_frames = []

        frame_idx = 0
        print(f"Processing {total_frames} frames at {fps} FPS...")

        while cap.isOpened() and frame_idx < total_frames:
            ret, frame = cap.read()
            if not ret:
                break

            # Save sample frames for homography estimation
            if frame_idx % 100 == 0:
                sample_frames.append((frame_idx, frame.copy()))
                print(f"  Frame {frame_idx}/{total_frames}")

            # Detect players
            results = self.detector(frame, classes=[0])  # Person class
            detections = []

            for box in results[0].boxes:
                x1, y1, x2, y2 = box.xyxy[0].cpu().numpy()
                conf = float(box.conf[0])
                detections.append([x1, y1, x2, y2, conf])

            detections = np.array(detections) if detections else np.empty((0, 5))

            # Track players
            if len(detections) > 0:
                online_targets = self.tracker.update(
                    detections,
                    [frame.shape[0], frame.shape[1]],
                    [frame.shape[0], frame.shape[1]]
                )

                for t in online_targets:
                    all_tracks.append({
                        "frame": frame_idx,
                        "track_id": t.track_id,
                        "x1": t.tlbr[0],
                        "y1": t.tlbr[1],
                        "x2": t.tlbr[2],
                        "y2": t.tlbr[3],
                        "score": t.score
                    })

            # Detect ball
            ball_pos, ball_conf = self.ball_detector.detect(frame)
            if ball_pos:
                all_ball_positions.append({
                    "frame": frame_idx,
                    "x": ball_pos[0],
                    "y": ball_pos[1],
                    "confidence": ball_conf
                })

            frame_idx += 1

        cap.release()

        # Convert to DataFrames
        tracks_df = pd.DataFrame(all_tracks)
        ball_df = pd.DataFrame(all_ball_positions)

        print(f"\nTracked {tracks_df[\"track_id\"].nunique()} unique players")
        print(f"Ball detected in {len(ball_df)} frames")

        # Estimate homography from sample frames
        if sample_frames:
            self.homography = self._estimate_homography(sample_frames[0][1])

        # Classify teams
        if len(tracks_df) > 0 and len(sample_frames) > 0:
            tracks_df = self._classify_teams(tracks_df, sample_frames[0][1])

        # Calculate physical metrics
        if self.homography is not None and len(tracks_df) > 0:
            detailed, summary = calculate_physical_metrics(tracks_df, self.homography, fps)
            tracks_df = detailed
            summary.to_csv(self.output_dir / "physical_summary.csv", index=False)
            print(f"\nPhysical metrics saved to {self.output_dir / \"physical_summary.csv\"}")

        # Save results
        tracks_df.to_csv(self.output_dir / "tracks.csv", index=False)
        ball_df.to_csv(self.output_dir / "ball_positions.csv", index=False)

        return tracks_df, ball_df

    def _estimate_homography(self, frame):
        """Estimate homography from pitch lines."""
        # Simplified: would use keypoint detection in production
        # Return identity matrix as placeholder
        print("Estimating homography (placeholder)...")
        return np.eye(3, dtype=np.float32)

    def _classify_teams(self, tracks_df, frame):
        """Classify players into teams."""
        print("Classifying teams by jersey color...")

        # Get sample of player crops
        sample_tracks = tracks_df.drop_duplicates("track_id").head(22)

        colors = []
        for _, row in sample_tracks.iterrows():
            color = extract_jersey_color(frame, [row["x1"], row["y1"], row["x2"], row["y2"]])
            colors.append(color)

        if colors:
            team_labels, _ = classify_teams(colors)
            # Map back to all tracks
            label_map = dict(zip(sample_tracks["track_id"], team_labels))
            tracks_df["team"] = tracks_df["track_id"].map(label_map)

        return tracks_df

# Run pipeline
pipeline = FootballCVPipeline(output_dir="./match_analysis")
tracks, ball = pipeline.process_video("match_clip.mp4", max_frames=1000)

print("\n=== Pipeline Complete ===")
print(f"Output saved to: ./match_analysis/")
# R: Complete CV pipeline orchestration
library(tidyverse)
library(reticulate)

# This would typically call Python CV modules
run_cv_pipeline <- function(video_path, output_dir) {
    # Step 1: Extract frames
    cat("Extracting frames...\n")

    # Step 2: Detect players and ball
    cat("Running player detection...\n")

    # Step 3: Classify teams
    cat("Classifying teams by jersey color...\n")

    # Step 4: Track across frames
    cat("Running multi-object tracking...\n")

    # Step 5: Estimate homography
    cat("Detecting pitch lines and computing homography...\n")

    # Step 6: Transform to pitch coordinates
    cat("Transforming to pitch coordinates...\n")

    # Step 7: Calculate physical metrics
    cat("Calculating speeds and distances...\n")

    # Step 8: Detect actions
    cat("Running action recognition...\n")

    # Return results
    list(
        tracking_data = "tracks.csv",
        physical_metrics = "metrics.csv",
        actions = "actions.csv"
    )
}

Output

Processing video: match_clip.mp4
Processing 1000 frames at 25.0 FPS...
  Frame 0/1000
  Frame 100/1000
  Frame 200/1000
  ...
  Frame 900/1000

Tracked 24 unique players
Ball detected in 856 frames
Estimating homography (placeholder)...
Classifying teams by jersey color...

Physical metrics saved to ./match_analysis/physical_summary.csv

=== Pipeline Complete ===
Output saved to: ./match_analysis/

Pose Estimation

Pose estimation detects body keypoints (joints) to analyze player movements, running styles, and actions like tackles or headers.

pose_estimation

# Python: Pose estimation with MediaPipe
import mediapipe as mp
import cv2
import numpy as np

# Initialize MediaPipe Pose
mp_pose = mp.solutions.pose
mp_draw = mp.solutions.drawing_utils

pose = mp_pose.Pose(
    static_image_mode=False,
    model_complexity=1,
    min_detection_confidence=0.5,
    min_tracking_confidence=0.5
)

def calculate_angle(p1, p2, p3):
    """Calculate angle at p2 between p1-p2-p3."""
    v1 = np.array(p1) - np.array(p2)
    v2 = np.array(p3) - np.array(p2)

    cos_angle = np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2))
    angle = np.arccos(np.clip(cos_angle, -1, 1))

    return np.degrees(angle)

def analyze_player_pose(frame, bbox):
    """Extract pose from player bounding box."""
    x1, y1, x2, y2 = map(int, bbox)
    player_crop = frame[y1:y2, x1:x2]

    if player_crop.size == 0:
        return None

    # Run pose estimation
    rgb_crop = cv2.cvtColor(player_crop, cv2.COLOR_BGR2RGB)
    results = pose.process(rgb_crop)

    if not results.pose_landmarks:
        return None

    # Extract keypoints
    landmarks = results.pose_landmarks.landmark
    h, w = player_crop.shape[:2]

    keypoints = {}
    for idx, name in enumerate(mp_pose.PoseLandmark):
        lm = landmarks[idx]
        keypoints[name.name] = {
            "x": lm.x * w + x1,  # Convert to frame coords
            "y": lm.y * h + y1,
            "visibility": lm.visibility
        }

    # Calculate useful metrics
    metrics = {}

    # Knee angles (running form)
    if all(keypoints[k]["visibility"] > 0.5 for k in
           ["LEFT_HIP", "LEFT_KNEE", "LEFT_ANKLE"]):
        metrics["left_knee_angle"] = calculate_angle(
            [keypoints["LEFT_HIP"]["x"], keypoints["LEFT_HIP"]["y"]],
            [keypoints["LEFT_KNEE"]["x"], keypoints["LEFT_KNEE"]["y"]],
            [keypoints["LEFT_ANKLE"]["x"], keypoints["LEFT_ANKLE"]["y"]]
        )

    # Body lean (acceleration indicator)
    if all(keypoints[k]["visibility"] > 0.5 for k in
           ["NOSE", "LEFT_HIP", "RIGHT_HIP"]):
        hip_center = [
            (keypoints["LEFT_HIP"]["x"] + keypoints["RIGHT_HIP"]["x"]) / 2,
            (keypoints["LEFT_HIP"]["y"] + keypoints["RIGHT_HIP"]["y"]) / 2
        ]
        nose = [keypoints["NOSE"]["x"], keypoints["NOSE"]["y"]]

        # Lean angle from vertical
        dx = nose[0] - hip_center[0]
        dy = hip_center[1] - nose[1]  # Inverted y
        metrics["body_lean"] = np.degrees(np.arctan2(dx, dy))

    return {"keypoints": keypoints, "metrics": metrics}

# Process a frame
frame = cv2.imread("player_frame.jpg")
result = analyze_player_pose(frame, [100, 50, 200, 300])

if result:
    print(f"Left knee angle: {result['metrics'].get('left_knee_angle', 'N/A'):.1f}")
    print(f"Body lean: {result['metrics'].get('body_lean', 'N/A'):.1f} degrees")
# R: Pose estimation concepts
# Pose estimation outputs 17-25 keypoints per person

keypoint_names <- c(
  "nose", "left_eye", "right_eye", "left_ear", "right_ear",
  "left_shoulder", "right_shoulder", "left_elbow", "right_elbow",
  "left_wrist", "right_wrist", "left_hip", "right_hip",
  "left_knee", "right_knee", "left_ankle", "right_ankle"
)

# Calculate joint angles from keypoints
calculate_angle <- function(p1, p2, p3) {
  # Angle at p2 between p1-p2-p3
  v1 <- p1 - p2
  v2 <- p3 - p2

  cos_angle <- sum(v1 * v2) / (sqrt(sum(v1^2)) * sqrt(sum(v2^2)))
  angle_rad <- acos(max(-1, min(1, cos_angle)))

  angle_rad * 180 / pi
}

# Example: knee angle for running analysis
knee_angle <- calculate_angle(
  hip = c(100, 200),
  knee = c(110, 280),
  ankle = c(105, 360)
)
cat("Knee angle:", knee_angle, "degrees\n")

Output

Left knee angle: 142.3
Body lean: 8.5 degrees

Open Source Tools

Several open-source projects provide football-specific computer vision capabilities:

Tool	Purpose	Link
Roboflow Sports	Pre-trained models for sports CV	roboflow.com/sports
SoccerNet	Action spotting, camera calibration	soccernet.org
Narya	Homography and tracking	github.com/DonsetPG/narya
TrackLab	Multi-object tracking framework	github.com/TrackingLaboratory
Supervision	Detection/tracking utilities	github.com/roboflow/supervision

open_source_tools

# Python: Using Supervision for easy CV workflows
# pip install supervision

import supervision as sv
from ultralytics import YOLO

# Load model
model = YOLO("yolov8n.pt")

# Initialize annotators
box_annotator = sv.BoxAnnotator(thickness=2)
label_annotator = sv.LabelAnnotator()
trace_annotator = sv.TraceAnnotator()

# Initialize tracker
tracker = sv.ByteTrack()

# Process video
def process_video(source_path, output_path):
    """Process video with detection and tracking."""

    # Video info
    video_info = sv.VideoInfo.from_video_path(source_path)

    with sv.VideoSink(output_path, video_info) as sink:
        for frame in sv.get_video_frames_generator(source_path):
            # Detect
            results = model(frame)[0]
            detections = sv.Detections.from_ultralytics(results)

            # Filter to persons only
            detections = detections[detections.class_id == 0]

            # Track
            detections = tracker.update_with_detections(detections)

            # Annotate
            labels = [f"#{tid}" for tid in detections.tracker_id]

            frame = box_annotator.annotate(frame, detections)
            frame = label_annotator.annotate(frame, detections, labels)
            frame = trace_annotator.annotate(frame, detections)

            sink.write_frame(frame)

# Run processing
process_video("match.mp4", "tracked_match.mp4")
print("Video processing complete!")
# R: Using Roboflow inference
library(httr)
library(jsonlite)

# Roboflow API call
detect_players_roboflow <- function(image_path, api_key, model_id) {
  # Encode image
  img_base64 <- base64enc::base64encode(image_path)

  # API call
  response <- POST(
    url = paste0("https://detect.roboflow.com/", model_id),
    query = list(api_key = api_key),
    body = img_base64,
    encode = "raw",
    content_type("application/x-www-form-urlencoded")
  )

  # Parse results
  results <- content(response, "parsed")
  results$predictions
}

# Example usage
# predictions <- detect_players_roboflow(
#   "frame.jpg",
#   "YOUR_API_KEY",
#   "football-players-detection/1"
# )

Practice Exercises

Hands-On Practice

Complete these exercises to build computer vision skills:

Exercise 37.1: Player Detection

Use YOLOv8 to detect all players in a broadcast frame. Count how many players from each team are detected based on jersey color clustering.

Exercise 37.2: Homography Estimation

Manually identify 4+ pitch keypoints in a broadcast frame. Calculate the homography matrix and transform detected player positions to pitch coordinates.

Exercise 37.3: Simple Tracker

Implement an IoU-based tracker to follow players across 100 frames. Calculate the total distance traveled by each tracked player.

Exercise 37.4: Pose Analysis

Use MediaPipe to extract poses from player crops. Calculate knee angles and body lean for a sprinting player vs. a jogging player.

Exercise 37.5: Ball Tracking with Kalman Filter

Implement a Kalman filter for ball tracking that handles occlusions. Process a 30-second clip where the ball is occluded multiple times (by players or during aerial duels). Evaluate the prediction accuracy during occlusion periods.

Hint

Start with a simple constant-velocity model. The state vector should include position (x, y) and velocity (vx, vy). Tune the process noise (Q) and measurement noise (R) matrices based on typical ball movement patterns.

Exercise 37.6: Team Formation Detection

Using player tracking data from a full match, detect team formations at different phases of play. Cluster player positions to identify whether a team is playing 4-3-3, 4-4-2, or 3-5-2 in attacking vs. defensive phases.

Hint

Normalize positions relative to the team's centroid. Use hierarchical clustering on defensive-phase positions to identify formation templates. Compare results to manually annotated formations.

Exercise 37.7: Speed Zone Analysis

Calculate speed profiles for all players in a match. Classify each movement into zones: walking (<7 km/h), jogging (7-14 km/h), running (14-21 km/h), high-speed running (21-25 km/h), and sprinting (>25 km/h). Create visualizations showing time spent in each zone by position.

Exercise 37.8: Complete Pipeline Project

Build a complete CV pipeline that processes a 5-minute match clip and outputs: (1) Player tracking data in pitch coordinates, (2) Ball trajectory, (3) Team classifications, (4) Physical metrics summary, (5) Event detection for passes and shots. Validate against ground truth annotations if available.

Hint

Use the FootballCVPipeline class as a starting point. Focus on one component at a time—get detection working first, then tracking, then homography transformation. Use sample frames to validate each stage before processing the full video.

Summary

Key Takeaways

Object detection (YOLO, Faster R-CNN) identifies players and ball in video frames
Team classification uses jersey color clustering to separate teams
Homography transforms pixel coordinates to real-world pitch positions
Multi-object tracking (SORT, ByteTrack) connects detections across frames
Ball tracking with Kalman filters handles occlusions and predicts trajectory
Pose estimation extracts body keypoints for movement analysis
Action recognition detects events like passes, shots, and tackles from video
Open-source tools like Supervision and SoccerNet accelerate development

Common Pitfalls

Broadcast camera movement: Camera panning/zooming invalidates homography—recalibrate frequently
Similar jersey colors: Teams with similar colors (both in white) break clustering-based team classification
Crowd interference: Stadium crowds can trigger false detections—use pitch mask or confidence thresholds
Goalkeeper detection: GK uniforms differ from outfield players—may need separate detector
Ball occlusion: Ball frequently hidden during headers, tackles, and crowded areas
ID switching: Trackers lose IDs during player collisions or crossing paths
Frame rate misalignment: Ensure consistent FPS for accurate speed calculations
Lens distortion: Wide-angle broadcast lenses distort coordinates near frame edges

Essential Libraries

Python Libraries:

ultralytics - YOLOv8 detection and tracking
supervision - CV pipeline utilities
opencv-python - Video I/O and image processing
mediapipe - Pose estimation
filterpy - Kalman filter implementation
SoccerNet - Football-specific benchmarks
torch/torchvision - Deep learning framework

R Packages:

reticulate - Python interop for CV models
opencv - Basic image operations
magick - Image manipulation
imager - Image processing
keras3/tensorflow - Deep learning
gganimate - Animated visualizations

GPU Requirements and Performance

Computer vision workloads are computationally intensive. Here are typical performance benchmarks:

Task	CPU (i7)	GPU (RTX 3080)	Recommended
YOLOv8n detection	~5 FPS	~150 FPS	GPU for real-time
YOLOv8m detection	~2 FPS	~80 FPS	GPU required
ByteTrack tracking	~100 FPS	~100 FPS	CPU sufficient
Pose estimation	~10 FPS	~60 FPS	GPU preferred
Homography (per frame)	~50 FPS	~50 FPS	CPU sufficient

Data Quality Considerations

The accuracy of your CV pipeline depends heavily on input quality:

Resolution: Higher resolution (1080p+) improves small object detection (ball)
Frame rate: 25-30 FPS minimum for tracking; 50+ FPS for precise speed calculations
Compression: Heavy compression artifacts degrade detection accuracy
Camera angle: Tactical camera (high, centered) is ideal for tracking; broadcast cameras vary
Lighting: Night games with shadows and uneven lighting challenge color-based classification

Integration with Analytics Pipelines

CV outputs integrate with broader analytics workflows:

cv_integration

# Python: Integrating CV outputs with analytics
import pandas as pd
import numpy as np

# Load CV pipeline outputs
tracking_data = pd.read_csv("output/tracks.csv")
ball_positions = pd.read_csv("output/ball_positions.csv")
physical_metrics = pd.read_csv("output/physical_summary.csv")

def calculate_possession(ball_df, tracks_df):
    """Calculate possession percentage from tracking data."""
    possession_frames = []

    for _, ball in ball_df.iterrows():
        frame_tracks = tracks_df[tracks_df["frame"] == ball["frame"]]

        if len(frame_tracks) == 0:
            continue

        # Find nearest player to ball
        frame_tracks = frame_tracks.copy()
        frame_tracks["cx"] = (frame_tracks["x1"] + frame_tracks["x2"]) / 2
        frame_tracks["cy"] = (frame_tracks["y1"] + frame_tracks["y2"]) / 2
        frame_tracks["dist"] = np.sqrt(
            (ball["x"] - frame_tracks["cx"])**2 +
            (ball["y"] - frame_tracks["cy"])**2
        )

        nearest = frame_tracks.loc[frame_tracks["dist"].idxmin()]
        possession_frames.append({
            "frame": ball["frame"],
            "team": nearest.get("team", "unknown")
        })

    poss_df = pd.DataFrame(possession_frames)
    return poss_df["team"].value_counts(normalize=True) * 100

possession = calculate_possession(ball_positions, tracking_data)
print("Possession %:")
print(possession)
# R: Integrating CV outputs with analytics
library(tidyverse)

# Load CV pipeline outputs
tracking_data <- read_csv("output/tracks.csv")
ball_positions <- read_csv("output/ball_positions.csv")
physical_metrics <- read_csv("output/physical_summary.csv")

# Calculate team possession
calculate_possession <- function(ball_df, tracks_df) {
    ball_df %>%
        left_join(
            tracks_df %>% select(frame, track_id, team, x1, y1, x2, y2),
            by = "frame"
        ) %>%
        mutate(
            # Distance from ball to player center
            player_cx = (x1 + x2) / 2,
            player_cy = (y1 + y2) / 2,
            dist_to_ball = sqrt((x - player_cx)^2 + (y - player_cy)^2)
        ) %>%
        group_by(frame) %>%
        slice_min(dist_to_ball, n = 1) %>%
        ungroup() %>%
        count(team) %>%
        mutate(possession_pct = n / sum(n) * 100)
}

possession <- calculate_possession(ball_positions, tracking_data)
print(possession)

Output

Possession %:
Team A    54.3
Team B    45.7
Name: team, dtype: float64

Computer vision opens up new possibilities for football analysis from video. The techniques covered in this chapter—object detection, tracking, homography transformation, and pose estimation—form the foundation for extracting rich analytical data from broadcast and tactical footage. In the next chapter, we'll explore natural language processing applications in football analytics, including match report generation and sentiment analysis.

Capstone - Complete Analytics System

Computer Vision for Football

Learning Objectives

Prerequisites

Computer Vision Fundamentals

Player and Ball Detection

Team Classification

Homography and Pitch Mapping

What is Homography?

Automatic Pitch Line Detection

Multi-Object Tracking

Using ByteTrack for Robust Tracking

Ball Detection and Tracking

Ball Detection Challenges

Speed and Distance Calculation

Action Recognition

SoccerNet Action Spotting

Case Study: Building a Complete CV Pipeline

Pose Estimation

Open Source Tools

Practice Exercises

Hands-On Practice

Summary

Key Takeaways

Common Pitfalls

Essential Libraries

GPU Requirements and Performance

Data Quality Considerations

Integration with Analytics Pipelines

On This Page

Exercises

Chapter Info