Capstone - Complete Analytics System
Computer Vision for Football
Computer vision is transforming football analytics by enabling automated analysis of broadcast footage. From detecting players and ball to estimating poses and tracking movements, these techniques unlock insights that were previously impossible to obtain at scale.
Learning Objectives
- Understand the fundamentals of computer vision for sports
- Implement object detection for players and ball
- Apply homography transformations to map video to pitch coordinates
- Use pose estimation to analyze player movements
- Build basic tracking pipelines with OpenCV
- Work with pre-trained models and open-source tools
Prerequisites
This chapter requires familiarity with Python and basic image processing concepts. We'll use OpenCV, PyTorch/TensorFlow, and specialized sports CV libraries.
Computer Vision Fundamentals
Computer vision enables machines to interpret visual information. For football, this means converting broadcast video into structured data about player positions, movements, and actions.
Locate and identify objects (players, ball, referees) in images with bounding boxes.
Models: YOLO, Faster R-CNN, SSD
Follow detected objects across video frames to build trajectories over time.
Methods: SORT, DeepSORT, ByteTrack
Detect body keypoints to analyze player poses, movements, and actions.
Models: OpenPose, MediaPipe, HRNet
# Python: Basic image processing with OpenCV
import cv2
import numpy as np
import matplotlib.pyplot as plt
# Load a football frame
img = cv2.imread("match_frame.jpg")
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
print(f"Dimensions: {img.shape[1]} x {img.shape[0]}")
# Basic preprocessing
# Resize to standard dimensions
resized = cv2.resize(img, (1280, 720))
# Adjust brightness and contrast
alpha = 1.2 # Contrast
beta = 10 # Brightness
adjusted = cv2.convertScaleAbs(resized, alpha=alpha, beta=beta)
# Edge detection for pitch lines
gray = cv2.cvtColor(resized, cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(gray, 50, 150)
# Hough line detection for pitch lines
lines = cv2.HoughLinesP(edges, 1, np.pi/180, threshold=100,
minLineLength=100, maxLineGap=10)
# Draw detected lines
line_img = resized.copy()
if lines is not None:
for line in lines:
x1, y1, x2, y2 = line[0]
cv2.line(line_img, (x1, y1), (x2, y2), (0, 255, 0), 2)
# Display results
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
axes[0].imshow(cv2.cvtColor(resized, cv2.COLOR_BGR2RGB))
axes[0].set_title("Original")
axes[1].imshow(edges, cmap="gray")
axes[1].set_title("Edge Detection")
axes[2].imshow(cv2.cvtColor(line_img, cv2.COLOR_BGR2RGB))
axes[2].set_title("Pitch Lines")
plt.tight_layout()
plt.show()# R: Basic image processing with magick
library(magick)
library(tidyverse)
# Load a football frame
img <- image_read("match_frame.jpg")
# Basic image info
info <- image_info(img)
cat("Dimensions:", info$width, "x", info$height, "\n")
# Basic preprocessing
processed <- img %>%
image_resize("1280x720") %>% # Standardize size
image_modulate(brightness = 110) %>% # Adjust brightness
image_contrast(sharpen = 1) # Enhance contrast
# Edge detection (for pitch line detection)
edges <- img %>%
image_convert(colorspace = "gray") %>%
image_edge(radius = 2)
# Save processed images
image_write(processed, "processed_frame.jpg")
image_write(edges, "edges_frame.jpg")
# Note: For advanced CV in R, use reticulate to call Python
# library(reticulate)
# cv2 <- import("cv2")Dimensions: 1920 x 1080Player and Ball Detection
Object detection identifies players, the ball, and referees in video frames. Modern deep learning models like YOLO (You Only Look Once) provide real-time detection capabilities.
# Python: Player detection with YOLOv8
from ultralytics import YOLO
import cv2
import numpy as np
# Load pre-trained YOLO model
model = YOLO("yolov8n.pt") # Nano model for speed
# Run detection
results = model("match_frame.jpg")
# Process results
for result in results:
boxes = result.boxes
# Filter for person class (0) and sports ball (32)
for box in boxes:
cls = int(box.cls[0])
conf = float(box.conf[0])
x1, y1, x2, y2 = box.xyxy[0].cpu().numpy()
if cls == 0: # Person
print(f"Player: ({x1:.0f}, {y1:.0f}) to ({x2:.0f}, {y2:.0f}), conf: {conf:.2f}")
elif cls == 32: # Sports ball
print(f"Ball: ({x1:.0f}, {y1:.0f}) to ({x2:.0f}, {y2:.0f}), conf: {conf:.2f}")
# Visualize detections
annotated_frame = results[0].plot()
cv2.imwrite("detections.jpg", annotated_frame)
print(f"\nTotal people detected: {len([b for b in boxes if int(b.cls[0]) == 0])}")# R: Object detection via reticulate
library(reticulate)
# Use Python YOLO implementation
yolo <- import("ultralytics")
cv2 <- import("cv2")
np <- import("numpy")
# Load pre-trained model
model <- yolo$YOLO("yolov8n.pt") # Nano model for speed
# Run detection on frame
results <- model("match_frame.jpg")
# Extract detections
detections <- results[[1]]$boxes$data$cpu()$numpy()
# Filter for person class (class 0 in COCO)
persons <- detections[detections[, 6] == 0, ]
cat("Detected", nrow(persons), "people in frame\n")
# Access bounding boxes
for (i in seq_len(nrow(persons))) {
x1 <- persons[i, 1]
y1 <- persons[i, 2]
x2 <- persons[i, 3]
y2 <- persons[i, 4]
conf <- persons[i, 5]
cat(sprintf("Person %d: (%.0f, %.0f) to (%.0f, %.0f), conf: %.2f\n",
i, x1, y1, x2, y2, conf))
}Player: (234, 156) to (289, 298), conf: 0.92
Player: (567, 189) to (612, 334), conf: 0.89
Ball: (445, 412) to (462, 428), conf: 0.76
Total people detected: 22Team Classification
After detecting players, we need to classify them by team. This is typically done using jersey color clustering or dedicated classification models.
# Python: Team classification by jersey color
import cv2
import numpy as np
from sklearn.cluster import KMeans
def extract_jersey_color(img, bbox):
"""Extract dominant jersey color from player bounding box."""
x1, y1, x2, y2 = map(int, bbox)
height = y2 - y1
# Focus on torso region (upper-middle of bounding box)
torso_y1 = int(y1 + height * 0.2)
torso_y2 = int(y1 + height * 0.5)
# Crop jersey region
jersey_region = img[torso_y1:torso_y2, x1:x2]
if jersey_region.size == 0:
return None
# Reshape to list of pixels
pixels = jersey_region.reshape(-1, 3)
# K-means to find dominant color
kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)
kmeans.fit(pixels)
# Get most common cluster center
labels, counts = np.unique(kmeans.labels_, return_counts=True)
dominant_idx = labels[np.argmax(counts)]
dominant_color = kmeans.cluster_centers_[dominant_idx]
return dominant_color
def classify_teams(player_colors):
"""Classify players into teams based on jersey colors."""
colors = np.array([c for c in player_colors if c is not None])
if len(colors) < 3:
return None
# Cluster into 3 groups (2 teams + refs/others)
kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)
labels = kmeans.fit_predict(colors)
return labels, kmeans.cluster_centers_
# Example usage
img = cv2.imread("match_frame.jpg")
# Assume we have detections from YOLO
player_bboxes = [
[234, 156, 289, 298],
[567, 189, 612, 334],
# ... more players
]
# Extract colors for all players
colors = [extract_jersey_color(img, bbox) for bbox in player_bboxes]
# Classify teams
team_labels, team_colors = classify_teams(colors)
print(f"Team 1 color (BGR): {team_colors[0]}")
print(f"Team 2 color (BGR): {team_colors[1]}")# R: Team classification by jersey color
library(tidyverse)
# Function to extract dominant color from bounding box
extract_jersey_color <- function(img, bbox) {
# Crop to upper body (jersey area)
x1 <- bbox[1]; y1 <- bbox[2]; x2 <- bbox[3]; y2 <- bbox[4]
height <- y2 - y1
# Focus on torso region (roughly middle third)
torso_y1 <- y1 + height * 0.2
torso_y2 <- y1 + height * 0.5
# Extract region and get dominant color
# (Simplified - actual implementation would use k-means)
c(r = 150, g = 50, b = 50) # Example red jersey
}
# Cluster players by color
classify_teams <- function(player_colors) {
# K-means clustering on RGB values
km <- kmeans(player_colors, centers = 3) # 2 teams + referees
# Return cluster assignments
km$cluster
}Homography and Pitch Mapping
To convert pixel coordinates to real-world pitch coordinates, we use homography transformations. This requires identifying corresponding points between the video frame and a standard pitch template.
What is Homography?
A homography is a transformation that maps points from one plane to another. In football, we map the camera view (with perspective distortion) to a top-down 2D pitch representation using pitch markings as reference points.
# Python: Homography transformation
import cv2
import numpy as np
# Define corresponding points
# Source: pixel coordinates from video frame (detected pitch markings)
src_points = np.array([
[324, 189], # Corner flag
[856, 201], # Penalty area corner
[1234, 312], # Center circle point
[567, 445], # 6-yard box corner
], dtype=np.float32)
# Destination: pitch coordinates (in meters, standard 105x68 pitch)
dst_points = np.array([
[0, 0], # Corner
[16.5, 0], # Penalty area corner
[52.5, 34], # Center
[5.5, 13.84] # 6-yard box corner
], dtype=np.float32)
# Scale for visualization (pixels per meter)
scale = 10
dst_points = dst_points * scale
# Calculate homography matrix
H, mask = cv2.findHomography(src_points, dst_points)
print("Homography Matrix:")
print(H)
# Function to transform points
def transform_to_pitch(pixel_coords, H):
"""Transform pixel coordinates to pitch coordinates."""
pts = np.array([[pixel_coords]], dtype=np.float32)
transformed = cv2.perspectiveTransform(pts, H)
return transformed[0][0] / scale # Unscale to meters
# Transform player positions
player_pixels = [
[500, 300],
[750, 250],
[1000, 400]
]
for px in player_pixels:
pitch_pos = transform_to_pitch(px, H)
print(f"Pixel {px} -> Pitch ({pitch_pos[0]:.1f}, {pitch_pos[1]:.1f}) meters")
# Create pitch visualization with transformed points
def create_pitch_overlay(H, detections, pitch_size=(1050, 680)):
"""Create top-down pitch view with player positions."""
pitch = np.ones((pitch_size[1], pitch_size[0], 3), dtype=np.uint8) * 34
# Draw pitch markings (simplified)
cv2.rectangle(pitch, (0, 0), (pitch_size[0]-1, pitch_size[1]-1),
(255, 255, 255), 2)
cv2.line(pitch, (pitch_size[0]//2, 0), (pitch_size[0]//2, pitch_size[1]),
(255, 255, 255), 2)
cv2.circle(pitch, (pitch_size[0]//2, pitch_size[1]//2), 91,
(255, 255, 255), 2)
# Transform and plot detections
for det in detections:
px = [(det[0] + det[2])/2, det[3]] # Bottom center of bbox
pitch_pos = transform_to_pitch(px, H)
x = int(pitch_pos[0] * scale)
y = int(pitch_pos[1] * scale)
if 0 <= x < pitch_size[0] and 0 <= y < pitch_size[1]:
cv2.circle(pitch, (x, y), 8, (0, 0, 255), -1)
return pitch# R: Homography transformation (via reticulate)
library(reticulate)
cv2 <- import("cv2")
np <- import("numpy")
# Define corresponding points
# Source: pixel coordinates from video frame
src_points <- np$array(list(
c(324, 189), # Corner flag
c(856, 201), # Penalty spot
c(1234, 312), # Center circle point
c(567, 445) # Another reference point
), dtype = "float32")
# Destination: pitch coordinates (in meters, 105x68 pitch)
dst_points <- np$array(list(
c(0, 0), # Corner
c(11, 34), # Penalty spot
c(52.5, 34), # Center
c(16.5, 13.84) # Box corner
), dtype = "float32")
dst_points <- dst_points * 10 # Scale for visualization
# Calculate homography matrix
H <- cv2$findHomography(src_points, dst_points)[[1]]
# Transform a player position
player_pixel <- np$array(list(c(500, 300)), dtype = "float32")
player_pixel <- np$reshape(player_pixel, c(1L, 1L, 2L))
player_pitch <- cv2$perspectiveTransform(player_pixel, H)
cat("Player pitch position:", player_pitch[1, 1, 1], ",",
player_pitch[1, 1, 2], "meters\n")Homography Matrix:
[[ 1.23e-01 -2.45e-02 3.67e+01]
[ 4.56e-03 1.89e-01 -1.23e+01]
[ 5.67e-05 -1.23e-04 1.00e+00]]
Pixel [500, 300] -> Pitch (32.4, 21.8) meters
Pixel [750, 250] -> Pitch (48.2, 18.5) meters
Pixel [1000, 400] -> Pitch (67.1, 35.2) metersAutomatic Pitch Line Detection
# Python: Automatic pitch keypoint detection
import cv2
import numpy as np
def detect_pitch_keypoints(frame):
"""Detect pitch keypoints for homography estimation."""
# Convert to HSV for grass detection
hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
# Mask for grass (green)
lower_green = np.array([35, 50, 50])
upper_green = np.array([85, 255, 255])
grass_mask = cv2.inRange(hsv, lower_green, upper_green)
# Detect white lines on grass
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
_, white_mask = cv2.threshold(gray, 200, 255, cv2.THRESH_BINARY)
# Combine masks - white on grass
line_mask = cv2.bitwise_and(white_mask, grass_mask)
# Clean up with morphology
kernel = np.ones((3, 3), np.uint8)
line_mask = cv2.morphologyEx(line_mask, cv2.MORPH_CLOSE, kernel)
line_mask = cv2.morphologyEx(line_mask, cv2.MORPH_OPEN, kernel)
# Detect lines with Hough transform
lines = cv2.HoughLinesP(line_mask, 1, np.pi/180,
threshold=100, minLineLength=50, maxLineGap=20)
# Find line intersections (potential keypoints)
keypoints = []
if lines is not None:
for i in range(len(lines)):
for j in range(i+1, len(lines)):
pt = line_intersection(lines[i][0], lines[j][0])
if pt is not None:
keypoints.append(pt)
return lines, keypoints
def line_intersection(line1, line2):
"""Find intersection point of two lines."""
x1, y1, x2, y2 = line1
x3, y3, x4, y4 = line2
denom = (x1-x2)*(y3-y4) - (y1-y2)*(x3-x4)
if abs(denom) < 1e-10:
return None
t = ((x1-x3)*(y3-y4) - (y1-y3)*(x3-x4)) / denom
px = x1 + t*(x2-x1)
py = y1 + t*(y2-y1)
# Check if intersection is within frame
if 0 <= px <= 1920 and 0 <= py <= 1080:
return (int(px), int(py))
return None
# For production, use specialized models like:
# - SoccerNet Camera Calibration
# - Sports Camera Calibration networks
# - Keypoint detection CNNs# R: Pitch line detection (conceptual)
# Actual implementation requires deep learning models
detect_pitch_lines <- function(frame) {
# 1. Convert to appropriate color space
# 2. Apply edge detection
# 3. Use Hough transform for lines
# 4. Filter lines by orientation and position
# 5. Match to pitch template
# This is typically done with specialized models like
# SoccerNet camera calibration
list(
lines = data.frame(
x1 = c(100, 500), y1 = c(200, 200),
x2 = c(100, 500), y2 = c(600, 600)
),
intersections = data.frame(
x = c(100, 500),
y = c(200, 200)
)
)
}Multi-Object Tracking
Tracking connects detections across frames to build continuous trajectories. This enables analysis of player movements, distances covered, and speeds.
# Python: Multi-object tracking with SORT
import numpy as np
from scipy.optimize import linear_sum_assignment
class SimpleTracker:
"""Simple IoU-based multi-object tracker."""
def __init__(self, max_age=5, min_hits=3, iou_threshold=0.3):
self.max_age = max_age
self.min_hits = min_hits
self.iou_threshold = iou_threshold
self.tracks = {}
self.next_id = 0
self.frame_count = 0
def iou(self, box1, box2):
"""Calculate IoU between two boxes."""
x1 = max(box1[0], box2[0])
y1 = max(box1[1], box2[1])
x2 = min(box1[2], box2[2])
y2 = min(box1[3], box2[3])
inter = max(0, x2-x1) * max(0, y2-y1)
area1 = (box1[2]-box1[0]) * (box1[3]-box1[1])
area2 = (box2[2]-box2[0]) * (box2[3]-box2[1])
return inter / (area1 + area2 - inter + 1e-6)
def update(self, detections):
"""Update tracks with new detections."""
self.frame_count += 1
if len(self.tracks) == 0:
# Initialize tracks
for det in detections:
self.tracks[self.next_id] = {
"bbox": det,
"age": 0,
"hits": 1,
"history": [det.copy()]
}
self.next_id += 1
return self.get_active_tracks()
# Build cost matrix (negative IoU)
track_ids = list(self.tracks.keys())
cost_matrix = np.zeros((len(track_ids), len(detections)))
for i, tid in enumerate(track_ids):
for j, det in enumerate(detections):
cost_matrix[i, j] = -self.iou(self.tracks[tid]["bbox"], det)
# Hungarian algorithm for optimal assignment
row_ind, col_ind = linear_sum_assignment(cost_matrix)
# Update matched tracks
matched_tracks = set()
matched_dets = set()
for i, j in zip(row_ind, col_ind):
if -cost_matrix[i, j] >= self.iou_threshold:
tid = track_ids[i]
self.tracks[tid]["bbox"] = detections[j]
self.tracks[tid]["age"] = 0
self.tracks[tid]["hits"] += 1
self.tracks[tid]["history"].append(detections[j].copy())
matched_tracks.add(tid)
matched_dets.add(j)
# Age unmatched tracks
for tid in track_ids:
if tid not in matched_tracks:
self.tracks[tid]["age"] += 1
# Remove old tracks
self.tracks = {k: v for k, v in self.tracks.items()
if v["age"] <= self.max_age}
# Create new tracks for unmatched detections
for j, det in enumerate(detections):
if j not in matched_dets:
self.tracks[self.next_id] = {
"bbox": det,
"age": 0,
"hits": 1,
"history": [det.copy()]
}
self.next_id += 1
return self.get_active_tracks()
def get_active_tracks(self):
"""Return tracks with enough hits."""
return {k: v for k, v in self.tracks.items()
if v["hits"] >= self.min_hits}
# Example usage
tracker = SimpleTracker()
# Process video frames
cap = cv2.VideoCapture("match_video.mp4")
model = YOLO("yolov8n.pt")
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
# Detect players
results = model(frame)
detections = []
for box in results[0].boxes:
if int(box.cls[0]) == 0: # Person class
detections.append(box.xyxy[0].cpu().numpy())
# Update tracker
if detections:
active_tracks = tracker.update(np.array(detections))
# Draw tracks
for tid, track in active_tracks.items():
x1, y1, x2, y2 = map(int, track["bbox"])
cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
cv2.putText(frame, f"ID:{tid}", (x1, y1-10),
cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
cap.release()# R: Simple tracking with IoU matching
library(tidyverse)
# Intersection over Union for bounding boxes
iou <- function(box1, box2) {
x1 <- max(box1[1], box2[1])
y1 <- max(box1[2], box2[2])
x2 <- min(box1[3], box2[3])
y2 <- min(box1[4], box2[4])
inter_area <- max(0, x2 - x1) * max(0, y2 - y1)
box1_area <- (box1[3] - box1[1]) * (box1[4] - box1[2])
box2_area <- (box2[3] - box2[1]) * (box2[4] - box2[2])
union_area <- box1_area + box2_area - inter_area
inter_area / union_area
}
# Simple IoU-based tracker
simple_tracker <- function(prev_tracks, current_dets, iou_threshold = 0.3) {
if (length(prev_tracks) == 0) {
# Initialize new tracks
return(lapply(seq_len(nrow(current_dets)), function(i) {
list(id = i, bbox = current_dets[i, ], history = list())
}))
}
# Match current detections to existing tracks
matches <- list()
for (i in seq_len(nrow(current_dets))) {
best_iou <- 0
best_track <- NULL
for (track in prev_tracks) {
score <- iou(track$bbox, current_dets[i, ])
if (score > best_iou && score > iou_threshold) {
best_iou <- score
best_track <- track$id
}
}
if (!is.null(best_track)) {
matches[[as.character(best_track)]] <- i
}
}
matches
}Track 1: 245 frames, distance: 156.3m
Track 2: 238 frames, distance: 203.7m
Track 3: 251 frames, distance: 178.2mUsing ByteTrack for Robust Tracking
# Python: ByteTrack for robust tracking
# pip install bytetracker
from bytetracker import BYTETracker
import numpy as np
# Initialize ByteTrack
tracker = BYTETracker(
track_thresh=0.5, # Detection confidence threshold
track_buffer=30, # Frames to keep lost tracks
match_thresh=0.8, # IoU threshold for matching
frame_rate=25 # Video frame rate
)
# Process video with ByteTrack
def process_video_with_bytetrack(video_path, detector):
"""Process video and track all players."""
cap = cv2.VideoCapture(video_path)
fps = cap.get(cv2.CAP_PROP_FPS)
all_tracks = []
frame_idx = 0
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
# Get detections [x1, y1, x2, y2, conf, class]
results = detector(frame)
dets = []
for box in results[0].boxes:
if int(box.cls[0]) == 0: # Person
x1, y1, x2, y2 = box.xyxy[0].cpu().numpy()
conf = float(box.conf[0])
dets.append([x1, y1, x2, y2, conf])
dets = np.array(dets) if dets else np.empty((0, 5))
# Update tracker
if len(dets) > 0:
online_targets = tracker.update(
dets,
[frame.shape[0], frame.shape[1]],
[frame.shape[0], frame.shape[1]]
)
# Store tracking results
for t in online_targets:
all_tracks.append({
"frame": frame_idx,
"track_id": t.track_id,
"bbox": t.tlbr, # [x1, y1, x2, y2]
"score": t.score
})
frame_idx += 1
cap.release()
return pd.DataFrame(all_tracks)
# Run tracking
tracks_df = process_video_with_bytetrack("match.mp4", model)
print(f"Tracked {tracks_df['track_id'].nunique()} unique players")# R: ByteTrack via Python
# ByteTrack is state-of-the-art for multi-object tracking
# Use reticulate to access Python implementation
library(reticulate)
bytetrack <- import("bytetracker")
# Create tracker
tracker <- bytetrack$BYTETracker(
track_thresh = 0.5,
track_buffer = 30,
match_thresh = 0.8
)
# Process detections
# detections: Nx5 matrix [x1, y1, x2, y2, confidence]
tracks <- tracker$update(detections, frame_shape)Ball Detection and Tracking
Ball detection is particularly challenging due to the ball's small size, fast movement, and frequent occlusions. Specialized techniques are required beyond standard object detection.
Ball Detection Challenges
- Small size: Ball occupies only ~20-50 pixels in broadcast footage
- Motion blur: Fast-moving ball becomes elongated/blurred
- Occlusions: Ball frequently hidden by players
- Similar objects: Heads, advertisements can be confused with ball
# Python: Ball detection with specialized model
import cv2
import numpy as np
from ultralytics import YOLO
class BallDetector:
"""Specialized ball detection with tracking."""
def __init__(self, model_path="yolov8n.pt"):
self.model = YOLO(model_path)
self.ball_history = []
self.max_history = 10
self.kalman = self._init_kalman()
def _init_kalman(self):
"""Initialize Kalman filter for ball tracking."""
kf = cv2.KalmanFilter(4, 2) # 4 state vars, 2 measurements
# State: [x, y, vx, vy]
kf.measurementMatrix = np.array([
[1, 0, 0, 0],
[0, 1, 0, 0]
], dtype=np.float32)
kf.transitionMatrix = np.array([
[1, 0, 1, 0],
[0, 1, 0, 1],
[0, 0, 1, 0],
[0, 0, 0, 1]
], dtype=np.float32)
kf.processNoiseCov = np.eye(4, dtype=np.float32) * 0.03
return kf
def detect(self, frame):
"""Detect ball in frame."""
# Run YOLO detection
results = self.model(frame, classes=[32]) # Sports ball class
best_detection = None
best_confidence = 0
for result in results:
for box in result.boxes:
conf = float(box.conf[0])
if conf > best_confidence:
x1, y1, x2, y2 = box.xyxy[0].cpu().numpy()
center = ((x1 + x2) / 2, (y1 + y2) / 2)
size = ((x2 - x1) + (y2 - y1)) / 2
# Validate ball-like properties
if self._validate_ball(center, size):
best_detection = center
best_confidence = conf
# If no detection, use Kalman prediction
if best_detection is None and len(self.ball_history) > 0:
prediction = self.kalman.predict()
best_detection = (prediction[0, 0], prediction[1, 0])
best_confidence = 0.3 # Lower confidence for predicted
# Update tracker
if best_detection is not None:
measurement = np.array([[best_detection[0]], [best_detection[1]]], dtype=np.float32)
self.kalman.correct(measurement)
self.ball_history.append(best_detection)
if len(self.ball_history) > self.max_history:
self.ball_history.pop(0)
return best_detection, best_confidence
def _validate_ball(self, center, size):
"""Validate if detection is likely a ball."""
# Size constraints (ball is small)
if size < 10 or size > 100:
return False
# Trajectory consistency check
if len(self.ball_history) >= 2:
last_pos = self.ball_history[-1]
dx = abs(center[0] - last_pos[0])
dy = abs(center[1] - last_pos[1])
# Ball cant move too far in one frame
if dx > 200 or dy > 200:
return False
return True
def get_velocity(self, fps=25):
"""Calculate ball velocity from history."""
if len(self.ball_history) < 2:
return None
p1 = np.array(self.ball_history[-2])
p2 = np.array(self.ball_history[-1])
# Pixels per frame
displacement = np.linalg.norm(p2 - p1)
# Approximate: 1 pixel ≈ 0.1 meters (varies with camera)
speed_mps = displacement * 0.1 * fps
return speed_mps
# Usage
detector = BallDetector()
frame = cv2.imread("match_frame.jpg")
position, confidence = detector.detect(frame)
if position:
print(f"Ball at ({position[0]:.0f}, {position[1]:.0f}), conf: {confidence:.2f}")
velocity = detector.get_velocity()
if velocity:
print(f"Ball speed: {velocity:.1f} m/s ({velocity * 3.6:.1f} km/h)")# R: Ball detection strategies (conceptual)
library(tidyverse)
# Strategy 1: Color-based detection
detect_ball_by_color <- function(frame) {
# Convert to HSV and look for white/ball-colored pixels
# Filter by size and circularity
# Works for white balls on grass
list(x = 500, y = 300, confidence = 0.8)
}
# Strategy 2: Motion-based detection
detect_ball_by_motion <- function(frames) {
# Ball moves differently than players
# Look for small, fast-moving objects
# Smooth trajectory constraint
trajectories <- tibble()
trajectories
}
# Strategy 3: Trajectory prediction
# When ball is occluded, predict position using physics
predict_ball_position <- function(prev_positions, dt) {
# Simple ballistic model
n <- nrow(prev_positions)
if (n < 2) return(NULL)
# Velocity from last two positions
vx <- (prev_positions$x[n] - prev_positions$x[n-1]) / dt
vy <- (prev_positions$y[n] - prev_positions$y[n-1]) / dt
# Predict next position (with gravity for lofted balls)
g <- 9.8 # gravity
pred_x <- prev_positions$x[n] + vx * dt
pred_y <- prev_positions$y[n] + vy * dt + 0.5 * g * dt^2
list(x = pred_x, y = pred_y)
}Ball at (512, 389), conf: 0.76
Ball speed: 18.5 m/s (66.6 km/h)Speed and Distance Calculation
Once we have tracking data and homography transformation, we can calculate physical metrics like player speed, distance covered, and acceleration.
# Python: Physical metrics from tracking data
import numpy as np
import pandas as pd
def calculate_physical_metrics(tracks_df, H, fps=25, scale=10):
"""Calculate speed, distance, and acceleration from tracking data."""
def transform_point(px, py, H):
"""Transform pixel to pitch coordinates."""
pt = np.array([[[px, py]]], dtype=np.float32)
transformed = cv2.perspectiveTransform(pt, H)
return transformed[0, 0] / scale
# Transform all positions
tracks_df = tracks_df.copy()
tracks_df["foot_x"] = (tracks_df["x1"] + tracks_df["x2"]) / 2
tracks_df["foot_y"] = tracks_df["y2"]
# Apply homography transformation
pitch_coords = tracks_df.apply(
lambda row: transform_point(row["foot_x"], row["foot_y"], H),
axis=1
)
tracks_df["pitch_x"] = [c[0] for c in pitch_coords]
tracks_df["pitch_y"] = [c[1] for c in pitch_coords]
# Sort by track and frame
tracks_df = tracks_df.sort_values(["track_id", "frame"])
# Calculate displacement and speed per track
dt = 1 / fps
results = []
for track_id, group in tracks_df.groupby("track_id"):
group = group.reset_index(drop=True)
# Frame-to-frame displacement
dx = group["pitch_x"].diff()
dy = group["pitch_y"].diff()
displacement = np.sqrt(dx**2 + dy**2)
# Speed (m/s)
speed = displacement / dt
# Smooth speed (rolling average)
speed_smooth = speed.rolling(window=5, min_periods=1).mean()
# Acceleration
acceleration = speed.diff() / dt
# Movement classification
def classify_movement(s):
if pd.isna(s):
return "Unknown"
if s < 2:
return "Walking"
if s < 4:
return "Jogging"
if s < 6:
return "Running"
if s < 7:
return "High Speed"
return "Sprinting"
movement_type = speed_smooth.apply(classify_movement)
group["displacement"] = displacement
group["speed"] = speed
group["speed_smooth"] = speed_smooth
group["acceleration"] = acceleration
group["movement_type"] = movement_type
results.append(group)
detailed = pd.concat(results, ignore_index=True)
# Aggregate per player
summary = detailed.groupby("track_id").agg(
total_distance=("displacement", "sum"),
max_speed=("speed_smooth", "max"),
avg_speed=("speed_smooth", "mean"),
time_walking=("movement_type", lambda x: (x == "Walking").sum() / fps),
time_jogging=("movement_type", lambda x: (x == "Jogging").sum() / fps),
time_running=("movement_type", lambda x: (x == "Running").sum() / fps),
time_sprinting=("movement_type", lambda x: (x == "Sprinting").sum() / fps),
).reset_index()
# Count sprints (transitions into sprinting)
def count_sprints(group):
is_sprint = group["movement_type"] == "Sprinting"
return (is_sprint & ~is_sprint.shift(1, fill_value=False)).sum()
sprint_counts = detailed.groupby("track_id").apply(count_sprints).reset_index()
sprint_counts.columns = ["track_id", "sprints"]
summary = summary.merge(sprint_counts, on="track_id")
# Convert speed to km/h for display
summary["max_speed_kmh"] = summary["max_speed"] * 3.6
summary["avg_speed_kmh"] = summary["avg_speed"] * 3.6
return detailed, summary
# Example usage
print("Player Physical Summary:")
print("Track Distance Max Speed Sprints")
print(" 1 10.2 km 32.4 km/h 24")
print(" 2 11.8 km 28.7 km/h 18")
print(" 3 9.8 km 31.1 km/h 22")# R: Physical metrics from tracking data
library(tidyverse)
calculate_physical_metrics <- function(tracks_df, H, fps = 25, scale = 10) {
# tracks_df: frame, track_id, x1, y1, x2, y2
# Get bottom center of bbox (foot position)
tracks_df <- tracks_df %>%
mutate(
foot_x = (x1 + x2) / 2,
foot_y = y2
)
# Transform to pitch coordinates (need homography matrix H)
# Simplified: assume we have pitch_x, pitch_y already
# Calculate frame-to-frame displacement
metrics <- tracks_df %>%
arrange(track_id, frame) %>%
group_by(track_id) %>%
mutate(
dx = pitch_x - lag(pitch_x),
dy = pitch_y - lag(pitch_y),
displacement = sqrt(dx^2 + dy^2),
# Time between frames
dt = 1 / fps,
# Speed (m/s)
speed = displacement / dt,
# Smooth speed (rolling average)
speed_smooth = zoo::rollmean(speed, k = 5, fill = NA, align = "right"),
# Acceleration
acceleration = (speed - lag(speed)) / dt,
# Classify movement
movement_type = case_when(
speed_smooth < 2 ~ "Walking",
speed_smooth < 4 ~ "Jogging",
speed_smooth < 6 ~ "Running",
speed_smooth < 7 ~ "High Speed",
TRUE ~ "Sprinting"
)
) %>%
ungroup()
# Aggregate per player
player_summary <- metrics %>%
group_by(track_id) %>%
summarise(
total_distance = sum(displacement, na.rm = TRUE),
max_speed = max(speed_smooth, na.rm = TRUE),
avg_speed = mean(speed_smooth, na.rm = TRUE),
# Time in speed zones
time_walking = sum(movement_type == "Walking", na.rm = TRUE) / fps,
time_jogging = sum(movement_type == "Jogging", na.rm = TRUE) / fps,
time_running = sum(movement_type == "Running", na.rm = TRUE) / fps,
time_sprinting = sum(movement_type == "Sprinting", na.rm = TRUE) / fps,
# Sprint count
sprints = sum(movement_type == "Sprinting" &
lag(movement_type) != "Sprinting", na.rm = TRUE),
.groups = "drop"
)
return(list(detailed = metrics, summary = player_summary))
}
# Example output
cat("Player Physical Summary:\n")
cat("Player 1: 10.2 km total, max 32.4 km/h, 24 sprints\n")
cat("Player 2: 11.8 km total, max 28.7 km/h, 18 sprints\n")Player Physical Summary:
Track Distance Max Speed Sprints
1 10.2 km 32.4 km/h 24
2 11.8 km 28.7 km/h 18
3 9.8 km 31.1 km/h 22Action Recognition
Beyond detecting and tracking players, we can recognize specific actions like passes, shots, tackles, and headers using video understanding models.
# Python: Action recognition with deep learning
import torch
import torch.nn as nn
from torchvision.models.video import r3d_18
class FootballActionRecognizer:
"""Recognize football actions from video clips."""
def __init__(self, num_classes=10):
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Use pre-trained 3D ResNet
self.model = r3d_18(pretrained=True)
# Replace final layer for our classes
self.model.fc = nn.Linear(512, num_classes)
self.model.to(self.device)
self.model.eval()
self.action_labels = [
"pass", "shot", "dribble", "tackle", "header",
"cross", "clearance", "save", "foul", "other"
]
def preprocess_clip(self, frames, target_frames=16):
"""Preprocess video clip for model input."""
import torchvision.transforms as T
# Sample frames uniformly
indices = np.linspace(0, len(frames)-1, target_frames, dtype=int)
sampled = [frames[i] for i in indices]
# Transform
transform = T.Compose([
T.ToPILImage(),
T.Resize((112, 112)),
T.ToTensor(),
T.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
])
tensors = [transform(f) for f in sampled]
clip = torch.stack(tensors, dim=1) # [C, T, H, W]
return clip.unsqueeze(0).to(self.device) # [1, C, T, H, W]
def predict(self, frames):
"""Predict action from video frames."""
clip = self.preprocess_clip(frames)
with torch.no_grad():
outputs = self.model(clip)
probs = torch.softmax(outputs, dim=1)
pred_idx = torch.argmax(probs, dim=1).item()
confidence = probs[0, pred_idx].item()
return self.action_labels[pred_idx], confidence
def extract_action_clips(self, video_path, detections, window=32):
"""Extract clips around key events for classification."""
cap = cv2.VideoCapture(video_path)
fps = cap.get(cv2.CAP_PROP_FPS)
clips = []
for event in detections:
center_frame = event["frame"]
start = max(0, center_frame - window // 2)
cap.set(cv2.CAP_PROP_POS_FRAMES, start)
frames = []
for _ in range(window):
ret, frame = cap.read()
if not ret:
break
# Crop to player bounding box area
if "bbox" in event:
x1, y1, x2, y2 = map(int, event["bbox"])
pad = 50 # Add padding
x1 = max(0, x1 - pad)
y1 = max(0, y1 - pad)
x2 = min(frame.shape[1], x2 + pad)
y2 = min(frame.shape[0], y2 + pad)
frame = frame[y1:y2, x1:x2]
frames.append(frame)
if len(frames) >= 16:
clips.append({
"event": event,
"frames": frames
})
cap.release()
return clips
# Example usage
recognizer = FootballActionRecognizer()
# frames = [...] # List of video frames
# action, confidence = recognizer.predict(frames)
# print(f"Detected action: {action} ({confidence:.2f})")# R: Action recognition concepts
library(tidyverse)
# Action recognition typically uses:
# 1. Pose sequences over time
# 2. Optical flow features
# 3. 3D CNNs or Transformers
# Simple rule-based action detection from pose
detect_action_from_pose <- function(pose_sequence) {
# Analyze keypoint movements over time
n_frames <- length(pose_sequence)
# Example: Header detection
# - Head height increases then decreases
# - Arms spread for balance
# - Body arches backward
head_heights <- sapply(pose_sequence, function(p) p$nose$y)
if (n_frames >= 10) {
# Look for head rising pattern
peak_idx <- which.max(head_heights)
if (peak_idx > 2 && peak_idx < n_frames - 2) {
rise <- mean(diff(head_heights[1:peak_idx]))
fall <- mean(diff(head_heights[peak_idx:n_frames]))
if (rise < -5 && fall > 5) { # Negative = upward in image coords
return("Header")
}
}
}
# Tackle detection: low body position, extended leg
# Shot detection: leg swing pattern
# etc.
return("Unknown")
}SoccerNet Action Spotting
SoccerNet provides pre-trained models for action spotting in broadcast footage:
# Python: SoccerNet action spotting
# pip install SoccerNet
from SoccerNet.Downloader import SoccerNetDownloader
from SoccerNet.Evaluation.ActionSpotting import evaluate
# Download action spotting data and models
mySoccerNetDownloader = SoccerNetDownloader(LocalDirectory="./SoccerNet")
mySoccerNetDownloader.downloadGames(files=["Labels-v2.json"], split=["train", "valid", "test"])
# Action classes in SoccerNet
SOCCERNET_ACTIONS = [
"Ball out of play",
"Clearance",
"Corner",
"Direct free-kick",
"Foul",
"Goal",
"Indirect free-kick",
"Kick-off",
"Offside",
"Penalty",
"Red card",
"Shots off target",
"Shots on target",
"Substitution",
"Throw-in",
"Yellow card",
"Yellow->red card"
]
print(f"SoccerNet has {len(SOCCERNET_ACTIONS)} action classes")# R: Using SoccerNet (via Python)
library(reticulate)
# SoccerNet provides:
# - Action spotting models
# - Camera calibration
# - Player tracking benchmarks
soccernet <- import("SoccerNet")
# Download pre-trained models
# soccernet.download("action-spotting")SoccerNet has 17 action classesCase Study: Building a Complete CV Pipeline
Let's build a complete pipeline that processes a match video to extract player tracking data and physical metrics.
# Python: Complete CV pipeline
import cv2
import numpy as np
import pandas as pd
from ultralytics import YOLO
from bytetracker import BYTETracker
from pathlib import Path
class FootballCVPipeline:
"""Complete computer vision pipeline for football analysis."""
def __init__(self, output_dir="./output"):
self.output_dir = Path(output_dir)
self.output_dir.mkdir(exist_ok=True)
# Initialize components
self.detector = YOLO("yolov8m.pt")
self.tracker = BYTETracker(track_thresh=0.5, track_buffer=30, match_thresh=0.8)
self.ball_detector = BallDetector()
self.homography = None
self.team_colors = None
def process_video(self, video_path, max_frames=None):
"""Process video and extract all data."""
print(f"Processing video: {video_path}")
cap = cv2.VideoCapture(video_path)
fps = cap.get(cv2.CAP_PROP_FPS)
total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
if max_frames:
total_frames = min(total_frames, max_frames)
# Results storage
all_tracks = []
all_ball_positions = []
sample_frames = []
frame_idx = 0
print(f"Processing {total_frames} frames at {fps} FPS...")
while cap.isOpened() and frame_idx < total_frames:
ret, frame = cap.read()
if not ret:
break
# Save sample frames for homography estimation
if frame_idx % 100 == 0:
sample_frames.append((frame_idx, frame.copy()))
print(f" Frame {frame_idx}/{total_frames}")
# Detect players
results = self.detector(frame, classes=[0]) # Person class
detections = []
for box in results[0].boxes:
x1, y1, x2, y2 = box.xyxy[0].cpu().numpy()
conf = float(box.conf[0])
detections.append([x1, y1, x2, y2, conf])
detections = np.array(detections) if detections else np.empty((0, 5))
# Track players
if len(detections) > 0:
online_targets = self.tracker.update(
detections,
[frame.shape[0], frame.shape[1]],
[frame.shape[0], frame.shape[1]]
)
for t in online_targets:
all_tracks.append({
"frame": frame_idx,
"track_id": t.track_id,
"x1": t.tlbr[0],
"y1": t.tlbr[1],
"x2": t.tlbr[2],
"y2": t.tlbr[3],
"score": t.score
})
# Detect ball
ball_pos, ball_conf = self.ball_detector.detect(frame)
if ball_pos:
all_ball_positions.append({
"frame": frame_idx,
"x": ball_pos[0],
"y": ball_pos[1],
"confidence": ball_conf
})
frame_idx += 1
cap.release()
# Convert to DataFrames
tracks_df = pd.DataFrame(all_tracks)
ball_df = pd.DataFrame(all_ball_positions)
print(f"\nTracked {tracks_df[\"track_id\"].nunique()} unique players")
print(f"Ball detected in {len(ball_df)} frames")
# Estimate homography from sample frames
if sample_frames:
self.homography = self._estimate_homography(sample_frames[0][1])
# Classify teams
if len(tracks_df) > 0 and len(sample_frames) > 0:
tracks_df = self._classify_teams(tracks_df, sample_frames[0][1])
# Calculate physical metrics
if self.homography is not None and len(tracks_df) > 0:
detailed, summary = calculate_physical_metrics(tracks_df, self.homography, fps)
tracks_df = detailed
summary.to_csv(self.output_dir / "physical_summary.csv", index=False)
print(f"\nPhysical metrics saved to {self.output_dir / \"physical_summary.csv\"}")
# Save results
tracks_df.to_csv(self.output_dir / "tracks.csv", index=False)
ball_df.to_csv(self.output_dir / "ball_positions.csv", index=False)
return tracks_df, ball_df
def _estimate_homography(self, frame):
"""Estimate homography from pitch lines."""
# Simplified: would use keypoint detection in production
# Return identity matrix as placeholder
print("Estimating homography (placeholder)...")
return np.eye(3, dtype=np.float32)
def _classify_teams(self, tracks_df, frame):
"""Classify players into teams."""
print("Classifying teams by jersey color...")
# Get sample of player crops
sample_tracks = tracks_df.drop_duplicates("track_id").head(22)
colors = []
for _, row in sample_tracks.iterrows():
color = extract_jersey_color(frame, [row["x1"], row["y1"], row["x2"], row["y2"]])
colors.append(color)
if colors:
team_labels, _ = classify_teams(colors)
# Map back to all tracks
label_map = dict(zip(sample_tracks["track_id"], team_labels))
tracks_df["team"] = tracks_df["track_id"].map(label_map)
return tracks_df
# Run pipeline
pipeline = FootballCVPipeline(output_dir="./match_analysis")
tracks, ball = pipeline.process_video("match_clip.mp4", max_frames=1000)
print("\n=== Pipeline Complete ===")
print(f"Output saved to: ./match_analysis/")# R: Complete CV pipeline orchestration
library(tidyverse)
library(reticulate)
# This would typically call Python CV modules
run_cv_pipeline <- function(video_path, output_dir) {
# Step 1: Extract frames
cat("Extracting frames...\n")
# Step 2: Detect players and ball
cat("Running player detection...\n")
# Step 3: Classify teams
cat("Classifying teams by jersey color...\n")
# Step 4: Track across frames
cat("Running multi-object tracking...\n")
# Step 5: Estimate homography
cat("Detecting pitch lines and computing homography...\n")
# Step 6: Transform to pitch coordinates
cat("Transforming to pitch coordinates...\n")
# Step 7: Calculate physical metrics
cat("Calculating speeds and distances...\n")
# Step 8: Detect actions
cat("Running action recognition...\n")
# Return results
list(
tracking_data = "tracks.csv",
physical_metrics = "metrics.csv",
actions = "actions.csv"
)
}Processing video: match_clip.mp4
Processing 1000 frames at 25.0 FPS...
Frame 0/1000
Frame 100/1000
Frame 200/1000
...
Frame 900/1000
Tracked 24 unique players
Ball detected in 856 frames
Estimating homography (placeholder)...
Classifying teams by jersey color...
Physical metrics saved to ./match_analysis/physical_summary.csv
=== Pipeline Complete ===
Output saved to: ./match_analysis/Pose Estimation
Pose estimation detects body keypoints (joints) to analyze player movements, running styles, and actions like tackles or headers.
# Python: Pose estimation with MediaPipe
import mediapipe as mp
import cv2
import numpy as np
# Initialize MediaPipe Pose
mp_pose = mp.solutions.pose
mp_draw = mp.solutions.drawing_utils
pose = mp_pose.Pose(
static_image_mode=False,
model_complexity=1,
min_detection_confidence=0.5,
min_tracking_confidence=0.5
)
def calculate_angle(p1, p2, p3):
"""Calculate angle at p2 between p1-p2-p3."""
v1 = np.array(p1) - np.array(p2)
v2 = np.array(p3) - np.array(p2)
cos_angle = np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2))
angle = np.arccos(np.clip(cos_angle, -1, 1))
return np.degrees(angle)
def analyze_player_pose(frame, bbox):
"""Extract pose from player bounding box."""
x1, y1, x2, y2 = map(int, bbox)
player_crop = frame[y1:y2, x1:x2]
if player_crop.size == 0:
return None
# Run pose estimation
rgb_crop = cv2.cvtColor(player_crop, cv2.COLOR_BGR2RGB)
results = pose.process(rgb_crop)
if not results.pose_landmarks:
return None
# Extract keypoints
landmarks = results.pose_landmarks.landmark
h, w = player_crop.shape[:2]
keypoints = {}
for idx, name in enumerate(mp_pose.PoseLandmark):
lm = landmarks[idx]
keypoints[name.name] = {
"x": lm.x * w + x1, # Convert to frame coords
"y": lm.y * h + y1,
"visibility": lm.visibility
}
# Calculate useful metrics
metrics = {}
# Knee angles (running form)
if all(keypoints[k]["visibility"] > 0.5 for k in
["LEFT_HIP", "LEFT_KNEE", "LEFT_ANKLE"]):
metrics["left_knee_angle"] = calculate_angle(
[keypoints["LEFT_HIP"]["x"], keypoints["LEFT_HIP"]["y"]],
[keypoints["LEFT_KNEE"]["x"], keypoints["LEFT_KNEE"]["y"]],
[keypoints["LEFT_ANKLE"]["x"], keypoints["LEFT_ANKLE"]["y"]]
)
# Body lean (acceleration indicator)
if all(keypoints[k]["visibility"] > 0.5 for k in
["NOSE", "LEFT_HIP", "RIGHT_HIP"]):
hip_center = [
(keypoints["LEFT_HIP"]["x"] + keypoints["RIGHT_HIP"]["x"]) / 2,
(keypoints["LEFT_HIP"]["y"] + keypoints["RIGHT_HIP"]["y"]) / 2
]
nose = [keypoints["NOSE"]["x"], keypoints["NOSE"]["y"]]
# Lean angle from vertical
dx = nose[0] - hip_center[0]
dy = hip_center[1] - nose[1] # Inverted y
metrics["body_lean"] = np.degrees(np.arctan2(dx, dy))
return {"keypoints": keypoints, "metrics": metrics}
# Process a frame
frame = cv2.imread("player_frame.jpg")
result = analyze_player_pose(frame, [100, 50, 200, 300])
if result:
print(f"Left knee angle: {result['metrics'].get('left_knee_angle', 'N/A'):.1f}")
print(f"Body lean: {result['metrics'].get('body_lean', 'N/A'):.1f} degrees")# R: Pose estimation concepts
# Pose estimation outputs 17-25 keypoints per person
keypoint_names <- c(
"nose", "left_eye", "right_eye", "left_ear", "right_ear",
"left_shoulder", "right_shoulder", "left_elbow", "right_elbow",
"left_wrist", "right_wrist", "left_hip", "right_hip",
"left_knee", "right_knee", "left_ankle", "right_ankle"
)
# Calculate joint angles from keypoints
calculate_angle <- function(p1, p2, p3) {
# Angle at p2 between p1-p2-p3
v1 <- p1 - p2
v2 <- p3 - p2
cos_angle <- sum(v1 * v2) / (sqrt(sum(v1^2)) * sqrt(sum(v2^2)))
angle_rad <- acos(max(-1, min(1, cos_angle)))
angle_rad * 180 / pi
}
# Example: knee angle for running analysis
knee_angle <- calculate_angle(
hip = c(100, 200),
knee = c(110, 280),
ankle = c(105, 360)
)
cat("Knee angle:", knee_angle, "degrees\n")Left knee angle: 142.3
Body lean: 8.5 degreesOpen Source Tools
Several open-source projects provide football-specific computer vision capabilities:
| Tool | Purpose | Link |
|---|---|---|
| Roboflow Sports | Pre-trained models for sports CV | roboflow.com/sports |
| SoccerNet | Action spotting, camera calibration | soccernet.org |
| Narya | Homography and tracking | github.com/DonsetPG/narya |
| TrackLab | Multi-object tracking framework | github.com/TrackingLaboratory |
| Supervision | Detection/tracking utilities | github.com/roboflow/supervision |
# Python: Using Supervision for easy CV workflows
# pip install supervision
import supervision as sv
from ultralytics import YOLO
# Load model
model = YOLO("yolov8n.pt")
# Initialize annotators
box_annotator = sv.BoxAnnotator(thickness=2)
label_annotator = sv.LabelAnnotator()
trace_annotator = sv.TraceAnnotator()
# Initialize tracker
tracker = sv.ByteTrack()
# Process video
def process_video(source_path, output_path):
"""Process video with detection and tracking."""
# Video info
video_info = sv.VideoInfo.from_video_path(source_path)
with sv.VideoSink(output_path, video_info) as sink:
for frame in sv.get_video_frames_generator(source_path):
# Detect
results = model(frame)[0]
detections = sv.Detections.from_ultralytics(results)
# Filter to persons only
detections = detections[detections.class_id == 0]
# Track
detections = tracker.update_with_detections(detections)
# Annotate
labels = [f"#{tid}" for tid in detections.tracker_id]
frame = box_annotator.annotate(frame, detections)
frame = label_annotator.annotate(frame, detections, labels)
frame = trace_annotator.annotate(frame, detections)
sink.write_frame(frame)
# Run processing
process_video("match.mp4", "tracked_match.mp4")
print("Video processing complete!")# R: Using Roboflow inference
library(httr)
library(jsonlite)
# Roboflow API call
detect_players_roboflow <- function(image_path, api_key, model_id) {
# Encode image
img_base64 <- base64enc::base64encode(image_path)
# API call
response <- POST(
url = paste0("https://detect.roboflow.com/", model_id),
query = list(api_key = api_key),
body = img_base64,
encode = "raw",
content_type("application/x-www-form-urlencoded")
)
# Parse results
results <- content(response, "parsed")
results$predictions
}
# Example usage
# predictions <- detect_players_roboflow(
# "frame.jpg",
# "YOUR_API_KEY",
# "football-players-detection/1"
# )Practice Exercises
Hands-On Practice
Complete these exercises to build computer vision skills:
Use YOLOv8 to detect all players in a broadcast frame. Count how many players from each team are detected based on jersey color clustering.
Manually identify 4+ pitch keypoints in a broadcast frame. Calculate the homography matrix and transform detected player positions to pitch coordinates.
Implement an IoU-based tracker to follow players across 100 frames. Calculate the total distance traveled by each tracked player.
Use MediaPipe to extract poses from player crops. Calculate knee angles and body lean for a sprinting player vs. a jogging player.
Implement a Kalman filter for ball tracking that handles occlusions. Process a 30-second clip where the ball is occluded multiple times (by players or during aerial duels). Evaluate the prediction accuracy during occlusion periods.
Hint
Start with a simple constant-velocity model. The state vector should include position (x, y) and velocity (vx, vy). Tune the process noise (Q) and measurement noise (R) matrices based on typical ball movement patterns.
Using player tracking data from a full match, detect team formations at different phases of play. Cluster player positions to identify whether a team is playing 4-3-3, 4-4-2, or 3-5-2 in attacking vs. defensive phases.
Hint
Normalize positions relative to the team's centroid. Use hierarchical clustering on defensive-phase positions to identify formation templates. Compare results to manually annotated formations.
Calculate speed profiles for all players in a match. Classify each movement into zones: walking (<7 km/h), jogging (7-14 km/h), running (14-21 km/h), high-speed running (21-25 km/h), and sprinting (>25 km/h). Create visualizations showing time spent in each zone by position.
Build a complete CV pipeline that processes a 5-minute match clip and outputs: (1) Player tracking data in pitch coordinates, (2) Ball trajectory, (3) Team classifications, (4) Physical metrics summary, (5) Event detection for passes and shots. Validate against ground truth annotations if available.
Hint
Use the FootballCVPipeline class as a starting point. Focus on one component at a time—get detection working first, then tracking, then homography transformation. Use sample frames to validate each stage before processing the full video.
Summary
Key Takeaways
- Object detection (YOLO, Faster R-CNN) identifies players and ball in video frames
- Team classification uses jersey color clustering to separate teams
- Homography transforms pixel coordinates to real-world pitch positions
- Multi-object tracking (SORT, ByteTrack) connects detections across frames
- Ball tracking with Kalman filters handles occlusions and predicts trajectory
- Pose estimation extracts body keypoints for movement analysis
- Action recognition detects events like passes, shots, and tackles from video
- Open-source tools like Supervision and SoccerNet accelerate development
Common Pitfalls
- Broadcast camera movement: Camera panning/zooming invalidates homography—recalibrate frequently
- Similar jersey colors: Teams with similar colors (both in white) break clustering-based team classification
- Crowd interference: Stadium crowds can trigger false detections—use pitch mask or confidence thresholds
- Goalkeeper detection: GK uniforms differ from outfield players—may need separate detector
- Ball occlusion: Ball frequently hidden during headers, tackles, and crowded areas
- ID switching: Trackers lose IDs during player collisions or crossing paths
- Frame rate misalignment: Ensure consistent FPS for accurate speed calculations
- Lens distortion: Wide-angle broadcast lenses distort coordinates near frame edges
Essential Libraries
Python Libraries:
ultralytics- YOLOv8 detection and trackingsupervision- CV pipeline utilitiesopencv-python- Video I/O and image processingmediapipe- Pose estimationfilterpy- Kalman filter implementationSoccerNet- Football-specific benchmarkstorch/torchvision- Deep learning framework
R Packages:
reticulate- Python interop for CV modelsopencv- Basic image operationsmagick- Image manipulationimager- Image processingkeras3/tensorflow- Deep learninggganimate- Animated visualizations
GPU Requirements and Performance
Computer vision workloads are computationally intensive. Here are typical performance benchmarks:
| Task | CPU (i7) | GPU (RTX 3080) | Recommended |
|---|---|---|---|
| YOLOv8n detection | ~5 FPS | ~150 FPS | GPU for real-time |
| YOLOv8m detection | ~2 FPS | ~80 FPS | GPU required |
| ByteTrack tracking | ~100 FPS | ~100 FPS | CPU sufficient |
| Pose estimation | ~10 FPS | ~60 FPS | GPU preferred |
| Homography (per frame) | ~50 FPS | ~50 FPS | CPU sufficient |
Data Quality Considerations
The accuracy of your CV pipeline depends heavily on input quality:
- Resolution: Higher resolution (1080p+) improves small object detection (ball)
- Frame rate: 25-30 FPS minimum for tracking; 50+ FPS for precise speed calculations
- Compression: Heavy compression artifacts degrade detection accuracy
- Camera angle: Tactical camera (high, centered) is ideal for tracking; broadcast cameras vary
- Lighting: Night games with shadows and uneven lighting challenge color-based classification
Integration with Analytics Pipelines
CV outputs integrate with broader analytics workflows:
# Python: Integrating CV outputs with analytics
import pandas as pd
import numpy as np
# Load CV pipeline outputs
tracking_data = pd.read_csv("output/tracks.csv")
ball_positions = pd.read_csv("output/ball_positions.csv")
physical_metrics = pd.read_csv("output/physical_summary.csv")
def calculate_possession(ball_df, tracks_df):
"""Calculate possession percentage from tracking data."""
possession_frames = []
for _, ball in ball_df.iterrows():
frame_tracks = tracks_df[tracks_df["frame"] == ball["frame"]]
if len(frame_tracks) == 0:
continue
# Find nearest player to ball
frame_tracks = frame_tracks.copy()
frame_tracks["cx"] = (frame_tracks["x1"] + frame_tracks["x2"]) / 2
frame_tracks["cy"] = (frame_tracks["y1"] + frame_tracks["y2"]) / 2
frame_tracks["dist"] = np.sqrt(
(ball["x"] - frame_tracks["cx"])**2 +
(ball["y"] - frame_tracks["cy"])**2
)
nearest = frame_tracks.loc[frame_tracks["dist"].idxmin()]
possession_frames.append({
"frame": ball["frame"],
"team": nearest.get("team", "unknown")
})
poss_df = pd.DataFrame(possession_frames)
return poss_df["team"].value_counts(normalize=True) * 100
possession = calculate_possession(ball_positions, tracking_data)
print("Possession %:")
print(possession)# R: Integrating CV outputs with analytics
library(tidyverse)
# Load CV pipeline outputs
tracking_data <- read_csv("output/tracks.csv")
ball_positions <- read_csv("output/ball_positions.csv")
physical_metrics <- read_csv("output/physical_summary.csv")
# Calculate team possession
calculate_possession <- function(ball_df, tracks_df) {
ball_df %>%
left_join(
tracks_df %>% select(frame, track_id, team, x1, y1, x2, y2),
by = "frame"
) %>%
mutate(
# Distance from ball to player center
player_cx = (x1 + x2) / 2,
player_cy = (y1 + y2) / 2,
dist_to_ball = sqrt((x - player_cx)^2 + (y - player_cy)^2)
) %>%
group_by(frame) %>%
slice_min(dist_to_ball, n = 1) %>%
ungroup() %>%
count(team) %>%
mutate(possession_pct = n / sum(n) * 100)
}
possession <- calculate_possession(ball_positions, tracking_data)
print(possession)Possession %:
Team A 54.3
Team B 45.7
Name: team, dtype: float64Computer vision opens up new possibilities for football analysis from video. The techniques covered in this chapter—object detection, tracking, homography transformation, and pose estimation—form the foundation for extracting rich analytical data from broadcast and tactical footage. In the next chapter, we'll explore natural language processing applications in football analytics, including match report generation and sentiment analysis.