Data Storytelling Team Project - Football

Author
Affiliation

Byeong-Hak Choe

SUNY Geneseo

Published

November 26, 2024

Data

The following lists data frames about National Football League (NFL) for the seasons from 2014-15 through 2023-2024:

  • nfl_team_epa: Team’s mean expected points added (EPA) when the team was on offense and when the team was on defense
    • For the details about EPA, please refer to the Football Metrics section below in the webpage.
  • nfl_field_goals: Play-by-play statistics at situations when field goals were attempted during the game
  • nfl_passers: weakly EPA and completion percentage over expected (CPOE) among players who passed more than 44 times within a week
  • nfl_player_stat: Player statistics
  • nfl_receivers: Total EPA of players whose positions are either “WR”, “TE”, or “RB”, and top 10 players in terms of total EPA
Position Full Name Main Role Skills Required
WR Wide Receiver Catch passes and gain yards or score TDs Speed, agility, reliable hands
TE Tight End Block defenders and catch passes Strength, versatility, reliable hands
RB Running Back Run the ball, catch passes, block Speed, vision, agility
QB Quarterback Lead the offense and throw passes Decision-making, accuracy, arm strength, leadership

NFL Team EPA (nfl_team_epa)

  • nfl_team_epa: Team’s mean EPA when the team was on offense and when the team was on defense
nfl_team_epa <- read_csv("http://bcdanl.github.io/data/nfl_team_epa.csv")


season Starting year of the season (2014 if 2014-15 season)

team Team abbreviation

off_epa Offensive EPA

def_epa Defensive EPA

NFL Field Goals (nfl_field_goals)

  • nfl_field_goals: Play-by-play statistics at situations when field goals were attempted during the game
    • A data frame with 10047 observations on the 23 variables.
nfl_field_goals <- read_csv("http://bcdanl.github.io/data/nfl_field_goals.csv")



  • fg_distance accurately reflects the total distance of a field goal attempt:
    • The total addition of 17 yards comes from 10 yards (end zone) + 7 yards (holder’s position).


NFL Passer’s EPA and COPE (nfl_passers)

  • nfl_passers: weakly mean EPA and completion percentage over CPOE among players who passed more than 44 times within a week
    • A data frame with 1098 rows and 22 variables:
nfl_passers <- read_csv("http://bcdanl.github.io/data/nfl_passers.csv")


NFL Player Statistics (nfl_players_stat)

  • nfl_players_stat: Player statistics
nfl_players_stat <- read_csv("http://bcdanl.github.io/data/nfl_players_stat.csv")


yards rushing_yards + receiving_yards

rushing_yards Yards gained when rushing with the ball (incl. scrambles and kneel downs). Also includes yards gained after obtaining a lateral on a play that started with a rushing attempt.

receiving_yards Yards gained after a pass reception. Includes yards gained after receiving a lateral on a play that started as a pass play.

touches carries + receptions

carries The number of official rush attempts (incl. scrambles and kneel downs). Rushes after a lateral reception don’t count as carry.

receptions The number of pass receptions. Lateral receptions officially don’t count as reception.

tds rushing_tds + receiving_tds

rushing_tds The number of rushing touchdowns (incl. scrambles). Also includes touchdowns after obtaining a lateral on a play that started with a rushing attempt.

receiving_tds The number of touchdowns following a pass reception. Also includes touchdowns after receiving a lateral on a play that started as a pass play.

NFL Receivers (nfl_receivers)

  • nfl_receivers: Total EPA of players whose positions are either “WR”, “TE”, or “RB”, and top 10 players in terms of total EPA
nfl_receivers <- read_csv("http://bcdanl.github.io/data/nfl_receivers.csv")


epa_rank: Ranking in terms of tot_epa (The lower tot_epa, the higher EPA)

epa_rank_within_position: Ranking in terms of EPA within the group of the same position

n_received: the number of passes a player received

tot_epa: Total EPA within a season


Football Metrics

Expected Points

Expected Points Added (EPA) is a football analytics metric that measures the value of a play in terms of its impact on the team’s expected scoring. It quantifies how much a single play increases or decreases a team’s chances of scoring compared to the situation before the play.

How EPA Works

Every play in football occurs within a specific context (down, distance, field position, time remaining, and score). Historical data is used to calculate the expected points (EP) a team can expect to score from their current situation. EPA is the difference between the expected points after the play and before the play.

Formula:

\[ EPA = EP (\text{after the play}) - EP (\text{before the play}) \]

Key Insights:

  1. Positive EPA: The play improved the team’s scoring chances.
  • Example: A 20-yard pass on 3rd and 8 increases the likelihood of scoring.
  1. Negative EPA: The play reduced the team’s scoring chances.
    • Example: A sack or an interception harms the team’s scoring potential.

Why EPA Is Important

  • Contextual: Accounts for situational factors, making it more informative than raw stats like yards gained.
  • Play Evaluation: Helps determine the effectiveness of specific plays or players.
  • Strategic Decisions: Assists coaches and analysts in evaluating decisions like when to go for it on 4th down.

Applications

  • Offensive EPA: Evaluates how well a team’s offense increases scoring opportunities.
  • Defensive EPA: Measures how effectively a defense reduces the opposing team’s scoring potential.
  • Player Performance: Used to assess quarterbacks, running backs, wide receivers, and defenders by their contribution to scoring or preventing points.

Completion Percentage Over Expected (CPOE)

Completion Percentage Over Expected (CPOE) is a football analytics metric that evaluates a quarterback’s passing performance by comparing their actual completion percentage to the expected completion percentage based on the difficulty of their pass attempts.

How CPOE Works

  1. Actual Completion Percentage (COMP%):
    • The percentage of passes a quarterback completes.
  2. Expected Completion Percentage (xCOMP%):
    • Calculated based on factors like:
    • Distance of the throw (air yards)
    • Angle of the throw
    • Receiver separation
    • Defensive pressure
    • Game situation (e.g., down, distance, and field position)
    • Derived from historical data on similar passes.
  3. CPOE Formula: \[ CPOE = COMP - xCOMP \]

Interpretation of CPOE

  • Positive CPOE: Indicates the quarterback is completing more passes than expected, showcasing accuracy and skill.
  • Negative CPOE: Indicates the quarterback is completing fewer passes than expected, potentially highlighting issues with accuracy or decision-making.

Why CPOE Is Useful

  • Isolates Skill: It accounts for the difficulty of throws, focusing on the quarterback’s performance rather than the system or play design.
  • Complementary Metric: Often paired with EPA/play to provide a comprehensive evaluation of a quarterback’s impact.
  • Game Context: Helps differentiate between quarterbacks who excel in challenging situations versus those whose stats are inflated by easy throws.
Back to top