Data Storytelling Team Project - Football

Author
Affiliation

Byeong-Hak Choe

SUNY Geneseo

Published

November 26, 2024

Data

The following lists data frames about National Football League (NFL) for the seasons from 2014-15 through 2023-2024:

  • nfl_team_epa: Team’s mean expected points added (EPA) when the team was on offense and when the team was on defense
    • For the details about EPA, please refer to the Football Metrics section below in the webpage.
  • nfl_field_goals: Play-by-play statistics at situations when field goals were attempted during the game
  • nfl_passers: weakly EPA and completion percentage over expected (CPOE) among players who passed more than 44 times within a week
  • nfl_player_stat: Player statistics
  • nfl_receivers: Total EPA of players whose positions are either “WR”, “TE”, or “RB”, and top 10 players in terms of total EPA
Position Full Name Main Role Skills Required
WR Wide Receiver Catch passes and gain yards or score TDs Speed, agility, reliable hands
TE Tight End Block defenders and catch passes Strength, versatility, reliable hands
RB Running Back Run the ball, catch passes, block Speed, vision, agility
QB Quarterback Lead the offense and throw passes Decision-making, accuracy, arm strength, leadership

NFL Team EPA (nfl_team_epa)

  • nfl_team_epa: Team’s mean EPA when the team was on offense and when the team was on defense
nfl_team_epa <- read_csv("http://bcdanl.github.io/data/nfl_team_epa.csv")
ABCDEFGHIJ0123456789
season
<dbl>
team
<chr>
off_epa
<dbl>
def_epa
<dbl>
2014ARI-0.0245-0.0501
2014ATL0.00690.0748
2014BAL0.0669-0.0321
2014BUF-0.0743-0.1024
2014CAR-0.00670.0045
2014CHI-0.04980.0789
2014CIN-0.0176-0.0107
2014CLE-0.0710-0.0350
2014DAL0.11740.0030
2014DEN0.0851-0.0569


season Starting year of the season (2014 if 2014-15 season)

team Team abbreviation

off_epa Offensive EPA

def_epa Defensive EPA

NFL Field Goals (nfl_field_goals)

  • nfl_field_goals: Play-by-play statistics at situations when field goals were attempted during the game
    • A data frame with 10047 observations on the 23 variables.
nfl_field_goals <- read_csv("http://bcdanl.github.io/data/nfl_field_goals.csv")
ABCDEFGHIJ0123456789
game_date
<date>
time
<time>
down
<dbl>
yrdln
<chr>
ydstogo
<dbl>
yardline_100
<dbl>
fg_distance
<dbl>
posteam
<chr>
defteam
<chr>
field_goal_result
<chr>
2014-09-0708:12:004CHI 3263249BUFCHImade
2014-09-0709:39:004BUF 23112340CHIBUFmade
2014-09-0704:07:004CHI 1551532BUFCHImade
2014-09-0700:35:004BUF 1911936CHIBUFmade
2014-09-0709:51:002CHI 99926BUFCHImade
2014-09-0700:05:003TB 1091027CARTBmade
2014-09-0709:45:004TB 30173047CARTBmissed
2014-09-0700:28:004TB 1521532CARTBmade
2014-09-0710:30:004BAL 3123148CINBALmade
2014-09-0701:28:004BAL 44421CINBALmade



  • fg_distance accurately reflects the total distance of a field goal attempt:
    • The total addition of 17 yards comes from 10 yards (end zone) + 7 yards (holder’s position).


NFL Passer’s EPA and COPE (nfl_passers)

  • nfl_passers: weakly mean EPA and completion percentage over CPOE among players who passed more than 44 times within a week
    • A data frame with 1098 rows and 22 variables:
nfl_passers <- read_csv("http://bcdanl.github.io/data/nfl_passers.csv")
ABCDEFGHIJ0123456789
season
<dbl>
week
<dbl>
passer
<chr>
epa
<dbl>
cpoe
<dbl>
n_passes
<dbl>
team
<chr>
position
<chr>
jersey_number
<dbl>
full_name
<chr>
20141A.Luck0.11481.179356INDQB12Andrew Luck
20141C.Henne-0.2598-6.731146JAXQB7Chad Henne
20141J.Cutler-0.13252.864351CHIQB6Jay Cutler
20141J.Flacco-0.0749-11.889565BLTQB5Joe Flacco
20141N.Foles-0.2225-1.086649PHIQB9Nick Foles
20141T.Brady-0.2110-8.626360NEQB12Tom Brady
20142A.Rodgers0.22712.733447GBQB12Aaron Rodgers
20142A.Smith0.25500.225244KCQB11Alex Smith
20142M.Ryan-0.3360-8.086446ATLQB2Matt Ryan
20142M.Stafford-0.0220-6.270152DETQB9Matthew Stafford


NFL Player Statistics (nfl_players_stat)

  • nfl_players_stat: Player statistics
nfl_players_stat <- read_csv("http://bcdanl.github.io/data/nfl_players_stat.csv")
ABCDEFGHIJ0123456789
season
<dbl>
player_id
<chr>
player_name
<chr>
recent_team
<chr>
yards
<dbl>
rushing_yards
<dbl>
receiving_yards
<dbl>
touches
<dbl>
carries
<dbl>
receptions
<dbl>
201400-0028009D.MurrayDAL2261184541644939257
201400-0030496L.BellPIT2215136185437329083
201400-0026184M.ForteCHI18461038808368266102
201400-0027793A.BrownPIT17111316981334129
201400-0025399M.LynchSEA1673130636731728037
201400-0027874D.ThomasDEN1619016191110111
201400-0027944J.JonesATL1594115931051104
201400-0026796NAHOU1573124632729826038
201400-0030485E.LacyGB1566113942728824642
201400-0026373NABAL1529126626327923544


yards rushing_yards + receiving_yards

rushing_yards Yards gained when rushing with the ball (incl. scrambles and kneel downs). Also includes yards gained after obtaining a lateral on a play that started with a rushing attempt.

receiving_yards Yards gained after a pass reception. Includes yards gained after receiving a lateral on a play that started as a pass play.

touches carries + receptions

carries The number of official rush attempts (incl. scrambles and kneel downs). Rushes after a lateral reception don’t count as carry.

receptions The number of pass receptions. Lateral receptions officially don’t count as reception.

tds rushing_tds + receiving_tds

rushing_tds The number of rushing touchdowns (incl. scrambles). Also includes touchdowns after obtaining a lateral on a play that started with a rushing attempt.

receiving_tds The number of touchdowns following a pass reception. Also includes touchdowns after receiving a lateral on a play that started as a pass play.

NFL Receivers (nfl_receivers)

  • nfl_receivers: Total EPA of players whose positions are either “WR”, “TE”, or “RB”, and top 10 players in terms of total EPA
nfl_receivers <- read_csv("http://bcdanl.github.io/data/nfl_receivers.csv")
ABCDEFGHIJ0123456789
season
<dbl>
receiver
<chr>
position
<chr>
epa_rank
<dbl>
epa_rank_within_position
<dbl>
n_received
<dbl>
tot_epa
<dbl>
2014J.NelsonWR11182109.155604
2014A.BrownWR2220898.263152
2014R.CobbWR3315391.941996
2014E.SandersWR4416981.666310
2014D.BryantWR5515681.495877
2014J.EdelmanWR6618475.916749
2014O.BeckhamWR7714166.936396
2014J.JonesWR8817164.560002
2014R.GronkowskiTE9117263.931129
2014T.HiltonWR10916663.020969


epa_rank: Ranking in terms of tot_epa (The lower tot_epa, the higher EPA)

epa_rank_within_position: Ranking in terms of EPA within the group of the same position

n_received: the number of passes a player received

tot_epa: Total EPA within a season


Football Metrics

Expected Points

Expected Points Added (EPA) is a football analytics metric that measures the value of a play in terms of its impact on the team’s expected scoring. It quantifies how much a single play increases or decreases a team’s chances of scoring compared to the situation before the play.

How EPA Works

Every play in football occurs within a specific context (down, distance, field position, time remaining, and score). Historical data is used to calculate the expected points (EP) a team can expect to score from their current situation. EPA is the difference between the expected points after the play and before the play.

Formula:

EPA=EP(after the play)EP(before the play)

Key Insights:

  1. Positive EPA: The play improved the team’s scoring chances.
  • Example: A 20-yard pass on 3rd and 8 increases the likelihood of scoring.
  1. Negative EPA: The play reduced the team’s scoring chances.
    • Example: A sack or an interception harms the team’s scoring potential.

Why EPA Is Important

  • Contextual: Accounts for situational factors, making it more informative than raw stats like yards gained.
  • Play Evaluation: Helps determine the effectiveness of specific plays or players.
  • Strategic Decisions: Assists coaches and analysts in evaluating decisions like when to go for it on 4th down.

Applications

  • Offensive EPA: Evaluates how well a team’s offense increases scoring opportunities.
  • Defensive EPA: Measures how effectively a defense reduces the opposing team’s scoring potential.
  • Player Performance: Used to assess quarterbacks, running backs, wide receivers, and defenders by their contribution to scoring or preventing points.

Completion Percentage Over Expected (CPOE)

Completion Percentage Over Expected (CPOE) is a football analytics metric that evaluates a quarterback’s passing performance by comparing their actual completion percentage to the expected completion percentage based on the difficulty of their pass attempts.

How CPOE Works

  1. Actual Completion Percentage (COMP%):
    • The percentage of passes a quarterback completes.
  2. Expected Completion Percentage (xCOMP%):
    • Calculated based on factors like:
    • Distance of the throw (air yards)
    • Angle of the throw
    • Receiver separation
    • Defensive pressure
    • Game situation (e.g., down, distance, and field position)
    • Derived from historical data on similar passes.
  3. CPOE Formula: CPOE=COMPxCOMP

Interpretation of CPOE

  • Positive CPOE: Indicates the quarterback is completing more passes than expected, showcasing accuracy and skill.
  • Negative CPOE: Indicates the quarterback is completing fewer passes than expected, potentially highlighting issues with accuracy or decision-making.

Why CPOE Is Useful

  • Isolates Skill: It accounts for the difficulty of throws, focusing on the quarterback’s performance rather than the system or play design.
  • Complementary Metric: Often paired with EPA/play to provide a comprehensive evaluation of a quarterback’s impact.
  • Game Context: Helps differentiate between quarterbacks who excel in challenging situations versus those whose stats are inflated by easy throws.
Back to top