Data Storytelling Team Project - Hockey

Author
Affiliation

Byeong-Hak Choe

SUNY Geneseo

Published

November 21, 2024

Data

The following lists data frames about NHL for the 2022-23 season:

  • nhl_players: NHL Player Statistics

    • 924 unique NHL Players
    • 328 NHL players’ salary information are missing.
  • nhl_games: NHL Game Logs

  • Data Source: MoneyPuck

    • There is a great website that makes a ton of hockey data available for download called MoneyPuck.
    • Almost all of their data are available as a .csv or a .zip file.
    • You can download this data using the links given on their download page. For example, here is how you could get all of the skater (i.e., nongoalie) information for the 2022–23 season:
  • Salary Data Source: PuckPedia

NHL Players (nhl_players)

  • nhl_players: NHL non-goalie player data
    • A data frame with 4620 observations and 158 variables:
nhl_players <- read_csv("http://bcdanl.github.io/data/nhl_players.csv")


NHL Game Logs (nhl_games)

  • nhl_games: NHL Game Logs
    • A data frame with 14000 observations and 110 variables:
nhl_games <- read_csv("http://bcdanl.github.io/data/nhl_games.csv")



Hockey Metrics

Corsi and Fenwick Metrics

Corsi and Fenwick are two of the earliest advanced analytics used to measure player effectiveness in hockey. For over a century, traditional metrics such as goals, assists, and penalty minutes were the main tools for evaluating players. The plus-minus rating, which tracks the number of goals scored by a player’s team minus the goals allowed while they are on the ice, added some depth. However, this metric has a significant flaw: it often reflects the quality of a player’s teammates. For instance, a lower-skilled player might have a great plus-minus rating simply by playing alongside scoring superstars like Auston Matthews or Sidney Crosby.

Corsi and Fenwick provide a more nuanced measure of effectiveness. Corsi calculates the difference in shot attempts (shots on goal, missed shots, and blocked shots) by a player’s team versus their opponents while the player is on the ice. Fenwick is similar but excludes blocked shots. Both metrics are typically applied to specific game situations, such as:

  • 5-on-5 (Even Strength): The most common game state, where each team has five skaters on the ice, excluding goalies. Corsi and Fenwick in this situation are considered the most reliable indicators of player performance since it minimizes the influence of special teams.
  • 5-on-4 (Power Play): A situation where the player’s team has a one-man advantage due to an opponent’s penalty. Metrics in this scenario provide insight into how effectively a player contributes to creating offensive opportunities.
  • 4-on-5 (Penalty Kill): A situation where the player’s team is at a one-man disadvantage. Here, Corsi and Fenwick can help measure a player’s ability to limit opponent shot attempts and maintain defensive stability.
  • Other Situations: Includes rarer scenarios like 4-on-4, 3-on-3 overtime, or 6-on-5 when a team pulls its goalie for an extra attacker. Metrics in these cases are less common but still useful for specific analyses.
  • All Situations: Aggregates data from every possible game state, offering a broader view of a player’s performance but often less precise for evaluating specific roles or responsibilities.

While these metrics are often expressed as raw differences, they can also be analyzed as percentages. For instance, we can calculate a player’s Corsi percentage by dividing the Corsi “for” (shot attempts by the player’s team) by the sum of the Corsi “for” and Corsi “against” (shot attempts by the opposing team). This provides a clearer picture of a player’s overall effectiveness on the ice.

Time on Ice by Position

It is rare for skaters to play even half of a game, unlike goalkeepers, who generally play the entire match. For this analysis, we will focus on skaters and exclude goalkeepers. Managing a team effectively requires ensuring players are not overly fatigued, especially during the final minutes of a game or during stretches with multiple games and minimal rest. To address this, we use the “time on ice” (TOI) metric. TOI varies by position. NHL teams typically have four offensive lines (left wing, center, right wing) but only three defensive pairs. As a result, defensemen tend to play more minutes both in individual games and throughout the season. Special situations such as power plays or penalty kills can also affect TOI distribution. For instance:

  • Power Plays (5-on-4): Offensive players on the power play units often have increased TOI in these scenarios to maximize scoring opportunities.
  • Penalty Kills (4-on-5): Defensive players and penalty kill specialists tend to play more minutes during penalties to mitigate scoring threats.
  • Overtime (3-on-3): In regular-season overtime, top players are typically deployed for longer shifts due to the fast-paced and open nature of play.

Understanding TOI in these different contexts helps coaches optimize player utilization and prevent fatigue over the course of a game or season.

General Terms Description

  • Expected Goals
    • The sum of the probabilities of unblocked shot attempts being goals.
    • For example, a rebound shot in the slot may be worth 0.5 expected goals, while a shot from the blueline while short handed may be worth 0.01 expected goals.
    • The expected value of each shot attempt is calculated by the MoneyPuck Expected Goals model.
    • Expected goals is commonly abbreviated as “xGoals”. Blocked shot attempts are valued at 0 xGoals.
    • See more here: http://moneypuck.com/about.htm#shotModel
  • Score Adjusted
    • Adjusts metrics to gives more credit to away teams and teams with large leads.
  • Flurry Adjusted
  • I_F
    • “Individual For”.
    • For stats credited to an individual.
    • For example, I_F_goals is the number of goals a player has scored
  • OnIce_F
    • “On-ice For”.
    • Every player on the ice on the team doing the event receives credit.
    • OnIce_F_goals is the number of goals the player’s team has scored while that player is on the ice, regardless of if they were the one who scored the goal or not.
  • OnIce_A
    • “On-ice Against”.
    • Every opposing team’s players on the ice on receives credit. OnIce_A_goals is the number of goals the player’s team has given up while the player is on the ice
  • OffIce_F
    • “Off-ice For”.
    • Every player on the bench of the team doing the event receives “credit”.
    • OffIce_F_goals is the number of goals the player’s team has scored while that player is on the bench.
  • OffIce_A
    • “Off-ice Against”.
    • Every player on the opposing team’s bench of the team doing the event receives “credit”.
    • OffIce_A_goals is the number of goals the player’s team has given up while that player is on the bench.
  • Low Danger Shots
    • Unblocked Shot attempts with a < 8% probability of being a goal. Low danger shots accounts for ~75% of shots and ~33% of goals
  • Medium Danger Shots
    • Unblocked Shot attempts with between >=8% but less than 20% probability of being a goal.
    • Medium danger shots account for ~20% of shots and ~33% of goals
  • High Danger Shots
    • Unblocked Shot attempts with >= 20% probability of being a goal.
    • High danger shots account for ~5% of shots and ~33% of goals
  • Created Expected Goals
Back to top