library(tidyverse)
library(janitor)
library(rpart)
library(rpart.plot)
library(ranger)
# library(xgboost)
library(vip)
library(pdp)
library(ggthemes)
library(rmarkdown)Classwork 9
Tree-based Models
Setup
Packages
I downloaded MLB 2024 batting statistics leaderboard from Fangraphs, and created the following mlb_battings_2024 data.frame:
mlb_battings_2024 <- read_csv("https://bcdanl.github.io/data/mlb_battings_2024.csv")Variable Description
| Variable | Description |
|---|---|
g |
Games Played: The number of games in which the player appeared. |
pa |
Plate Appearances: Total number of times the player appeared at the plate. |
hr |
Home Runs: Total number of home runs hit by the player. |
r |
Runs: Total number of runs scored by the player. |
rbi |
Runs Batted In (RBI): Number of runs the player batted in. |
sb |
Stolen Bases: Total number of bases stolen by the player. |
bb_percent |
Walk Percentage: The percentage of plate appearances that result in a base on balls. |
k_percent |
Strikeout Percentage: The percentage of plate appearances that end in a strikeout. |
iso |
Isolated Power (ISO): A measure of a player’s raw power, calculated as (SLG - AVG). |
babip |
Batting Average on Balls In Play (BABIP): The average when excluding home runs and strikeouts. |
avg |
Batting Average (AVG): The ratio of hits to official at-bats. |
obp |
On-Base Percentage (OBP): The frequency a player reaches base per plate appearance. |
slg |
Slugging Percentage (SLG): A weighted measure of total bases per at-bat. |
w_oba |
Weighted On-Base Average (wOBA): An advanced metric that measures a player’s overall offensive value. |
xw_oba |
Expected wOBA (xwOBA): A metric estimating wOBA based on the quality of contact. |
w_rc |
Weighted Runs Created (wRC): An advanced statistic that estimates the number of runs a player creates. |
bs_r |
Base Running Runs (BsR): A metric quantifying the value of a player’s base running. |
off |
Offensive Value: A composite metric or rating summarizing the player’s offensive contributions. |
def |
Defensive Value: A composite metric or rating summarizing the player’s defensive contributions. |
war |
Wins Above Replacement (WAR): An overall measure of a player’s total contributions to their team. |
- Consider the tree-based models—(1) decision trees, (2) random forest, and (3) gradient-boosted trees:
- Outcome Variable:
war - Predictors: All remaining variables
- Outcome Variable:
Question 1
- Fit a regression tree model with a maximum depth of 3 (
max_depth=3). - Provide an interpretation of the leaf nodes.
Question 2
- Fit a regression tree model without imposing a maximum depth constraint.
Question 3
- Prune regression trees using cross-validation (CV).
- Plot the CV error versus the number of leaves.
- Plot the pruned tree with the lowest mean CV MSE.
- Compare the pruned tree with the tree from Question 1.
Question 4
- Fit and tune the random forest model with the following
mtryvalues.
p <- ncol(x_train)
rf_grid <- tibble(mtry = seq(2, p, by = 2))- Plot the variable importance measures.
Question 5
- Fit and tune the XGBoost model using the following hyperparameter grid:
xgb_grid <- tidyr::crossing(
nrounds = seq(20, 200, by = 20),
eta = c(0.025, 0.05, 0.1, 0.3),
max_depth = c(1, 2, 3, 4),
gamma = 0,
colsample_bytree = 1,
min_child_weight = 1,
subsample = 1
)
xgb_grid |>
paged_table()- Plot the variable importance measures.
Question 6
- Compare the Mean Squared Errors (MSEs) on the test data across the different tree-based models.
- Analyze and discuss the differences in predictive performance among these models.
Discussion
Welcome to our Classwork 9 Discussion Board! đź‘‹
This space is designed for you to engage with your classmates about the material covered in Classwork 9.
Whether you are looking to delve deeper into the content, share insights, or have questions about the content, this is the perfect place for you.
If you have any specific questions for Byeong-Hak (@bcdanl) regarding the Classwork 9 materials or need clarification on any points, don’t hesitate to ask here.
All comments will be stored here.
Let’s collaborate and learn from each other!