Classwork 9

Tree-based Models

Author

Byeong-Hak Choe

Published

April 6, 2026

Modified

April 8, 2026

Setup

Packages

library(tidyverse)

library(janitor)
library(rpart)
library(rpart.plot)
library(ranger)
# library(xgboost)
library(vip)
library(pdp)

library(ggthemes)
library(rmarkdown)

I downloaded MLB 2024 batting statistics leaderboard from Fangraphs, and created the following mlb_battings_2024 data.frame:

mlb_battings_2024 <- read_csv("https://bcdanl.github.io/data/mlb_battings_2024.csv")

Variable Description

Variable Description
g Games Played: The number of games in which the player appeared.
pa Plate Appearances: Total number of times the player appeared at the plate.
hr Home Runs: Total number of home runs hit by the player.
r Runs: Total number of runs scored by the player.
rbi Runs Batted In (RBI): Number of runs the player batted in.
sb Stolen Bases: Total number of bases stolen by the player.
bb_percent Walk Percentage: The percentage of plate appearances that result in a base on balls.
k_percent Strikeout Percentage: The percentage of plate appearances that end in a strikeout.
iso Isolated Power (ISO): A measure of a player’s raw power, calculated as (SLG - AVG).
babip Batting Average on Balls In Play (BABIP): The average when excluding home runs and strikeouts.
avg Batting Average (AVG): The ratio of hits to official at-bats.
obp On-Base Percentage (OBP): The frequency a player reaches base per plate appearance.
slg Slugging Percentage (SLG): A weighted measure of total bases per at-bat.
w_oba Weighted On-Base Average (wOBA): An advanced metric that measures a player’s overall offensive value.
xw_oba Expected wOBA (xwOBA): A metric estimating wOBA based on the quality of contact.
w_rc Weighted Runs Created (wRC): An advanced statistic that estimates the number of runs a player creates.
bs_r Base Running Runs (BsR): A metric quantifying the value of a player’s base running.
off Offensive Value: A composite metric or rating summarizing the player’s offensive contributions.
def Defensive Value: A composite metric or rating summarizing the player’s defensive contributions.
war Wins Above Replacement (WAR): An overall measure of a player’s total contributions to their team.


  • Consider the tree-based models—(1) decision trees, (2) random forest, and (3) gradient-boosted trees:
    • Outcome Variable: war
    • Predictors: All remaining variables


Question 1

  • Fit a regression tree model with a maximum depth of 3 (max_depth=3).
  • Provide an interpretation of the leaf nodes.


Question 2

  • Fit a regression tree model without imposing a maximum depth constraint.


Question 3

  • Prune regression trees using cross-validation (CV).
  • Plot the CV error versus the number of leaves.
  • Plot the pruned tree with the lowest mean CV MSE.
  • Compare the pruned tree with the tree from Question 1.


Question 4

  • Fit and tune the random forest model with the following mtry values.
p <- ncol(x_train)
rf_grid <- tibble(mtry = seq(2, p, by = 2))
  • Plot the variable importance measures.


Question 5

  • Fit and tune the XGBoost model using the following hyperparameter grid:
xgb_grid <- tidyr::crossing(
  nrounds = seq(20, 200, by = 20),
  eta = c(0.025, 0.05, 0.1, 0.3),
  max_depth = c(1, 2, 3, 4),
  gamma = 0,
  colsample_bytree = 1,
  min_child_weight = 1,
  subsample = 1
)

xgb_grid |> 
  paged_table()
  • Plot the variable importance measures.


Question 6

  • Compare the Mean Squared Errors (MSEs) on the test data across the different tree-based models.
  • Analyze and discuss the differences in predictive performance among these models.



Discussion

Welcome to our Classwork 9 Discussion Board! đź‘‹

This space is designed for you to engage with your classmates about the material covered in Classwork 9.

Whether you are looking to delve deeper into the content, share insights, or have questions about the content, this is the perfect place for you.

If you have any specific questions for Byeong-Hak (@bcdanl) regarding the Classwork 9 materials or need clarification on any points, don’t hesitate to ask here.

All comments will be stored here.

Let’s collaborate and learn from each other!

Back to top