Tree-based Model I: Decision Trees
March 23, 2026


x and y.
x and y.x is true AND y is true, THEN….”
Objective at each split: find the best predictor and cutoff to partition the data into two regions, \(R_1\) and \(R_2\), so that the predictions within each resulting node are as accurate as possible.
For a regression tree, we choose the split that minimizes the sum of squared errors (SSE): \[ SSE = \sum_{i \in R_1} (y_i - c_1)^2 + \sum_{i \in R_2} (y_i - c_2)^2 \]
Objective at each split: find the best predictor and cutoff to partition the data into two regions, \(R_1\) and \(R_2\), so that the predictions within each resulting node are as accurate as possible.
Regression: each split is chosen to make the observations within each child node as similar as possible in their outcome values.
Classfication: a good split makes each child node as pure as possible, meaning observations within each region mostly belong to one class.
Splits yield locally optimal results, so we are NOT guaranteed to train a model that is globally optimal
How do we control the complexity of the tree?
maxdepth)minsplit)minbucket)cp)
We can grow a very large complicated tree, and then prune back to an optimal subtree using a cost complexity parameter \(\alpha\) (like \(\lambda\) for regularization)
\(\alpha\) penalizes objective as a function of the number of terminal nodes
e.g., we want to minimize \(SSE + \alpha \cdot (\# \text{ of terminal nodes})\)

Hyperparameters are parameters set before training a model — unlike regular model parameters (e.g., coefficients), they are not learned from the data.
cp, minsplit, and minbucket in a decision tree.Hyperparameter tuning is the process of searching for the combination of hyperparameter values that produces the best model performance on unseen data.
A model trained with poorly chosen hyperparameters tends to overfit (too complex) or underfit (too simple).
Tuning finds the sweet spot that balances bias and variance.