Tree-based Model II: Ensemble Models
March 30, 2026
Analogy: The Overconfident Student
Random forests are an extension of bagging
For classification, accuracy = \(\frac{\text{Number of Correct Prediction}}{\text{Total Prediction}}\).
For regression, accuracy means \(R^{2}\).
Out-of-bag samples for observation x1
Calculating variable importance of variable v1
Note
Removing a variable entirely changes the model. Shuffling breaks the relationship between \(v_1\) and the outcome while keeping the variable present — a controlled experiment that isolates each variable’s contribution.
Building up a gradient-boosted tree model
Boosted regression trees as 0-1024 successive trees are added.
Tip
Update the model parameters in the direction of the loss function (e.g., MSE, deviance)’s descending gradient
We need to control how much we update by in each step - the learning rate
eta) takes moderate, steady steps downhill to the minimum loss.nrounds and eta Must Be Tuned Togethereta (learning rate) means each tree contributes less, so you need more trees (nrounds) to reach good performance.eta small (e.g., 0.05–0.1) and then find the right nrounds via early stopping.
XGBoost is one of the most popular open-source library for the gradient boosting algorithm.nrounds)eta), i.e. how much we update in each stepnrounds and eta really have to be tuned togethermax_depth, min_child_weight)
max_depth: the maximum depth a tree is allowed to grow. Larger values let the model capture more complex patterns, but can increase overfitting.min_child_weight: the minimum total instance weight required in a child node for a split to be allowed. Larger values make the model more conservative by preventing splits that create very small or weakly supported nodes.gamma) and early stopping| Hyperparameter | Typical Starting Point | Effect of Increasing |
|---|---|---|
nrounds |
100–500 (with early stopping) | More trees; use early stopping to avoid overfitting |
eta |
0.05–0.1 | Slower learning; needs more nrounds |
max_depth |
3–6 | Deeper trees; captures more interactions but risks overfitting |
min_child_weight |
1–10 | More conservative splits; reduces overfitting |
gamma |
0–1 | Higher penalty for additional splits; simpler trees |
A good default workflow: fix eta = 0.05, use early stopping to find the right nrounds, then tune max_depth and min_child_weight.
mtry are often enough to get strong performance.nrounds, eta, tree depth, and other settings together.| Random Forest | Gradient-Boosted Trees (XGBoost) | |
|---|---|---|
| How trees are built | Independently (parallel) | Sequentially (one at a time) |
| Main error reduced | Variance | Bias |
| Tuning effort | Low — mtry and ntree |
Higher — eta, nrounds, max_depth, etc. |
| Risk of overfitting | Lower (averaging stabilizes) | Higher (can memorize with too many trees) |
| Typical performance | Strong, robust baseline | Often best-in-class on tabular data |
| Interpretability | Variable importance plots | Variable importance + partial dependence |
Rule of thumb: Start with random forest for a quick, reliable baseline. Switch to XGBoost when you need to squeeze out maximum predictive accuracy and are willing to invest time in tuning.