Explain ensemble methods. How do random forests and gradient boosting work, and why do they often outperform a single strong model?
formulate your answer, then —
You said gradient boosting fits residuals — can you show exactly what happens in one boosting step when the loss is something other than squared error, like log loss?
formulate your answer, then —
tldr
Bagging (Random Forest) trains diverse independent trees and averages them — each tree has high variance, averaging reduces variance while keeping bias low. Boosting (Gradient Boosting) trains trees sequentially, each fitting the negative gradient (pseudo-residuals) of the current ensemble — each step reduces bias. Gradient boosting generalizes to any differentiable loss by always fitting the direction of steepest loss descent.
follow-up
- What hyperparameters matter most in gradient boosting and how would you tune them?
- How does XGBoost's regularization differ from a vanilla gradient boosted tree, and why does it help?
- When would you choose a gradient boosted tree over a neural network for a tabular task?