mlprep
mlprep/ML Breadthmedium12 min

Explain ensemble methods. How do random forests and gradient boosting work, and why do they often outperform a single strong model?

formulate your answer, then —

You said gradient boosting fits residuals — can you show exactly what happens in one boosting step when the loss is something other than squared error, like log loss?

formulate your answer, then —

tldr

Bagging (Random Forest) trains diverse independent trees and averages them — each tree has high variance, averaging reduces variance while keeping bias low. Boosting (Gradient Boosting) trains trees sequentially, each fitting the negative gradient (pseudo-residuals) of the current ensemble — each step reduces bias. Gradient boosting generalizes to any differentiable loss by always fitting the direction of steepest loss descent.

follow-up

  • What hyperparameters matter most in gradient boosting and how would you tune them?
  • How does XGBoost's regularization differ from a vanilla gradient boosted tree, and why does it help?
  • When would you choose a gradient boosted tree over a neural network for a tabular task?