Compare MSE, MAE, and Huber loss — when would you use each?

Question

Accepted Answer

Compare MSE, MAE, and Huber loss for regression. What does each optimize for, and how do outliers affect each? Think about: what squaring does to large errors vs small errors. What the gradient of each loss looks like near zero and far from zero. What "median regression" vs "mean regression" means. Why you might not want to minimize MSE if your labels have heavy tails. **MSE — Mean Squared Error** Squaring amplifies large errors quadratically. The model is trained to minimize the average squared