Walk me through the probability distributions you encounter most often in ML engineering — not just what they are, but where they show up in real systems and why they matter for modeling decisions.
You mentioned that using MSE assumes Gaussian residuals. In practice, most real-world data isn't perfectly Gaussian. How much does this assumption violation actually matter, and when should you switch to a different loss function?
tldr
Every ML loss function implies a distributional assumption: MSE→Gaussian, BCE→Bernoulli, CE→Categorical, Poisson NLL→Poisson. Choosing the right distribution means your model optimizes for the right thing. Key patterns: Gaussian for symmetric continuous data, Poisson for counts, Exponential/Weibull for time-to-event, Beta for rates with small samples, LogNormal for positive heavy-tailed data. When assumptions break down (outliers, asymmetry, bounded targets), switch to the loss function that matches your data's actual distribution.
follow-up
- How would you decide between Poisson regression and Negative Binomial regression for count data? What diagnostic tells you which is appropriate?
- Explain how the reparameterization trick in VAEs relates to probability distributions — what problem does it solve?
- Your team wants to predict "time until user returns to the app." Why is standard regression inappropriate, and what distribution/model family would you use?