mlprep

Walk me through the probability distributions you encounter most often in ML engineering — not just what they are, but where they show up in real systems and why they matter for modeling decisions.

formulate your answer, then —

You mentioned that using MSE assumes Gaussian residuals. In practice, most real-world data isn't perfectly Gaussian. How much does this assumption violation actually matter, and when should you switch to a different loss function?

formulate your answer, then —

tldr

Every ML loss function implies a distributional assumption: MSE→Gaussian, BCE→Bernoulli, CE→Categorical, Poisson NLL→Poisson. Choosing the right distribution means your model optimizes for the right thing. Key patterns: Gaussian for symmetric continuous data, Poisson for counts, Exponential/Weibull for time-to-event, Beta for rates with small samples, LogNormal for positive heavy-tailed data. When assumptions break down (outliers, asymmetry, bounded targets), switch to the loss function that matches your data's actual distribution.

follow-up

  • How would you decide between Poisson regression and Negative Binomial regression for count data? What diagnostic tells you which is appropriate?
  • Explain how the reparameterization trick in VAEs relates to probability distributions — what problem does it solve?
  • Your team wants to predict "time until user returns to the app." Why is standard regression inappropriate, and what distribution/model family would you use?