mlprep

You have a dataset collected from production logs to train a new ML model. How do you detect whether it has sampling bias, and what techniques can you use to correct for it?

formulate your answer, then —

You mentioned inverse propensity weighting. In practice, how do you estimate the propensity scores when the selection mechanism is a complex recommendation model with billions of parameters? And what goes wrong when propensity estimates are inaccurate?

formulate your answer, then —

tldr

Production data is always biased: selection bias (model filters observations), survivorship bias, popularity bias, temporal bias. Detect with KS tests on feature distributions, PSI, propensity analysis, and subgroup evaluation. Correct with: inverse propensity weighting (upweight underrepresented examples), doubly-robust estimators (robust to propensity errors), stratified resampling, or collecting unbiased exploration data. Log propensity at serving time when possible — estimating it retroactively is error-prone.

follow-up

  • How does Simpson's paradox relate to sampling bias, and can you give an ML example where ignoring a confounding variable reverses the apparent relationship?
  • Your recommendation model has a feedback loop — it only recommends items it's confident about, so it never collects data on uncertain items. How do you break this loop without hurting user experience?
  • When is reweighting insufficient and you fundamentally need new data collection to fix the bias?