mlprep

Your model predicts that users who receive a discount are 30% more likely to purchase. A colleague says "let's give everyone discounts." What's wrong with this reasoning? Walk me through causal inference — why observational data is tricky, and how techniques like propensity score matching and difference-in-differences help.

formulate your answer, then —

tldr

Correlation ≠ causation because of confounders — variables that affect both treatment assignment and outcome. RCTs (A/B tests) remove confounding via randomization. For observational data: propensity score matching balances observed confounders; DiD removes shared time trends; IV uses exogenous variation. All observational methods assume some version of "no unobserved confounders" — this assumption is untestable and the main source of bias. Causal inference is needed for intervention decisions; correlation-based ML is fine for prediction.

follow-up

  • What is the ignorability (unconfoundedness) assumption in propensity score methods, and how would you test whether it holds?
  • How does inverse propensity weighting (IPW) differ from propensity score matching?
  • When does optimizing an ML model on logged data create a feedback loop, and how do counterfactual methods fix it?