mlprep

Explain Simpson's paradox with a concrete example. When does it appear in ML workflows? How do you detect it, and what does it tell you about the relationship between correlation and causation?

formulate your answer, then —

tldr

Simpson's paradox: a trend in subgroups reverses when aggregated, because a confounder affects both group membership and outcome. Classic signal: one treatment better in every subgroup but worse overall. In ML: hides model performance gaps on subgroups, corrupts feature importance, invalidates naive A/B comparisons if randomization isn't balanced on confounders. Fix: stratify by confounders, ask explicitly which causal question you're answering before choosing aggregate vs subgroup numbers.

follow-up

  • How does randomization in A/B testing protect against Simpson's paradox?
  • If your model evaluation shows a Simpson's paradox across user segments, which metric do you report to stakeholders?
  • How does the backdoor criterion in causal graphs help identify which confounders to control for?