mlprep

Your A/B test returns p=0.001 — highly significant. Your PM is excited. What questions do you ask before declaring victory? Explain the difference between statistical significance and practical significance, and when a highly significant result can still be useless.

formulate your answer, then —

tldr

Statistical significance (p-value) conflates effect size with sample size — large n makes even trivial effects significant. Effect size (Cohen's d, relative lift) measures what actually matters: how large is the improvement? The dangerous failure mode: declaring victory on a significant 0.001% lift. Pre-define MDE before the test (business decision: minimum effect worth shipping). Report confidence intervals on the effect magnitude, not just p-values. Statistical significance is necessary but not sufficient for a ship decision.

follow-up

  • If your A/B test shows p=0.04 and a 0.05% lift, how do you present this to your PM?
  • How does the concept of practical significance interact with CUPED — if you reduce variance, do you change effect size?
  • What is the difference between a one-sided and two-sided test, and when would you use each for a product experiment?