Your A/B test returns p=0.001 — highly significant. Your PM is excited. What questions do you ask before declaring victory? Explain the difference between statistical significance and practical significance, and when a highly significant result can still be useless.
tldr
Statistical significance (p-value) conflates effect size with sample size — large n makes even trivial effects significant. Effect size (Cohen's d, relative lift) measures what actually matters: how large is the improvement? The dangerous failure mode: declaring victory on a significant 0.001% lift. Pre-define MDE before the test (business decision: minimum effect worth shipping). Report confidence intervals on the effect magnitude, not just p-values. Statistical significance is necessary but not sufficient for a ship decision.
follow-up
- If your A/B test shows p=0.04 and a 0.05% lift, how do you present this to your PM?
- How does the concept of practical significance interact with CUPED — if you reduce variance, do you change effect size?
- What is the difference between a one-sided and two-sided test, and when would you use each for a product experiment?