mlprep
mlprep/ML Breadthhard14 min

Walk me through how you'd design and interpret an A/B test for a new ranking model. What are the common failure modes?

formulate your answer, then —

tldr

A/B tests for ML require: user-level randomization, pre-specified primary metric and guardrails, sufficient power, and fixed runtime covering weekly seasonality. Flat results are ambiguous — check that treatment actually differed, the experiment had power, and no confounders were present. Always check for sample ratio mismatch before interpreting results.

follow-up

  • What is sample ratio mismatch and why does it invalidate an experiment?
  • Your experiment ran for two weeks and shows p=0.08 on the primary metric. Do you ship?
  • How do you handle experiments where the label takes 30 days to observe but you need a decision in two weeks?