How do you run an A/B test to evaluate a new ML model?

Question

Accepted Answer

How would you run an A/B test to evaluate whether a new ML model is better than the current one? What makes ML A/B tests different from standard product experiments? Think about: what metric you're testing and whether the model directly affects it. How long to run the test. What interference effects are. What statistical power means and how it determines sample size. What the difference is between offline evaluation and an A/B test. An A/B test measures whether the new model produces better busi