How do you run A/B tests for ML models in production?

Question

Accepted Answer

Walk me through how you'd design and interpret an A/B test for a new ranking model. What are the common failure modes? Think about: what you're randomizing, what the primary metric is, how long to run, what power you need, what guardrails you need, and why a flat result doesn't mean the model failed. **Experiment design basics** An A/B test splits traffic between control (old model) and treatment (new model). Randomization unit matters: randomize by user, not by request, to avoid the same user s