Your model's accuracy dropped 8% over 2 weeks. Walk through your complete debugging process. How do you isolate the cause and decide whether to rollback, hotfix, or retrain?
formulate your answer, then —
tldr
Debug in order: (1) validate metric is real and understand temporal shape; (2) check serving infrastructure — feature pipeline, null rates, code changes; (3) compare feature distributions using PSI; (4) segment error analysis to find the concentrated failure; (5) classify: pipeline failure → rollback/hotfix, drift → retrain. A cliff suggests a code or pipeline regression; a gradual slide suggests drift. Assess blast radius before rollback. Post-incident: close the observability gap that let 8% accumulate undetected.
follow-up
- How do you compute Population Stability Index and what threshold triggers investigation?
- Your feature pipeline has 200 features. How do you prioritize which ones to check first?
- You identify concept drift as the cause. How do you decide how much recent data to retrain on — and how do you validate the retrained model before deploying?