Your dataset has sampling bias — how do you detect and correct it?

Question

Accepted Answer

You have a dataset collected from production logs to train a new ML model. How do you detect whether it has sampling bias, and what techniques can you use to correct for it? Think about: how production data is inherently biased by the decisions of the current system. What "sampling bias" means technically — which population are you sampling from vs. which population do you want to generalize to? How you can detect the gap and correct for it. Production data is never a random sample of the popula