mlprep
mlprep/ML Breadthhard12 min

You train a model daily on logged features. At serving time the same features are computed in real-time. What can go wrong, and how do you catch it before it hurts users?

formulate your answer, then —

tldr

Train-serve skew: feature values differ between training and serving due to duplicated logic, time window misalignment, late events, or different join semantics — not because the world changed. Detection: log serving features, compare distributions to training features. Prevention: feature store with shared code path for training retrieval and serving lookup; point-in-time correctness prevents future leakage. Shadow validation + distribution monitoring to catch regression.

follow-up

  • What is point-in-time correctness in a feature store and why is it essential for preventing data leakage?
  • How would you detect train-serve skew automatically in a production ML system?
  • Your model's performance degrades two weeks after launch but training metrics still look fine. How do you distinguish train-serve skew from concept drift?