What is data leakage and how do you prevent it?

Question

Accepted Answer

What is data leakage in ML? Give me examples of how it appears, and how you'd catch and prevent it. Think about: the difference between target leakage (a feature computed from the target) and train-test leakage (test information flowing into training). What makes a seemingly valid feature actually leak future information. Why a model that "looks too good" should trigger suspicion. How leakage slips through group-based CV. **What leakage is** Data leakage occurs when information unavailable at pr