How does data augmentation improve generalization, and when can it hurt?

Question

Accepted Answer

Explain data augmentation as regularization. How do you decide which augmentations are valid for a task? Think about: invariances, label preservation, distribution shift, train-test mismatch, Mixup/CutMix, and why augmentation is domain-specific. **The core idea** Data augmentation creates transformed training examples that should preserve the label. It teaches the model invariances and increases effective training diversity. For images, common augmentations include crops, flips, color jitter, b