mlprep
mlprep/ML Breadthhard12 min

Explain mixed precision training. Why does it speed up deep learning, and what numerical problems do you need to handle?

formulate your answer, then —

tldr

Mixed precision speeds training by using FP16/BF16 for many tensor operations while preserving stability with FP32 where needed. FP16 often needs loss scaling; BF16 is more numerically forgiving. Monitor overflows, NaNs, convergence, and quality against a baseline.

follow-up

  • Why is BF16 often more stable than FP16?
  • What is dynamic loss scaling?
  • Which parts of Adam training are risky to store only in low precision?