mlprep
mlprep/ML Breadthhard10 min

What is focal loss? Why was it introduced, and how does it differ from cross-entropy? When would you reach for it?

formulate your answer, then —

tldr

Focal loss = cross-entropy × (1 - p_t)^γ. The modulating factor downweights easy, well-classified examples so hard examples dominate training. γ=0 recovers CE; γ=2 is standard. Use for dense detection (many background anchors) or severe class imbalance where easy negatives drown out hard positives. Simple weighted cross-entropy is still the first thing to try.

follow-up

  • How would you set the α and γ hyperparameters? Is there a systematic way?
  • Focal loss was designed for single-stage detectors. Could you use it for NLP tasks with class imbalance?
  • What are alternatives to focal loss for handling class imbalance, and how do they compare?