mlprep
mlprep/ML Breadthmedium12 min

Walk me through how modern object detection works. How would you frame the space of approaches?

formulate your answer, then —

You mentioned NMS to deduplicate detections. Can you walk through exactly how non-maximum suppression works and where it breaks down?

formulate your answer, then —

tldr

Object detection = bounding box regression + classification, done simultaneously. Two-stage detectors (Faster R-CNN) propose regions then classify; one-stage (YOLO) predict directly from a grid. Anchor boxes define the space of candidate shapes the model refines. NMS deduplicates predictions greedily by confidence + IoU; it fails when distinct objects overlap.

follow-up

  • How would you adapt an object detection pipeline to detect very small objects in high-resolution satellite imagery?
  • What is the precision-recall tradeoff in detection and how does mAP summarize it?
  • How does DETR's approach eliminate the need for anchor boxes and NMS, and what does it trade off?