mlprep

Explain bootstrapping. When would you use a bootstrap confidence interval instead of a CLT-based one? Walk me through the algorithm and give a concrete example of when bootstrap is the right tool.

formulate your answer, then —

tldr

Bootstrap resamples your data with replacement B times to empirically estimate the sampling distribution of any statistic. No formula needed — works for median, AUC, NDCG, F1, ratios, or any complex statistic with no closed-form variance. Use over CLT when: no analytical variance formula exists, sample is small or skewed, statistic is nonstandard. Limitation: bootstrap can't recover information not in your original sample — doesn't fix small n; use block bootstrap for time series.

follow-up

  • Why does standard bootstrap fail for time series data, and how does block bootstrap address it?
  • How would you use bootstrap to test whether two ML models have significantly different AUC scores?
  • What is the BCa bootstrap and when does it give meaningfully better intervals than the percentile method?