mlprep
mlprep/ML Breadthhard14 min

You have logs from the current recommender. How can you estimate whether a new policy would perform better without fully launching it?

formulate your answer, then —

tldr

Counterfactual evaluation estimates a new policy from old-policy logs. IPS uses logged propensities to debias outcomes, but variance and support mismatch are major limitations. Exploration traffic and doubly robust estimators improve reliability, but A/B testing remains necessary for launch decisions.

follow-up

  • Why do you need logged propensities for IPS?
  • What is the support problem in off-policy evaluation?
  • When would doubly robust estimation beat IPS?