How do you evaluate a new ranker using logs from an old policy?

Question

Accepted Answer

You have logs from the current recommender. How can you estimate whether a new policy would perform better without fully launching it? Think about: logged propensities, off-policy evaluation, support mismatch, IPS variance, doubly robust estimators, and why offline replay can be misleading. **The problem** Logs are generated by an old policy. You observe outcomes only for items that policy showed. A new ranker might show different items, but you do not know how users would have responded to thos