Walk me through two-tower models for retrieval — architecture, tradeoffs, and limits

Question

Accepted Answer

Walk me through a two-tower model for candidate retrieval. What are its limitations versus a single cross-attention model, and why do we use it anyway at scale? Think about: what "retrieval" means in a recommendation system — you have 1B items, you can't score all of them. What a two-tower model actually produces (two embedding spaces, one per side). Why you can precompute item embeddings. What cross-attention between query and item at retrieval time would cost. What information is lost when you