Most production ranking systems use a multi-stage funnel rather than a single model that scores all candidates. Walk me through why, what each stage does, and what the tradeoffs are at each layer. Then tell me: what breaks if you skip a stage?
formulate your answer, then —
tldr
Multi-stage funnel: retrieval (millions → hundreds, optimize recall, ANN), pre-rank (hundreds → 100, cheap filter), rank (100 → 20, expensive model, optimize precision), re-rank (20 → final, diversity + business rules). Each stage trades model capacity for latency. Key failure modes: retrieval false negatives are unrecoverable; pre-rank degradation silently hurts ranker quality via distribution shift; skipping re-rank collapses diversity. Log exploration slices to detect cascade failures.
follow-up
- How would you detect that pre-rank is degrading and causing downstream ranker quality loss?
- What is maximal marginal relevance and how does it balance relevance and diversity in re-ranking?
- How do you train a ranker end-to-end across stages when each stage filters the training distribution?