mlprep

Design the ML system behind Amazon's product search. A user types "wireless headphones" — how do you decide which products to show and in what order? Walk me through retrieval, ranking, the signals that matter, and how you'd measure whether the system is working.

formulate your answer, then —

You mentioned personalization — using the user's purchase history in ranking. But most searches on Amazon are from returning users, and a user's past purchases shouldn't always influence future results. If I just bought a TV, I don't want TVs surfaced for the next year. How do you handle this?

formulate your answer, then —

Let's talk about sponsored products. How do you blend paid placements with organic results without destroying user trust?

formulate your answer, then —

How would you design the system for tail queries and new products with little interaction history?

formulate your answer, then —

tldr

Amazon search design is a multi-stage marketplace ranking system: query understanding, lexical + dense + category retrieval, multi-task ranking for click/cart/purchase/return, sponsored-organic blending with quality floors, personalization as a constrained tiebreaker, and exploration for tail queries/new products. Offline NDCG is only a proxy; online conversion, abandonment, reformulation, return rate, revenue, and trust guardrails decide success.

follow-up

  • How would you design the query understanding layer — specifically, how do you detect when a user's query has a navigational intent (they want one specific product) vs. an exploratory intent (they're browsing a category)?
  • Amazon has sponsored products that appear in search results. How do you blend organic ranking with paid placement without degrading search quality?
  • How would you detect search result manipulation — sellers gaming the ranking system through fake reviews or click fraud?