mlprep
mlprep/ML Breadthhard14 min

Explain the Wide & Deep architecture and why Google built it for app recommendations. Then walk me through how Deep & Cross Network (DCN) improves on it. What problem are both trying to solve, and when would you use each in a ranking system today?

formulate your answer, then —

tldr

Wide & Deep trains a memorization path (linear model on hand-crossed features) + generalization path (DNN on embeddings) jointly. DCN replaces hand-engineered crosses with a cross network that learns polynomial feature interactions automatically — each cross layer adds one interaction degree with O(d) parameters. DCN v2 uses a full matrix per layer for richer interactions. Modern systems use DCN v2 + multi-task heads rather than vanilla Wide & Deep.

follow-up

  • How does DCN v2's matrix cross layer differ from dot-product self-attention for feature interaction?
  • What is DLRM and how does it handle the interaction between sparse embedding features and dense features differently from DCN?
  • How would you decide how many cross layers to use in DCN for a production ranking system?