Explain the Wide & Deep architecture and why Google built it for app recommendations. Then walk me through how Deep & Cross Network (DCN) improves on it. What problem are both trying to solve, and when would you use each in a ranking system today?
formulate your answer, then —
tldr
Wide & Deep trains a memorization path (linear model on hand-crossed features) + generalization path (DNN on embeddings) jointly. DCN replaces hand-engineered crosses with a cross network that learns polynomial feature interactions automatically — each cross layer adds one interaction degree with O(d) parameters. DCN v2 uses a full matrix per layer for richer interactions. Modern systems use DCN v2 + multi-task heads rather than vanilla Wide & Deep.
follow-up
- How does DCN v2's matrix cross layer differ from dot-product self-attention for feature interaction?
- What is DLRM and how does it handle the interaction between sparse embedding features and dense features differently from DCN?
- How would you decide how many cross layers to use in DCN for a production ranking system?