mlprep
mlprep/ML Breadthmedium10 min

What's the difference between an embedding model like text-embedding-3-large and a generative model like GPT-4? Walk me through how each is trained, what the output represents, and when you'd use one vs the other for a real task.

formulate your answer, then —

tldr

Generative models predict the next token (causal attention, autoregressive). Embedding models map sequences to fixed-size vectors trained for semantic similarity via contrastive learning (bidirectional attention). GPT hidden states aren't valid semantic embeddings — they're asymmetric and optimized for next-token prediction, not similarity. Use embeddings for retrieval/search; use generative models for open-ended generation. In RAG: bi-encoder for recall, cross-encoder for precision reranking.

follow-up

  • Why do larger batch sizes help contrastive training for embedding models?
  • What is the matryoshka representation learning objective and why is it useful for variable-cost retrieval?
  • How would you evaluate embedding model quality without running end-to-end retrieval experiments?