Design a feature store for a company running dozens of ML models in production. What problems does it solve, what are its core components, and what are the consistency and freshness challenges you'd need to address?
Go deeper on point-in-time correctness. How exactly would training data be generated without leaking future information?
Now design online serving. A fraud model needs 50 features in under 20ms. What happens on the critical path?
How do you govern a feature store so dozens of teams can reuse features without creating hidden data quality problems?
tldr
A feature store design should cover shared feature definitions, offline historical retrieval, online low-latency serving, streaming materialization, point-in-time correctness, freshness SLAs, data quality monitoring, lineage, access control, and degraded behavior. The core ML-specific challenge is preserving the prediction-time information boundary while making feature reuse safe across many teams.
follow-up
- How would you handle a feature that's computed from another feature (derived features)? What are the dependency management challenges?
- Walk me through how you'd implement point-in-time correct joins in BigQuery for model training data creation.
- How would you design the alerting system for a feature store to catch data quality issues before they silently degrade model performance?