Evaluation Stores: Closing the ML Data Flywheel?
Model Stores, Feature Stores, and now Evaluation stores? Is the MLOps space going crazy, or is there a real need for these tools?
In this post, we take a look at the Evaluation Store concept as it stands in 2021.
Example Story
It all started with that ML system alert. 📉🔥🔥
The on-call MLEs pick up their pagers. They see that the accuracy on a hyper-important model has dropped for 2 days. The drop in accuracy triggers a rolling window alert.
The MLEs check the system metrics of the serving infra, but nothing pops out. ✅
Then they make sure that the model that is currently deployed is the expected one. “model_10_01_2021__10_31_2021.pkl” ✅
Then they check that the volume of the prediction requests is not too different from the trend.✅
Then they attempt to retrieve the offline model metrics from the model store. They check if there is a way to compare this model compared to the previous version deployed last week. ✅
The offline metrics do not look too different on the experimentation tracking dashboards. ✅