Evaluation Stores: Closing the ML Data Flywheel?
Model Stores, Feature Stores, and now Evaluation stores? Is the MLOps space going crazy, or is there a real need for these tools?
In this post, we take a look at the Evaluation Store concept as it stands in 2021.
Example Story
It all started with that ML system alert. ππ₯π₯
The on-call MLEs pick up their pagers. They see that the accuracy on a hyper-important model has dropped for 2 days. The drop in accuracy triggers a rolling window alert.
The MLEs check the system metrics of the serving infra, but nothing pops out. β
Then they make sure that the model that is currently deployed is the expected one. βmodel_10_01_2021__10_31_2021.pklβ β
Then they check that the volume of the prediction requests is not too different from the trend.β
Then they attempt to retrieve the offline model metrics from the model store. They check if there is a way to compare this model compared to the previous version deployed last week. β
The offline metrics do not look too different on the experimentation tracking dashboards. β