Evaluation Stores: Closing the ML Data Flywheel?

Model Stores, Feature Stores, and now Evaluation stores? Is the MLOps space going crazy, or is there a real need for these tools?

Moussa Taifi PhD
10 min readNov 29, 2021
Kandinsky, Squares with Concentric Circles, 1913 [9]

In this post, we take a look at the Evaluation Store concept as it stands in 2021.

Example Story

It all started with that ML system alert. πŸ“‰πŸ”₯πŸ”₯

The on-call MLEs pick up their pagers. They see that the accuracy on a hyper-important model has dropped for 2 days. The drop in accuracy triggers a rolling window alert.

The MLEs check the system metrics of the serving infra, but nothing pops out. βœ…

Then they make sure that the model that is currently deployed is the expected one. β€œmodel_10_01_2021__10_31_2021.pkl” βœ…

Then they check that the volume of the prediction requests is not too different from the trend.βœ…

Then they attempt to retrieve the offline model metrics from the model store. They check if there is a way to compare this model compared to the previous version deployed last week. βœ…

The offline metrics do not look too different on the experimentation tracking dashboards. βœ…

--

--

Moussa Taifi PhD

Senior Data Science Platform Engineer β€” CS PhDβ€” Cloudamize-Appnexus-Xandr-AT&T-Microsoft β€” Books: www.moussataifi.com/books