Evaluation Stores: Closing the ML Data Flywheel?

Model Stores, Feature Stores, and now Evaluation stores? Is the MLOps space going crazy, or is there a real need for these tools?

Moussa Taifi PhD
10 min readNov 29, 2021
Kandinsky, Squares with Concentric Circles, 1913 [9]

In this post, we take a look at the Evaluation Store concept as it stands in 2021.

Example Story

It all started with that ML system alert. 📉🔥🔥

The on-call MLEs pick up their pagers. They see that the accuracy on a hyper-important model has dropped for 2 days. The drop in accuracy triggers a rolling window alert.

The MLEs check the system metrics of the serving infra, but nothing pops out. ✅

Then they make sure that the model that is currently deployed is the expected one. “model_10_01_2021__10_31_2021.pkl” ✅

Then they check that the volume of the prediction requests is not too different from the trend.✅

Then they attempt to retrieve the offline model metrics from the model store. They check if there is a way to compare this model compared to the previous version deployed last week. ✅

The offline metrics do not look too different on the experimentation tracking dashboards. ✅

--

--

Moussa Taifi PhD
Moussa Taifi PhD

Written by Moussa Taifi PhD

Senior Data Science Platform Engineer — CS PhD— Cloudamize-Appnexus-Xandr-AT&T-Microsoft — Books: www.moussataifi.com/books

No responses yet