Mlops - Model Inference | Aditya Bhatia Blog!

Model inference is a way to run the model on the server and pass in the user data to get the response from the model. There are many steps which go in running the model on production and getting the inference. In this article I will be exploring some of the tools to do server the model.

Let’s take a look first what are some of the challenges and important points for any model serving tool:

Pre-processing/post-processing
Prediction
Scaling
Monitoring
Metrics tracking
Canary model rollout
experiments
ensembles
transformers
model drift

Looking at KFServe, which is the model serving capability in kubeflow. In the KFServe, defines a customer model rollout CRD, as well as ability to define input processing tasks in form of docker containers as transformers in the CR. It also provides the ability to define the docker image for the custom prediction step.

Canary model rollout:

Using KFServer, can specify canary models (new model) and traffic threshold which the canary model should get. In such way the threshold can be increased or the model layout can be updated using the CR to update the traffic for new pods running the new model in the production.

Model ensembles:
Model drift:

Once the model is launched in the production, the model might see data which was not part of the training data for which the performance of the model might not be what was expected. Such a case where model drifts from the data on which the training and evaluation of the model is known as model drift.