1.2.0
What's Changed
Simplified Interface for Custom Runtimes
MLServer now exposes an alternative “simplified” interface which can be used to write custom runtimes. This interface can be enabled by decorating your predict() method with the mlserver.codecs.decode_args
decorator, and it lets you specify in the method signature both how you want your request payload to be decoded and how to encode the response back.
Based on the information provided in the method signature, MLServer will automatically decode the request payload into the different inputs specified as keyword arguments. Under the hood, this is implemented through MLServer’s codecs and content types system.
from mlserver import MLModel
from mlserver.codecs import decode_args
class MyCustomRuntime(MLModel):
async def load(self) -> bool:
# TODO: Replace for custom logic to load a model artifact
self._model = load_my_custom_model()
self.ready = True
return self.ready
@decode_args
async def predict(self, questions: List[str], context: List[str]) -> np.ndarray:
# TODO: Replace for custom logic to run inference
return self._model.predict(questions, context)
Built-in Templates for Custom Runtimes
To make it easier to write your own custom runtimes, MLServer now ships with a mlserver init
command that will generate a templated project. This project will include a skeleton with folders, unit tests, Dockerfiles, etc. for you to fill.
Dynamic Loading of Custom Runtimes
MLServer now lets you load custom runtimes dynamically into a running instance of MLServer. Once you have your custom runtime ready, all you need to do is to move it to your model folder, next to your model-settings.json
configuration file.
For example, if we assume a flat model repository where each folder represents a model, you would end up with a folder structure like the one below:
.
├── models
│ └── sum-model
│ ├── model-settings.json
│ ├── models.py
Batch Inference Client
This release of MLServer introduces a new mlserver infer
command, which will let you run inference over a large batch of input data on the client side. Under the hood, this command will stream a large set of inference requests from specified input file, arrange them in microbatches, orchestrate the request / response lifecycle, and will finally write back the obtained responses into output file.
Parallel Inference Improvements
The 1.2.0
release of MLServer, includes a number of fixes around the parallel inference pool focused on improving the architecture to optimise memory usage and reduce latency. These changes include (but are not limited to):
- The main MLServer process won’t load an extra replica of the model anymore. Instead, all computing will occur on the parallel inference pool.
- The worker pool will now ensure that all requests are executed on each worker’s AsyncIO loop, thus optimising compute time vs IO time.
- Several improvements around logging from the inference workers.
Dropped support for Python 3.7
MLServer has now dropped support for Python 3.7
. Going forward, only 3.8
, 3.9
and 3.10
will be supported (with 3.8
being used in our official set of images).
Move to UBI Base Images
The official set of MLServer images has now moved to use UBI 9 as a base image. This ensures support to run MLServer in OpenShift clusters, as well as a well-maintained baseline for our images.
Support for MLflow 2.0
In line with MLServer’s close relationship with the MLflow team, this release of MLServer introduces support for the recently released MLflow 2.0. This introduces changes to the drop-in MLflow “scoring protocol” support, in the MLflow runtime for MLServer, to ensure it’s aligned with MLflow 2.0.
MLServer is also shipped as a dependency of MLflow, therefore you can try it out today by installing MLflow as:
$ pip install mlflow[extras]
To learn more about how to use MLServer directly from the MLflow CLI, check out the MLflow docs.
New Contributors
- @johnpaulett made their first contribution in #633
- @saeid93 made their first contribution in #711
- @RafalSkolasinski made their first contribution in #720
- @dumaas made their first contribution in #742
- @Salehbigdeli made their first contribution in #776
- @regen100 made their first contribution in #839
Full Changelog: 1.1.0...1.2.0