Exposes a serialized machine learning model through a HTTP API written in Java.
This project is under active development
The purpose of this project is to expose a generic HTTP API from a machine learning serialized models.
Supported serialized models are :
- ONNX
1.5
- TensorFlow
<=1.15
SavedModel or HDF5 - HuggingFace Tokenizer
- Maven for compiling the project
Java 11
for running the project
HDF5
serialization format is supported through a conversion into SavedModel
format. That conversion relies on following dependencies :
- Python
3.7
- TensorFlow
<=1.15
(pip install tensorflow
)
For HuggingFace tokenizer :
- Cargo (Rust stable)
If you use the API from the docker image this step is not necessary as it will be built within the image.
The Tensorflow module requires the support of HDF5 files through the creation of an executable h5_converter
wich exports the model from HDF5 file to a Tensorflow SavedModel (.pb
).
To generate the converter simply use the initialize_tensorflow goal of the Makefile
:
make initialize_tensorflow
The generated executable can be found here: evaluator-tensorflow/h5_converter/dist/h5_converter
To build the Java binding use the initialize_huggingface goal of the Makefile
:
make initialize_huggingface
To install libtorch use initialize_torch goal of the Makefile
:
make initialize_torch
Convert pyTorch model and more
Several profiles are available depending on the support you require for the built project.
full
which includes both Tensorflow and ONNX, requires the ONNX support, HDF5 support and Torch support.tensorflow
which only includes Tensorflow, requires the HDF5 supportonnx
which only includes ONNX, requires the ONNX support.torch
which only includes Torch, requires the Torch support.
Set your desired profile:
export MAVEN_PROFILE=<your-profile>
If not specified the default profile is set to full
.
make test MAVEN_PROFILE=$MAVEN_PROFILE
make build MAVEN_PROFILE=$MAVEN_PROFILE
The JAR could then be found in api/target/api-*.jar
In the following command, replace <jar-path>
with the path on your compiled jar and <model-path>
with the directory where to find your serialized model.
java -Dfiles.path=<model-path> -jar <jar-path>
If you wish to load a model from a HDF5 model you will need to specify the path to the executable generated in HDF5 support.
java -Dfiles.path=<model-path> -Devaluator.tensorflow.h5_converter.path=<path-to-h5-converter> -jar <jar-path>
Inside the <model-path>
it will look for the first file ending with :
.onnx
for an ONNX model.pb
for a TensorFlow SavedModel.h5
for a HDF5 model
On the launch command you can also specify the following parameters :
-Dserver.port
: the host port to request for the http server-Dswagger.title
: The title that will be dispayed on the swagger-Dswagger.description
: The description that will be displayed on the swagger
make docker-build-api MAVEN_PROFILE=$MAVEN_PROFILE
It will build the docker image serving-runtime-$MAVEN_PROFILE:latest
In the following command, replace <model-path>
with the absolute path on directory where to find your serialized model.
docker run --rm -it -p 8080:8080 -v <model-path>:/deployments/models serving-runtime-$MAVEN_PROFILE:latest
By default the API will be running on http://localhost:8080
. Reaching this URL in your browser will display the SwaggerUI describing the API for your model.
There is 2 routes available in each models :
/describe
: Describe your model (what are the inputs, outputs and transformations)/eval
: Send expected inputs on model and receive expected outputs results
Each serialized model takes a list of named tensors as inputs and also returns a list of named tensors as outputs.
A named tensors is a N-Dimensional array with :
- A identifier name. Example:
my-tensor-name
- A data type. Example:
integer
ordouble
orstring
- A shape. Example:
(5)
for a vector of length 5,(3, 2)
for a matrix which first dimension is of size 3 and second dimension is of size 2. Etc.
You can get access to the model inputs and outputs by calling the http GET
method on /describe
path of the model.
curl \
-X GET \
http://<your-model-url>/describe
You will get a JSON object describing the list of inputs tensors that are needed to query your model as well as the list of outputs tensors that will be returning.
{
"inputs": [
{
"name": "sepal_length",
"type": "float",
"shape": [-1]
},
{
"name": "sepal_width",
"type": "float",
"shape": [-1]
},
{
"name": "petal_length",
"type": "float",
"shape": [-1]
},
{
"name": "petal_width",
"type": "float",
"shape": [-1]
}
],
"outputs": [
{
"name": "output_label",
"type": "long",
"shape": [-1]
},
{
"name": "output_probability",
"type": "float",
"shape": [-1, 2]
}
]
}
In this example, the deployed model is waiting for 4 tensors as inputs :
sepal_length
of shape(-1)
(i.e. a vector of any size)sepal_width
of shape(-1)
(i.e. a vector of any size)petal_length
of shape(-1)
(i.e. a vector of any size)petal_width
of shape(-1)
(i.e. a vector of any size)
It will answer a response with 2 tensors as outputs :
output_label
of shape(-1)
(i.e. a vector of any size)output_probability
of shape(-1, 2)
(i.e. a matrix which first dimension is of any size and which second dimension is of size 2)
Once you know what kind of input tensors are needed by the model, just fill a correct body on your HTTP query with your wanted representation of tensor (see below) and send it to the model with a POST
method on the path /eval
.
Two attached headers are available for your query:
- The Content-Type header indicating the media type of your input tensors data contained in your body message.
- The (optional) Accept header indicating what kind of media type your want to receive for output tensors in the response body. The default
Accept
header if you don't provide one will beapplication/json
.
-
application/json
: A json document which key are the input tensors names and values are the n-dimensional json arrays matching your tensors. -
image/png
: A bytes content which representation is a png encoded image. -
image/jpeg
: A bytes content which representation is a jpeg encoded image.
image/png
andimage/jpeg
are only available for models taking a single tensor as input. That tensor's shape should also be compatible with an image representation.
multipart/form-data
: A multipart body, each part of which is named by an input tensor.
Each part (i.e. tensor) in the multipart should have its own Content-Type
-
application/json
: A JSON document which key is the output tensors names and values are the n-dimensional json arrays matching your tensors. -
image/png
: A bytes content which representation is a png encoded image. -
image/jpeg
: A bytes content which representation is a jpeg encoded image.
image/png
andimage/jpeg
are only available for models returning a single tensor as output. That tensor's shape should also be compatible with an image representation.
text/html
: A HTML document displaying the output tensors representation.multipart/form-data
: A multipart body, each part of which is named by an output tensor and the content is the tensor json representation.
If you want some of the output tensors in
multipart/form-data
andtext/html
header to be interpreted as an image, you can specify it as a parameter in the header.Example : The header
text/html; tensor_1=image/png; tensor_2=image/png
returns the global response as HTML content. Inside the HTML page,tensor_1
andtensor_2
are displayed as png images.
For a tensor to be interpretable as image raw data, it should be of a compatible shape in your exported model. Here are the supported ones :
(x, y, z, 1)
: Batch of x grayscale images with y pixels height and z pixels width(x, 1, y, z)
: Batch of x grayscale images with y pixels height and z pixels width(x, y, z, 3)
: Batch of x RGB images with y pixels height and z pixels width. The last dimension should be the array of(red, green, blue)
components.(x, 3, y, z)
: Batch of x RGB images with y pixels height and z pixels width. The last dimension should be the array of(red, green, blue)
components.(y, z, 1)
: Single grayscale image with y pixels height and z pixels width(1, y, z)
: Single grayscale image with y pixels height and z pixels width(y, z, 3)
: Single RGB image with y pixels height and z pixels width. The last dimension should be the array of(red, green, blue)
components.(3, y, z)
: Single RGB image with y pixels height and z pixels width. The last dimension should be the array of(red, green, blue)
components.
In the following example, we want to receive a prediction from our model for the following item :
sepal_length
: 0.1sepal_width
: 0.2petal_length
: 0.3petal_width
: 0.4
curl \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-X POST \
-d '{
"stepal_length": 0.1,
"stepal_width": 0.2,
"petal_length": 0.3,
"petal_width": 0.4
}' \
http://<your-model-url>/eval
- HTTP Status code:
200
- Header:
Content-Type: application/json
{
"output_label": 0,
"output_probability": [0.88, 0.12]
}
In this example, our model predicts the output_label for our input item to be 0
with the following probabilities :
- 88% of chance to be
0
- 12% of chance to be
1
In the following example, we want to receive a prediction from our model for the two following items :
First Item
sepal_length
: 0.1sepal_width
: 0.2petal_length
: 0.3petal_width
: 0.4
Second Item
sepal_length
: 0.2sepal_width
: 0.3petal_length
: 0.4petal_width
: 0.5
Query
curl \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-X POST \
-d '{
"stepal_length": [0.1, 0.2],
"stepal_width": [0.2, 0.3],
"petal_length": [0.3, 0.4],
"petal_width": [0.4, 0.5]
}' \
http://<your-model-url>/eval
- HTTP Status code:
200
- Header:
Content-Type: application/json
{
"output_label": [0, 1],
"output_probability": [
[0.88, 0.12],
[0.01, 0.99]
]
}
In this example, our model predicts the output_label for our first input item to be 0
with the following probabilities :
- 88% of chance to be
0
- 12% of chance to be
1
It also predicts the output_label for our second input item to be 1
with the following probabilities :
- 1% of chance to be
0
- 99% of chance to be
1
- Contribute: https://github.com/ovh/serving-runtime/blob/master/CONTRIBUTING.md
- Report bugs: https://github.com/ovh/serving-runtime/issues
See https://github.com/ovh/serving-runtime/blob/master/LICENSE