Schema enforcement error when using `PandasCodec` inference request #1625

mfueller · 2024-03-06T15:34:29Z

Hi!

My goal is to serve a mlflow model with a signature via mlserver and observed some issue with the signature enforcement and the request generated by PandasCodec.

I followed the example from https://mlserver.readthedocs.io/en/latest/examples/mlflow/README.html

The model signature in the example is inferred by

model_signature = infer_signature(train_x, train_y)

and logged to mlflow via:

mlflow.sklearn.log_model(
                lr,
                "model",
                registered_model_name="ElasticnetWineModel",
                signature=model_signature,
            )

model serving is done with mlserver

mlserver start .

I can test the inference via the given plain json example

import requests

inference_request = {
    "inputs": [
        {
          "name": "fixed acidity",
          "shape": [1],
          "datatype": "FP32",
          "data": [7.4],
        },
        {
          "name": "volatile acidity",
          "shape": [1],
          "datatype": "FP32",
          "data": [0.7000],
        },
       ...
    ]
}

endpoint = "http://localhost:8080/v2/models/wine-classifier/infer"
response = requests.post(endpoint, json=inference_request)

response.json()

{'model_name': 'ElasticnetWineModel',
 'id': 'c0dbba4c-ac18-43ba-a408-3cf9536931bd',
 'parameters': {'content_type': 'np'},
 'outputs': [{'name': 'output-1',
   'shape': [1, 1],
   'datatype': 'FP64',
   'parameters': {'content_type': 'np'},
   'data': [5.576883936610762]}]}

However, if I create the inference request using PandasCodec like this

inference_request = PandasCodec.encode_request(test_x.head(1))
endpoint = "http://localhost:8080/v2/models/wine-classifier/infer"
response = requests.post(endpoint, json=inference_request.dict())
response.json()

I get the following error response:

{'error': "mlflow.exceptions.MlflowException: Failed to enforce schema of data '  fixed acidity volatile acidity citric acid  ...       pH sulphates  alcohol\n0       (10.1,)          (0.37,)     (0.34,)  ...  (3.17,)   (0.65,)  (10.6,)\n\n[1 rows x 11 columns]' with schema '['fixed acidity': double (required), 'volatile acidity': double (required), 'citric acid': double (required), 'residual sugar': double (required), 'chlorides': double (required), 'free sulfur dioxide': double (required), 'total sulfur dioxide': double (required), 'density': double (required), 'pH': double (required), 'sulphates': double (required), 'alcohol': double (required)]'. Error: Invalid object type at position 0"}

and mlserver shows the following stack trace:

endpoint-1  | 2024-03-06 15:09:15,804 [mlserver.parallel] ERROR - An error occurred calling method 'predict' from model 'ElasticnetWineModel'.
endpoint-1  | Traceback (most recent call last):
endpoint-1  |   File "lib.pyx", line 2374, in pandas._libs.lib.maybe_convert_numeric
endpoint-1  | TypeError: Invalid object type
endpoint-1  |
endpoint-1  | During handling of the above exception, another exception occurred:
endpoint-1  |
endpoint-1  | Traceback (most recent call last):
endpoint-1  |   File "/usr/local/lib/python3.11/site-packages/mlflow/pyfunc/__init__.py", line 471, in predict
endpoint-1  |     data = _enforce_schema(data, input_schema)
endpoint-1  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
endpoint-1  |   File "/usr/local/lib/python3.11/site-packages/mlflow/models/utils.py", line 954, in _enforce_schema
endpoint-1  |     return _enforce_named_col_schema(pf_input, input_schema)
endpoint-1  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
endpoint-1  |   File "/usr/local/lib/python3.11/site-packages/mlflow/models/utils.py", line 673, in _enforce_named_col_schema
endpoint-1  |     new_pf_input[name] = _enforce_mlflow_datatype(name, pf_input[name], input_type)
endpoint-1  |                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
endpoint-1  |   File "/usr/local/lib/python3.11/site-packages/mlflow/models/utils.py", line 585, in _enforce_mlflow_datatype
endpoint-1  |     return pd.to_numeric(values, errors="raise")
endpoint-1  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
endpoint-1  |   File "/usr/local/lib/python3.11/site-packages/pandas/core/tools/numeric.py", line 222, in to_numeric
endpoint-1  |     values, new_mask = lib.maybe_convert_numeric(  # type: ignore[call-overload]  # noqa: E501
endpoint-1  |                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
endpoint-1  |   File "lib.pyx", line 2416, in pandas._libs.lib.maybe_convert_numeric
endpoint-1  | TypeError: Invalid object type at position 0
endpoint-1  |
endpoint-1  | During handling of the above exception, another exception occurred:
endpoint-1  |
endpoint-1  | Traceback (most recent call last):
endpoint-1  |   File "/usr/local/lib/python3.11/site-packages/mlserver/parallel/worker.py", line 136, in _process_request
endpoint-1  |     return_value = await method(
endpoint-1  |                    ^^^^^^^^^^^^^
endpoint-1  |   File "/usr/local/lib/python3.11/site-packages/mlserver_mlflow/runtime.py", line 199, in predict
endpoint-1  |     model_output = self._model.predict(decoded_payload)
endpoint-1  |                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
endpoint-1  |   File "/usr/local/lib/python3.11/site-packages/mlflow/pyfunc/__init__.py", line 474, in predict
endpoint-1  |     raise MlflowException.invalid_parameter_value(
endpoint-1  | mlflow.exceptions.MlflowException: Failed to enforce schema of data '  fixed acidity volatile acidity citric acid  ...       pH sulphates  alcohol
endpoint-1  | 0       (10.1,)          (0.37,)     (0.34,)  ...  (3.17,)   (0.65,)  (10.6,)
endpoint-1  |
endpoint-1  | [1 rows x 11 columns]' with schema '['fixed acidity': double (required), 'volatile acidity': double (required), 'citric acid': double (required), 'residual sugar': double (required), 'chlorides': double (required), 'free sulfur dioxide': double (required), 'total sulfur dioxide': double (required), 'density': double (required), 'pH': double (required), 'sulphates': double (required), 'alcohol': double (required)]'. Error: Invalid object type at position 0

I had a look on the difference between the plain json schema and the resulting PandasCodec schema. It seems that the shape attribute for the input is different:

plain json example

...
{
          "name": "fixed acidity",
          "shape": [1],
          "datatype": "FP32",
          "data": [7.4],
}
...

PandasCodec result

inference_request = PandasCodec.encode_request(test_x.head(1))
inference_request.dict()

{'parameters': {'content_type': 'pd'},
 'inputs': [{'name': 'fixed acidity',
   'shape': [1, 1],
   'datatype': 'FP64',
   'data': [10.1]},
  {'name': 'volatile acidity',
...

The shape is [1] in the plain json example and [1,1] in the resulting PandasCodec request.

I fixed the shape information and tested it successfully with:

inference_request = PandasCodec.encode_request(test_x.head(1))

for key in inference_request.inputs:
    key.shape = [key.shape[0]]
    
inference_request.dict()

{'parameters': {'content_type': 'pd'},
 'inputs': [{'name': 'fixed acidity',
   'shape': [1],
   'datatype': 'FP64',
   'data': [10.1]},
...

endpoint = "http://localhost:8080/v2/models/ElasticnetWineModel/infer"
response = requests.post(endpoint, json=inference_request.dict())

response.json()

{'model_name': 'ElasticnetWineModel',
 'id': '5aa69a4a-864b-49cb-99fa-5d2b1afedb2e',
 'parameters': {'content_type': 'np'},
 'outputs': [{'name': 'output-1',
   'shape': [1, 1],
   'datatype': 'FP64',
   'parameters': {'content_type': 'np'},
   'data': [5.731344540042413]}]}

Version Infos:

python version: '3.11.7 (main, Jan 29 2024, 16:03:57) [GCC 13.2.1 20230801]'
mlflow version: '2.10.2'
mlserver version: '1.4.0'

Is there any issue in the generated shape information generated by PandasCodec or have I missed something?

Thanks a lot for any help!

The text was updated successfully, but these errors were encountered:

ReveStobinson · 2024-04-11T16:40:33Z

I have also had this issue, and it's unclear from the docs if this is intended or not. There is a disclaimer about this in the Numpy Array section of the docs, but it is not present in the Pandas DataFrame section just after it, and the presented JSON Payload has other errors that make it unreliable for determining what the correct value should be (I have an MR out for that docs change at #1679)

jesse-c · 2024-05-16T10:51:02Z

We've merged in a contribution in #1751 that should fix this. If you're able to use MLServer from master to get the latest change, could you please let us know if it fixes the issue for you.

ReveStobinson · 2024-05-20T17:14:14Z

We've merged in a contribution in #1751 that should fix this. If you're able to use MLServer from master to get the latest change, could you please let us know if it fixes the issue for you.

@jesse-c I just had a chance to test this with the new changes. TL;DR is that this works! But not quite in the way I expected it to, and I didn't think it would when I first serialized the new request.

Expand for more.

Using the same code snippet, the inference request looked the same with both the MLServer 1.5.0 release, and the current state of master:

import pandas as pd
import mlserver.grpc.converters as converters
from mlserver.codecs import PandasCodec

example_dict = {
    "input1": [132.6454, 131.315],
    "input2": [2.78412, 1.315],
    "input3": [12.9, 35.6687]
}
data = pd.DataFrame(example_dict)

inference_request = PandasCodec.encode_request(data, inject_batch=True)
grpc_inference_request = converters.ModelInferRequestConverter.from_types(inference_request,
                                                                          model_name=model_name, model_version=None)
inference_request.inputs

Output:

[RequestInput(name='input1', shape=[2, 1], datatype='FP64', parameters=None, data=TensorData(__root__=[132.6454, 131.315])),
 RequestInput(name='input2', shape=[2, 1], datatype='FP64', parameters=None, data=TensorData(__root__=[2.78412, 1.315])),
 RequestInput(name='input3', shape=[2, 1], datatype='FP64', parameters=None, data=TensorData(__root__=[12.9, 35.6687]))]

In 1.5.0, this would result in an error when trying to send it to the inference server, so I would have to amend the code snippet above to add two short lines to reshape the inputs:

inference_request = PandasCodec.encode_request(data, inject_batch=True)
for i in inference_request.inputs:
    i.shape = [i.shape[0]]

Which would yield a request that looked like this:

[RequestInput(name='input1', shape=[2], datatype='FP64', parameters=None, data=TensorData(root=[132.6454, 131.315])),
 RequestInput(name='input2', shape=[2], datatype='FP64', parameters=None, data=TensorData(root=[2.78412, 1.315])),
 RequestInput(name='input3', shape=[2], datatype='FP64', parameters=None, data=TensorData(root=[12.9, 35.6687]))]

And that version of the request would work in 1.5.0.

With the new changes on master, it appears that both requests are acceptable and yield the same results. I was expecting to see the PandasCodec change the input shapes to be 1-dimensional given 1-D input data, but it was implemented on the inference runtime side instead.

This change is a welcome one for sure, since it removes the need for those workaround lines in code that might have to handle multiple request types! But I wasn't sure it was going to work when I noticed that my requests looked the same as they did before, so that might be beneficial to mention in release notes somewhere.

Thank you for this!

This was referenced Apr 11, 2024

Docs specify Pandas DataFrame Codec input shapes incorrectly. #1678

Closed

Fix JSON input shapes #1679

Merged

lhnwrk mentioned this issue May 9, 2024

Fix Pandas codec decoding from numpy arrays #1751

Merged

jesse-c added bug Something isn't working codecs labels May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Schema enforcement error when using `PandasCodec` inference request #1625

Schema enforcement error when using `PandasCodec` inference request #1625

mfueller commented Mar 6, 2024

ReveStobinson commented Apr 11, 2024

jesse-c commented May 16, 2024

ReveStobinson commented May 20, 2024

Schema enforcement error when using PandasCodec inference request #1625

Schema enforcement error when using PandasCodec inference request #1625

Comments

mfueller commented Mar 6, 2024

Version Infos:

ReveStobinson commented Apr 11, 2024

jesse-c commented May 16, 2024

ReveStobinson commented May 20, 2024

Schema enforcement error when using `PandasCodec` inference request #1625

Schema enforcement error when using `PandasCodec` inference request #1625