Add parsing paremters to the HTTP and GRPC frontends (triton-inferenc…

…e-server#5490) * Add parsing paremters to the HTTP frontend * Add parameters to the GRPC server * Add testing for parameters * Fix up * Add testing for async and streaming * Add documentation and reserved parameters list * Modify based on feedback
mailmahee · Mar 15, 2023 · 81e5a39 · 81e5a39
1 parent b81ba91
commit 81e5a39
Show file tree

Hide file tree

Showing 7 changed files with 522 additions and 51 deletions.
diff --git a/docs/protocol/README.md b/docs/protocol/README.md
@@ -44,6 +44,7 @@ plus several extensions that are defined in the following documents:
 - [Statistics extension](./extension_statistics.md)
 - [Trace extension](./extension_trace.md)
 - [Logging extension](./extension_logging.md)
+- [Parameters extension](./extension_parameters.md)
 
 For the GRPC protocol, the [protobuf
 specification](https://github.com/triton-inference-server/common/blob/main/protobuf/grpc_service.proto)

diff --git a/docs/protocol/extension_parameters.md b/docs/protocol/extension_parameters.md
@@ -0,0 +1,87 @@
+<!--
+# Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#  * Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+#  * Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#  * Neither the name of NVIDIA CORPORATION nor the names of its
+#    contributors may be used to endorse or promote products derived
+#    from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
+# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+# PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+-->
+
+# Parameters Extension
+
+This document describes Triton's parameters extension. The
+parameters extension allows an inference request to provide
+custom parameters that cannot be provided as inputs. Because this extension is
+supported, Triton reports “parameters” in the extensions field of its
+Server Metadata. This extension uses the optional "parameters"
+field in the KServe Protocol in
+[HTTP](https://kserve.github.io/website/0.10/modelserving/data_plane/v2_protocol/#inference-request-json-object)
+and
+[GRPC](https://kserve.github.io/website/0.10/modelserving/data_plane/v2_protocol/#parameters).
+
+The following parameters are reserved for Triton's usage and should not be
+used as custom parameters:
+
+- sequence_id
+- priority
+- timeout
+- sequence_start
+- sequence_end
+- All the keys that start with "triton_" prefix.
+- headers
+
+When using both GRPC and HTTP endpoints, you need to make sure to not use
+the reserved parameters list to avoid unexpected behavior. The reserved
+parameters are not accessible in the Triton C-API.
+
+## HTTP/REST
+
+The following example shows how a request can include custom parameters.
+
+```
+POST /v2/models/mymodel/infer HTTP/1.1
+Host: localhost:8000
+Content-Type: application/json
+Content-Length: <xx>
+{
+  "parameters" : { "my_custom_parameter" : 42 }
+  "inputs" : [
+    {
+      "name" : "input0",
+      "shape" : [ 2, 2 ],
+      "datatype" : "UINT32",
+      "data" : [ 1, 2, 3, 4 ]
+    }
+  ],
+  "outputs" : [
+    {
+      "name" : "output0",
+    }
+  ]
+}
+```
+
+## GRPC
+
+The `parameters` field in the
+ModelInferRequest message can be used to send custom parameters.
+
diff --git a/qa/L0_parameters/model_repository/parameter/1/model.py b/qa/L0_parameters/model_repository/parameter/1/model.py
@@ -0,0 +1,67 @@
+# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#  * Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+#  * Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#  * Neither the name of NVIDIA CORPORATION nor the names of its
+#    contributors may be used to endorse or promote products derived
+#    from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
+# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+# PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+import triton_python_backend_utils as pb_utils
+import numpy as np
+
+
+class TritonPythonModel:
+
+    @staticmethod
+    def auto_complete_config(auto_complete_model_config):
+        inputs = [{'name': 'INPUT0', 'data_type': 'TYPE_FP32', 'dims': [1]}]
+        outputs = [{'name': 'OUTPUT0', 'data_type': 'TYPE_STRING', 'dims': [1]}]
+
+        config = auto_complete_model_config.as_dict()
+        input_names = []
+        output_names = []
+        for input in config['input']:
+            input_names.append(input['name'])
+        for output in config['output']:
+            output_names.append(output['name'])
+
+        for input in inputs:
+            if input['name'] not in input_names:
+                auto_complete_model_config.add_input(input)
+        for output in outputs:
+            if output['name'] not in output_names:
+                auto_complete_model_config.add_output(output)
+
+        auto_complete_model_config.set_max_batch_size(0)
+        return auto_complete_model_config
+
+    def execute(self, requests):
+        # A simple model that puts the parameters in the in the request in the
+        # output.
+        responses = []
+        for request in requests:
+            output0 = np.asarray([request.parameters()], dtype=object)
+            output_tensor = pb_utils.Tensor("OUTPUT0", output0)
+            inference_response = pb_utils.InferenceResponse(
+                output_tensors=[output_tensor])
+            responses.append(inference_response)
+
+        return responses
diff --git a/qa/L0_parameters/parameters_test.py b/qa/L0_parameters/parameters_test.py
@@ -0,0 +1,184 @@
+# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#  * Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+#  * Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#  * Neither the name of NVIDIA CORPORATION nor the names of its
+#    contributors may be used to endorse or promote products derived
+#    from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
+# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+# PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+import sys
+
+sys.path.append("../common")
+
+import numpy as np
+import infer_util as iu
+import test_util as tu
+import tritonclient.http as httpclient
+import tritonclient.grpc as grpcclient
+import tritonclient.http.aio as asynchttpclient
+import tritonclient.grpc.aio as asyncgrpcclient
+from tritonclient.utils import InferenceServerException
+from unittest import IsolatedAsyncioTestCase
+import json
+import unittest
+import queue
+from functools import partial
+
+
+class InferenceParametersTest(IsolatedAsyncioTestCase):
+
+    async def asyncSetUp(self):
+        self.http = httpclient.InferenceServerClient(url='localhost:8000')
+        self.async_http = asynchttpclient.InferenceServerClient(
+            url='localhost:8000')
+        self.grpc = grpcclient.InferenceServerClient(url='localhost:8001')
+        self.async_grpc = asyncgrpcclient.InferenceServerClient(
+            url='localhost:8001')
+
+        self.parameter_list = []
+        self.parameter_list.append({'key1': 'value1', 'key2': 'value2'})
+        self.parameter_list.append({'key1': 1, 'key2': 2})
+        self.parameter_list.append({'key1': True, 'key2': 'value2'})
+        self.parameter_list.append({'triton_': True, 'key2': 'value2'})
+
+        def callback(user_data, result, error):
+            if error:
+                user_data.put(error)
+            else:
+                user_data.put(result)
+
+        self.grpc_callback = callback
+
+    def create_inputs(self, client_type):
+        inputs = []
+        inputs.append(client_type.InferInput('INPUT0', [1], "FP32"))
+
+        # Initialize the data
+        inputs[0].set_data_from_numpy(np.asarray([1], dtype=np.float32))
+        return inputs
+
+    async def send_request_and_verify(self,
+                                      client_type,
+                                      client,
+                                      is_async=False):
+
+        inputs = self.create_inputs(client_type)
+        for parameters in self.parameter_list:
+            # The `triton_` prefix is reserved for Triton usage
+            should_error = False
+            if 'triton_' in parameters.keys():
+                should_error = True
+
+            if is_async:
+                if should_error:
+                    with self.assertRaises(InferenceServerException):
+                        result = await client.infer(model_name='parameter',
+                                                    inputs=inputs,
+                                                    parameters=parameters)
+                    return
+                else:
+                    result = await client.infer(model_name='parameter',
+                                                inputs=inputs,
+                                                parameters=parameters)
+
+            else:
+                if should_error:
+                    with self.assertRaises(InferenceServerException):
+                        result = client.infer(model_name='parameter',
+                                              inputs=inputs,
+                                              parameters=parameters)
+                    return
+                else:
+                    result = client.infer(model_name='parameter',
+                                          inputs=inputs,
+                                          parameters=parameters)
+
+            self.verify_outputs(result, parameters)
+
+    def verify_outputs(self, result, parameters):
+        result = result.as_numpy('OUTPUT0')
+        self.assertEqual(json.loads(result[0]), parameters)
+
+    async def test_grpc_parameter(self):
+        await self.send_request_and_verify(grpcclient, self.grpc)
+
+    async def test_http_parameter(self):
+        await self.send_request_and_verify(httpclient, self.http)
+
+    async def test_async_http_parameter(self):
+        await self.send_request_and_verify(asynchttpclient,
+                                           self.async_http,
+                                           is_async=True)
+
+    async def test_async_grpc_parameter(self):
+        await self.send_request_and_verify(asyncgrpcclient,
+                                           self.async_grpc,
+                                           is_async=True)
+
+    def test_http_async_parameter(self):
+        inputs = self.create_inputs(httpclient)
+        # Skip the parameter that returns an error
+        parameter_list = self.parameter_list[:-1]
+        for parameters in parameter_list:
+            result = self.http.async_infer(model_name='parameter',
+                                           inputs=inputs,
+                                           parameters=parameters).get_result()
+            self.verify_outputs(result, parameters)
+
+    def test_grpc_async_parameter(self):
+        user_data = queue.Queue()
+        inputs = self.create_inputs(grpcclient)
+        # Skip the parameter that returns an error
+        parameter_list = self.parameter_list[:-1]
+        for parameters in parameter_list:
+            self.grpc.async_infer(model_name='parameter',
+                                  inputs=inputs,
+                                  parameters=parameters,
+                                  callback=partial(self.grpc_callback,
+                                                   user_data))
+            result = user_data.get()
+            self.assertFalse(result is InferenceServerException)
+            self.verify_outputs(result, parameters)
+
+    def test_grpc_stream_parameter(self):
+        user_data = queue.Queue()
+        self.grpc.start_stream(callback=partial(self.grpc_callback, user_data))
+        inputs = self.create_inputs(grpcclient)
+        # Skip the parameter that returns an error
+        parameter_list = self.parameter_list[:-1]
+        for parameters in parameter_list:
+            self.grpc.async_stream_infer(model_name='parameter',
+                                         inputs=inputs,
+                                         parameters=parameters)
+            result = user_data.get()
+            self.assertFalse(result is InferenceServerException)
+            self.verify_outputs(result, parameters)
+        self.grpc.stop_stream()
+
+    async def asyncTearDown(self):
+        self.http.close()
+        self.grpc.close()
+        await self.async_grpc.close()
+        await self.async_http.close()
+
+
+if __name__ == '__main__':
+    unittest.main()