Skip to content

Commit

Permalink
Add parsing paremters to the HTTP and GRPC frontends (triton-inferenc…
Browse files Browse the repository at this point in the history
…e-server#5490)

* Add parsing paremters to the HTTP frontend

* Add parameters to the GRPC server

* Add testing for parameters

* Fix up

* Add testing for async and streaming

* Add documentation and reserved parameters list

* Modify based on feedback
  • Loading branch information
Tabrizian authored Mar 15, 2023
1 parent b81ba91 commit 81e5a39
Show file tree
Hide file tree
Showing 7 changed files with 522 additions and 51 deletions.
1 change: 1 addition & 0 deletions docs/protocol/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ plus several extensions that are defined in the following documents:
- [Statistics extension](./extension_statistics.md)
- [Trace extension](./extension_trace.md)
- [Logging extension](./extension_logging.md)
- [Parameters extension](./extension_parameters.md)

For the GRPC protocol, the [protobuf
specification](https://github.com/triton-inference-server/common/blob/main/protobuf/grpc_service.proto)
Expand Down
87 changes: 87 additions & 0 deletions docs/protocol/extension_parameters.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
<!--
# Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# * Neither the name of NVIDIA CORPORATION nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-->

# Parameters Extension

This document describes Triton's parameters extension. The
parameters extension allows an inference request to provide
custom parameters that cannot be provided as inputs. Because this extension is
supported, Triton reports “parameters” in the extensions field of its
Server Metadata. This extension uses the optional "parameters"
field in the KServe Protocol in
[HTTP](https://kserve.github.io/website/0.10/modelserving/data_plane/v2_protocol/#inference-request-json-object)
and
[GRPC](https://kserve.github.io/website/0.10/modelserving/data_plane/v2_protocol/#parameters).

The following parameters are reserved for Triton's usage and should not be
used as custom parameters:

- sequence_id
- priority
- timeout
- sequence_start
- sequence_end
- All the keys that start with "triton_" prefix.
- headers

When using both GRPC and HTTP endpoints, you need to make sure to not use
the reserved parameters list to avoid unexpected behavior. The reserved
parameters are not accessible in the Triton C-API.

## HTTP/REST

The following example shows how a request can include custom parameters.

```
POST /v2/models/mymodel/infer HTTP/1.1
Host: localhost:8000
Content-Type: application/json
Content-Length: <xx>
{
"parameters" : { "my_custom_parameter" : 42 }
"inputs" : [
{
"name" : "input0",
"shape" : [ 2, 2 ],
"datatype" : "UINT32",
"data" : [ 1, 2, 3, 4 ]
}
],
"outputs" : [
{
"name" : "output0",
}
]
}
```

## GRPC

The `parameters` field in the
ModelInferRequest message can be used to send custom parameters.

67 changes: 67 additions & 0 deletions qa/L0_parameters/model_repository/parameter/1/model.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# * Neither the name of NVIDIA CORPORATION nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

import triton_python_backend_utils as pb_utils
import numpy as np


class TritonPythonModel:

@staticmethod
def auto_complete_config(auto_complete_model_config):
inputs = [{'name': 'INPUT0', 'data_type': 'TYPE_FP32', 'dims': [1]}]
outputs = [{'name': 'OUTPUT0', 'data_type': 'TYPE_STRING', 'dims': [1]}]

config = auto_complete_model_config.as_dict()
input_names = []
output_names = []
for input in config['input']:
input_names.append(input['name'])
for output in config['output']:
output_names.append(output['name'])

for input in inputs:
if input['name'] not in input_names:
auto_complete_model_config.add_input(input)
for output in outputs:
if output['name'] not in output_names:
auto_complete_model_config.add_output(output)

auto_complete_model_config.set_max_batch_size(0)
return auto_complete_model_config

def execute(self, requests):
# A simple model that puts the parameters in the in the request in the
# output.
responses = []
for request in requests:
output0 = np.asarray([request.parameters()], dtype=object)
output_tensor = pb_utils.Tensor("OUTPUT0", output0)
inference_response = pb_utils.InferenceResponse(
output_tensors=[output_tensor])
responses.append(inference_response)

return responses
184 changes: 184 additions & 0 deletions qa/L0_parameters/parameters_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,184 @@
# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# * Neither the name of NVIDIA CORPORATION nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

import sys

sys.path.append("../common")

import numpy as np
import infer_util as iu
import test_util as tu
import tritonclient.http as httpclient
import tritonclient.grpc as grpcclient
import tritonclient.http.aio as asynchttpclient
import tritonclient.grpc.aio as asyncgrpcclient
from tritonclient.utils import InferenceServerException
from unittest import IsolatedAsyncioTestCase
import json
import unittest
import queue
from functools import partial


class InferenceParametersTest(IsolatedAsyncioTestCase):

async def asyncSetUp(self):
self.http = httpclient.InferenceServerClient(url='localhost:8000')
self.async_http = asynchttpclient.InferenceServerClient(
url='localhost:8000')
self.grpc = grpcclient.InferenceServerClient(url='localhost:8001')
self.async_grpc = asyncgrpcclient.InferenceServerClient(
url='localhost:8001')

self.parameter_list = []
self.parameter_list.append({'key1': 'value1', 'key2': 'value2'})
self.parameter_list.append({'key1': 1, 'key2': 2})
self.parameter_list.append({'key1': True, 'key2': 'value2'})
self.parameter_list.append({'triton_': True, 'key2': 'value2'})

def callback(user_data, result, error):
if error:
user_data.put(error)
else:
user_data.put(result)

self.grpc_callback = callback

def create_inputs(self, client_type):
inputs = []
inputs.append(client_type.InferInput('INPUT0', [1], "FP32"))

# Initialize the data
inputs[0].set_data_from_numpy(np.asarray([1], dtype=np.float32))
return inputs

async def send_request_and_verify(self,
client_type,
client,
is_async=False):

inputs = self.create_inputs(client_type)
for parameters in self.parameter_list:
# The `triton_` prefix is reserved for Triton usage
should_error = False
if 'triton_' in parameters.keys():
should_error = True

if is_async:
if should_error:
with self.assertRaises(InferenceServerException):
result = await client.infer(model_name='parameter',
inputs=inputs,
parameters=parameters)
return
else:
result = await client.infer(model_name='parameter',
inputs=inputs,
parameters=parameters)

else:
if should_error:
with self.assertRaises(InferenceServerException):
result = client.infer(model_name='parameter',
inputs=inputs,
parameters=parameters)
return
else:
result = client.infer(model_name='parameter',
inputs=inputs,
parameters=parameters)

self.verify_outputs(result, parameters)

def verify_outputs(self, result, parameters):
result = result.as_numpy('OUTPUT0')
self.assertEqual(json.loads(result[0]), parameters)

async def test_grpc_parameter(self):
await self.send_request_and_verify(grpcclient, self.grpc)

async def test_http_parameter(self):
await self.send_request_and_verify(httpclient, self.http)

async def test_async_http_parameter(self):
await self.send_request_and_verify(asynchttpclient,
self.async_http,
is_async=True)

async def test_async_grpc_parameter(self):
await self.send_request_and_verify(asyncgrpcclient,
self.async_grpc,
is_async=True)

def test_http_async_parameter(self):
inputs = self.create_inputs(httpclient)
# Skip the parameter that returns an error
parameter_list = self.parameter_list[:-1]
for parameters in parameter_list:
result = self.http.async_infer(model_name='parameter',
inputs=inputs,
parameters=parameters).get_result()
self.verify_outputs(result, parameters)

def test_grpc_async_parameter(self):
user_data = queue.Queue()
inputs = self.create_inputs(grpcclient)
# Skip the parameter that returns an error
parameter_list = self.parameter_list[:-1]
for parameters in parameter_list:
self.grpc.async_infer(model_name='parameter',
inputs=inputs,
parameters=parameters,
callback=partial(self.grpc_callback,
user_data))
result = user_data.get()
self.assertFalse(result is InferenceServerException)
self.verify_outputs(result, parameters)

def test_grpc_stream_parameter(self):
user_data = queue.Queue()
self.grpc.start_stream(callback=partial(self.grpc_callback, user_data))
inputs = self.create_inputs(grpcclient)
# Skip the parameter that returns an error
parameter_list = self.parameter_list[:-1]
for parameters in parameter_list:
self.grpc.async_stream_infer(model_name='parameter',
inputs=inputs,
parameters=parameters)
result = user_data.get()
self.assertFalse(result is InferenceServerException)
self.verify_outputs(result, parameters)
self.grpc.stop_stream()

async def asyncTearDown(self):
self.http.close()
self.grpc.close()
await self.async_grpc.close()
await self.async_http.close()


if __name__ == '__main__':
unittest.main()
Loading

0 comments on commit 81e5a39

Please sign in to comment.