Skip to content

Commit

Permalink
Add GRPC option for restricted protocol access (triton-inference-serv…
Browse files Browse the repository at this point in the history
…er#5397)

* [WIP] simple example

* WIP

* Implement GRPC restricted protocol option

* Add GRPC restricted protocol test. Fix bug

* Add documentation for endpoint configuration

* Add missing protocol

* Change option documentation based on change in detail

* Update command line option to restrict protocol access

* minor fix

* Update doc

* Address comment

* fix up

* Address comment
  • Loading branch information
GuanLuo authored Mar 21, 2023
1 parent ed554d8 commit 980bd76
Show file tree
Hide file tree
Showing 8 changed files with 688 additions and 156 deletions.
52 changes: 51 additions & 1 deletion docs/customization_guide/inference_protocols.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<!--
# Copyright 2018-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
Expand Down Expand Up @@ -114,6 +114,56 @@ These options can be used to configure the KeepAlive settings:

For client-side documentation, see [Client-Side GRPC KeepAlive](https://github.com/triton-inference-server/client/blob/main/README.md#grpc-keepalive).

#### Limit Endpoint Access (BETA)

In some use cases, Triton users may want to restrict the access of the protocols on a given endpoint.
For example, there can be need for two separate protocol groups that one exposes standard inference
protocols for user access, while the other one exposes other extension protocols for administration
usage and should not be accessible by non-admin user.

The following option can be specified to declare an restricted protocol group:

```
--grpc-restricted-protocol=<protocol_1>,<protocol_2>,...:<restricted-key>=<restricted-value>
```

The option can be specified multiple times to specifies multiple groups of
protocols with different restriction settings.

* `protocols` : A comma-separated list of protocols to be included in this
group. Note that currently a given protocol is not allowed to be included in
multiple groups. The following protocols are currently recognized by all network
protocol types mentioned above:

* `health` : Health endpoint defined for [HTTP/REST](https://github.com/kserve/kserve/blob/master/docs/predict-api/v2/required_api.md#health) and [GRPC](https://github.com/kserve/kserve/blob/master/docs/predict-api/v2/required_api.md#health-1). For GRPC endpoint, this value also exposes [GRPC health check protocol](https://github.com/triton-inference-server/common/blob/main/protobuf/health.proto).
* `metadata` : Server / model metadata endpoints defined for [HTTP/REST](https://github.com/kserve/kserve/blob/master/docs/predict-api/v2/required_api.md#server-metadata) and [GRPC](https://github.com/kserve/kserve/blob/master/docs/predict-api/v2/required_api.md#server-metadata-1).
* `inference` : Inference endpoints defined for [HTTP/REST](https://github.com/kserve/kserve/blob/master/docs/predict-api/v2/required_api.md#inference) and [GRPC](https://github.com/kserve/kserve/blob/master/docs/predict-api/v2/required_api.md#inference-1).
* `shared-memory` : [Shared-memory endpoint](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_shared_memory.md).
* `model-config` : [Model configuration endpoint](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_model_configuration.md).
* `model-repository` : [Model repository endpoint](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_model_repository.md).
* `statistics` : [statistics endpoint](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_statistics.md).
* `trace` : [trace endpoint](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_trace.md).
* `logging` : [logging endpoint](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_logging.md).

* `restricted-key` : Key to determine the GRPC request header to be checked when a
request to the protocol is received. The completed header will be in the form of
`triton-grpc-protocol-<restricted-key>`

* `restricted-value` : The value of the header to be matched in order to preceed
the process of the specified protocols.

#### Example

To start server with a subset of protocols to be restricted in use case
described above, the following command line arguments can be set to accept
"standard inference" request without additional header and the rest of the
protocols with `triton-grpc-protocol-<admin-key>=<admin-value>` specified in header:

```
tritonserver --grpc-restricted-protocol=shared-memory,model-config,model-repository,statistics,trace:<admin-key>=<admin-value> ...
```


## In-Process Triton Server API

The Triton Inference Server provides a backwards-compatible C API that
Expand Down
12 changes: 10 additions & 2 deletions docs/protocol/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,17 +36,25 @@ plus several extensions that are defined in the following documents:

- [Binary tensor data extension](./extension_binary_data.md)
- [Classification extension](./extension_classification.md)
- [Model configuration extension](./extension_model_configuration.md)
- [Model repository extension](./extension_model_repository.md)
- [Schedule policy extension](./extension_schedule_policy.md)
- [Sequence extension](./extension_sequence.md)
- [Shared-memory extension](./extension_shared_memory.md)
- [Model configuration extension](./extension_model_configuration.md)
- [Model repository extension](./extension_model_repository.md)
- [Statistics extension](./extension_statistics.md)
- [Trace extension](./extension_trace.md)
- [Logging extension](./extension_logging.md)
- [Parameters extension](./extension_parameters.md)

Note that some extensions introduce new fields onto the inference protocols,
and the other extensions define new protocols that Triton follows, please refer
to the extension documents for detail.

For the GRPC protocol, the [protobuf
specification](https://github.com/triton-inference-server/common/blob/main/protobuf/grpc_service.proto)
is also available. In addition, you can find the GRPC health checking protocol protobuf
specification [here](https://github.com/triton-inference-server/common/blob/main/protobuf/health.proto).

You can configure the Triton endpoints, which implement the protocols, to
restrict access to some protocols and to control network settings, please refer
to [protocol customization guide](https://github.com/triton-inference-server/server/blob/main/docs/customization_guide/inference_protocols.md#httprest-and-grpc-protocols) for detail.
154 changes: 154 additions & 0 deletions qa/L0_grpc/python_unit_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
#!/usr/bin/env python
# Copyright 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# * Neither the name of NVIDIA CORPORATION nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

import unittest
import numpy as np
import time

import tritonclient.grpc as grpcclient
from tritonclient.utils import InferenceServerException

# For stream infer test
from functools import partial
import queue


class UserData:

def __init__(self):
self._completed_requests = queue.Queue()


def callback(user_data, result, error):
if error:
user_data._completed_requests.put(error)
else:
user_data._completed_requests.put(result)


class RestrictedProtocolTest(unittest.TestCase):

def setUp(self):
self.client_ = grpcclient.InferenceServerClient(url="localhost:8001")
self.model_name_ = "simple"
self.prefix_ = "triton-grpc-protocol-"

# Other unspecified protocols should not be restricted
def test_sanity(self):
self.client_.get_inference_statistics("simple")
self.client_.get_inference_statistics(
"simple", headers={self.prefix_ + "infer-key": "infer-value"})

# health, infer, model repository protocols are restricted.
# health and infer expects "triton-grpc-restricted-infer-key : infer-value" header,
# model repository expected "triton-grpc-restricted-admin-key : admin-value".
def test_model_repository(self):
with self.assertRaisesRegex(InferenceServerException,
"This protocol is restricted"):
self.client_.unload_model(
self.model_name_,
headers={self.prefix_ + "infer-key": "infer-value"})
# Request go through and get actual transaction error
with self.assertRaisesRegex(
InferenceServerException,
"explicit model load / unload is not allowed"):
self.client_.unload_model(
self.model_name_,
headers={self.prefix_ + "admin-key": "admin-value"})

def test_health(self):
with self.assertRaisesRegex(InferenceServerException,
"This protocol is restricted"):
self.client_.is_server_live()
self.client_.is_server_live({self.prefix_ + "infer-key": "infer-value"})

def test_infer(self):
# setup
inputs = [
grpcclient.InferInput('INPUT0', [1, 16], "INT32"),
grpcclient.InferInput('INPUT1', [1, 16], "INT32")
]
inputs[0].set_data_from_numpy(np.ones(shape=(1, 16), dtype=np.int32))
inputs[1].set_data_from_numpy(np.ones(shape=(1, 16), dtype=np.int32))

# This test only care if the request goes through
with self.assertRaisesRegex(InferenceServerException,
"This protocol is restricted"):
results = self.client_.infer(model_name=self.model_name_,
inputs=inputs,
headers={'test': '1'})
self.client_.infer(model_name=self.model_name_,
inputs=inputs,
headers={self.prefix_ + "infer-key": "infer-value"})

def test_stream_infer(self):
# setup
inputs = [
grpcclient.InferInput('INPUT0', [1, 16], "INT32"),
grpcclient.InferInput('INPUT1', [1, 16], "INT32")
]
inputs[0].set_data_from_numpy(np.ones(shape=(1, 16), dtype=np.int32))
inputs[1].set_data_from_numpy(np.ones(shape=(1, 16), dtype=np.int32))
user_data = UserData()
# The server can't interfere with whether GRPC should create the stream,
# server will be notified after the stream is established and only
# until then be able to access metadata to decide whether to continue
# the stream.
# So on client side, it will always perceive that the stream is
# successfully created and can only check its health at a later time.
self.client_.start_stream(partial(callback, user_data),
headers={'test': '1'})
# wait for sufficient round-trip time
time.sleep(1)
with self.assertRaisesRegex(InferenceServerException,
"The stream is no longer in valid state"):
self.client_.async_stream_infer(model_name=self.model_name_,
inputs=inputs)
# callback should record error detail
self.assertFalse(user_data._completed_requests.empty())
with self.assertRaisesRegex(InferenceServerException,
"This protocol is restricted"):
raise user_data._completed_requests.get()

self.assertTrue(user_data._completed_requests.empty())

# Stop and start new stream with proper header
self.client_.stop_stream()
self.client_.start_stream(
partial(callback, user_data),
headers={self.prefix_ + "infer-key": "infer-value"})
self.client_.async_stream_infer(model_name=self.model_name_,
inputs=inputs)
# wait for response
time.sleep(1)
self.assertFalse(user_data._completed_requests.empty())
self.assertNotEqual(type(user_data._completed_requests.get()),
InferenceServerException)


if __name__ == "__main__":
unittest.main()
39 changes: 39 additions & 0 deletions qa/L0_grpc/test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,7 @@ else
SIMPLE_CUSTOM_ARGS_CLIENT=../clients/simple_grpc_custom_args_client
CC_UNIT_TEST=../clients/cc_client_test
fi
PYTHON_UNIT_TEST=python_unit_test.py

# Add string_dyna_sequence model to repo
cp -r ${MODELDIR}/simple_dyna_sequence ${MODELDIR}/simple_string_dyna_sequence
Expand Down Expand Up @@ -571,6 +572,44 @@ fi
kill $SERVER_PID
wait $SERVER_PID

# Repeated protocol, not allowed
SERVER_ARGS="--model-repository=${MODELDIR} \
--grpc-restricted-protocol=model-repository,health:k1=v1 \
--grpc-restricted-protocol=metadata,health:k2=v2"
run_server
EXPECTED_MSG="protocol 'health' can not be specified in multiple config group"
if [ "$SERVER_PID" != "0" ]; then
echo -e "\n***\n*** Expect fail to start $SERVER\n***"
kill $SERVER_PID
wait $SERVER_PID
RET=1
elif [ `grep -c "${EXPECTED_MSG}" ${SERVER_LOG}` != "1" ]; then
echo -e "\n***\n*** Failed. Expected ${EXPECTED_MSG} to be found in log\n***"
cat $SERVER_LOG
RET=1
fi

# Test restricted protocols
SERVER_ARGS="--model-repository=${MODELDIR} \
--grpc-restricted-protocol=model-repository:admin-key=admin-value \
--grpc-restricted-protocol=inference,health:infer-key=infer-value"
run_server
if [ "$SERVER_PID" == "0" ]; then
echo -e "\n***\n*** Failed to start $SERVER\n***"
cat $SERVER_LOG
exit 1
fi
set +e
python $PYTHON_UNIT_TEST RestrictedProtocolTest > $CLIENT_LOG 2>&1
if [ $? -ne 0 ]; then
cat $CLIENT_LOG
echo -e "\n***\n*** Python GRPC Restricted Protocol Test Failed\n***"
RET=1
fi
set -e
kill $SERVER_PID
wait $SERVER_PID

if [ $RET -eq 0 ]; then
echo -e "\n***\n*** Test Passed\n***"
else
Expand Down
Loading

0 comments on commit 980bd76

Please sign in to comment.