Add GRPC option for restricted protocol access (triton-inference-serv…

…er#5397) * [WIP] simple example * WIP * Implement GRPC restricted protocol option * Add GRPC restricted protocol test. Fix bug * Add documentation for endpoint configuration * Add missing protocol * Change option documentation based on change in detail * Update command line option to restrict protocol access * minor fix * Update doc * Address comment * fix up * Address comment
mailmahee · Mar 21, 2023 · 980bd76 · 980bd76
1 parent ed554d8
commit 980bd76
Show file tree

Hide file tree

Showing 8 changed files with 688 additions and 156 deletions.
diff --git a/docs/customization_guide/inference_protocols.md b/docs/customization_guide/inference_protocols.md
@@ -1,5 +1,5 @@
 <!--
-# Copyright 2018-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# Copyright 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 #
 # Redistribution and use in source and binary forms, with or without
 # modification, are permitted provided that the following conditions
@@ -114,6 +114,56 @@ These options can be used to configure the KeepAlive settings:
 
 For client-side documentation, see [Client-Side GRPC KeepAlive](https://github.com/triton-inference-server/client/blob/main/README.md#grpc-keepalive).
 
+#### Limit Endpoint Access (BETA)
+
+In some use cases, Triton users may want to restrict the access of the protocols on a given endpoint.
+For example, there can be need for two separate protocol groups that one exposes standard inference
+protocols for user access, while the other one exposes other extension protocols for administration
+usage and should not be accessible by non-admin user.
+
+The following option can be specified to declare an restricted protocol group:
+
+```
+--grpc-restricted-protocol=<protocol_1>,<protocol_2>,...:<restricted-key>=<restricted-value>
+```
+
+The option can be specified multiple times to specifies multiple groups of
+protocols with different restriction settings.
+
+* `protocols` : A comma-separated list of protocols to be included in this
+group. Note that currently a given protocol is not allowed to be included in
+multiple groups. The following protocols are currently recognized by all network
+protocol types mentioned above:
+
+  * `health` : Health endpoint defined for [HTTP/REST](https://github.com/kserve/kserve/blob/master/docs/predict-api/v2/required_api.md#health) and [GRPC](https://github.com/kserve/kserve/blob/master/docs/predict-api/v2/required_api.md#health-1). For GRPC endpoint, this value also exposes [GRPC health check protocol](https://github.com/triton-inference-server/common/blob/main/protobuf/health.proto).
+  * `metadata` : Server / model metadata endpoints defined for [HTTP/REST](https://github.com/kserve/kserve/blob/master/docs/predict-api/v2/required_api.md#server-metadata) and [GRPC](https://github.com/kserve/kserve/blob/master/docs/predict-api/v2/required_api.md#server-metadata-1).
+  * `inference` : Inference endpoints defined for [HTTP/REST](https://github.com/kserve/kserve/blob/master/docs/predict-api/v2/required_api.md#inference) and [GRPC](https://github.com/kserve/kserve/blob/master/docs/predict-api/v2/required_api.md#inference-1).
+  * `shared-memory` : [Shared-memory endpoint](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_shared_memory.md).
+  * `model-config` : [Model configuration endpoint](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_model_configuration.md).
+  * `model-repository` : [Model repository endpoint](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_model_repository.md).
+  * `statistics` : [statistics endpoint](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_statistics.md).
+  * `trace` : [trace endpoint](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_trace.md).
+  * `logging` : [logging endpoint](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_logging.md).
+
+* `restricted-key` : Key to determine the GRPC request header to be checked when a
+request to the protocol is received. The completed header will be in the form of
+`triton-grpc-protocol-<restricted-key>`
+
+* `restricted-value` : The value of the header to be matched in order to preceed
+the process of the specified protocols.
+
+#### Example
+
+To start server with a subset of protocols to be restricted in use case
+described above, the following command line arguments can be set to accept
+"standard inference" request without additional header and the rest of the
+protocols with `triton-grpc-protocol-<admin-key>=<admin-value>` specified in header:
+
+```
+tritonserver --grpc-restricted-protocol=shared-memory,model-config,model-repository,statistics,trace:<admin-key>=<admin-value> ...
+```
+
+
 ## In-Process Triton Server API
 
 The Triton Inference Server provides a backwards-compatible C API that

diff --git a/docs/protocol/README.md b/docs/protocol/README.md
@@ -36,17 +36,25 @@ plus several extensions that are defined in the following documents:
 
 - [Binary tensor data extension](./extension_binary_data.md)
 - [Classification extension](./extension_classification.md)
-- [Model configuration extension](./extension_model_configuration.md)
-- [Model repository extension](./extension_model_repository.md)
 - [Schedule policy extension](./extension_schedule_policy.md)
 - [Sequence extension](./extension_sequence.md)
 - [Shared-memory extension](./extension_shared_memory.md)
+- [Model configuration extension](./extension_model_configuration.md)
+- [Model repository extension](./extension_model_repository.md)
 - [Statistics extension](./extension_statistics.md)
 - [Trace extension](./extension_trace.md)
 - [Logging extension](./extension_logging.md)
 - [Parameters extension](./extension_parameters.md)
 
+Note that some extensions introduce new fields onto the inference protocols,
+and the other extensions define new protocols that Triton follows, please refer
+to the extension documents for detail.
+
 For the GRPC protocol, the [protobuf
 specification](https://github.com/triton-inference-server/common/blob/main/protobuf/grpc_service.proto)
 is also available. In addition, you can find the GRPC health checking protocol protobuf
 specification [here](https://github.com/triton-inference-server/common/blob/main/protobuf/health.proto).
+
+You can configure the Triton endpoints, which implement the protocols, to
+restrict access to some protocols and to control network settings, please refer
+to [protocol customization guide](https://github.com/triton-inference-server/server/blob/main/docs/customization_guide/inference_protocols.md#httprest-and-grpc-protocols) for detail.
diff --git a/qa/L0_grpc/python_unit_test.py b/qa/L0_grpc/python_unit_test.py
@@ -0,0 +1,154 @@
+#!/usr/bin/env python
+# Copyright 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#  * Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+#  * Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#  * Neither the name of NVIDIA CORPORATION nor the names of its
+#    contributors may be used to endorse or promote products derived
+#    from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
+# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+# PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+import unittest
+import numpy as np
+import time
+
+import tritonclient.grpc as grpcclient
+from tritonclient.utils import InferenceServerException
+
+# For stream infer test
+from functools import partial
+import queue
+
+
+class UserData:
+
+    def __init__(self):
+        self._completed_requests = queue.Queue()
+
+
+def callback(user_data, result, error):
+    if error:
+        user_data._completed_requests.put(error)
+    else:
+        user_data._completed_requests.put(result)
+
+
+class RestrictedProtocolTest(unittest.TestCase):
+
+    def setUp(self):
+        self.client_ = grpcclient.InferenceServerClient(url="localhost:8001")
+        self.model_name_ = "simple"
+        self.prefix_ = "triton-grpc-protocol-"
+
+    # Other unspecified protocols should not be restricted
+    def test_sanity(self):
+        self.client_.get_inference_statistics("simple")
+        self.client_.get_inference_statistics(
+            "simple", headers={self.prefix_ + "infer-key": "infer-value"})
+
+    # health, infer, model repository protocols are restricted.
+    # health and infer expects "triton-grpc-restricted-infer-key : infer-value" header,
+    # model repository expected "triton-grpc-restricted-admin-key : admin-value".
+    def test_model_repository(self):
+        with self.assertRaisesRegex(InferenceServerException,
+                                    "This protocol is restricted"):
+            self.client_.unload_model(
+                self.model_name_,
+                headers={self.prefix_ + "infer-key": "infer-value"})
+        # Request go through and get actual transaction error
+        with self.assertRaisesRegex(
+                InferenceServerException,
+                "explicit model load / unload is not allowed"):
+            self.client_.unload_model(
+                self.model_name_,
+                headers={self.prefix_ + "admin-key": "admin-value"})
+
+    def test_health(self):
+        with self.assertRaisesRegex(InferenceServerException,
+                                    "This protocol is restricted"):
+            self.client_.is_server_live()
+        self.client_.is_server_live({self.prefix_ + "infer-key": "infer-value"})
+
+    def test_infer(self):
+        # setup
+        inputs = [
+            grpcclient.InferInput('INPUT0', [1, 16], "INT32"),
+            grpcclient.InferInput('INPUT1', [1, 16], "INT32")
+        ]
+        inputs[0].set_data_from_numpy(np.ones(shape=(1, 16), dtype=np.int32))
+        inputs[1].set_data_from_numpy(np.ones(shape=(1, 16), dtype=np.int32))
+
+        # This test only care if the request goes through
+        with self.assertRaisesRegex(InferenceServerException,
+                                    "This protocol is restricted"):
+            results = self.client_.infer(model_name=self.model_name_,
+                                         inputs=inputs,
+                                         headers={'test': '1'})
+        self.client_.infer(model_name=self.model_name_,
+                           inputs=inputs,
+                           headers={self.prefix_ + "infer-key": "infer-value"})
+
+    def test_stream_infer(self):
+        # setup
+        inputs = [
+            grpcclient.InferInput('INPUT0', [1, 16], "INT32"),
+            grpcclient.InferInput('INPUT1', [1, 16], "INT32")
+        ]
+        inputs[0].set_data_from_numpy(np.ones(shape=(1, 16), dtype=np.int32))
+        inputs[1].set_data_from_numpy(np.ones(shape=(1, 16), dtype=np.int32))
+        user_data = UserData()
+        # The server can't interfere with whether GRPC should create the stream,
+        # server will be notified after the stream is established and only
+        # until then be able to access metadata to decide whether to continue
+        # the stream.
+        # So on client side, it will always perceive that the stream is
+        # successfully created and can only check its health at a later time.
+        self.client_.start_stream(partial(callback, user_data),
+                                  headers={'test': '1'})
+        # wait for sufficient round-trip time
+        time.sleep(1)
+        with self.assertRaisesRegex(InferenceServerException,
+                                    "The stream is no longer in valid state"):
+            self.client_.async_stream_infer(model_name=self.model_name_,
+                                            inputs=inputs)
+        # callback should record error detail
+        self.assertFalse(user_data._completed_requests.empty())
+        with self.assertRaisesRegex(InferenceServerException,
+                                    "This protocol is restricted"):
+            raise user_data._completed_requests.get()
+
+        self.assertTrue(user_data._completed_requests.empty())
+
+        # Stop and start new stream with proper header
+        self.client_.stop_stream()
+        self.client_.start_stream(
+            partial(callback, user_data),
+            headers={self.prefix_ + "infer-key": "infer-value"})
+        self.client_.async_stream_infer(model_name=self.model_name_,
+                                        inputs=inputs)
+        # wait for response
+        time.sleep(1)
+        self.assertFalse(user_data._completed_requests.empty())
+        self.assertNotEqual(type(user_data._completed_requests.get()),
+                            InferenceServerException)
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/qa/L0_grpc/test.sh b/qa/L0_grpc/test.sh
@@ -137,6 +137,7 @@ else
     SIMPLE_CUSTOM_ARGS_CLIENT=../clients/simple_grpc_custom_args_client
     CC_UNIT_TEST=../clients/cc_client_test
 fi
+PYTHON_UNIT_TEST=python_unit_test.py
 
 # Add string_dyna_sequence model to repo
 cp -r ${MODELDIR}/simple_dyna_sequence ${MODELDIR}/simple_string_dyna_sequence
@@ -571,6 +572,44 @@ fi
 kill $SERVER_PID
 wait $SERVER_PID
 
+# Repeated protocol, not allowed
+SERVER_ARGS="--model-repository=${MODELDIR} \
+             --grpc-restricted-protocol=model-repository,health:k1=v1 \
+             --grpc-restricted-protocol=metadata,health:k2=v2"
+run_server
+EXPECTED_MSG="protocol 'health' can not be specified in multiple config group"
+if [ "$SERVER_PID" != "0" ]; then
+    echo -e "\n***\n*** Expect fail to start $SERVER\n***"
+    kill $SERVER_PID
+    wait $SERVER_PID
+    RET=1
+elif [ `grep -c "${EXPECTED_MSG}" ${SERVER_LOG}` != "1" ]; then
+    echo -e "\n***\n*** Failed. Expected ${EXPECTED_MSG} to be found in log\n***"
+    cat $SERVER_LOG
+    RET=1
+fi
+
+# Test restricted protocols
+SERVER_ARGS="--model-repository=${MODELDIR} \
+             --grpc-restricted-protocol=model-repository:admin-key=admin-value \
+             --grpc-restricted-protocol=inference,health:infer-key=infer-value"
+run_server
+if [ "$SERVER_PID" == "0" ]; then
+    echo -e "\n***\n*** Failed to start $SERVER\n***"
+    cat $SERVER_LOG
+    exit 1
+fi
+set +e
+python $PYTHON_UNIT_TEST RestrictedProtocolTest > $CLIENT_LOG 2>&1
+if [ $? -ne 0 ]; then
+    cat $CLIENT_LOG
+    echo -e "\n***\n*** Python GRPC Restricted Protocol Test Failed\n***"
+    RET=1
+fi
+set -e
+kill $SERVER_PID
+wait $SERVER_PID
+
 if [ $RET -eq 0 ]; then
     echo -e "\n***\n*** Test Passed\n***"
 else