Add load-model=* tests and documentation (triton-inference-server#4256

) * Start tweaking lifecycle test to include --load-model=* test * Tweak SERVER_ARGS when testing load-model=* with another load-model argument * Set return value instead of exiting on negative server test * Add documentation on load-model=* to load on all models on startup in explicit model control mode * Typo and copyright update * Add help text on --load-model=* to main.cc * Add pattern matching disclaimer
dhruv2180 · Apr 20, 2022 · 478a9e0 · 478a9e0
1 parent 11410df
commit 478a9e0
Show file tree

Hide file tree

Showing 4 changed files with 104 additions and 14 deletions.
diff --git a/docs/model_management.md b/docs/model_management.md
@@ -1,5 +1,5 @@
 <!--
-# Copyright (c) 2018-2020, NVIDIA CORPORATION. All rights reserved.
+# Copyright 2018-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 #
 # Redistribution and use in source and binary forms, with or without
 # modification, are permitted provided that the following conditions
@@ -47,18 +47,20 @@ protocol](protocol/extension_model_repository.md) will have no affect
 and will return an error response.
 
 This model control mode is selected by specifying
---model-control-mode=none when starting Triton. This is the default
+`--model-control-mode=none` when starting Triton. This is the default
 model control mode. Changing the model repository while Triton is
 running must be done carefully, as explained in [Modifying the Model
 Repository](#modifying-the-model-repository).
 
 ## Model Control Mode EXPLICIT
 
-At startup, Triton loads only those models specified explicitly with
-the --load-model command-line option. If --load-model is not specified
-then no models are loaded at startup. Models that Triton is not able
-to load will be marked as UNAVAILABLE and will not be available for
-inferencing.
+At startup, Triton loads only those models specified explicitly with the
+`--load-model` command-line option. To load ALL models at startup, specify 
+`--load-model=*` as the ONLY `--load-model` argument. Specifying 
+`--load-model=*` in conjunction with another `--load-model` argument will
+result in error. If `--load-model` is not specified then no models are loaded
+at startup. Models that Triton is not able to load will be marked as
+UNAVAILABLE and will not be available for inferencing.
 
 After startup, all model load and unload actions must be initiated
 explicitly by using the [model control
@@ -71,7 +73,7 @@ newly loaded model will replace the already loaded model without any
 loss in availability for the model.
 
 This model control mode is enabled by specifying
---model-control-mode=explicit. Changing the model repository while
+`--model-control-mode=explicit`. Changing the model repository while
 Triton is running must be done carefully, as explained in [Modifying
 the Model Repository](#modifying-the-model-repository).
 
@@ -91,7 +93,7 @@ the model.
 
 Changes to the model repository may not be detected immediately
 because Triton polls the repository periodically. You can control the
-polling interval with the --repository-poll-secs option. The console
+polling interval with the `--repository-poll-secs` option. The console
 log or the [model ready
 protocol](https://github.com/kserve/kserve/blob/master/docs/predict-api/v2/required_api.md)
 or the index operation of the [model control
@@ -109,7 +111,7 @@ protocol](protocols/extension_model_repository.md) will have no affect
 and will return an error response.
 
 This model control mode is enabled by specifying
---model-control-mode=poll and by setting --repository-poll-secs to a
+`--model-control-mode=poll` and by setting `--repository-poll-secs` to a
 non-zero value when starting Triton. Changing the model repository
 while Triton is running must be done carefully, as explained in
 [Modifying the Model Repository](#modifying-the-model-repository).

diff --git a/qa/L0_lifecycle/lifecycle_test.py b/qa/L0_lifecycle/lifecycle_test.py
@@ -2094,7 +2094,7 @@ def test_multiple_model_repository_control_startup_models(self):
             self.assertTrue(False, "unexpected error {}".format(ex))
 
     def test_model_repository_index(self):
-        # use model control EXPLIT and --load-model to load a subset of models
+        # use model control EXPLICIT and --load-model to load a subset of models
         # in model repository
         tensor_shape = (1, 16)
         model_bases = ['graphdef', 'savedmodel', "simple_savedmodel"]
@@ -2119,7 +2119,7 @@ def test_model_repository_index(self):
                 self.assertTrue(False, "unexpected error {}".format(ex))
 
         # Check model repository index
-        # All models should be in ready state except savedmodel_float32_float32_float32
+        # All models should be in ready state except onnx_float32_float32_float32
         # which appears in two repositories.
         model_bases.append("simple_graphdef")
         try:

diff --git a/qa/L0_lifecycle/test.sh b/qa/L0_lifecycle/test.sh
@@ -1196,6 +1196,90 @@ wait $SERVER_PID
 
 LOG_IDX=$((LOG_IDX+1))
 
+# Test loading all models on startup in EXPLICIT model control mode, re-use
+# existing LifeCycleTest.test_multiple_model_repository_control_startup_models
+# unit test
+rm -fr models models_0 config.pbtxt.*
+mkdir models models_0
+# Ensemble models in the second repository
+for i in plan onnx ; do
+    cp -r $DATADIR/qa_model_repository/${i}_float32_float32_float32 models/.
+    cp -r $DATADIR/qa_ensemble_model_repository/qa_model_repository/simple_${i}_float32_float32_float32 models_0/.
+    sed -i "s/max_batch_size:.*/max_batch_size: 1/" models/${i}_float32_float32_float32/config.pbtxt
+    sed -i "s/max_batch_size:.*/max_batch_size: 1/" models_0/simple_${i}_float32_float32_float32/config.pbtxt
+done
+
+# savedmodel doesn't load because it is duplicated in 2 repositories
+for i in savedmodel ; do
+    cp -r $DATADIR/qa_model_repository/${i}_float32_float32_float32 models/.
+    cp -r $DATADIR/qa_model_repository/${i}_float32_float32_float32 models_0/.
+done
+
+SERVER_ARGS="--model-repository=`pwd`/models --model-repository=`pwd`/models_0 \
+             --model-control-mode=explicit \
+             --strict-readiness=false \
+             --strict-model-config=false --exit-on-error=false \
+             --load-model=*"
+SERVER_LOG="./inference_server_$LOG_IDX.log"
+run_server
+if [ "$SERVER_PID" == "0" ]; then
+    echo -e "\n***\n*** Failed to start $SERVER\n***"
+    cat $SERVER_LOG
+    exit 1
+fi
+
+rm -f $CLIENT_LOG
+set +e
+python $LC_TEST LifeCycleTest.test_multiple_model_repository_control_startup_models >>$CLIENT_LOG 2>&1
+if [ $? -ne 0 ]; then
+    cat $CLIENT_LOG
+    echo -e "\n***\n*** Test Failed\n***"
+    RET=1
+else
+    check_test_results $TEST_RESULT_FILE 1
+    if [ $? -ne 0 ]; then
+        cat $CLIENT_LOG
+        echo -e "\n***\n*** Test Result Verification Failed\n***"
+        RET=1
+    fi
+fi
+set -e
+
+kill $SERVER_PID
+wait $SERVER_PID
+
+LOG_IDX=$((LOG_IDX+1))
+
+# Test loading all models on startup in EXPLICIT model control mode AND
+# an additional --load-model argument, it should fail
+rm -fr models 
+mkdir models 
+for i in onnx ; do
+    cp -r $DATADIR/qa_model_repository/${i}_float32_float32_float32 models/.
+    sed -i "s/max_batch_size:.*/max_batch_size: 1/" models/${i}_float32_float32_float32/config.pbtxt
+done
+
+# --load-model=* can not be used with any other --load-model arguments
+# as it's unclear what the user's intentions are.
+SERVER_ARGS="--model-repository=`pwd`/models --model-repository=`pwd`/models_0 \
+             --model-control-mode=explicit \
+             --strict-readiness=true \
+             --exit-on-error=true \
+             --load-model=* \
+             --load-model=onnx_float32_float32_float32"
+SERVER_LOG="./inference_server_$LOG_IDX.log"
+run_server
+if [ "$SERVER_PID" != "0" ]; then
+    echo -e "\n***\n*** Failed: $SERVER started successfully when it was expected to fail\n***"
+    cat $SERVER_LOG
+    RET=1
+
+    kill $SERVER_PID
+    wait $SERVER_PID
+fi
+
+LOG_IDX=$((LOG_IDX+1))
+
 # LifeCycleTest.test_model_repository_index
 rm -fr models models_0 config.pbtxt.*
 mkdir models models_0

diff --git a/src/main.cc b/src/main.cc
@@ -519,8 +519,12 @@ std::vector<Option> options_
        "specified."},
       {OPTION_STARTUP_MODEL, "load-model", Option::ArgStr,
        "Name of the model to be loaded on server startup. It may be specified "
-       "multiple times to add multiple models. Note that this option will only "
-       "take affect if --model-control-mode=explicit is true."},
+       "multiple times to add multiple models. To load ALL models at startup, "
+       "specify '*' as the model name with --load-model=* as the ONLY "
+       "--load-model argument, this does not imply any pattern matching. "
+       "Specifying --load-model=* in conjunction with another --load-model "
+       "argument will result in error. Note that this option will only take "
+       "effect if --model-control-mode=explicit is true."},
       // FIXME:  fix the default to execution_count once RL logic is complete.
       {OPTION_RATE_LIMIT, "rate-limit", Option::ArgStr,
        "Specify the mode for rate limiting. Options are \"execution_count\" "