Skip to content

Commit

Permalink
Add load-model=* tests and documentation (triton-inference-server#4256
Browse files Browse the repository at this point in the history
)

* Start tweaking lifecycle test to include --load-model=* test

* Tweak SERVER_ARGS when testing load-model=* with another load-model argument

* Set return value instead of exiting on negative server test

* Add documentation on load-model=* to load on all models on startup in explicit model control mode

* Typo and copyright update

* Add help text on --load-model=* to main.cc

* Add pattern matching disclaimer
  • Loading branch information
rmccorm4 authored Apr 20, 2022
1 parent 11410df commit 478a9e0
Show file tree
Hide file tree
Showing 4 changed files with 104 additions and 14 deletions.
22 changes: 12 additions & 10 deletions docs/model_management.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<!--
# Copyright (c) 2018-2020, NVIDIA CORPORATION. All rights reserved.
# Copyright 2018-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
Expand Down Expand Up @@ -47,18 +47,20 @@ protocol](protocol/extension_model_repository.md) will have no affect
and will return an error response.

This model control mode is selected by specifying
--model-control-mode=none when starting Triton. This is the default
`--model-control-mode=none` when starting Triton. This is the default
model control mode. Changing the model repository while Triton is
running must be done carefully, as explained in [Modifying the Model
Repository](#modifying-the-model-repository).

## Model Control Mode EXPLICIT

At startup, Triton loads only those models specified explicitly with
the --load-model command-line option. If --load-model is not specified
then no models are loaded at startup. Models that Triton is not able
to load will be marked as UNAVAILABLE and will not be available for
inferencing.
At startup, Triton loads only those models specified explicitly with the
`--load-model` command-line option. To load ALL models at startup, specify
`--load-model=*` as the ONLY `--load-model` argument. Specifying
`--load-model=*` in conjunction with another `--load-model` argument will
result in error. If `--load-model` is not specified then no models are loaded
at startup. Models that Triton is not able to load will be marked as
UNAVAILABLE and will not be available for inferencing.

After startup, all model load and unload actions must be initiated
explicitly by using the [model control
Expand All @@ -71,7 +73,7 @@ newly loaded model will replace the already loaded model without any
loss in availability for the model.

This model control mode is enabled by specifying
--model-control-mode=explicit. Changing the model repository while
`--model-control-mode=explicit`. Changing the model repository while
Triton is running must be done carefully, as explained in [Modifying
the Model Repository](#modifying-the-model-repository).

Expand All @@ -91,7 +93,7 @@ the model.

Changes to the model repository may not be detected immediately
because Triton polls the repository periodically. You can control the
polling interval with the --repository-poll-secs option. The console
polling interval with the `--repository-poll-secs` option. The console
log or the [model ready
protocol](https://github.com/kserve/kserve/blob/master/docs/predict-api/v2/required_api.md)
or the index operation of the [model control
Expand All @@ -109,7 +111,7 @@ protocol](protocols/extension_model_repository.md) will have no affect
and will return an error response.

This model control mode is enabled by specifying
--model-control-mode=poll and by setting --repository-poll-secs to a
`--model-control-mode=poll` and by setting `--repository-poll-secs` to a
non-zero value when starting Triton. Changing the model repository
while Triton is running must be done carefully, as explained in
[Modifying the Model Repository](#modifying-the-model-repository).
Expand Down
4 changes: 2 additions & 2 deletions qa/L0_lifecycle/lifecycle_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -2094,7 +2094,7 @@ def test_multiple_model_repository_control_startup_models(self):
self.assertTrue(False, "unexpected error {}".format(ex))

def test_model_repository_index(self):
# use model control EXPLIT and --load-model to load a subset of models
# use model control EXPLICIT and --load-model to load a subset of models
# in model repository
tensor_shape = (1, 16)
model_bases = ['graphdef', 'savedmodel', "simple_savedmodel"]
Expand All @@ -2119,7 +2119,7 @@ def test_model_repository_index(self):
self.assertTrue(False, "unexpected error {}".format(ex))

# Check model repository index
# All models should be in ready state except savedmodel_float32_float32_float32
# All models should be in ready state except onnx_float32_float32_float32
# which appears in two repositories.
model_bases.append("simple_graphdef")
try:
Expand Down
84 changes: 84 additions & 0 deletions qa/L0_lifecycle/test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -1196,6 +1196,90 @@ wait $SERVER_PID

LOG_IDX=$((LOG_IDX+1))

# Test loading all models on startup in EXPLICIT model control mode, re-use
# existing LifeCycleTest.test_multiple_model_repository_control_startup_models
# unit test
rm -fr models models_0 config.pbtxt.*
mkdir models models_0
# Ensemble models in the second repository
for i in plan onnx ; do
cp -r $DATADIR/qa_model_repository/${i}_float32_float32_float32 models/.
cp -r $DATADIR/qa_ensemble_model_repository/qa_model_repository/simple_${i}_float32_float32_float32 models_0/.
sed -i "s/max_batch_size:.*/max_batch_size: 1/" models/${i}_float32_float32_float32/config.pbtxt
sed -i "s/max_batch_size:.*/max_batch_size: 1/" models_0/simple_${i}_float32_float32_float32/config.pbtxt
done

# savedmodel doesn't load because it is duplicated in 2 repositories
for i in savedmodel ; do
cp -r $DATADIR/qa_model_repository/${i}_float32_float32_float32 models/.
cp -r $DATADIR/qa_model_repository/${i}_float32_float32_float32 models_0/.
done

SERVER_ARGS="--model-repository=`pwd`/models --model-repository=`pwd`/models_0 \
--model-control-mode=explicit \
--strict-readiness=false \
--strict-model-config=false --exit-on-error=false \
--load-model=*"
SERVER_LOG="./inference_server_$LOG_IDX.log"
run_server
if [ "$SERVER_PID" == "0" ]; then
echo -e "\n***\n*** Failed to start $SERVER\n***"
cat $SERVER_LOG
exit 1
fi

rm -f $CLIENT_LOG
set +e
python $LC_TEST LifeCycleTest.test_multiple_model_repository_control_startup_models >>$CLIENT_LOG 2>&1
if [ $? -ne 0 ]; then
cat $CLIENT_LOG
echo -e "\n***\n*** Test Failed\n***"
RET=1
else
check_test_results $TEST_RESULT_FILE 1
if [ $? -ne 0 ]; then
cat $CLIENT_LOG
echo -e "\n***\n*** Test Result Verification Failed\n***"
RET=1
fi
fi
set -e

kill $SERVER_PID
wait $SERVER_PID

LOG_IDX=$((LOG_IDX+1))

# Test loading all models on startup in EXPLICIT model control mode AND
# an additional --load-model argument, it should fail
rm -fr models
mkdir models
for i in onnx ; do
cp -r $DATADIR/qa_model_repository/${i}_float32_float32_float32 models/.
sed -i "s/max_batch_size:.*/max_batch_size: 1/" models/${i}_float32_float32_float32/config.pbtxt
done

# --load-model=* can not be used with any other --load-model arguments
# as it's unclear what the user's intentions are.
SERVER_ARGS="--model-repository=`pwd`/models --model-repository=`pwd`/models_0 \
--model-control-mode=explicit \
--strict-readiness=true \
--exit-on-error=true \
--load-model=* \
--load-model=onnx_float32_float32_float32"
SERVER_LOG="./inference_server_$LOG_IDX.log"
run_server
if [ "$SERVER_PID" != "0" ]; then
echo -e "\n***\n*** Failed: $SERVER started successfully when it was expected to fail\n***"
cat $SERVER_LOG
RET=1

kill $SERVER_PID
wait $SERVER_PID
fi

LOG_IDX=$((LOG_IDX+1))

# LifeCycleTest.test_model_repository_index
rm -fr models models_0 config.pbtxt.*
mkdir models models_0
Expand Down
8 changes: 6 additions & 2 deletions src/main.cc
Original file line number Diff line number Diff line change
Expand Up @@ -519,8 +519,12 @@ std::vector<Option> options_
"specified."},
{OPTION_STARTUP_MODEL, "load-model", Option::ArgStr,
"Name of the model to be loaded on server startup. It may be specified "
"multiple times to add multiple models. Note that this option will only "
"take affect if --model-control-mode=explicit is true."},
"multiple times to add multiple models. To load ALL models at startup, "
"specify '*' as the model name with --load-model=* as the ONLY "
"--load-model argument, this does not imply any pattern matching. "
"Specifying --load-model=* in conjunction with another --load-model "
"argument will result in error. Note that this option will only take "
"effect if --model-control-mode=explicit is true."},
// FIXME: fix the default to execution_count once RL logic is complete.
{OPTION_RATE_LIMIT, "rate-limit", Option::ArgStr,
"Specify the mode for rate limiting. Options are \"execution_count\" "
Expand Down

0 comments on commit 478a9e0

Please sign in to comment.