Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test for sequence model instance update #5831

Merged
merged 22 commits into from
Jul 24, 2023
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
7637d14
Add test for sequence model instance update
kthui May 19, 2023
8eda66f
Add gap for file timestamp update
kthui May 24, 2023
62bff4d
Update test for non-blocking sequence update
kthui Jun 27, 2023
e386e28
Update documentation
kthui Jun 27, 2023
46595ca
Remove mentioning increase instance count case
kthui Jun 28, 2023
1998f33
Add more documentaion for scheduler update test
kthui Jul 13, 2023
37ab460
Update test for non-blocking batcher removal
kthui Jul 14, 2023
453d302
Add polling due to async scheduler destruction
kthui Jul 17, 2023
2b76ddb
Use _ as private
kthui Jul 18, 2023
0d3b784
Fix typo
kthui Jul 19, 2023
017a76c
Add docs on instance count decrease
kthui Jul 19, 2023
c8456ad
Fix typo
kthui Jul 19, 2023
c9d7b5f
Separate direct and oldest to different test cases
kthui Jul 20, 2023
dcb55f0
Separate nested tests in a loop into multiple test cases
kthui Jul 20, 2023
0268918
Refactor scheduler update test
kthui Jul 20, 2023
38b0ade
Improve doc on handling future test failures
kthui Jul 20, 2023
e05434d
Merge branch 'main' of github.com:triton-inference-server/server into…
kthui Jul 20, 2023
f3a9f75
Address pre-commit
kthui Jul 20, 2023
99f5935
Add best effort to reset model state after a single test case failure
kthui Jul 21, 2023
fae2e1a
Remove reset model method to make harder for chaining multiple test c…
kthui Jul 21, 2023
ded51b4
Remove description on model state clean up
kthui Jul 21, 2023
9e44efc
Merge branch 'main' of github.com:triton-inference-server/server into…
kthui Jul 21, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 13 additions & 8 deletions docs/user_guide/model_management.md
Original file line number Diff line number Diff line change
Expand Up @@ -212,9 +212,8 @@ repository, copy in the new shared libraries, and then reload the
model.

* If only the model instance configuration on the 'config.pbtxt' is modified
(i.e. increasing/decreasing the instance count) for non-sequence models,
then Triton will update the model rather then reloading it, when either a load
request is received under
(i.e. increasing/decreasing the instance count), then Triton will update the
model rather then reloading it, when either a load request is received under
[Model Control Mode EXPLICIT](#model-control-mode-explicit) or change to the
'config.pbtxt' is detected under
[Model Control Mode POLL](#model-control-mode-poll).
Expand All @@ -225,11 +224,17 @@ request is received under
configuration, so its presence in the model directory may be detected as a new file
and cause the model to fully reload when only an update is expected.

* If a sequence model is updated with in-flight sequence(s), Triton does not
guarantee any remaining request(s) from the in-flight sequence(s) will be routed
to the same model instance for processing. It is currently the responsibility of
the user to ensure any in-flight sequence(s) is complete before updating a
sequence model.
* If a sequence model is *updated* (i.e. decreasing the instance count), Triton
will wait until the in-flight sequence is completed (or timed-out) before the
instance behind the sequence is removed.
* If the instance count is decreased, arbitrary instance(s) are selected among
idle instances and instances with in-flight sequence(s) for removal.

* If a sequence model is *reloaded* with in-flight sequence(s) (i.e. changes to
the model file), Triton does not guarantee any remaining request(s) from the
in-flight sequence(s) will be routed to the same model instance for processing.
It is currently the responsibility of the user to ensure any in-flight
sequence(s) are completed before reloading a sequence model.

## Concurrently Loading Models

Expand Down
Loading
Loading