Skip to content

Commit

Permalink
Add test for sequence model instance update (#5831)
Browse files Browse the repository at this point in the history
* Add test for sequence model instance update

* Add gap for file timestamp update

* Update test for non-blocking sequence update

* Update documentation

* Remove mentioning increase instance count case

* Add more documentaion for scheduler update test

* Update test for non-blocking batcher removal

* Add polling due to async scheduler destruction

* Use _ as private

* Fix typo

* Add docs on instance count decrease

* Fix typo

* Separate direct and oldest to different test cases

* Separate nested tests in a loop into multiple test cases

* Refactor scheduler update test

* Improve doc on handling future test failures

* Address pre-commit

* Add best effort to reset model state after a single test case failure

* Remove reset model method to make harder for chaining multiple test cases as one

* Remove description on model state clean up
  • Loading branch information
kthui committed Jul 24, 2023
1 parent 9bc9ad6 commit 0f84995
Show file tree
Hide file tree
Showing 6 changed files with 340 additions and 154 deletions.
21 changes: 13 additions & 8 deletions docs/user_guide/model_management.md
Original file line number Diff line number Diff line change
Expand Up @@ -212,9 +212,8 @@ repository, copy in the new shared libraries, and then reload the
model.

* If only the model instance configuration on the 'config.pbtxt' is modified
(i.e. increasing/decreasing the instance count) for non-sequence models,
then Triton will update the model rather then reloading it, when either a load
request is received under
(i.e. increasing/decreasing the instance count), then Triton will update the
model rather then reloading it, when either a load request is received under
[Model Control Mode EXPLICIT](#model-control-mode-explicit) or change to the
'config.pbtxt' is detected under
[Model Control Mode POLL](#model-control-mode-poll).
Expand All @@ -225,11 +224,17 @@ request is received under
configuration, so its presence in the model directory may be detected as a new file
and cause the model to fully reload when only an update is expected.

* If a sequence model is updated with in-flight sequence(s), Triton does not
guarantee any remaining request(s) from the in-flight sequence(s) will be routed
to the same model instance for processing. It is currently the responsibility of
the user to ensure any in-flight sequence(s) is complete before updating a
sequence model.
* If a sequence model is *updated* (i.e. decreasing the instance count), Triton
will wait until the in-flight sequence is completed (or timed-out) before the
instance behind the sequence is removed.
* If the instance count is decreased, arbitrary instance(s) are selected among
idle instances and instances with in-flight sequence(s) for removal.

* If a sequence model is *reloaded* with in-flight sequence(s) (i.e. changes to
the model file), Triton does not guarantee any remaining request(s) from the
in-flight sequence(s) will be routed to the same model instance for processing.
It is currently the responsibility of the user to ensure any in-flight
sequence(s) are completed before reloading a sequence model.

## Concurrently Loading Models

Expand Down
Loading

0 comments on commit 0f84995

Please sign in to comment.