Connects the rate limiter to the scheduling pipeline #3388
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Model: resnet50v1.5_savedmodel; Instance Count: 2
Case 1: --rate-limit = off:
Avg request latency: 16645 usec (overhead 47 usec + queue 7450 usec + compute input 136 usec + compute infer 8994 usec + compute output 18 usec)
Case 2: --rate-limit=execution_count, sufficient resources to run two instances concurrently:
Avg request latency: 15629 usec (overhead 47 usec + queue 6979 usec + compute input 137 usec + compute infer 8448 usec + compute output 18 usec)
Case 3: --rate-limit=execution_count, available resources support only one instance at a time:
Avg request latency: 25244 usec (overhead 50 usec + queue 18561 usec + compute input 163 usec + compute infer 6452 usec + compute output 18 usec)
Rate limiter seems to be working.
There is some overhead in the rate limiting logic. For an efficient rate limiting implementation, I would assume the following:
queue time for case 3 = queue time of Case 2 + compute time of Case 2.
which comes out to be 15582us. However. we see the queue time of 18561us, which is an additional overhead of 2.97ms.
I will add testing and perform tuning as a separate ticket.