Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connects the rate limiter to the scheduling pipeline #3388

Merged
merged 3 commits into from
Sep 22, 2021
Merged

Conversation

tanmayv25
Copy link
Contributor

Model: resnet50v1.5_savedmodel; Instance Count: 2

Case 1: --rate-limit = off:

Avg request latency: 16645 usec (overhead 47 usec + queue 7450 usec + compute input 136 usec + compute infer 8994 usec + compute output 18 usec)

Case 2: --rate-limit=execution_count, sufficient resources to run two instances concurrently:

Avg request latency: 15629 usec (overhead 47 usec + queue 6979 usec + compute input 137 usec + compute infer 8448 usec + compute output 18 usec)

Case 3: --rate-limit=execution_count, available resources support only one instance at a time:

Avg request latency: 25244 usec (overhead 50 usec + queue 18561 usec + compute input 163 usec + compute infer 6452 usec + compute output 18 usec)

Rate limiter seems to be working.

There is some overhead in the rate limiting logic. For an efficient rate limiting implementation, I would assume the following:

queue time for case 3 = queue time of Case 2 + compute time of Case 2.

which comes out to be 15582us. However. we see the queue time of 18561us, which is an additional overhead of 2.97ms.
I will add testing and perform tuning as a separate ticket.

@tanmayv25 tanmayv25 merged commit 4a7bd92 into main Sep 22, 2021
@tanmayv25 tanmayv25 deleted the tanmayv-rl branch September 22, 2021 06:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants