Connects the rate limiter to the scheduling pipeline #3388

tanmayv25 · 2021-09-21T23:31:26Z

Model: resnet50v1.5_savedmodel; Instance Count: 2

Case 1: --rate-limit = off:

Avg request latency: 16645 usec (overhead 47 usec + queue 7450 usec + compute input 136 usec + compute infer 8994 usec + compute output 18 usec)

Case 2: --rate-limit=execution_count, sufficient resources to run two instances concurrently:

Avg request latency: 15629 usec (overhead 47 usec + queue 6979 usec + compute input 137 usec + compute infer 8448 usec + compute output 18 usec)

Case 3: --rate-limit=execution_count, available resources support only one instance at a time:

Avg request latency: 25244 usec (overhead 50 usec + queue 18561 usec + compute input 163 usec + compute infer 6452 usec + compute output 18 usec)

Rate limiter seems to be working.

There is some overhead in the rate limiting logic. For an efficient rate limiting implementation, I would assume the following:

queue time for case 3 = queue time of Case 2 + compute time of Case 2.

which comes out to be 15582us. However. we see the queue time of 18561us, which is an additional overhead of 2.97ms.
I will add testing and perform tuning as a separate ticket.

tanmayv25 added 3 commits September 21, 2021 16:01

Connect the rate limiting pipeline

d866cf8

Some clean ups in rate limiter logic

2931777

Cleaning extra logic for handling no limiting case

4a0e93e

tanmayv25 requested review from deadeyegoodwin, CoderHam, GuanLuo and a team September 21, 2021 23:31

tanmayv25 force-pushed the tanmayv-rl branch from de20658 to 4a0e93e Compare September 21, 2021 23:39

deadeyegoodwin approved these changes Sep 22, 2021

View reviewed changes

tanmayv25 merged commit 4a7bd92 into main Sep 22, 2021

tanmayv25 deleted the tanmayv-rl branch September 22, 2021 06:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connects the rate limiter to the scheduling pipeline #3388

Connects the rate limiter to the scheduling pipeline #3388

tanmayv25 commented Sep 21, 2021

Connects the rate limiter to the scheduling pipeline #3388

Connects the rate limiter to the scheduling pipeline #3388

Conversation

tanmayv25 commented Sep 21, 2021

Case 1: --rate-limit = off:

Case 2: --rate-limit=execution_count, sufficient resources to run two instances concurrently:

Case 3: --rate-limit=execution_count, available resources support only one instance at a time: