-
Notifications
You must be signed in to change notification settings - Fork 92
Open
Labels
enhancementNew feature or requestNew feature or requestinternalfiled by core contributor or associatefiled by core contributor or associate
Milestone
Description
Often when we benchmark a new model or hardware, the goal is to determine the max RPS or tokens per second that the server can sustain under a certain SLO. We should add a new feature similar to the "sweep" but instead of doing linearly spaced constant RPS runs, it should do something like a binary search to try to find the peak load which the server can handle while meeting a defined latency SLO.
We would need to support some config options for the SLO, to support p99 or p95 ITL and TTFT.
I have a rough PoC of this in progress on this branch: https://github.com/dagrayvid/guidellm/tree/goodput, but wanted to open this issue to discuss the idea further and track progress.
ivanbaldo, rudeigerc and Belyenochi
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestinternalfiled by core contributor or associatefiled by core contributor or associate