Skip to content

Add new benchmark mode to search for peak goodput under an SLO #197

@dagrayvid

Description

@dagrayvid

Often when we benchmark a new model or hardware, the goal is to determine the max RPS or tokens per second that the server can sustain under a certain SLO. We should add a new feature similar to the "sweep" but instead of doing linearly spaced constant RPS runs, it should do something like a binary search to try to find the peak load which the server can handle while meeting a defined latency SLO.

We would need to support some config options for the SLO, to support p99 or p95 ITL and TTFT.

I have a rough PoC of this in progress on this branch: https://github.com/dagrayvid/guidellm/tree/goodput, but wanted to open this issue to discuss the idea further and track progress.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestinternalfiled by core contributor or associate

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions