Feat/max error rate - continued #238

AlonKellner-RedHat · 2025-07-23T10:04:42Z

@markurtz This is a continuation of the work done in #171 by @markVaykhansky with the review comments fixed and a few additions.

The main differences to the original PR are:

E2E tests of the new features, using vLLM simulator.
3 new fields in the output report run_stats: window_error_rate, termination_reason and status.
A simplification of the windowing mechanism to be "chunked", now simply checks every GUIDELLM__ERROR_CHECK_WINDOW_SIZE completed requests (success/error) if the ratio of errors (or error count if --max-error>1) in those requests is greater than --max-error.

E2E tests

By default, these will check if the vLLM simulator is available in the local environment, if not - they are skipped and log a warning with the command required to build the vLLM simulator.
The vLLM simulator is built from source in the tests/e2e/vllm-sim.Dockerfile and extracted to bin/llm-d-inference-sim.
The tests use a class VllmSimServer that wraps the vLLM simulator with start and stop methods, then they call the guidellm command and assert the values generated in the output report.
The 2 test modules have varying run times:

test_successful_benchmark (2 tests) - 14.12s
test_max_error_benchmark (1 test) - 15.98s

The max error test takes longer since it allows guidellm to start sending requests before killing the vLLM sim, and then waits for the guidellm command to stop gracefully.
The successful tests share a single vLLM sim instance, hence are more efficient.

The new `run_stats` fields

window_error_rate - the error rate within the error check window at the time of run completion (may be a partial window if the --max-error was not reached.). This is the value that would indicate why the --max-error was reached, as opposed to the error_rate field that is global and does not necessarily reflect that the --max-error was reached.
termination_reason - either "max_seconds_reached", "max_requests_reached", "max_error_reached" or "interrupted". This is important for future features such as over-saturation termination or target metric margin termination.
status - either "success", "error" or "interrupted". This is a simplification of the termination_reason field, will help differentiate success states from error states, for example over-saturation termination is an error state and target metric margin termination is a success state.

"Chunked" error checking

The precise logic is as follows:

Accumulate the total amount of completed (errored/successful) requests and total amount of errored requests.
When the total amount of completed requests reached GUIDELLM__ERROR_CHECK_WINDOW_SIZE, check if max_error is reached.
If the max_error>1 - check if the accumulated amount of errored requests > max_error. Otherwise - check if the accumulated errored requests divided by the accumulated completed requests > max_error.
If max_error is reached, break and shutdown.
If max_error is not reached, reset the accumulated amounts to 0 and keep going.

This is simple and handles all different cases nicely.

Co-authored-by: Mark Vakhansky <mvakhans@redhat.com> Signed-off-by: AlonKellner-RedHat <akellner@redhat.com>

Signed-off-by: AlonKellner-RedHat <akellner@redhat.com>

sjmonson · 2025-10-18T00:35:02Z

Obsolete

AlonKellner-RedHat changed the base branch from feat/max-error-rate to main July 23, 2025 10:11

AlonKellner-RedHat force-pushed the feat/max-error-rate branch from 0c2978e to c3453df Compare July 23, 2025 11:02

Squashed changes from feat/max-error-rate over main

a1fe883

Co-authored-by: Mark Vakhansky <mvakhans@redhat.com> Signed-off-by: AlonKellner-RedHat <akellner@redhat.com>

AlonKellner-RedHat force-pushed the feat/max-error-rate branch from c3453df to a1fe883 Compare July 23, 2025 11:10

AlonKellner-RedHat added 2 commits July 24, 2025 07:55

Merge branch 'main' into feat/max-error-rate

5d92286

fix: tox -e style

d01f528

Signed-off-by: AlonKellner-RedHat <akellner@redhat.com>

This was referenced Jul 27, 2025

Over-Saturation stopping #242

Open

Margin Of Error (MOE) stopping #244

Open

sjmonson self-requested a review July 28, 2025 15:57

sjmonson mentioned this pull request Aug 14, 2025

Refactor of the Scheduler package to enable or setup for: distributed benchmarks, multi-turn requests, advanced stopping criteria, large concurrency, multi modality, evaluations #251

Closed

sjmonson mentioned this pull request Aug 21, 2025

Scheduler refactor: base/main PR for final merge #286

Closed

4 tasks

sjmonson closed this Oct 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat/max error rate - continued #238

Feat/max error rate - continued #238

Uh oh!

AlonKellner-RedHat commented Jul 23, 2025 •

edited

Loading

Uh oh!

sjmonson commented Oct 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Feat/max error rate - continued #238

Feat/max error rate - continued #238

Uh oh!

Conversation

AlonKellner-RedHat commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E tests

The new run_stats fields

"Chunked" error checking

Uh oh!

sjmonson commented Oct 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AlonKellner-RedHat commented Jul 23, 2025 •

edited

Loading

The new `run_stats` fields