Skip to content

Conversation

@pythongiant
Copy link

@pythongiant pythongiant commented Dec 18, 2025

What does this PR do?

This PR fixes two correctness issues in the continuous batching result consumption logic.

First, it improves ContinuousBatchingManager.get_result to avoid starvation and incorrect timeout behavior when multiple requests share a single output queue. Instead of immediately re-queuing mismatched results and returning, the updated logic temporarily defers non-matching outputs while continuing to search for a matching one, reinserting deferred results afterward. This preserves FIFO ordering, ensures fair consumption across requests, and respects the specified timeout under concurrent access.

Second, it fixes request_id_iter so that iteration stops when a request reaches a terminal FINISHED state, rather than only stopping on cancellation. This prevents infinite iteration after normal request completion and aligns the iterator’s behavior with expected streaming semantics.

These changes are local, backward-compatible, and do not alter the public API or benchmarking behavior.

Fixes #42943

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline, Pull Request section?
  • Was this discussed/approved via a Github issue or the forum?
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

@remi-or @ArthurZucker @McPatate

@github-actions
Copy link
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=42942&sha=b30808

@remi-or remi-or assigned remi-or and unassigned remi-or Dec 18, 2025
@McPatate
Copy link
Member

This isn't how we plan to solve the outstanding issue, although I do appreciate you taking a look.

Given we're in a multi threaded context, I was thinking of holding a hashmap of running requests with their req_id as key, and calling get result would wait til completion or timeout. But this may need some careful attention to avoid breaking things.

Feel free to refactor in a direction that looks more like how we intend to address the problem, otherwise we'll close the PR.

@pythongiant
Copy link
Author

pythongiant commented Dec 18, 2025

This isn't how we plan to solve the outstanding issue, although I do appreciate you taking a look.

Given we're in a multi threaded context, I was thinking of holding a hashmap of running requests with their req_id as key, and calling get result would wait til completion or timeout. But this may need some careful attention to avoid breaking things.

Feel free to refactor in a direction that looks more like how we intend to address the problem, otherwise we'll close the PR.

Instead of pushing everything directly into output_queue maybe we could route through a small dispatcher? would that fit the context more?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Continuous batching: output queue requeue starvation and request-scoped iterator does not terminate on completion

3 participants