Continuous batching: output queue requeue starvation and request-scoped iterator does not terminate on completion



### **Description**

There are two related correctness issues in the continuous batching result consumption logic that can lead to unfairness and non-terminating iterators under concurrent workloads.

#### 1. Starvation and incorrect timeout handling in `get_result`

`ContinuousBatchingManager.get_result` currently retrieves a single item from the shared output queue and immediately re-queues it if the `request_id` does not match, returning `None` afterward. Under concurrent requests, this can lead to:

* Starvation when mismatched outputs are repeatedly re-queued
* Timeout semantics that are not respected, since re-queueing returns early instead of continuing to search within the remaining timeout
* Unfair consumption behavior that depends on queue ordering rather than request progress

This behavior is observable when multiple streaming requests are active and results are interleaved in the output queue.

#### 2. `request_id_iter` does not terminate after normal completion

`request_id_iter` currently exits only when a request is cancelled or the generation thread terminates. For requests that complete normally, the iterator continues polling indefinitely after the final `FINISHED` output has been yielded.

This can result in:

* Infinite iteration loops for request-scoped consumers
* Unexpected blocking behavior in streaming-style usage
* Reliance on caller-side logic to manually stop iteration

The iterator should terminate once a terminal `FINISHED` output is observed for the given request.

---

### **Expected behavior**

* `get_result` should fairly search for a matching result within the specified timeout without starvation or early return.
* `request_id_iter` should stop iterating once the request reaches a terminal finished state, in addition to cancellation or thread termination.

---

### **Proposed fix** 

A minimal, backward-compatible fix (#42942) can:

* Defer re-queuing mismatched outputs until a matching result is found or the timeout expires
* Explicitly terminate `request_id_iter` when a `GenerationOutput` reports a finished state

This preserves existing APIs, streaming semantics, and benchmarking behavior while fixing the correctness issues.

---

### **Environment**

* Transformers version: main
* Feature: continuous batching
* Device: CPU / CUDA (independent of backend)

---

### **Additional context**

These issues are easiest to reproduce with multiple concurrent streaming requests sharing a single `ContinuousBatchingManager`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Continuous batching: output queue requeue starvation and request-scoped iterator does not terminate on completion #42943

Description

1. Starvation and incorrect timeout handling in `get_result`

2. `request_id_iter` does not terminate after normal completion

Expected behavior

Proposed fix

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Continuous batching: output queue requeue starvation and request-scoped iterator does not terminate on completion #42943

Description

Description

1. Starvation and incorrect timeout handling in get_result

2. request_id_iter does not terminate after normal completion

Expected behavior

Proposed fix

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. Starvation and incorrect timeout handling in `get_result`

2. `request_id_iter` does not terminate after normal completion