Skip to content

Conversation

@yewentao256
Copy link
Member

@yewentao256 yewentao256 commented Sep 23, 2025

Purpose

FIxes #25494

Test Plan

============ Serving Benchmark Result ============
Successful requests:                     1         
Benchmark duration (s):                  0.90      
Total input tokens:                      129999    
Total generated tokens:                  1         
Request throughput (req/s):              1.11      
Output token throughput (tok/s):         1.11      
Peak output token throughput (tok/s):    1.00      
Peak concurrent requests:                1.00      
Total Token throughput (tok/s):          144353.62 
---------------Time to First Token----------------
Mean TTFT (ms):                          899.48    
Median TTFT (ms):                        899.48    
P99 TTFT (ms):                           899.48    
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          0.00      
Median TPOT (ms):                        0.00      
P99 TPOT (ms):                           0.00      
---------------Inter-token Latency----------------
Mean ITL (ms):                           0.00      
Median ITL (ms):                         0.00      
P99 ITL (ms):                            0.00      
==================================================

Signed-off-by: yewentao256 <zhyanwentao@126.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request attempts to fix an AssertionError related to num_reqs > max_num_reqs in uniform batches by modifying a call to _dummy_run. While the intention is correct, the proposed change introduces a new ZeroDivisionError under the same conditions that caused the original error. The root cause appears to be in how _dummy_run handles cases where the maximum number of requests is zero, which is not addressed by this change. A more robust solution would involve modifying _dummy_run to gracefully handle this edge case for all its call paths.

@yewentao256 yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 23, 2025
@mgoin
Copy link
Member

mgoin commented Sep 24, 2025

Superseded by #25505

@mgoin mgoin closed this Sep 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: AssertionError: Do not capture num_reqs > max_num_reqs for uniform batch

3 participants