-
Notifications
You must be signed in to change notification settings - Fork 133
V1 - dont look for bucket we know don't exists #1606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
V1 - dont look for bucket we know don't exists #1606
Conversation
/run-gaudi-tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces an early exit in the 2D prompt bucketing logic when the batch size exceeds the maximum allowed, and updates the merge check to handle the new sentinel return values.
- Added a pre-check in
_bucketize_2d_prompt
to return(None, None, None)
for oversized batches. - Updated
_can_merge_prefill_contents
to treat anyNone
in the bucketing result as a non-mergeable case.
Comments suppressed due to low confidence (2)
vllm/v1/worker/hpu_model_runner.py:969
- [nitpick] Consider renaming the variable
bs
tobatch_size
for improved readability and to make its purpose immediately clear.
if bs > self.max_prefill_batch_size:
vllm/v1/worker/hpu_model_runner.py:969
- Add a unit test to verify that
_bucketize_2d_prompt
returns(None, None, None)
when the batch size exceedsmax_prefill_batch_size
, ensuring this new branch is covered.
if bs > self.max_prefill_batch_size:
Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>
/run-gaudi-tests |
Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>
ripped from: HabanaAI/vllm-fork#1606, fixes weird bucketing anomaly where bs=1 prefills would be padded to bs=2 and trigger a recompilation Signed-off-by: Konrad Zawora <kzawora@habana.ai>
No description provided.