-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move jagged for-loop to lib #49
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR moves the jagged for‐loop logic from sglang into the hip-attn library while making the API backward-compatible by deprecating the is_prefill parameter and introducing new configuration options.
- Migrate for-loop logic from sglang to hip-attn library
- Deprecate is_prefill in favor of is_decode and add additional parameters (e.g., hip_config, rope_range)
- Update model offload cache to support optional batch IDs and variable-length sequence inputs
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
src/hip_attn/v1_2/paged_hip.py | Moves jagged input handling logic and updates deprecation warnings |
src/hip_attn/v1_2/model_offload_cache.py | Adjusts function signatures and adds support for optional parameters |
Comments suppressed due to low confidence (2)
src/hip_attn/v1_2/model_offload_cache.py:187
- The parameter 'extend_seq_lens_cpu' is declared as Optional[List[int]] but is used without a None-check. Adding validation to ensure it is not None before iterating can prevent potential runtime errors.
for idx_batch, seq_len in enumerate(extend_seq_lens_cpu):
src/hip_attn/v1_2/model_offload_cache.py:193
- Since 'cache_k' (and similarly 'cache_v') is now optional, ensure that these variables are not None before performing slicing operations to avoid runtime errors.
cache_k[start_len : start_len + seq_len].unsqueeze(0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
It seems the argument rope_range
is newly introduced in this PR. Is it related to MLA?
It is not newly introduced in this PR, but Copilot thinks it is. It was introduced in the previous PR, but it was not applied in some of the internal function calls, so I fixed it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
Currently, the prefill in sglang is done by running a for-loop along the batch axis. However, in the future, this might change.
So, I move the for-loop logic from sglang into the hip-attn library.
This change is backward-compatible.