Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move jagged for-loop to lib #49

Merged
merged 2 commits into from
Mar 29, 2025
Merged

Conversation

daniel-geon-park
Copy link
Collaborator

Currently, the prefill in sglang is done by running a for-loop along the batch axis. However, in the future, this might change.

So, I move the for-loop logic from sglang into the hip-attn library.

This change is backward-compatible.

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR moves the jagged for‐loop logic from sglang into the hip-attn library while making the API backward-compatible by deprecating the is_prefill parameter and introducing new configuration options.

  • Migrate for-loop logic from sglang to hip-attn library
  • Deprecate is_prefill in favor of is_decode and add additional parameters (e.g., hip_config, rope_range)
  • Update model offload cache to support optional batch IDs and variable-length sequence inputs

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/hip_attn/v1_2/paged_hip.py Moves jagged input handling logic and updates deprecation warnings
src/hip_attn/v1_2/model_offload_cache.py Adjusts function signatures and adds support for optional parameters
Comments suppressed due to low confidence (2)

src/hip_attn/v1_2/model_offload_cache.py:187

  • The parameter 'extend_seq_lens_cpu' is declared as Optional[List[int]] but is used without a None-check. Adding validation to ensure it is not None before iterating can prevent potential runtime errors.
for idx_batch, seq_len in enumerate(extend_seq_lens_cpu):

src/hip_attn/v1_2/model_offload_cache.py:193

  • Since 'cache_k' (and similarly 'cache_v') is now optional, ensure that these variables are not None before performing slicing operations to avoid runtime errors.
cache_k[start_len : start_len + seq_len].unsqueeze(0)

@DeepAuto-AI DeepAuto-AI deleted a comment from Copilot bot Mar 28, 2025
Copy link
Collaborator

@gmlwns2000 gmlwns2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
It seems the argument rope_range is newly introduced in this PR. Is it related to MLA?

@daniel-geon-park
Copy link
Collaborator Author

LGTM. It seems the argument rope_range is newly introduced in this PR. Is it related to MLA?

It is not newly introduced in this PR, but Copilot thinks it is. It was introduced in the previous PR, but it was not applied in some of the internal function calls, so I fixed it.

Copy link
Contributor

@kbumsik kbumsik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@kbumsik kbumsik merged commit 40d7c85 into deepauto/dev Mar 29, 2025
1 check passed
@kbumsik kbumsik deleted the feat/move-jagged-forloop-to-lib branch March 29, 2025 07:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants