Remove packed tensor functions and improve dynamic mask attention #142

LoserCheems · 2025-09-01T09:19:32Z

Eliminate packed tensor functions to simplify the interface and add sequence padding for hardware compatibility. Enhance the dynamic mask attention function with clearer parameters and improved implementation, while streamlining the overall logic and supporting new configuration options.

Removes FlashDMAttnQKVPackedFunc, FlashDMAttnVarlenQKVPackedFunc, FlashDMAttnKVPackedFunc, and FlashDMAttnVarlenKVPackedFunc classes along with their corresponding helper functions to simplify the interface. Adds sequence length padding to ensure compatibility with hardware requirements by padding sequences to multiples of 128 tokens and properly handling the unpadding in backward passes. Fixes mask and bias tensor dimension alignment to use number of key heads instead of query heads for better multi-query and grouped-query attention support.

Adds comprehensive docstring explaining function parameters and return values Separates query and key length variables for better clarity and accuracy Removes redundant kwargs to prevent duplicate parameter passing Updates implementation parameter to use explicit string value instead of config reference Adds support for keep_window_size parameter from module configuration

Removes complex padding/unpadding logic and varlen functions in favor of a streamlined approach that handles attention masking directly through bias manipulation. Adds support for keep_window_size parameter to enable top-k attention window selection when key length exceeds the specified window size. Updates function signature to include key_length parameter and removes unnecessary helper functions for index manipulation and tensor reshaping.

Copilot

Pull Request Overview

This PR removes packed tensor functions and improves the dynamic mask attention implementation. The changes simplify the interface by eliminating QKV and KV packed function variants, while enhancing the core attention function with new features like window-based attention and improved parameter handling.

Removed multiple packed tensor function classes and their corresponding public API functions
Enhanced dynamic mask attention with new parameters like keep_window_size and improved mask handling
Added sequence padding for hardware compatibility requirements (128-token alignment)

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File	Description
flash_dmattn/integrations/modeling_flash_dynamic_mask_attention_utils.py	Removed packed tensor utility functions and improved dynamic mask attention implementation with new windowing features
flash_dmattn/integrations/flash_dynamic_mask_attention.py	Enhanced wrapper function with better documentation and support for new parameters
flash_dmattn/flash_dmattn_interface.py	Removed packed tensor function classes and added sequence padding for hardware alignment

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-01T09:38:25Z

flash_dmattn/integrations/modeling_flash_dynamic_mask_attention_utils.py

+    min_dtype = torch.finfo(dtype).min
+    batch_size, _, num_kv_heads, _ = key_states.shape
+
+    if not all(k in globals() for k in ("_flash_fn")):


The all() function with a single-element tuple is unnecessary and inefficient. This should be simplified to a direct membership check.

Suggested change

if not all(k in globals() for k in ("_flash_fn")):

if "_flash_fn" not in globals():

flash_dmattn/integrations/flash_dynamic_mask_attention.py

flash_dmattn/flash_dmattn_interface.py

LoserCheems added 3 commits August 30, 2025 17:39

LoserCheems requested review from Evanwu1125, SNHuan, Thanksyy, Copilot, ftgreat and wubingheng111 and removed request for Copilot September 1, 2025 09:19

LoserCheems assigned LoserCheems, Copilot, Evanwu1125, ftgreat, SNHuan, Thanksyy and wubingheng111 Sep 1, 2025

LoserCheems requested a review from Copilot September 1, 2025 09:37

Copilot AI reviewed Sep 1, 2025

View reviewed changes

LoserCheems merged commit f8db33a into main Sep 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove packed tensor functions and improve dynamic mask attention #142

Remove packed tensor functions and improve dynamic mask attention #142

Uh oh!

LoserCheems commented Sep 1, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Sep 1, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

	if not all(k in globals() for k in ("_flash_fn")):
	if "_flash_fn" not in globals():

Remove packed tensor functions and improve dynamic mask attention #142

Remove packed tensor functions and improve dynamic mask attention #142

Uh oh!

Conversation

LoserCheems commented Sep 1, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants