Skip to content

Conversation

@yzh119
Copy link
Collaborator

@yzh119 yzh119 commented Nov 25, 2024

This PR fixes the issue #634, which is brought by #592 .
If we want to use 16-bytes vectorized read/write, we need to confirm the address is aligned to 16 bytes.
When num_warps is not a multiple of 4 (4*sizeof(float) = 16), the address of smem + num_warps might not align to 16 bytes.

We can fix this by shifting the start offset of vectorized read/write to smem + ceil_div(num_warps, 4) * 4 to force the alignment.

cc @ovowei @Abatom

@yzh119 yzh119 merged commit db9c48d into main Nov 25, 2024
Atream added a commit to kvcache-ai/custom_flashinfer that referenced this pull request Dec 4, 2024
yzh119 pushed a commit that referenced this pull request Dec 4, 2024
…646)

Fix smem_size in FusedAddRMSNorm which is missed in #636 
Fix issue #645
@zhyncs zhyncs deleted the bugfix-634 branch December 12, 2024 06:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants