-
Notifications
You must be signed in to change notification settings - Fork 290
[GQA] Add regional atomic add to slightly boost performance #1093
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
WalkthroughThe backward pass kernel in a flash attention example refactors atomic updates from per-element loops to vectorized slice-based operations for dQ, dV, and dK tensors. The control flow remains unchanged, but accumulation steps are restructured to use contiguous slices instead of per-element atomic additions. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes The changes involve targeted logic refactoring of atomic operations and tensor slicing patterns within a single example file. Review requires understanding of atomic semantics, vectorized memory operations, and tensor indexing correctness, but is localized to a specific optimization path without branching concerns. Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary by CodeRabbit