Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use enable_gqa in place of repeat_kv #641

Draft
wants to merge 1 commit into
base: gh/awgu/18/base
Choose a base branch
from
Draft

Conversation

awgu
Copy link
Contributor

@awgu awgu commented Oct 22, 2024

Stack from ghstack (oldest at bottom):

Saves some memory even when compile is enabled, MFU is roughly similar

pending some investigation from @drisspg

currently looks like cudnn attention with enable_gqa=True may be recomputing the repeat_kv in backward, leading to memory savings, but no change in computation

awgu added a commit that referenced this pull request Oct 22, 2024
ghstack-source-id: e8781d3e797737c073bc487197ce804bce15502c
Pull Request resolved: #641
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 22, 2024
@awgu awgu marked this pull request as ready for review October 22, 2024 15:14
@awgu awgu requested review from drisspg and tianyu-l October 22, 2024 15:14
@fegin
Copy link
Contributor

fegin commented Oct 22, 2024

cc., @XilunWu We may want to test if CP still works after this PR is landed.

@drisspg
Copy link
Contributor

drisspg commented Oct 22, 2024

Lets wait before landing this, still working on the CuDNN backend integration for this feature and want to make sure everything is working accordingly

@awgu awgu marked this pull request as draft October 22, 2024 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants