-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Readme Update for FSDP #980
Conversation
Updating Readme to indicate flash attention support for LLama2 70b with FSDP. This improves performance as well as memory footprint
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@regisss please review this readme change and merge this change when we transition to Synapse 1.16.0. We have tested this change on G2 with Synapse 1.16.0, this gives a small boost in performance. |
Does this work with 1.15? |
@regisss This is for 1.16; it does not work properly for 1.15 |
Fused SDPA leads to a small drop in the perf for 7b model; so updating the test case. For 70b we see a perf improvement.
@regisss Can we merge this now? |
Updating Readme to indicate flash attention support for LLama2 70b with FSDP. This improves performance as well as memory footprint
What does this PR do?
Fixes # (issue)
Before submitting