-
Notifications
You must be signed in to change notification settings - Fork 392
Flash Attention for Neuron #883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
eab90f9 to
347f522
Compare
| @@ -0,0 +1,129 @@ | |||
| from absl import logging | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copyright?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also please add comments for the file.
| return out | ||
|
|
||
|
|
||
| def _mha_forward(query, key, value, bias, causal, softmax_scale): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get a support for segment ID and dropout as well? Both are quite needed nowadays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Segment ID support is in progress and will be added soon
ruomingp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will defer to @kelvin-zou for approval.
|
This has been implemented in a new PR #939. That PR addresses all comments in this PR but is from a different fork of Axlearn. |
This PR adds support for flash attention kernel for Neuron implemented through Neuron Kernel Interface (NKI).
The flash attention kernel works with TRN1 and TRN2.