-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scatter on Cuda #6533
Scatter on Cuda #6533
Conversation
2e3c4d4
to
963147a
Compare
|
||
with ib.new_scope(): | ||
bx = te.thread_axis("blockIdx.x") | ||
ib.scope_attr(bx, "thread_extent", 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if bx is not used in this context, we can remove the above two lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I don't define at least one block in the scope, the generated cuda code fails to compile.
963147a
to
488ad3d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. just one minor nitpick. @Laurawly could you please take another look?
@zhiics Can we merge this? I want to send a PR to enable |
yeah, let's merge this. before that, we probably want to rebase again just in case bad things would happen |
fix lint fix pylint again
7881b2e
to
03de673
Compare
I rebased, I'll keep an eye on it this evening to make sure it still passes CI. Thanks! |
Thanks @mbrookhart @Laurawly @tkonolige @masahi |
* working cuda scatter fix lint fix pylint again * cuda scatter with threading * add dynamic shape tests * remove unused variable
* working cuda scatter fix lint fix pylint again * cuda scatter with threading * add dynamic shape tests * remove unused variable
* working cuda scatter fix lint fix pylint again * cuda scatter with threading * add dynamic shape tests * remove unused variable
* working cuda scatter fix lint fix pylint again * cuda scatter with threading * add dynamic shape tests * remove unused variable
I was unable to get scatter working with te schedules on cuda due to the two loops that update values in place, so I resorted to using ir_builder directly.
Attempts to better parallelize the algorithm have created some strange behavior. If I can get a correct and faster implementation to work, I'll submit another PR.
Thanks to @tkonolige for very helpful discussions. @notoraptor, this may be useful for your scatter_add work.
cc @zhiics