Skip to content

Conversation

@MasterJH5574
Copy link
Contributor

This PR fixes a bug in the LowerThreadAllreduce pass.

Prior to this PR, in multi-group settings, the thread mask is not correctly set: when the reduction extent is 32, the thread mask will always be 0. This bug was not spotted because even when the mask is 0, the CUDA program still gives correct result. But in any way, having the zero mask is dangerous and should be fixed.

@tvm-bot
Copy link
Collaborator

tvm-bot commented Jul 14, 2023

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

Generated by tvm-bot

@MasterJH5574
Copy link
Contributor Author

cc @yzh119

This PR fixes a bug in the LowerThreadAllreduce pass.

Prior to this PR, in multi-group settings, the thread mask is not
correctly set: when the reduction extent is 32, the thread mask will
always be 0. This bug was not spotted because even when the mask is 0,
the CUDA program still gives correct result. But in any way, having
the zero mask is dangerous and should be fixed.
@MasterJH5574 MasterJH5574 force-pushed the tvm-dev/2023-07-14-allreduce-mask-fix branch from 437e6ff to c7cb4ac Compare July 14, 2023 23:40
Copy link
Member

@yzh119 yzh119 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank @MasterJH5574 for spotting the bug!

@MasterJH5574 MasterJH5574 marked this pull request as draft July 15, 2023 01:35
@tqchen tqchen marked this pull request as ready for review July 15, 2023 16:19
@tqchen tqchen merged commit 9af8efc into apache:main Jul 15, 2023
junrushao pushed a commit to junrushao/tvm that referenced this pull request Jul 24, 2023
This PR fixes a bug in the LowerThreadAllreduce pass.

Prior to this PR, in multi-group settings, the thread mask is not
correctly set: when the reduction extent is 32, the thread mask will
always be 0. This bug was not spotted because even when the mask is 0,
the CUDA program still gives correct result. But in any way, having
the zero mask is dangerous and should be fixed.
junrushao pushed a commit to junrushao/tvm that referenced this pull request Jul 27, 2023
This PR fixes a bug in the LowerThreadAllreduce pass.

Prior to this PR, in multi-group settings, the thread mask is not
correctly set: when the reduction extent is 32, the thread mask will
always be 0. This bug was not spotted because even when the mask is 0,
the CUDA program still gives correct result. But in any way, having
the zero mask is dangerous and should be fixed.
junrushao pushed a commit to junrushao/tvm that referenced this pull request Jul 30, 2023
This PR fixes a bug in the LowerThreadAllreduce pass.

Prior to this PR, in multi-group settings, the thread mask is not
correctly set: when the reduction extent is 32, the thread mask will
always be 0. This bug was not spotted because even when the mask is 0,
the CUDA program still gives correct result. But in any way, having
the zero mask is dangerous and should be fixed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants