-
Notifications
You must be signed in to change notification settings - Fork 791
Add complex group algorithms #7120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -46,4 +46,16 @@ enum GroupOperation { | |||
ExclusiveScan = 2, | |||
}; | |||
|
|||
typedef struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to add bfloat16 here? It is still 'experimental', but will be moved out soon. We can wait for that to happen. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has already got rather long in the tooth, so I'd like to defer that change for later
6c674fa
to
29279e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, but would like other reviewers to approve
Thanks for the reviews and apologies for the silence. I left this while dependent patches were merged upstream. |
74007dd
to
f7e1823
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the long delay folks. I had to get some other thorny changes merged before this actually works.
I think I've addressed everything so far, so if you have the bandwidth I think it's ready for review
Many thanks!
@@ -46,4 +46,16 @@ enum GroupOperation { | |||
ExclusiveScan = 2, | |||
}; | |||
|
|||
typedef struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has already got rather long in the tooth, so I'd like to defer that change for later
f7e1823
to
f38b675
Compare
f38b675
to
2d88880
Compare
Extends the implementation of group algorithms by adding support for complex types. Operations that can be done element-wise are implemented in headers and multiplication is implemented in CUDA PI. This involved changing the macros in CUDA PI to accommodate complex and defining complex types to use in PI interface. The scratch space for moving data between workgroups in the PI also had to be doubled.
2d88880
to
59a3806
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I'm not entriely sure ptx-nvidia
is the right folder for the collectives.cl additions as I don't see anything nvptx specific unless __clc__get_group_scratch_complex_*
is exclusively meant to be implemented for nvptx. (has been a while for me...)
friendly ping @sergey-semenov @asudarsa @intel/llvm-gatekeepers Could we get some help/reviews on this PR, it's been open for a long time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about the late review, LGTM
Add complex group algorithms
Extends the implementation of group algorithms by adding support for
complex types. Operations that can be done element-wise are implemented
in headers and multiplication is implemented in CUDA PI. This involved
changing the macros in CUDA PI to accommodate complex and defining
complex types to use in PI interface. The scratch space for moving data
between workgroups in the PI also had to be doubled.
This patch depends on two fixes to the NVPTX backend which are exposed by the new code generation.
Tests are in intel/llvm-test-suitegit