Skip to content

Add complex group algorithms #7120

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 24, 2023
Merged

Conversation

ldrumm
Copy link
Contributor

@ldrumm ldrumm commented Oct 19, 2022

Add complex group algorithms

Extends the implementation of group algorithms by adding support for
complex types. Operations that can be done element-wise are implemented
in headers and multiplication is implemented in CUDA PI. This involved
changing the macros in CUDA PI to accommodate complex and defining
complex types to use in PI interface. The scratch space for moving data
between workgroups in the PI also had to be doubled.

This patch depends on two fixes to the NVPTX backend which are exposed by the new code generation.

Tests are in intel/llvm-test-suitegit

@@ -46,4 +46,16 @@ enum GroupOperation {
ExclusiveScan = 2,
};

typedef struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add bfloat16 here? It is still 'experimental', but will be moved out soon. We can wait for that to happen. Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has already got rather long in the tooth, so I'd like to defer that change for later

@ldrumm ldrumm force-pushed the complex_group_algo branch 3 times, most recently from 6c674fa to 29279e0 Compare October 21, 2022 18:00
Copy link
Contributor

@romanovvlad romanovvlad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, but would like other reviewers to approve

@ldrumm
Copy link
Contributor Author

ldrumm commented Feb 7, 2023

Thanks for the reviews and apologies for the silence. I left this while dependent patches were merged upstream.

Copy link
Contributor Author

@ldrumm ldrumm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the long delay folks. I had to get some other thorny changes merged before this actually works.

I think I've addressed everything so far, so if you have the bandwidth I think it's ready for review

Many thanks!

@@ -46,4 +46,16 @@ enum GroupOperation {
ExclusiveScan = 2,
};

typedef struct {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has already got rather long in the tooth, so I'd like to defer that change for later

@ldrumm ldrumm force-pushed the complex_group_algo branch from f7e1823 to f38b675 Compare March 10, 2023 11:12
@ldrumm ldrumm temporarily deployed to aws March 10, 2023 12:32 — with GitHub Actions Inactive
@ldrumm ldrumm temporarily deployed to aws March 10, 2023 13:42 — with GitHub Actions Inactive
@ldrumm ldrumm temporarily deployed to aws March 10, 2023 14:08 — with GitHub Actions Inactive
@ldrumm ldrumm temporarily deployed to aws March 10, 2023 14:09 — with GitHub Actions Inactive
@ldrumm ldrumm force-pushed the complex_group_algo branch from f38b675 to 2d88880 Compare March 10, 2023 17:22
@ldrumm ldrumm temporarily deployed to aws March 10, 2023 17:48 — with GitHub Actions Inactive
@ldrumm ldrumm temporarily deployed to aws March 10, 2023 18:21 — with GitHub Actions Inactive
Extends the implementation of group algorithms by adding support for
complex types. Operations that can be done element-wise are implemented
in headers and multiplication is implemented in CUDA PI. This involved
changing the macros in CUDA PI to accommodate complex and defining
complex types to use in PI interface. The scratch space for moving data
between workgroups in the PI also had to be doubled.
@ldrumm ldrumm force-pushed the complex_group_algo branch from 2d88880 to 59a3806 Compare March 14, 2023 12:39
@ldrumm ldrumm temporarily deployed to aws March 14, 2023 13:07 — with GitHub Actions Inactive
@ldrumm ldrumm temporarily deployed to aws March 14, 2023 13:38 — with GitHub Actions Inactive
@romanovvlad romanovvlad requested review from sergey-semenov and removed request for romanovvlad March 16, 2023 15:17
Copy link
Contributor

@Naghasan Naghasan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I'm not entriely sure ptx-nvidia is the right folder for the collectives.cl additions as I don't see anything nvptx specific unless __clc__get_group_scratch_complex_* is exclusively meant to be implemented for nvptx. (has been a while for me...)

@npmiller
Copy link
Contributor

friendly ping @sergey-semenov @asudarsa @intel/llvm-gatekeepers

Could we get some help/reviews on this PR, it's been open for a long time.

Copy link
Contributor

@sergey-semenov sergey-semenov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the late review, LGTM

@againull againull merged commit fc039fd into intel:sycl Mar 24, 2023
@ldrumm ldrumm deleted the complex_group_algo branch March 28, 2023 10:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants