Skip to content

[SYCL] Support per-object file compilation #7595

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Dec 12, 2022
Merged

[SYCL] Support per-object file compilation #7595

merged 23 commits into from
Dec 12, 2022

Conversation

sarnex
Copy link
Contributor

@sarnex sarnex commented Nov 30, 2022

This change adds per-object compilation support for SYCL, also called non-relocatable device code mode. This is already supported in clang for HIP and CUDA.

It adds a new option -f[no-]sycl-rdc. The default is -fsycl-rdc, which compiles code as today. Passing -fno-sycl-rdc activates the new mode. This is just an alias to the existing flag used by AMD/CUDA, f[no-]-gpu-rdc.

The main implication is that we no longer link all device code together into one big module before post link.
Instead, we execute all jobs after device linking on a per-object file basis.
This means sycl-post-link and the later jobs execute multiple times, since we no longer have one big module.

This can result in large improvement performance in the compiler runtime and memory usage, we see a max memory usage reduction for QUDA with -g from over 250GB to 4GB and a large compiler runtime improvement as well.

Error cases:

  1. Cross-object dependencies. Since we don't link device code together, each object file must be independent. I added an error in Sema to error if the user passes this flag and has cross-object dependencies.
  2. Invalid architecture in fat object. We currently warn gracefully about this, in per-object-file mode llvm-foreach throws an error customers won't understand, so error out in that case instead of warning.

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
@sarnex sarnex changed the title [SYCL] Support non-relocatable device code [SYCL] Support per-object file compilation Nov 30, 2022
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Today, the basic flow of the SYCL device-side driver is as follows

1) Link all device code from all compiler inputs together
2) Link 1) against SYCL device libraries
3) Run sycl-post-link on 2)
4) Run llvm-spirv on 3)
5) Run clang-offload-wrapper on 4)

Step 1 can create a performance bottleneck when you have a huge number of kernels.
If none of the kernels use globals or SYCL_EXTERNAL functions, we can actually split it up to be the following:

For object file:
   1) Link this object file's device code with SYCL device libraries
   2) Run sycl-post-link on 1)
   3) Run llvm-spirv on 2)
   4) Run clang-offload-wrapper on 4)

Since we don't link all device code together, each step runs on smaller IR which results in
compiler runtime and compiler memory usage benefits.

Note that in order to do the above per-object-file, we need to break up fat static arhives. We do
this by using the ForEachWrappingAction action. This allows us to run commands on each item inside the
fat static archive

The driver flow when a static archive is involved is the most complex case and looks like the below:

1) spriv-to-ir-wrapper on fat static archive
2) Link all SYCL device libraries together without any user device code into a single device library BC
3) llvm-foreach:
  3a): llvm-link current object file with 2)
4) file-table-tform replacing tempfilelist from 1) with output of 3)
5) llvm-foreach:
  5a) sycl-post-link on 4)
6) llvm-foreach:
  6a) Extract BC file column from 5) output table
7) file-table-tform on 6) merging all BC file columns into a single big column
8) llvm-spirv on 7) (does llvm-foreach internally)
9) file-table-tform on 5) output table, replacing BC column with spirv column from 8)
10) clang-offload-wrapper with 9)

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
I don't really like this commit, but I see the following requirements
1) Keep the default for GPURelocatableDeviceCode to false
2) For sycl, if no cc1 option is specified, GPURelocatableDeviceCode should be true

Let me know if anyone has any better ideas

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
@sarnex sarnex marked this pull request as ready for review December 1, 2022 21:41
@sarnex sarnex requested review from a team as code owners December 1, 2022 21:41
@sarnex sarnex requested review from mdtoguchi and bader December 1, 2022 21:41
@sarnex sarnex requested a review from premanandrao December 2, 2022 20:31
Copy link
Contributor

@premanandrao premanandrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FE changes LGTM.

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
@sarnex sarnex requested a review from AlexeySachkov December 6, 2022 16:50
@sarnex
Copy link
Contributor Author

sarnex commented Dec 8, 2022

@AlexeySachkov @mdtoguchi Any more feedback on this bad boy? I think I addressed all feedback. Thanks!

Copy link
Contributor

@AlexeySachkov AlexeySachkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple more questions here and minor comments.

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
@sarnex sarnex requested a review from premanandrao December 9, 2022 16:44
@sarnex
Copy link
Contributor Author

sarnex commented Dec 9, 2022

@premanandrao Do you mind re-checking the FE changes? I had to rework them

@sarnex sarnex requested a review from AlexeySachkov December 9, 2022 16:49
Copy link
Contributor

@AlexeySachkov AlexeySachkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no further concerns/comments, thanks

@sarnex sarnex removed the request for review from premanandrao December 9, 2022 21:37
@sarnex
Copy link
Contributor Author

sarnex commented Dec 9, 2022

Thanks, with the CFE re-review complete we are now ready to merge!

@premanandrao
Copy link
Contributor

@premanandrao Do you mind re-checking the FE changes? I had to rework them

Thanks, I have no further concerns.

@sarnex
Copy link
Contributor Author

sarnex commented Dec 12, 2022

@intel/llvm-gatekeepers Mind merging this one? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants