-
Notifications
You must be signed in to change notification settings - Fork 790
[SYCL] Change linkage to linkonce_odr unless SYCL_EXTERNAL. #10317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I think the right linkage type for this case is "internal". |
|
Not exactly. AFAIK, there is a difference in how optimizations handle these linkage types. I think we might want to prefer
I implied that we use Anyway, I don't mind merging this patch as-is and make a separate patch if we prove that using |
That is true, there is indeed a difference in how some other optimizations handle these linkage types. The unreferenced function removal was the only thing I was thinking of, and that handles the two linkage types the same, but other optimizations are different.
Ah, gotcha. In this function we have to account for the worst possibility, we do not have a lot of information to do better, but I actually see there is one case where |
I have updated as said, and used the idea of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Runtime changes LGTM. I worry slightly that there was a reason we split the aspects and srcloc checks, but if it becomes a problem we can split the test in two.
clang/lib/CodeGen/CodeGenModule.cpp
Outdated
// SYCL: Device code is not generally limited to one TU, but anything accessed | ||
// from another translation unit is required to be annotated with the | ||
// SYCL_EXTERNAL keyword. For any function or variable that does not have | ||
// this, linkonce suffices. If -fno-sycl-rdc, though, we know there is only | ||
// one translation unit and can so mark them internal instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// SYCL: Device code is not generally limited to one TU, but anything accessed | |
// from another translation unit is required to be annotated with the | |
// SYCL_EXTERNAL keyword. For any function or variable that does not have | |
// this, linkonce suffices. If -fno-sycl-rdc, though, we know there is only | |
// one translation unit and can so mark them internal instead. | |
// SYCL: Device code is not generally limited to one translation unit, but anything accessed | |
// from another translation unit is required to be annotated with the | |
// SYCL_EXTERNAL macro. For any function or variable that does not have | |
// this, linkonce_odr suffices. If -fno-sycl-rdc is passed, we know there is only | |
// one translation unit and can so mark functions internal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly applied, thank you. The one thing is that the last "them" refers to both functions and variables, so replacing it with just "functions" does not cover it entirely. Are you okay with leaving that as "them", or would you prefer I repeat that as "functions and variables"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with how it is applied now, thank you.
d8d6d6f
to
5b5190d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CFE change LGTM.
@@ -1,7 +1,9 @@ | |||
// RUN: %clang_cc1 -triple spir64-unknown-unknown -disable-llvm-passes -fsycl-is-device -emit-llvm %s -o - | FileCheck %s | |||
// RUN: %clang_cc1 -triple spir64-unknown-unknown -disable-llvm-passes -fsycl-is-device -fgpu-rdc -emit-llvm %s -o - | FileCheck %s --check-prefixes=CHECK,CHECK-RDC | |||
// RUN: %clang_cc1 -triple spir64-unknown-unknown -disable-llvm-passes -fsycl-is-device -fno-gpu-rdc -emit-llvm %s -o - | FileCheck %s --check-prefixes=CHECK,CHECK-NORDC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's a good idea to extend all failing tests with -fno-gpu-rdc
case. For example, the purpose of this test is to check llvm loop metadata and we don't want this test to fail on changing linkage type. The right fix for this test is to drop checking linkage type in CHECK-LABEL:
patterns.
I would expect 1-2 tests to check the linkage type, but all other tests should ignore it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note though that the test was already checking the linkage type, that is not new behaviour of the test. When something like that changes, is it more common to just update the test to whatever is generated by the new version, or is it more common to update the test to allow any result? In upstream LLVM, it would be the former, so I went with that approach. If DPC++ usually does it differently, happy to make that change.
I can definitely agree that this is not what this particular test is about, and see your point that checking both -fgpu-rdc
and -fno-gpu-rdc
is overkill. Going over the tests, I think for device_global.cpp
and sycl-cuda-host-device-functions.cu
, checking both -fgpu-rdc
and -fno-gpu-rdc
makes sense for the test, and if we have those two, we cover both functions and variables. For the others, I will take out the -fno-gpu-rdc
versions of the tests, and then check with you once that is ready if you would still like me to additionally loosen the checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Looking over the new diff, I do actually see a lot of those affected tests using {{.*}}
in various places already, so will just update them the way you suggested under the assumption that this is indeed a difference in test styles. If it turns out you actually like the current version after all, thanks to Git, it's easy to go back. :)
f7743c9
to
ab36075
Compare
Thanks for the approvals. The CI failure for HIP AMDGPU looks unrelated to this PR and has shown up in other PRs as well (e.g. https://github.com/intel/llvm/actions/runs/5525573620/job/14962770834?pr=10313). The CI failure for Intel CPU/GEN9 GPU, I am not sure what is happening there but I see there have been relevant changes to CI, so before looking into that, I rebased to re-trigger CI and see what it shows now. Aside from checking the CI results, is this waiting on anything else from me? |
I think the |
Thanks Steffen. These are new tests that were not yet part on the commit that I was originally working on (or, at least, not yet enabled). I agree that they look like they are caused by this patch and am looking into them. The errors that were showing up earlier have disappeared, so I will disregard those as likely unrelated and since fixed. |
I think there is a design decision to be made here. The whole idea of
I think 1 makes for a poor user experience. 2 and 3 are the same from the user's point of view, if done correctly. I think 2 makes most sense, but requires bigger changes. I think 3 should be a simple short change that's sufficient to get this merged, and 2 can be done later if so desired. I will try to update the PR accordingly, but if you would prefer another direction, please let me know. |
I don't think we want option 1. either, as it means something different for Note that the way avoided the compiler messing with internal |
That |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the change seems sane to me and the no-rdc-related changes/statements seem correct, so that part lgtm. i am not an expert on llvm linkage types so ill leave that to others
In getLLVMLinkageForDeclarator, CUDA device functions are given internal linkage to allow improved IPO. The same optimizations would also be useful for SYCL. However, as multiple translation units are a possibility here, it cannot be done exactly the same way. There are a number of extra considerations that need to be made. - If a device function or object is annotated with the SYCL_EXTERNAL macro, it may be called from a different module and its handling should not be touched. - If a device function or object is not annotated with the SYCL_EXTERNAL macro, it may not be called from a different translation unit unless that other translation unit provides its own definition of it. As such, if other optimizations remove all references in the current module, the function or object itself can be removed, and other passes should be aware that it is legal to remove the function or object. - If two kernels defined in different translation units call into the same non-kernel inline or template device function, these different copies of the device function would previously be merged together, and should continue to be. The linkage that satisfies these criteria is linkonce_odr, so this commit marks functions and objects so, unless the sycl_kernel or sycl_device attributes are present which indicate that it must remain externally accessible. If -fno-sycl-rdc is used, these concerns go away and we can use internal linkage instead.
Rebased to resolve a conflict after the removal of no-opaque-pointer tests. |
ping |
…ntel#10317)" This reverts commit 4cb4fec.
…ERNAL. (intel#10317)"" This reverts commit 3d74cc1.
In getLLVMLinkageForDeclarator, CUDA device functions are given internal linkage to allow improved IPO. The same optimizations would also be useful for SYCL. However, as multiple translation units are a possibility here, it cannot be done exactly the same way. There are a number of extra considerations that need to be made.
If a device function or object is annotated with the SYCL_EXTERNAL macro, it may be called from a different module and its handling should not be touched.
If a device function or object is not annotated with the SYCL_EXTERNAL macro, it may not be called from a different translation unit unless that other translation unit provides its own definition of it. As such, if other optimizations remove all references in the current module, the function or object itself can be removed, and other passes should be aware that it is legal to remove the function or object.
If two kernels defined in different translation units call into the same non-kernel inline or template device function, these different copies of the device function would previously be merged together, and should continue to be.
The linkage that satisfies these criteria is linkonce_odr, so this commit marks functions and objects so, unless the sycl_kernel or sycl_device attributes are present which indicate that it must remain externally accessible.