-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[OpenMP] Build OpenMP DeviceRTL on SPIR-V #121600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
@llvm/pr-subscribers-offload @llvm/pr-subscribers-libc Author: Nick Sarnie (sarnex) ChangesWe actually can build it today with the current OMP FE. We want to minimize device-specific code with SPIR-V, so I used the I also made a minor change in the RPC code called when building the RTL, Full diff: https://github.com/llvm/llvm-project/pull/121600.diff 2 Files Affected:
diff --git a/libc/shared/rpc_util.h b/libc/shared/rpc_util.h
index 9406de59f63b71..6b9df43776edac 100644
--- a/libc/shared/rpc_util.h
+++ b/libc/shared/rpc_util.h
@@ -157,7 +157,7 @@ RPC_ATTRS void sleep_briefly() {
asm("nanosleep.u32 64;" ::: "memory");
#elif defined(__AMDGPU__) && defined(RPC_TARGET_IS_GPU)
__builtin_amdgcn_s_sleep(2);
-#elif __has_builtin(__builtin_ia32_pause)
+#elif __has_builtin(__builtin_ia32_pause) && !defined(__SPIRV__)
__builtin_ia32_pause();
#elif __has_builtin(__builtin_arm_isb)
__builtin_arm_isb(0xf);
diff --git a/offload/DeviceRTL/CMakeLists.txt b/offload/DeviceRTL/CMakeLists.txt
index 22940264f9b19a..e707429a5196cf 100644
--- a/offload/DeviceRTL/CMakeLists.txt
+++ b/offload/DeviceRTL/CMakeLists.txt
@@ -264,6 +264,9 @@ compileDeviceRTLLibrary(amdgpu amdgcn-amd-amdhsa -Xclang -mcode-object-version=n
add_custom_target(omptarget.devicertl.nvptx)
compileDeviceRTLLibrary(nvptx nvptx64-nvidia-cuda --cuda-feature=+ptx63)
+add_custom_target(omptarget.devicertl.spirv64)
+compileDeviceRTLLibrary(spirv64 spirv64)
+
# Archive all the object files generated above into a static library
add_library(omptarget.devicertl STATIC)
set_target_properties(omptarget.devicertl PROPERTIES
|
Joseph said he's going to rework the DeviceRTL files soon, so let me know if I should hold off on/rework this. Also let me know if I should put this under a CMake option or something while it's experimental. |
libc/shared/rpc_util.h
Outdated
@@ -157,7 +157,7 @@ RPC_ATTRS void sleep_briefly() { | |||
asm("nanosleep.u32 64;" ::: "memory"); | |||
#elif defined(__AMDGPU__) && defined(RPC_TARGET_IS_GPU) | |||
__builtin_amdgcn_s_sleep(2); | |||
#elif __has_builtin(__builtin_ia32_pause) | |||
#elif __has_builtin(__builtin_ia32_pause) && !defined(__SPIRV__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this necessary? This seems like a massive bug that needs to be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me debug more into it, will reply here when I have a better answer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is showing up it means that the x64 builtins are being emitted, which is very bad. I'm going to guess something got crossed while defining the target info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I debugged into it quickly before and found the problem was that it seems there's some map in the preprocessor between builtins and features those builtins require, and that feature map was empty and when it's empty __has_builtin
returns true
, but I don't actually know what that feature map repesents or why it's empty, will do a real investigation on this which I should have done to begin with :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This turned out to be pretty annoying, I have a draft PR here that attempts to fix it, will mark it ready for review once CI passes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it's okay to have this target-check workaround until we resolve the discussion on the __has_builtin
PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way you have it not just implicitly assumes the host is x64. If we're dedicated to working around that then you'd just need to put an empty ifdef
at the top.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx, I just pushed a commit that hopefully fixes what you meant. If you feel strongly about not doing the workaround I'm happy to wait, but since IMO it's relatively simple and self-contained I was thinking we could not block the PR with that.
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
libc/shared/rpc_util.h
Outdated
@@ -152,12 +152,14 @@ template <typename T> class optional { | |||
|
|||
/// Suspend the thread briefly to assist the thread scheduler during busy loops. | |||
RPC_ATTRS void sleep_briefly() { | |||
#if defined(__NVPTX__) && defined(RPC_TARGET_IS_GPU) | |||
#if defined(__SPIRV__) | |||
// It's unsupported and __has_builtin doesn't always work as we expect. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer a FIXME: __has_builtin does not work on offloading targets.
or something along those lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, thanks
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's definitely easier to support this now that I merged all the architecture into one. We'll need a tester at some point but as long as we're making forward progress.
Thanks for that, we had an earlier version of this internally before your patches and it was definitely something. |
Hi @jdoerfert @shiltian, do you guys have any feedback on this? If not, I'll merge this tomorrow. Thanks! |
You know I'm surprised this works, considering we don't even have |
Right, that's the kind of stuff I was referencing when I was saying we need work in the FE. The DeviceRTL won't be able to look up/write to variables like I agree this won't have use to anyone trying to seriously use SPIR-V with OpenMP, but it is useful for LLVM developers working on OMP/SPIR-V. Maybe it's better to put this under a default-off CMake option? |
Unsure, I might just remove |
Sure, I can test it if you end up trying that. |
Also this probably wouldn't do anything because we don't have any of the SPIR-V builtins to do much of anything, so we should probably hold off on landing it since it can't be used. |
Sounds good, will mark as draft for now. |
We actually can build it today with the current OMP FE.
We want to minimize device-specific code with SPIR-V, so I used the
spirv64
triple instead ofspirv64-intel
which is currently the only planned SPIR-V based OMP offloading triple. The existing driver code finds the library fine.I also made a minor change in the RPC code called when building the RTL,
__has_builtin(__builtin_ia32_pause)
was returning true and then we were getting an assert saying it's not supported on this target, it seems there are no features associated with__builtin_ia32_pause
so__has_builtin
always returns true on SPIR-V, so just add a arch check so it does nothing as there's no sleep we can use.