Skip to content

[OpenMP] Build OpenMP DeviceRTL on SPIR-V #121600

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft

[OpenMP] Build OpenMP DeviceRTL on SPIR-V #121600

wants to merge 4 commits into from

Conversation

sarnex
Copy link
Member

@sarnex sarnex commented Jan 3, 2025

We actually can build it today with the current OMP FE.

We want to minimize device-specific code with SPIR-V, so I used the spirv64 triple instead of spirv64-intel which is currently the only planned SPIR-V based OMP offloading triple. The existing driver code finds the library fine.

I also made a minor change in the RPC code called when building the RTL, __has_builtin(__builtin_ia32_pause) was returning true and then we were getting an assert saying it's not supported on this target, it seems there are no features associated with __builtin_ia32_pause so __has_builtin always returns true on SPIR-V, so just add a arch check so it does nothing as there's no sleep we can use.

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
@sarnex sarnex changed the title [OpenMP] Build OpenMP DeviceRTL on SPIRV64 [OpenMP] Build OpenMP DeviceRTL on SPIR-V Jan 3, 2025
@sarnex sarnex marked this pull request as ready for review January 3, 2025 21:47
@llvmbot
Copy link
Member

llvmbot commented Jan 3, 2025

@llvm/pr-subscribers-offload

@llvm/pr-subscribers-libc

Author: Nick Sarnie (sarnex)

Changes

We actually can build it today with the current OMP FE.

We want to minimize device-specific code with SPIR-V, so I used the spirv64 triple instead of spirv64-intel which is currently the only planned SPIR-V based OMP offloading triple. The existing driver code finds the library fine.

I also made a minor change in the RPC code called when building the RTL, __has_builtin(__builtin_ia32_pause) was returning true and then we were getting an assert saying it's not supported on this target, it seems there are no features associated with __builtin_ia32_pause so __has_builtin always returns true on SPIR-V, so just add a arch check so it does nothing as there's no sleep we can use.


Full diff: https://github.com/llvm/llvm-project/pull/121600.diff

2 Files Affected:

  • (modified) libc/shared/rpc_util.h (+1-1)
  • (modified) offload/DeviceRTL/CMakeLists.txt (+3)
diff --git a/libc/shared/rpc_util.h b/libc/shared/rpc_util.h
index 9406de59f63b71..6b9df43776edac 100644
--- a/libc/shared/rpc_util.h
+++ b/libc/shared/rpc_util.h
@@ -157,7 +157,7 @@ RPC_ATTRS void sleep_briefly() {
     asm("nanosleep.u32 64;" ::: "memory");
 #elif defined(__AMDGPU__) && defined(RPC_TARGET_IS_GPU)
   __builtin_amdgcn_s_sleep(2);
-#elif __has_builtin(__builtin_ia32_pause)
+#elif __has_builtin(__builtin_ia32_pause) && !defined(__SPIRV__)
   __builtin_ia32_pause();
 #elif __has_builtin(__builtin_arm_isb)
   __builtin_arm_isb(0xf);
diff --git a/offload/DeviceRTL/CMakeLists.txt b/offload/DeviceRTL/CMakeLists.txt
index 22940264f9b19a..e707429a5196cf 100644
--- a/offload/DeviceRTL/CMakeLists.txt
+++ b/offload/DeviceRTL/CMakeLists.txt
@@ -264,6 +264,9 @@ compileDeviceRTLLibrary(amdgpu amdgcn-amd-amdhsa -Xclang -mcode-object-version=n
 add_custom_target(omptarget.devicertl.nvptx)
 compileDeviceRTLLibrary(nvptx nvptx64-nvidia-cuda --cuda-feature=+ptx63)
 
+add_custom_target(omptarget.devicertl.spirv64)
+compileDeviceRTLLibrary(spirv64 spirv64)
+
 # Archive all the object files generated above into a static library
 add_library(omptarget.devicertl STATIC)
 set_target_properties(omptarget.devicertl PROPERTIES

@sarnex
Copy link
Member Author

sarnex commented Jan 3, 2025

Joseph said he's going to rework the DeviceRTL files soon, so let me know if I should hold off on/rework this.

Also let me know if I should put this under a CMake option or something while it's experimental.

@lntue lntue requested a review from jhuber6 January 4, 2025 16:24
@@ -157,7 +157,7 @@ RPC_ATTRS void sleep_briefly() {
asm("nanosleep.u32 64;" ::: "memory");
#elif defined(__AMDGPU__) && defined(RPC_TARGET_IS_GPU)
__builtin_amdgcn_s_sleep(2);
#elif __has_builtin(__builtin_ia32_pause)
#elif __has_builtin(__builtin_ia32_pause) && !defined(__SPIRV__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary? This seems like a massive bug that needs to be fixed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me debug more into it, will reply here when I have a better answer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is showing up it means that the x64 builtins are being emitted, which is very bad. I'm going to guess something got crossed while defining the target info.

Copy link
Member Author

@sarnex sarnex Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I debugged into it quickly before and found the problem was that it seems there's some map in the preprocessor between builtins and features those builtins require, and that feature map was empty and when it's empty __has_builtin returns true, but I don't actually know what that feature map repesents or why it's empty, will do a real investigation on this which I should have done to begin with :P

Copy link
Member Author

@sarnex sarnex Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This turned out to be pretty annoying, I have a draft PR here that attempts to fix it, will mark it ready for review once CI passes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it's okay to have this target-check workaround until we resolve the discussion on the __has_builtin PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way you have it not just implicitly assumes the host is x64. If we're dedicated to working around that then you'd just need to put an empty ifdef at the top.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx, I just pushed a commit that hopefully fixes what you meant. If you feel strongly about not doing the workaround I'm happy to wait, but since IMO it's relatively simple and self-contained I was thinking we could not block the PR with that.

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
@@ -152,12 +152,14 @@ template <typename T> class optional {

/// Suspend the thread briefly to assist the thread scheduler during busy loops.
RPC_ATTRS void sleep_briefly() {
#if defined(__NVPTX__) && defined(RPC_TARGET_IS_GPU)
#if defined(__SPIRV__)
// It's unsupported and __has_builtin doesn't always work as we expect.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer a FIXME: __has_builtin does not work on offloading targets. or something along those lines.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thanks

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely easier to support this now that I merged all the architecture into one. We'll need a tester at some point but as long as we're making forward progress.

@sarnex
Copy link
Member Author

sarnex commented Jan 7, 2025

Thanks for that, we had an earlier version of this internally before your patches and it was definitely something.

@sarnex
Copy link
Member Author

sarnex commented Jan 8, 2025

Hi @jdoerfert @shiltian, do you guys have any feedback on this? If not, I'll merge this tomorrow. Thanks!

@jhuber6
Copy link
Contributor

jhuber6 commented Jan 8, 2025

You know I'm surprised this works, considering we don't even have spirv64 hooked up to emit CGOpenMPRuntimeGPU. Do we have anything that confirms the code generated here makes sense? Maybe we should hold off on landing before then.

@sarnex
Copy link
Member Author

sarnex commented Jan 8, 2025

You know I'm surprised this works, considering we don't even have spirv64 hooked up to emit CGOpenMPRuntimeGPU. Do we have anything that confirms the code generated here makes sense? Maybe we should hold off on landing before then.

Right, that's the kind of stuff I was referencing when I was saying we need work in the FE. The DeviceRTL won't be able to look up/write to variables like kernel_environment because the FE doesn't generate them yet.

I agree this won't have use to anyone trying to seriously use SPIR-V with OpenMP, but it is useful for LLVM developers working on OMP/SPIR-V. Maybe it's better to put this under a default-off CMake option?

@jhuber6
Copy link
Contributor

jhuber6 commented Jan 8, 2025

Unsure, I might just remove -fopenmp from the DeviceRTL while I'm at it, but I don't know how SPIRV will handle that since it works fine for AMD / NVIDIA.

@sarnex
Copy link
Member Author

sarnex commented Jan 8, 2025

Sure, I can test it if you end up trying that.

@jhuber6
Copy link
Contributor

jhuber6 commented Jan 9, 2025

Also this probably wouldn't do anything because we don't have any of the SPIR-V builtins to do much of anything, so we should probably hold off on landing it since it can't be used.

@sarnex
Copy link
Member Author

sarnex commented Jan 9, 2025

Sounds good, will mark as draft for now.

@sarnex sarnex marked this pull request as draft January 9, 2025 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants