-
Notifications
You must be signed in to change notification settings - Fork 769
[SYCL][CUDA] Add initial support for FP atomics #3276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Generates native FP32 and FP64 atomics with the following flags: -DSYCL_USE_NATIVE_FP_ATOMICS -Xsycl-target-backend --cuda-gpu-arch=sm_60 Several known issues: - __spirv_AtomicFAddExt is not inlined, so order and scope do not propagate - Generated PTX does not respect order or scope (defaults to relaxed) - Fatal error when compiling with --cuda-gpu-arch <= sm_50 A complete implementation of this feature requires libspirv to be made aware of __nvvm_reflect, so that NVVMReflect can be used to branch on __CUDA_ARCH. Signed-off-by: John Pennycook <john.pennycook@intel.com>
A note to reviewers: this covers a common and important use-case (relaxed floating-point atomics), improving performance significantly for that case. It is related to #2853, but shouldn't be considered a complete implementation and shouldn't close that issue. |
+@Naghasan and @steffenlarsen |
libclc/ptx-nvidiacl/libspirv/SPV_EXT_shader_atomic_float_add/atomicfaddext.cl
Outdated
Show resolved
Hide resolved
Good work! To my knowledge, this should get https://github.com/intel/llvm-test-suite/blob/intel/SYCL/AtomicRef/add-native.cpp working with the CUDA BE. Have you tried removing |
Signed-off-by: John Pennycook <john.pennycook@intel.com>
Thanks! The test passes on my local machine, but only with I wasn't sure whether to commit changes to the test, or not. I'm not sure how to pass the required flags only when the test is being run on the CUDA backend, and we know that the test will still fail on older NVIDIA GPUs. I convinced myself that we shouldn't claim CUDA support until the test passes for the default GPU arch -- does that make sense? |
Makes sense to me, though I'm not sure that is achievable. I vaguely remember the default arch being bumped up to sm_50 due to atomics, so I wouldn't be surprised if floating point (or at least 64-bit floating point) atomics aren't supported before sm_60. If that is the case, as long as the error is clear about the architecture problem I don't think it's necessarily an issue. In that case, this should only become a problem if people use the floating point atomics for hardware that does not support it, otherwise they would just need to add the flag. As for the test, what happens if you add the --cuda-gpu-arch=sm_60 flag to the general command? Does other backends just ignore it? If so, it could be an option, though it is arguably hacky. In the future we could consider being smarter about the default architecture, e.g. by selecting the one that suits a device on the machine building it, defaulting back to sm_50 (or other) if there is no such device. The architecture flag should overwrite this. Just spitballing. 😄 |
You're right that architectures prior to sm_60 won't be able to generate native floating-point atomic instructions, but that doesn't mean that they can't support the compilation path currently enabled by I may be wrong about this, but I think that with something like an NVVMReflect pass it would be possible to generate a compare-and-swap loop for sm_50 inside of libspirv. Then
Good idea -- I hadn't thought to try this. The flag gets a little longer (because the target has to be specified explicitly), but this seems to work: I'll defer to @bader on this. I don't know which NVIDIA architecture(s) are used for automated tests, and we might have to change the documentation if some tests are known to fail on older NVIDIA GPUs.
Sounds like a good idea to me! I've been developing on sm_60, and didn't realize I was compiling for sm_50 until I encountered |
According to CI logs automation system uses Titan RTX, which is sm_75 architecture - http://ci.llvm.intel.com:8010/#/builders/37/builds/7236/steps/8/logs/stdio. We probably should build the test for multiple targets. Passing CUDA back-end specific option while compiling for any target other than CUDA might lead to the compilation failure - this option will be passed to the device specific back-end compiler. |
If that is possible, that would be great! I don't know much about that pass, so I will trust your judgement. We could still consider having a warning though, as people might be tricked by this if they have hardware that supports sm_60+. If it is not possible, then I think having it fail at compile time will be better as - as far as I remember - it gives a nice error message about the architecture being off and you can adjust it then and there. |
Test for functionality from intel/llvm#3276 Signed-off-by: John Pennycook <john.pennycook@intel.com>
Generates native FP32 and FP64 atomics with the following flags:
-DSYCL_USE_NATIVE_FP_ATOMICS -Xsycl-target-backend --cuda-gpu-arch=sm_60
Several known issues:
A complete implementation of this feature requires libspirv to be made aware
of __nvvm_reflect, so that NVVMReflect can be used to branch on __CUDA_ARCH.
Signed-off-by: John Pennycook john.pennycook@intel.com