-
Notifications
You must be signed in to change notification settings - Fork 769
[SYCL] Specialize atomic fetch_add for floating point types #2765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Specialize atomic fetch_add for floating point types #2765
Conversation
9668b1b
to
d141d43
Compare
c7961b1
to
7579487
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good to me.
One quick question to make sure I understand: am I right in thinking that we now default to the native FAdd implementation for all our devices? So the old compare-exchange implementation will now only be tested for CUDA devices?
@bader We should add a TODO somewhere to implement AtomicFAddEXT in the CUDA backend.
I suggest adding a new issue with "cuda" tag - https://github.com/intel/llvm/issues. |
Yes, this is aimed for all device BEs. I'll create an issue for CUDA once the extension support is pulled down into this repository. |
7579487
to
8fd82f0
Compare
02e8042
to
d8712b5
Compare
@Pennycook, could you please check if Also, should the IR checks in |
/summary:run |
56da357
to
fc3ffef
Compare
That commit says
Good question. If we're documenting |
That was an unfortunate typo on my side, sorry for that.
Fair enough. And yet I tend to agree that:
My personal approach would be to rely on the E2E tests passing with correct macro presence/absence. Should there be a lack of consensus among the reviewers, though, I'm prepared to lay this back and make all testing stages as detailed as possible. How would you recommend treating this? @AlexeySotkin, @AlexeySachkov, could you comment as well, please? |
fc3ffef
to
f7a585a
Compare
Signed-off-by: Artem Gindinson <artem.gindinson@intel.com>
Signed-off-by: Artem Gindinson <artem.gindinson@intel.com>
Signed-off-by: Artem Gindinson <artem.gindinson@intel.com>
Signed-off-by: Artem Gindinson <artem.gindinson@intel.com>
Signed-off-by: Artem Gindinson <artem.gindinson@intel.com>
Signed-off-by: Artem Gindinson <artem.gindinson@intel.com>
Signed-off-by: Artem Gindinson <artem.gindinson@intel.com>
Signed-off-by: Artem Gindinson <artem.gindinson@intel.com>
New macro name is __SYCL_USE_NATIVE_FP_ATOMICS__ Signed-off-by: Artem Gindinson <artem.gindinson@intel.com>
…here yet Signed-off-by: Artem Gindinson <artem.gindinson@intel.com>
Make the user-exposed macro consistent with naming standards for C++ libraries, ours in particular. https://eel.is/c++draft/lex.name Signed-off-by: Artem Gindinson <artem.gindinson@intel.com>
e484016
to
be71125
Compare
Updates:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Artem Gindinson <artem.gindinson@intel.com>
Minor implementation details aside, this is a follow-up to intel#2765. The end-to-end tests are already done, the latest update being intel/llvm-test-suite#118. Signed-off-by: Artem Gindinson <amgindinson@gmail.com>
Minor implementation details aside, this is a follow-up to #2765. The end-to-end tests are already done, the latest update being intel/llvm-test-suite#118. Signed-off-by: Artem Gindinson <amgindinson@gmail.com>
The new EXT/SPV_EXT_shader_atomic_float_add SPIR-V extension
allows us to further specialize
atomic::fetch_add()
forfloating point types. In device mode, we'll now be creating
an external call to a built-in-like
__spirv_AtomicFAddEXT()
.This is similar to what is done for other atomic binary
instructions, e.g. the integer specialization of
fetch_add()
being mapped onto
__spirv_AtomicIAdd()
.Furthermore,
atomic::fetch_sub()
is also re-implementedto use
__spirv_AtomicFAddEXT()
, the added operand beinga negation of the original one.
The new implementation can be exposed if a dedicated macro is
defined:
SYCL_USE_NATIVE_FP_ATOMICS
. Otherwise, a fallbackis used, where the atomic operation is done via spinlock emulation.
At the moment of committing this, only Intel GPUs support the
"native" implementation, which relies on a SPIR-V extension.
Tests for the feature have been finalized in
intel/llvm-test-suite#104.
Signed-off-by: Artem Gindinson artem.gindinson@intel.com