-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Add faster reduction implementations using atomic or/and intel… #1615
Conversation
…::reduce() Signed-off-by: Vyacheslav N Klochkov <vyacheslav.n.klochkov@intel.com>
@Pennycook could you review the patch? |
@@ -761,6 +785,48 @@ class __SYCL_EXPORT handler { | |||
#endif | |||
} | |||
|
|||
/// Implements parallel_for() accepting nd_range and 1 reduction variable | |||
/// having 'read_write' access mode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Minor] Suggest mentioning that reduction should support "fast atomics".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vlad, doesn't this comment already tell that (see the line 3 and 4 of this comment section)?
/// Implements parallel_for() accepting nd_range and 1 reduction variable
/// having 'read_write' access mode.
/// This version uses fast sycl::atomic operations to update user's reduction
/// variable at the end of each work-group work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about the names fast_atomic
and fast_reduce
going forward -- what we're really testing here is whether SYCL provides native
atomics or reductions for those types (since a device is not required to guarantee that their implementation is fast).
I don't feel strongly enough about this to block merging this PR, but we might want to revisit this naming convention when tuning for additional platforms.
If for some device those 'native' atomics happen to work slowly, then the good move is to exclude such 'native' atomic from 'fast' atomics list and use different algorithm not using them. Such exclusion would require some additional changes and maybe dynamic/runtime checks on HOST (I did not think much about it yet). |
That's a good point. I agree -- as long as we continue to update the logic and use these specializations only if the features are expected to improve performance, "fast" is a good name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…_docs * origin/sycl: (6482 commits) [SYCL][NFC] Clean formatting in Markdown documents (intel#1635) [SYCL][Doc] Remove obsolete parens from README (intel#1637) [SYCL] Fix failing ABI tests when LLVM_LIBDIR_SUFFIX is set (intel#1605) [SYCL] Fix warnings in libdevice (intel#1630) [SYCL][CUDA] Triage and clean LIT (intel#1620) [SYCL][NFC] Fix GCC 8 compilation warnings (intel#1631) [SYCL] Minor fixes in LowerWGScope [SYCL] PI: correct default interoperability plugin selection [SYCL] Add faster reduction implementations using atomic or/and intel::reduce() (intel#1615) [SYCL] Add sycl-ls utility for listing devices discovered/selected by SYCL RT (intel#1575) [SYCL] Fix getDeviceFromHandler declarations (intel#1626) [SPIR-V] Correct/improve declaration of SPIR-V builtins (intel#1519) [SYCL][USM] Improve USM allocator test and fix improper behavior. (intel#1538) [SYCL] Fix failing ABI LITs (intel#1622) [SYCL] Add support for MSVC internal math functions in device library (intel#1441) [SYCL] Add runtime library versioning (intel#1604) [SYCL] Check weak symbols in ABI dumps (intel#1609) [NFC][SYCL] Improve kernel metadata test (intel#1610) Revert "[SYCL] XFAIL LIT test due to duplicate diagnostic" (intel#1460) [SYCL] Move the reduction command group funcs out of handler.hpp (intel#1602) ...
…::reduce()
Signed-off-by: Vyacheslav N Klochkov vyacheslav.n.klochkov@intel.com