-
Notifications
You must be signed in to change notification settings - Fork 751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Add support for half type #78
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huge contribution!
I think they are a few details to improve.
Some other refactoring stuff could be postpone for the future.
sycl/include/CL/sycl/types.hpp
Outdated
#include <CL/sycl/multi_ptr.hpp> | ||
|
||
// 4.10.1: Scalar data types | ||
// 4.10.2: SYCL vector types | ||
|
||
#ifdef __SYCL_DEVICE_ONLY__ | ||
typedef _Float16 half; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please replace all the typedef
by using
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately in this file so many typedefs so I will replace typedef by using where it does violate style of near code. Replacing of all typedefs by usings seems not in scope of this PR...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I need to make a PR to fix some old coding style. :-)
typename std::enable_if< | ||
std::is_fundamental<Ty>::value || | ||
std::is_same<typename std::remove_const<Ty>::type, half>::value, | ||
vec &>::type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another meta-function to factorize out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I need to enable this operator only if Rhs has scalar type (or half) and distinguish unoptimized version of it from optimized since optimized version is not available for host half. If you have suggestion as we can re-write it more pretty, please let me know.
@@ -391,7 +427,10 @@ template <typename Type, int NumElements> class vec { | |||
} | |||
|
|||
template <typename Ty = DataT> | |||
typename std::enable_if<std::is_fundamental<Ty>::value, vec &>::type | |||
typename std::enable_if< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to comment the functions you add/change :-)
const EnableIfNotHostHalf<Ty> Arg1, const DataT Arg2, const DataT Arg3, | ||
const DataT Arg4, const DataT Arg5, const DataT Arg6, const DataT Arg7, | ||
const DataT Arg8, const DataT Arg9, const DataT ArgA, const DataT ArgB, | ||
const DataT ArgC, const DataT ArgD, const DataT ArgE, const DataT ArgF) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems very ad-hoc to deal with all the cases.
Any hope to have a variadic template solution to handle all the cases?
But I agree it seems out of the scope of this PR since it is history...
87a4dd7
to
82ad152
Compare
uint16_t Exp16 = 0; | ||
uint16_t Frac16 = Frac32 >> 13; | ||
|
||
if (__builtin_expect(Exp32 == 0xff || Exp32Diff > 15, 0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the expected :-) compiler to have __builtin_expect()
available?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tell the compiler that chances are that this branch will not be reached.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. But the problem I see is that it is not standard C++ and it is not guarded by any compiler-specific preprocessor guard...
Is it to be executed only on the device side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation only works on host side. On device side, we use _Float16
from clang.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So see my question above which is not about the meaning of __builtin_expect()
but about what are the compilers supporting this extension? MSVC, ICC, GCC, Clang,... ?
The problem is that SYCL allows to use a compiler different from Clang/LLVM for the host side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could argue that this is SYCL runtime library (not part of the SYCL headers), which can be deployed to user in binary form and "a compiler for the host side" is not relevant, but it's quite simple thing to fix by checking if __builtin_extect
is supported: https://clang.llvm.org/docs/LanguageExtensions.html#feature-checking-macros.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand then that you envision that the SYCL host runtime should be different according to the host compiler?
I could argue that this seems difficult to manage at scale.
So I prefer to agree with the end of your sentence. :-)
sycl/test/basic_tests/half_type.cpp
Outdated
|
||
inline bool bitwise_comparison_fp32(const half val, const uint32_t exp) { | ||
const float fp32 = static_cast<float>(val); | ||
return *reinterpret_cast<const uint32_t *>(&fp32) == exp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right you need the reference.
Signed-off-by: Mariya Podchishchaeva <mariya.podchishchaeva@intel.com>
Half type is defined on SYCL device as _Float16. This type is mangled by clang as "DF16_". OclCxxRewrite doesn't support this mangling and skips all OpenCL built-ins with _Float16 type. So these built-ins are not translated properly to SPIRV without this change. Signed-off-by: Mariya Podchishchaeva <mariya.podchishchaeva@intel.com>
82ad152
to
1e6ab92
Compare
Summary: feature coverage is a useful signal that is available during the merge process, but was not printed previously. Output example: ``` $ ./fuzzer -use_value_profile=1 -merge=1 new_corpus/ seed_corpus/ INFO: Seed: 1676551929 INFO: Loaded 1 modules (2380 inline 8-bit counters): 2380 [0x90d180, 0x90dacc), INFO: Loaded 1 PC tables (2380 PCs): 2380 [0x684018,0x68d4d8), MERGE-OUTER: 180 files, 78 in the initial corpus MERGE-OUTER: attempt 1 INFO: Seed: 1676574577 INFO: Loaded 1 modules (2380 inline 8-bit counters): 2380 [0x90d180, 0x90dacc), INFO: Loaded 1 PC tables (2380 PCs): 2380 [0x684018,0x68d4d8), INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 1048576 bytes MERGE-INNER: using the control file '/tmp/libFuzzerTemp.111754.txt' MERGE-INNER: 180 total files; 0 processed earlier; will process 180 files now #1 pulse cov: 134 ft: 330 exec/s: 0 rss: 37Mb #2 pulse cov: 142 ft: 462 exec/s: 0 rss: 38Mb #4 pulse cov: 152 ft: 651 exec/s: 0 rss: 38Mb #8 pulse cov: 152 ft: 943 exec/s: 0 rss: 38Mb #16 pulse cov: 520 ft: 2783 exec/s: 0 rss: 39Mb #32 pulse cov: 552 ft: 3280 exec/s: 0 rss: 41Mb #64 pulse cov: 576 ft: 3641 exec/s: 0 rss: 50Mb #78 LOADED cov: 602 ft: 3936 exec/s: 0 rss: 88Mb #128 pulse cov: 611 ft: 3996 exec/s: 0 rss: 93Mb #180 DONE cov: 611 ft: 4016 exec/s: 0 rss: 155Mb MERGE-OUTER: succesfull in 1 attempt(s) MERGE-OUTER: the control file has 39741 bytes MERGE-OUTER: consumed 0Mb (37Mb rss) to parse the control file MERGE-OUTER: 9 new files with 80 new features added; 9 new coverage edges ``` Reviewers: hctim, morehouse Reviewed By: morehouse Subscribers: delcypher, #sanitizers, llvm-commits, kcc Tags: #llvm, #sanitizers Differential Revision: https://reviews.llvm.org/D66030 llvm-svn: 368617
…… (#67069) We noticed some performance issue while in lldb-vscode for grabing the name of the SBValue. Profiling shows SBValue::GetName() can cause synthetic children provider of shared/unique_ptr to deference underlying object and complete it type. This patch lazily moves the dereference from synthetic child provider's Update() method to GetChildAtIndex() so that SBValue::GetName() won't trigger the slow code path. Here is the culprit slow code path: ``` ... frame #59: 0x00007ff4102e0660 liblldb.so.15`SymbolFileDWARF::CompleteType(this=<unavailable>, compiler_type=0x00007ffdd9829450) at SymbolFileDWARF.cpp:1567:25 [opt] ... frame #67: 0x00007ff40fdf9bd4 liblldb.so.15`lldb_private::ValueObject::Dereference(this=0x0000022bb5dfe980, error=0x00007ffdd9829970) at ValueObject.cpp:2672:41 [opt] frame #68: 0x00007ff41011bb0a liblldb.so.15`(anonymous namespace)::LibStdcppSharedPtrSyntheticFrontEnd::Update(this=0x000002298fb94380) at LibStdcpp.cpp:403:40 [opt] frame #69: 0x00007ff41011af9a liblldb.so.15`lldb_private::formatters::LibStdcppSharedPtrSyntheticFrontEndCreator(lldb_private::CXXSyntheticChildren*, std::shared_ptr<lldb_private::ValueObject>) [inlined] (anonymous namespace)::LibStdcppSharedPtrSyntheticFrontEnd::LibStdcppSharedPtrSyntheticFrontEnd(this=0x000002298fb94380, valobj_sp=<unavailable>) at LibStdcpp.cpp:371:5 [opt] ... frame #78: 0x00007ff40fdf6e42 liblldb.so.15`lldb_private::ValueObject::CalculateSyntheticValue(this=0x000002296c66a500) at ValueObject.cpp:1836:27 [opt] frame #79: 0x00007ff40fdf1939 liblldb.so.15`lldb_private::ValueObject::GetSyntheticValue(this=<unavailable>) at ValueObject.cpp:1867:3 [opt] frame #80: 0x00007ff40fc89008 liblldb.so.15`ValueImpl::GetSP(this=0x0000022c71b90de0, stop_locker=0x00007ffdd9829d00, lock=0x00007ffdd9829d08, error=0x00007ffdd9829d18) at SBValue.cpp:141:46 [opt] frame #81: 0x00007ff40fc7d82a liblldb.so.15`lldb::SBValue::GetSP(ValueLocker&) const [inlined] ValueLocker::GetLockedSP(this=0x00007ffdd9829d00, in_value=<unavailable>) at SBValue.cpp:208:21 [opt] frame #82: 0x00007ff40fc7d817 liblldb.so.15`lldb::SBValue::GetSP(this=0x00007ffdd9829d90, locker=0x00007ffdd9829d00) const at SBValue.cpp:1047:17 [opt] frame #83: 0x00007ff40fc7da6f liblldb.so.15`lldb::SBValue::GetName(this=0x00007ffdd9829d90) at SBValue.cpp:294:32 [opt] ... ``` Differential Revision: https://reviews.llvm.org/D159542
No description provided.