-
Notifications
You must be signed in to change notification settings - Fork 769
[ESIMD] Implement gather(acc) accepting compile-time properties #12414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
33d2fbd
to
515a418
Compare
515a418
to
252380c
Compare
252380c
to
116fbdc
Compare
Signed-off-by: Vyacheslav N Klochkov <vyacheslav.n.klochkov@intel.com> Signed-off-by: Klochkov, Vyacheslav N <vyacheslav.n.klochkov@intel.com>
116fbdc
to
04edb74
Compare
gather(AccessorTy acc, simd<detail::DeviceAccessorOffsetT, N> offsets, | ||
detail::DeviceAccessorOffsetT glob_offset = 0, simd_mask<N> mask = 1) { | ||
// Dev note: the argument \p glob_offset of this function does not have | ||
// the default value to not conflict with more generic variant (acc-ga-3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// the default value to not conflict with more generic variant (acc-ga-3) | |
// a default value to not conflict with more generic variant (acc-ga-3) |
detail::DeviceAccessorOffsetT glob_offset = 0, simd_mask<N> mask = 1) { | ||
// Dev note: the argument \p glob_offset of this function does not have | ||
// the default value to not conflict with more generic variant (acc-ga-3) | ||
// defined lower. This restriction though requires adding an additional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// defined lower. This restriction though requires adding an additional | |
// defined below. This restriction though requires adding an additional |
#else | ||
return detail::gather_impl<T, N, AccessorTy>(acc, offsets, glob_offset, mask); | ||
#endif | ||
if constexpr (sizeof(T) > 4 || !((N == 1 || N == 8 || N == 16 || N == 32))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit you can ignore: personally i find it hard to read ! on outside of large condition, i prefer ! applied to each element, but thats just me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for the comment, but here I'll keep it as is. It is read as `T is big && N is Not from set{1,8,16,32).
This comment here just preserves the old/previous checks for gather. I may need to re-visit it a bit later, perhaps N=4 is also supported for gen12+genx_intrinsics case.
/// PropertyListT props = {}); // (acc-ga-2) | ||
/// simd<T, N> gather(AccessorT acc, simd<OffsetT, N / VS> byte_offsets, | ||
/// PropertyListT props = {}); // (acc-ga-3) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Whitespace difference between here and other variants
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added that white space intentionally to make it a bit more readable by splitting to groups of 3.
Actually this comment made me to think critically at this block and I noticed acc-ga-7,8,9 absent here.
I'll add them and add couple more grouping sentences to this comment/block.
/// Supported platforms: DG2, PVC only - Temporary restriction for the variant | ||
/// with pass_thru operand. The only exception: DG2/PVC is not required if | ||
/// stateless memory mode is enforced via -fsycl-esimd-force-stateless-mem and | ||
/// VS == 1 and no L1/L2 cache hints passed and __ESIMD_GATHER_SCATTER_LLVM_IR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// VS == 1 and no L1/L2 cache hints passed and __ESIMD_GATHER_SCATTER_LLVM_IR | |
/// VS == 1 and no L1/L2 cache hints passed and the __ESIMD_GATHER_SCATTER_LLVM_IR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let's have it. I added it here and in one more place (acc-ga-2)
else // ByteOffset - simd_view | ||
Vals = gather<T, N>(InAcc, ByteOffsetsView, Pred); | ||
} | ||
} else { // UsePassThru is false, UseMask is false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This must have been very painful to write
} | ||
} // end if (VS == 1) | ||
Vals.copy_to(Out + GlobalID * N); | ||
// scatter(Out, ByteOffsets.template select<NOffsets, 1>(), Vals); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: remove commented code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, removed.
// Check VS > 1. GPU supports only dwords and qwords in this mode. | ||
if constexpr (sizeof(T) >= 4) { | ||
// TODO: This test case causes flaky fail. Enable it after the issue | ||
// in GPU driver is fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have an internal tracker for this or a version we know works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I created a tracker before uploading this PR.
GPU generated code where the mask is used but is not initialized.
// The test verifies esimd::gather() functions accepting ACCESSOR | ||
// and optional compile-time esimd::properties. | ||
// The gather() calls in this test do not use cache-hint properties | ||
// or VS > 1 (number of loads per offset) to not impose using PVC features. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit is not accurate I think in the part about not using PVC features
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, fixed. Thank you.
Passed &= testACC<uint32_t, TestFeatures>(Q); | ||
Passed &= testACC<float, TestFeatures>(Q); | ||
Passed &= testACC<ext::intel::experimental::esimd::tfloat32, TestFeatures>(Q); | ||
#ifdef __ESIMD_FORCE_STATELESS_MEM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be running this test and the previous in stateless mode as well? It seems we are not right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good catch. I'll add the tests here soon.
Need also to add E2E tests for stateless mode Signed-off-by: Klochkov, Vyacheslav N <vyacheslav.n.klochkov@intel.com>
Signed-off-by: Klochkov, Vyacheslav N <vyacheslav.n.klochkov@intel.com>
No description provided.