Skip to content

[ESIMD] Implement gather(acc) accepting compile-time properties #12414

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 23, 2024

Conversation

v-klochkov
Copy link
Contributor

No description provided.

Signed-off-by: Vyacheslav N Klochkov <vyacheslav.n.klochkov@intel.com>
Signed-off-by: Klochkov, Vyacheslav N <vyacheslav.n.klochkov@intel.com>
@v-klochkov v-klochkov force-pushed the esimd_unified_gather_acc branch from 116fbdc to 04edb74 Compare January 20, 2024 01:57
@v-klochkov v-klochkov marked this pull request as ready for review January 20, 2024 01:59
@v-klochkov v-klochkov requested a review from a team as a code owner January 20, 2024 01:59
@v-klochkov v-klochkov requested review from sarnex and fineg74 January 22, 2024 18:33
sarnex
sarnex previously approved these changes Jan 22, 2024
gather(AccessorTy acc, simd<detail::DeviceAccessorOffsetT, N> offsets,
detail::DeviceAccessorOffsetT glob_offset = 0, simd_mask<N> mask = 1) {
// Dev note: the argument \p glob_offset of this function does not have
// the default value to not conflict with more generic variant (acc-ga-3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// the default value to not conflict with more generic variant (acc-ga-3)
// a default value to not conflict with more generic variant (acc-ga-3)

detail::DeviceAccessorOffsetT glob_offset = 0, simd_mask<N> mask = 1) {
// Dev note: the argument \p glob_offset of this function does not have
// the default value to not conflict with more generic variant (acc-ga-3)
// defined lower. This restriction though requires adding an additional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// defined lower. This restriction though requires adding an additional
// defined below. This restriction though requires adding an additional

#else
return detail::gather_impl<T, N, AccessorTy>(acc, offsets, glob_offset, mask);
#endif
if constexpr (sizeof(T) > 4 || !((N == 1 || N == 8 || N == 16 || N == 32))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit you can ignore: personally i find it hard to read ! on outside of large condition, i prefer ! applied to each element, but thats just me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for the comment, but here I'll keep it as is. It is read as `T is big && N is Not from set{1,8,16,32).
This comment here just preserves the old/previous checks for gather. I may need to re-visit it a bit later, perhaps N=4 is also supported for gen12+genx_intrinsics case.

/// PropertyListT props = {}); // (acc-ga-2)
/// simd<T, N> gather(AccessorT acc, simd<OffsetT, N / VS> byte_offsets,
/// PropertyListT props = {}); // (acc-ga-3)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Whitespace difference between here and other variants

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added that white space intentionally to make it a bit more readable by splitting to groups of 3.
Actually this comment made me to think critically at this block and I noticed acc-ga-7,8,9 absent here.
I'll add them and add couple more grouping sentences to this comment/block.

/// Supported platforms: DG2, PVC only - Temporary restriction for the variant
/// with pass_thru operand. The only exception: DG2/PVC is not required if
/// stateless memory mode is enforced via -fsycl-esimd-force-stateless-mem and
/// VS == 1 and no L1/L2 cache hints passed and __ESIMD_GATHER_SCATTER_LLVM_IR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// VS == 1 and no L1/L2 cache hints passed and __ESIMD_GATHER_SCATTER_LLVM_IR
/// VS == 1 and no L1/L2 cache hints passed and the __ESIMD_GATHER_SCATTER_LLVM_IR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's have it. I added it here and in one more place (acc-ga-2)

else // ByteOffset - simd_view
Vals = gather<T, N>(InAcc, ByteOffsetsView, Pred);
}
} else { // UsePassThru is false, UseMask is false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must have been very painful to write

}
} // end if (VS == 1)
Vals.copy_to(Out + GlobalID * N);
// scatter(Out, ByteOffsets.template select<NOffsets, 1>(), Vals);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: remove commented code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, removed.

// Check VS > 1. GPU supports only dwords and qwords in this mode.
if constexpr (sizeof(T) >= 4) {
// TODO: This test case causes flaky fail. Enable it after the issue
// in GPU driver is fixed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an internal tracker for this or a version we know works?

Copy link
Contributor Author

@v-klochkov v-klochkov Jan 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I created a tracker before uploading this PR.
GPU generated code where the mask is used but is not initialized.

// The test verifies esimd::gather() functions accepting ACCESSOR
// and optional compile-time esimd::properties.
// The gather() calls in this test do not use cache-hint properties
// or VS > 1 (number of loads per offset) to not impose using PVC features.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit is not accurate I think in the part about not using PVC features

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, fixed. Thank you.

Passed &= testACC<uint32_t, TestFeatures>(Q);
Passed &= testACC<float, TestFeatures>(Q);
Passed &= testACC<ext::intel::experimental::esimd::tfloat32, TestFeatures>(Q);
#ifdef __ESIMD_FORCE_STATELESS_MEM
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be running this test and the previous in stateless mode as well? It seems we are not right now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good catch. I'll add the tests here soon.

@sarnex sarnex dismissed their stale review January 22, 2024 20:21

did not mean to approve

Need also to add E2E tests for stateless mode

Signed-off-by: Klochkov, Vyacheslav N <vyacheslav.n.klochkov@intel.com>
Signed-off-by: Klochkov, Vyacheslav N <vyacheslav.n.klochkov@intel.com>
@v-klochkov v-klochkov merged commit 655ab10 into intel:sycl Jan 23, 2024
@v-klochkov v-klochkov deleted the esimd_unified_gather_acc branch January 23, 2024 21:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants