-
Notifications
You must be signed in to change notification settings - Fork 788
[SYCL][ESIMD] Add support for align flags for simd::copy_from/to operations #4848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…tions This patch adds new parameter to simd<>::copy_from/copy_to which allows to specify memory alignment for the load/store address. Depending on the provided alignment operation is lowered to a appropriate low-level memory operation with matching alignment constraints. Signed-off-by: Sergey Dmitriev <serguei.n.dmitriev@intel.com>
sycl/include/sycl/ext/intel/experimental/esimd/detail/simd_obj_impl.hpp
Outdated
Show resolved
Hide resolved
sycl/include/sycl/ext/intel/experimental/esimd/detail/simd_obj_impl.hpp
Outdated
Show resolved
Hide resolved
sycl/include/sycl/ext/intel/experimental/esimd/detail/simd_obj_impl.hpp
Outdated
Show resolved
Hide resolved
sycl/include/sycl/ext/intel/experimental/esimd/detail/simd_obj_impl.hpp
Outdated
Show resolved
Hide resolved
please also add a reference to an E2E test to this PR |
@Pennycook, @rolandschulz - you might be interested in this too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, LGTM, but some tests fail in Jenkins.
Yes, some tests are failing, I will check why. |
constexpr unsigned Align = Flags::template alignment<T1>; | ||
|
||
simd<T, N> Tmp; | ||
if constexpr (Align >= OperandSize::DWORD && Size % OperandSize::OWORD == 0 && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be Align >= OperandSize::DWORD or Align >= OperandSize::OWORD here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Block load requires offset to be at least dword aligned, so it should be OperandSize::DWORD. block_load() then will check what alignment is and will use either aligned load if alignment is >= OperandSize::OWORD or unaligned otherwise.
@sndmitriev, friendly reminder. Is there E2E test yet? |
Yes, it is here intel/llvm-test-suite#560 |
I expect to see two failures on precommit testing SYCL :: ESIMD/reduction.cpp Test passes on GPU but fails on host. Looks like host implementation of gather<int16_t, …> works incorrectly in some cases. I will try to create a small reproducer for this problem. SYCL :: ESIMD/kmeans/kmeans.cpp Compiler cannot resolve an ambiguity between array and load constructors for simd object
I can update test to resolve this ambiguity, but I am not sure if this is the right way to fix this problem. Do you have any suggestions? |
The problem with int16_t gather on host can be reproduced on the following test
|
We might need to remove the array-based constructor until we can find a better solution :( I noticed lately that it does not get called in supposed context anyway (tests somehow did not catch that). Unless removal of initializer_list-based constructor makes compiler pick-up the array-based one in wanted cases. Can the ambiguity be resolved on the ESIMD library side? |
I am not sure how this can be resolved in the ESIMD library. Maybe it makes sense to remove load constructor from this patch until we find the right solution for the ambiguity problem? |
I just noticed that on kmeans ambiguity was actually reported for simd::copy_to/from rather than for the constructors, so please ignore my notes about the constructors. The ambiguity has to be resolved for copy_to/from overloads (pointer version vs array specialization). |
This patch adds new parameter to simd<>::copy_from/copy_to which allows
to specify memory alignment for the load/store address. Depending on the
provided alignment, operation is lowered to an appropriate low-level memory
operation with matching alignment constraints.
Signed-off-by: Sergey Dmitriev serguei.n.dmitriev@intel.com