Skip to content

[SYCL][ESIMD] Add support for align flags for simd::copy_from/to operations #4848

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Nov 16, 2021

Conversation

sndmitriev
Copy link
Contributor

This patch adds new parameter to simd<>::copy_from/copy_to which allows
to specify memory alignment for the load/store address. Depending on the
provided alignment, operation is lowered to an appropriate low-level memory
operation with matching alignment constraints.

Signed-off-by: Sergey Dmitriev serguei.n.dmitriev@intel.com

…tions

This patch adds new parameter to simd<>::copy_from/copy_to which allows
to specify memory alignment for the load/store address. Depending on the
provided alignment operation is lowered to a appropriate low-level memory
operation with matching alignment constraints.

Signed-off-by: Sergey Dmitriev <serguei.n.dmitriev@intel.com>
@kbobrovs
Copy link
Contributor

please also add a reference to an E2E test to this PR

@kbobrovs kbobrovs requested a review from kychendev October 29, 2021 05:16
@kbobrovs
Copy link
Contributor

@Pennycook, @rolandschulz - you might be interested in this too.

Copy link
Contributor

@kbobrovs kbobrovs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, LGTM, but some tests fail in Jenkins.

@sndmitriev
Copy link
Contributor Author

Yes, some tests are failing, I will check why.

constexpr unsigned Align = Flags::template alignment<T1>;

simd<T, N> Tmp;
if constexpr (Align >= OperandSize::DWORD && Size % OperandSize::OWORD == 0 &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be Align >= OperandSize::DWORD or Align >= OperandSize::OWORD here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Block load requires offset to be at least dword aligned, so it should be OperandSize::DWORD. block_load() then will check what alignment is and will use either aligned load if alignment is >= OperandSize::OWORD or unaligned otherwise.

@kbobrovs
Copy link
Contributor

please also add a reference to an E2E test to this PR

@sndmitriev, friendly reminder. Is there E2E test yet?

@sndmitriev
Copy link
Contributor Author

please also add a reference to an E2E test to this PR

@sndmitriev, friendly reminder. Is there E2E test yet?

Yes, it is here intel/llvm-test-suite#560

@sndmitriev
Copy link
Contributor Author

sndmitriev commented Nov 10, 2021

I expect to see two failures on precommit testing

SYCL :: ESIMD/reduction.cpp

Test passes on GPU but fails on host. Looks like host implementation of gather<int16_t, …> works incorrectly in some cases. I will try to create a small reproducer for this problem.

SYCL :: ESIMD/kmeans/kmeans.cpp

Compiler cannot resolve an ambiguity between array and load constructors for simd object

/// Construct from an array. To allow e.g. simd_mask_type<N> m({1,0,0,1,...}).
template <int N1, class = std::enable_if_t<N1 == N>>
simd_obj_impl(const Ty(&&Arr)[N1]) noexcept

/// Load constructor.
template <typename Flags = element_aligned_tag,
typename = std::enable_if_t<is_simd_flag_type_v<Flags>>>
simd_obj_impl(const Ty *ptr, Flags = {}) noexcept {

I can update test to resolve this ambiguity, but I am not sure if this is the right way to fix this problem. Do you have any suggestions?

@sndmitriev
Copy link
Contributor Author

The problem with int16_t gather on host can be reproduced on the following test

$ cat gather16.cpp 
#include <CL/sycl.hpp>
#include <array>
#include <iostream>
#include <numeric>

#include <sycl/ext/intel/experimental/esimd.hpp>

using namespace cl::sycl;
using namespace sycl::ext::intel::experimental;
using namespace sycl::ext::intel::experimental::esimd;

int main(void) {
  constexpr int N = 8;
  using T = int16_t;

  std::array<T, N> Ref;
  std::iota(Ref.begin(), Ref.end(), 1);

  simd<T, N> Tst = gather<T, N>(Ref.data(), simd<uint32_t, N>(0u, sizeof(T)));

  for (int I = 0; I < N; ++I)
    if (Ref[I] != Tst[I]) {
      std::cout << "failed at " << I << ": " << Tst[I] << " (Tst) != " << Ref[I] << " (Ref)\n";
      return 1;
    }

  std::cout << "passed\n";
  return 0;
}
$ clang++ -fsycl gather16.cpp 
$ ./a.out 
failed at 1: 0 (Tst) != 2 (Ref)
$ 

@kbobrovs
Copy link
Contributor

I can update test to resolve this ambiguity, but I am not sure if this is the right way to fix this problem. Do you have any suggestions?

We might need to remove the array-based constructor until we can find a better solution :( I noticed lately that it does not get called in supposed context anyway (tests somehow did not catch that). Unless removal of initializer_list-based constructor makes compiler pick-up the array-based one in wanted cases. Can the ambiguity be resolved on the ESIMD library side?

@sndmitriev
Copy link
Contributor Author

I am not sure how this can be resolved in the ESIMD library. Maybe it makes sense to remove load constructor from this patch until we find the right solution for the ambiguity problem?

@sndmitriev
Copy link
Contributor Author

I just noticed that on kmeans ambiguity was actually reported for simd::copy_to/from rather than for the constructors, so please ignore my notes about the constructors. The ambiguity has to be resolved for copy_to/from overloads (pointer version vs array specialization).

@sndmitriev sndmitriev marked this pull request as ready for review November 12, 2021 09:47
@kbobrovs kbobrovs merged commit 27f5c12 into intel:sycl Nov 16, 2021
@sndmitriev sndmitriev deleted the sndmitriev/copy_align branch November 16, 2021 03:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants