Skip to content

[SYCL][InvokeSIMD] Add basic numerics test for simd_mask #8976

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 7, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 104 additions & 0 deletions sycl/test-e2e/InvokeSimd/Spec/simd_mask_merge.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
// TODO: enable on Windows once driver is ready
// REQUIRES: gpu && linux
// UNSUPPORTED: cuda || hip
//
// Check that full compilation works:
// RUN: %clangxx -fsycl -fno-sycl-device-code-split-esimd -Xclang -fsycl-allow-func-ptr %s -o %t.out
// RUN: env IGC_VCSaveStackCallLinkage=1 IGC_VCDirectCallsOnly=1 %GPU_RUN_PLACEHOLDER %t.out
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my education, can you please elaborate, why these env vars are needed?
What is VISALTO?

Copy link
Contributor Author

@sarnex sarnex Apr 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so VCSaveStackCallLinkage=1 IGC_VCDirectCallsOnly=1 are actually needed to get IGC to run with invoke_simd, all the tests have them. i'm not sure why it doens't work automatically but we do need to pass them

for VISALTO, i just copied it from another test but it looks like that's testing with some optimization flags for link time optimization IGC, so that seems to be more testing an IGC feature and not super relevant to us, so i'll remove this, thanks

#include <sycl/detail/boost/mp11.hpp>
#include <sycl/ext/intel/esimd.hpp>
#include <sycl/ext/oneapi/experimental/invoke_simd.hpp>
#include <sycl/sycl.hpp>

#include <functional>
#include <iostream>
#include <type_traits>
using namespace sycl;
using namespace sycl::ext::oneapi::experimental;
namespace esimd = sycl::ext::intel::esimd;
constexpr int VL = 16;

[[intel::device_indirectly_callable]] simd<float, VL>
SIMD_CALLEE(simd<float, VL> va, simd_mask<float, VL> mask) SYCL_ESIMD_FUNCTION {
esimd::simd<float, VL> ret(0);
esimd::simd_mask<VL> emask;
for(int i = 0; i < VL; i++)
emask[i] = static_cast<bool>(mask[i]);
ret.merge(va, !emask);
return ret;
}

int main() {
sycl::queue q;
auto dev = q.get_device();

std::cout << "Running on " << dev.get_info<sycl::info::device::name>()
<< "\n";
constexpr unsigned Size = 1024;
constexpr unsigned GroupSize = 4 * VL;

std::array<float, Size> A;
std::array<float, Size> C;
std::array<bool, Size> M;

for (unsigned i = 0; i < Size; ++i) {
A[i] = i;
C[i] = 0;
M[i] = i % 2;
}

sycl::buffer<float> ABuf(A);
sycl::buffer<float> CBuf(C);
sycl::buffer<bool> MBuf(M);

sycl::range<1> GlobalRange{Size};
// Number of workitems in each workgroup.
sycl::range<1> LocalRange{GroupSize};

sycl::nd_range<1> Range(GlobalRange, LocalRange);

try {
auto e = q.submit([&](handler &cgh) {
sycl::accessor A_acc{ABuf, cgh, sycl::read_only};
sycl::accessor C_acc{CBuf, cgh, sycl::write_only};
sycl::accessor M_acc{MBuf, cgh, sycl::read_only};
cgh.parallel_for(Range, [=](nd_item<1> ndi) {
sub_group sg = ndi.get_sub_group();
uint32_t wi_id = ndi.get_global_linear_id();
float res = invoke_simd(sg, SIMD_CALLEE, A_acc[wi_id], M_acc[wi_id]);
C_acc[wi_id] = res;
});
});
e.wait();
} catch (sycl::exception const &e) {

std::cout << "SYCL exception caught: " << e.what() << '\n';
return e.code().value();
}

int err_cnt = 0;
sycl::host_accessor A_acc(ABuf);
sycl::host_accessor C_acc(CBuf);

for (unsigned i = 0; i < Size; ++i) {
if ((i % 2 == 0) && A_acc[i] != C_acc[i]) {
if (++err_cnt < 10) {
std::cout << "failed at index " << i << ", " << C_acc[i]
<< " != " << A_acc[i] << "\n";
}
}
if ((i % 2 == 1) && C_acc[i] != 0.0f) {
if (++err_cnt < 10) {
std::cout << "failed at index " << i << ", " << C_acc[i] << " != 0\n";
}
}
}
if (err_cnt > 0) {
std::cout << " pass rate: "
<< ((float)(Size - err_cnt) / (float)Size) * 100.0f << "% ("
<< (Size - err_cnt) << "/" << Size << ")\n";
}

std::cout << (err_cnt > 0 ? "FAILED\n" : "Passed\n");
return err_cnt == 0;
}