Skip to content

[SYCL] Improve parallel_for error handling #2117

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Aug 6, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 25 additions & 15 deletions sycl/source/detail/error_handling/enqueue_kernel.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -49,17 +49,22 @@ bool handleInvalidWorkGroupSize(const device_impl &DeviceImpl, pi_kernel Kernel,
}
}

size_t VerSize = 0;
Plugin.call<PiApiKind::piDeviceGetInfo>(Device, PI_DEVICE_INFO_VERSION, 0,
nullptr, &VerSize);
// Some of the error handling below is special for particular OpenCL
// versions. If this is an OpenCL backend, get the version.
const char *OpenClVer = nullptr;
string_class OpenClVerStr;
if (Platform.get_backend() == cl::sycl::backend::opencl) {
assert(VerSize >= 10 &&
size_t OclVerSize = 0;
Plugin.call<PiApiKind::piDeviceGetInfo>(Device, PI_DEVICE_INFO_VERSION, 0,
nullptr, &OclVerSize);
assert(OclVerSize >= 10 &&
"Unexpected device version string"); // strlen("OpenCL X.Y")
OpenClVerStr.assign(OclVerSize, '\0');
Plugin.call<PiApiKind::piDeviceGetInfo>(Device, PI_DEVICE_INFO_VERSION,
OclVerSize, &OpenClVerStr.front(),
nullptr);
OpenClVer = &OpenClVerStr[7]; // strlen("OpenCL ")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First this comment is unclear since we do not know that it refers to 7.
OpenClVerStr.assign(OclVerSize, '\0');: my eyes are bleeding... :-)
I wonder whether it should be OclVerSize + 1.
And why a std::string at the first place instead of a local array?
Perhaps this kind of information should be cached in the queue or at least the device?
Perhaps the PI interface could be user-friendly?
While writing assembly code, OpenClVer = OpenClVerStr + 7; is simpler.
More seriously this is a motivation to move to more modern std::string_view in the future...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these are good comments. I changed the OpenCL version checking here and in the other file to use the string class in a more normal way. I don't think we want to cache this information as that would spread OpenCL knowledge to even more parts of the code. Also, this is an error handling path, so performance isn't so critical.

P.S.: Sorry about your eyes. :-)

}
string_class VerStr(VerSize, '\0');
Plugin.call<PiApiKind::piDeviceGetInfo>(Device, PI_DEVICE_INFO_VERSION,
VerSize, &VerStr.front(), nullptr);
const char *Ver = &VerStr[7]; // strlen("OpenCL ")

size_t CompileWGSize[3] = {0};
Plugin.call<PiApiKind::piKernelGetGroupInfo>(
Expand All @@ -72,13 +77,14 @@ bool handleInvalidWorkGroupSize(const device_impl &DeviceImpl, pi_kernel Kernel,
// reqd_work_group_size attribute is used to declare the work-group size
// for kernel in the program source.
if (Platform.get_backend() == cl::sycl::backend::opencl) {
if (!HasLocalSize && (Ver[0] == '1' || (Ver[0] == '2' && Ver[2] == '0')))
if (!HasLocalSize && (OpenClVer[0] == '1' ||
(OpenClVer[0] == '2' && OpenClVer[2] == '0'))) {
throw sycl::nd_range_error(
"OpenCL 1.x and 2.0 requires to pass local size argument even if "
"required work-group size was specified in the program source",
PI_INVALID_WORK_GROUP_SIZE);
}
}
// Any OpenCL version:
// PI_INVALID_WORK_GROUP_SIZE if local_work_size is specified and does not
// match the required work-group size for kernel in the program source.
if (NDRDesc.LocalSize[0] != CompileWGSize[0] ||
Expand All @@ -90,11 +96,11 @@ bool handleInvalidWorkGroupSize(const device_impl &DeviceImpl, pi_kernel Kernel,
PI_INVALID_WORK_GROUP_SIZE);
}
if (Platform.get_backend() == cl::sycl::backend::opencl) {
if (Ver[0] == '1') {
if (OpenClVer[0] == '1') {
// OpenCL 1.x:
// PI_INVALID_WORK_GROUP_SIZE if local_work_size is specified and the
// total number of work-items in the work-group computed as
// local_work_size[0] * ... * local_work_size[work_dim 1] is greater
// local_work_size[0] * ... * local_work_size[work_dim - 1] is greater
// than the value specified by PI_DEVICE_MAX_WORK_GROUP_SIZE in
// table 4.3
size_t MaxWGSize = 0;
Expand All @@ -109,10 +115,10 @@ bool handleInvalidWorkGroupSize(const device_impl &DeviceImpl, pi_kernel Kernel,
std::to_string(MaxWGSize),
PI_INVALID_WORK_GROUP_SIZE);
} else {
// RELEVENT // OpenCL 2.x:
// OpenCL 2.x:
// PI_INVALID_WORK_GROUP_SIZE if local_work_size is specified and the
// total number of work-items in the work-group computed as
// local_work_size[0] * ... * local_work_size[work_dim 1] is greater
// local_work_size[0] * ... * local_work_size[work_dim - 1] is greater
// than the value specified by PI_KERNEL_GROUP_INFO_WORK_GROUP_SIZE in
// table 5.21.
size_t KernelWGSize = 0;
Expand All @@ -127,6 +133,8 @@ bool handleInvalidWorkGroupSize(const device_impl &DeviceImpl, pi_kernel Kernel,
std::to_string(KernelWGSize) + " for this kernel",
PI_INVALID_WORK_GROUP_SIZE);
}
} else {
// TODO: Should probably have something similar for the other backends
}

if (HasLocalSize) {
Expand All @@ -147,7 +155,7 @@ bool handleInvalidWorkGroupSize(const device_impl &DeviceImpl, pi_kernel Kernel,
NDRDesc.LocalSize[2] > NDRDesc.GlobalSize[2]);

if (NonUniformWGs) {
if (Ver[0] == '1') {
if (OpenClVer[0] == '1') {
// OpenCL 1.x:
// PI_INVALID_WORK_GROUP_SIZE if local_work_size is specified and
// number of workitems specified by global_work_size is not evenly
Expand Down Expand Up @@ -212,6 +220,8 @@ bool handleInvalidWorkGroupSize(const device_impl &DeviceImpl, pi_kernel Kernel,
// else unknown. fallback (below)
}
}
} else {
// TODO: Decide what checks (if any) we need for the other backends
}
throw sycl::nd_range_error(
"Non-uniform work-groups are not supported by the target device",
Expand Down
74 changes: 58 additions & 16 deletions sycl/test/basic_tests/parallel_for_range.cpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
// XFAIL: cuda || level0
// CUDA exposes broken hierarchical parallelism.
// LEVEL0 crashes with many of the negative tests.

// RUN: %clangxx -fsycl -fsycl-targets=%sycl_triple %s -o %t.out
// RUN: %CPU_RUN_PLACEHOLDER %t.out
Expand Down Expand Up @@ -28,6 +29,16 @@ int main() {

string_class DeviceVendorName = D.get_info<info::device::vendor>();
auto DeviceType = D.get_info<info::device::device_type>();
bool IsOpenCL = (D.get_platform().get_backend() == backend::opencl);

string_class OCLVersionStr;
const char *OCLVersion = nullptr;
if (IsOpenCL) {
OCLVersionStr = D.get_info<info::device::version>();
assert(OCLVersionStr.size() >= 10 &&
"Unexpected device version string"); // strlen("OpenCL X.Y")
OCLVersion = &OCLVersionStr[7]; // strlen("OpenCL ")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand this comment

}

// parallel_for, (16, 16, 16) global, (8, 8, 8) local, reqd_wg_size(4, 4, 4)
// -> fail
Expand Down Expand Up @@ -61,41 +72,70 @@ int main() {
return 1;
}

string_class OCLVersionStr = D.get_info<info::device::version>();
assert(OCLVersionStr.size() >= 10 &&
"Unexpected device version string"); // strlen("OpenCL X.Y")
const char *OCLVersion = &OCLVersionStr[7]; // strlen("OpenCL ")
if (OCLVersion[0] == '1' || (OCLVersion[0] == '2' && OCLVersion[2] == '0')) {
if (IsOpenCL && (OCLVersion[0] == '1' ||
(OCLVersion[0] == '2' && OCLVersion[2] == '0'))) {
// OpenCL 1.x or 2.0
// parallel_for, (16, 16, 16) global, null local, reqd_wg_size(4, 4, 4) //
// -> fail
try {
Q.submit([&](handler &CGH) {
CGH.parallel_for<class ReqdWGSizeNegativeB>(
CGH.parallel_for<class ReqdWGSizeNoLocalNegative>(
range<3>(16, 16, 16), [=](item<3>) { reqd_wg_size_helper(); });
});
Q.wait_and_throw();
std::cerr
<< "Test case ReqdWGSizeNegativeB failed: no exception has been "
"thrown\n";
<< "Test case ReqdWGSizeNoLocalNegative failed: no exception has "
"been thrown\n";
return 1; // We shouldn't be here, exception is expected
} catch (nd_range_error &E) {
if (string_class(E.what()).find(
"OpenCL 1.x and 2.0 requires to pass local size argument even if "
"required work-group size was specified in the program source") ==
string_class::npos) {
std::cerr
<< "Test case ReqdWGSizeNegativeB failed: unexpected exception: "
<< "Test case ReqdWGSizeNoLocalNegative failed: unexpected "
"nd_range_error exception: "
<< E.what() << std::endl;
return 1;
}
} catch (runtime_error &E) {
std::cerr
<< "Test case ReqdWGSizeNegativeB failed: unexpected exception: "
<< "Test case ReqdWGSizeNoLocalNegative failed: unexpected "
"runtime_error exception: "
<< E.what() << std::endl;
return 1;
} catch (...) {
std::cerr << "Test case ReqdWGSizeNegativeB failed: something unexpected "
"has been caught"
std::cerr << "Test case ReqdWGSizeNoLocalNegative failed: something "
"unexpected has been caught"
<< std::endl;
return 1;
}
} else if (IsOpenCL) {
// TODO: The behavior when OpenCL > 2.0 needs to be investigated. This
// seems to fail differently than the case when OpenCL is 1.x or 2.0.
} else {
// Backends other than OpenCL
// parallel_for, (16, 16, 16) global, null local, reqd_wg_size(4, 4, 4)
// -> pass
try {
Q.submit([&](handler &CGH) {
CGH.parallel_for<class ReqdWGSizeNoLocalPositive>(
range<3>(16, 16, 16), [=](item<3>) { reqd_wg_size_helper(); });
});
Q.wait_and_throw();
} catch (nd_range_error &E) {
std::cerr << "Test case ReqdWGSizeNoLocalPositive failed: unexpected "
"nd_range_error exception: "
<< E.what() << std::endl;
return 1;
} catch (runtime_error &E) {
std::cerr << "Test case ReqdWGSizeNoLocalPositive failed: unexpected "
"runtime_error exception: "
<< E.what() << std::endl;
return 1;
} catch (...) {
std::cerr << "Test case ReqdWGSizeNoLocalPositive failed: something "
"unexpected has been caught"
<< std::endl;
return 1;
}
Expand Down Expand Up @@ -127,7 +167,7 @@ int main() {
return 1;
}

if (OCLVersion[0] == '1') {
if (IsOpenCL && OCLVersion[0] == '1') {
// OpenCL 1.x

// CL_INVALID_WORK_GROUP_SIZE if local_work_size is specified and
Expand Down Expand Up @@ -283,7 +323,7 @@ int main() {

// CL_INVALID_WORK_GROUP_SIZE if local_work_size is specified and the
// total number of work-items in the work-group computed as
// local_work_size[0] * ... * local_work_size[work_dim 1] is greater
// local_work_size[0] * ... * local_work_size[work_dim - 1] is greater
// than the value specified by CL_DEVICE_MAX_WORK_GROUP_SIZE in
// table 4.3
size_t MaxDeviceWGSize = D.get_info<info::device::max_work_group_size>();
Expand Down Expand Up @@ -317,13 +357,13 @@ int main() {
<< std::endl;
return 1;
}
} else if (OCLVersion[0] == '2') {
} else if (IsOpenCL && OCLVersion[0] == '2') {
// OpenCL 2.x

// OpenCL 2.x:
// CL_INVALID_WORK_GROUP_SIZE if local_work_size is specified and the
// total number of work-items in the work-group computed as
// local_work_size[0] * ... * local_work_size[work_dim 1] is greater
// local_work_size[0] * ... * local_work_size[work_dim - 1] is greater
// than the value specified by CL_KERNEL_WORK_GROUP_SIZE in table 5.21.
{
program P(Q.get_context());
Expand Down Expand Up @@ -800,6 +840,8 @@ int main() {
return 1;
}
}
} else {
// TODO: Add tests for other backends
}

// local size has a 0-based range -- no SIGFPEs, we hope
Expand Down
110 changes: 60 additions & 50 deletions sycl/test/basic_tests/reqd_work_group_size.cpp
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
// XFAIL: cuda || opencl
// RUN: %clangxx -fsycl -fsycl-targets=%sycl_triple %s -o %t.out
// RUN: %CPU_RUN_PLACEHOLDER %t.out
// RUN: %GPU_RUN_PLACEHOLDER %t.out
Expand All @@ -20,44 +19,9 @@ int main() {
queue Q(AsyncHandler);
device D(Q.get_device());

string_class DeviceVendorName = D.get_info<info::device::vendor>();
auto DeviceType = D.get_info<info::device::device_type>();
bool IsOpenCL = (D.get_platform().get_backend() == backend::opencl);

// parallel_for, (16, 16, 16) global, (8, 8, 8) local, reqd_wg_size(4, 4, 4)
// -> fail
try {
Q.submit([&](handler &CGH) {
CGH.parallel_for<class ReqdWGSizeNegativeA>(
nd_range<3>(range<3>(16, 16, 16), range<3>(8, 8, 8)),
[=](nd_item<3>) [[intel::reqd_work_group_size(4, 4, 4)]]{

});
});
Q.wait_and_throw();
std::cerr << "Test case ReqdWGSizeNegativeA failed: no exception has been "
"thrown\n";
return 1; // We shouldn't be here, exception is expected
} catch (nd_range_error &E) {
if (string_class(E.what()).find(
"Specified local size doesn't match the required work-group size "
"specified in the program source") == string_class::npos) {
std::cerr
<< "Test case ReqdWGSizeNegativeA failed 1: unexpected exception: "
<< E.what() << std::endl;
return 1;
}
} catch (runtime_error &E) {
std::cerr << "Test case ReqdWGSizeNegativeA failed 2: unexpected exception: "
<< E.what() << std::endl;
return 1;
} catch (...) {
std::cerr << "Test case ReqdWGSizeNegativeA failed: something unexpected "
"has been caught"
<< std::endl;
return 1;
}

// Positive test-cases that should pass on any underlying OpenCL runtime
// Positive test case: Specify local size that matches required size.
// parallel_for, (8, 8, 8) global, (4, 4, 4) local, reqd_wg_size(4, 4, 4) ->
// pass
try {
Expand All @@ -68,11 +32,13 @@ int main() {
});
Q.wait_and_throw();
} catch (nd_range_error &E) {
std::cerr << "Test case ReqdWGSizePositiveA failed: unexpected exception: "
std::cerr << "Test case ReqdWGSizePositiveA failed: unexpected "
"nd_range_error exception: "
<< E.what() << std::endl;
return 1;
} catch (runtime_error &E) {
std::cerr << "Test case ReqdWGSizePositiveA failed: unexpected exception: "
std::cerr << "Test case ReqdWGSizePositiveA failed: unexpected "
"runtime_error exception: "
<< E.what() << std::endl;
return 1;
} catch (...) {
Expand All @@ -82,24 +48,68 @@ int main() {
return 1;
}

// Kernel that has a required WG size, but no local size is specified.
//
// TODO: This fails on OpenCL and should be investigated.
if (!IsOpenCL) {
try {
Q.submit([&](handler &CGH) {
CGH.parallel_for<class ReqdWGSizeNoLocalPositive>(
range<3>(16, 16, 16),
[=](item<3>) [[intel::reqd_work_group_size(4, 4, 4)]]{});
});
Q.wait_and_throw();
} catch (nd_range_error &E) {
std::cerr << "Test case ReqdWGSizeNoLocalPositive failed: unexpected "
"nd_range_error exception: "
<< E.what() << std::endl;
return 1;
} catch (runtime_error &E) {
std::cerr
<< "Test case ReqdWGSizeNoLocalPositive: unexpected runtime_error "
"exception: "
<< E.what() << std::endl;
return 1;
} catch (...) {
std::cerr << "Test case ReqdWGSizeNoLocalPositive failed: something "
"unexpected has been caught"
<< std::endl;
return 1;
}
}

// Negative test case: Specify local size that does not match required size.
// parallel_for, (16, 16, 16) global, (8, 8, 8) local, reqd_wg_size(4, 4, 4)
// -> fail
try {
Q.submit([&](handler &CGH) {
CGH.parallel_for<class ReqdWGSizePositiveB>(
range<3>(16, 16, 16), [=](item<3>) [[intel::reqd_work_group_size(4, 4, 4)]]{});
CGH.parallel_for<class ReqdWGSizeNegativeA>(
nd_range<3>(range<3>(16, 16, 16), range<3>(8, 8, 8)),
[=](nd_item<3>) [[intel::reqd_work_group_size(4, 4, 4)]]{

});
});
Q.wait_and_throw();

std::cerr << "Test case ReqdWGSizeNegativeA failed: no exception has been "
"thrown\n";
return 1; // We shouldn't be here, exception is expected
} catch (nd_range_error &E) {
std::cerr << "Test case ReqdWGSizePositiveB failed 1: unexpected exception: "
<< E.what() << std::endl;
return 1;
if (string_class(E.what()).find(
"Specified local size doesn't match the required work-group size "
"specified in the program source") == string_class::npos) {
std::cerr
<< "Test case ReqdWGSizeNegativeA failed: unexpected nd_range_error "
"exception: "
<< E.what() << std::endl;
return 1;
}
} catch (runtime_error &E) {
std::cerr
<< "Test case ReqdWGSizePositiveB failed 2: unexpected exception: "
<< E.what() << std::endl;
std::cerr << "Test case ReqdWGSizeNegativeA failed: unexpected "
"nd_range_error exception: "
<< E.what() << std::endl;
return 1;
} catch (...) {
std::cerr << "Test case ReqdWGSizePositiveB failed: something unexpected "
std::cerr << "Test case ReqdWGSizeNegativeA failed: something unexpected "
"has been caught"
<< std::endl;
return 1;
Expand Down