-
Notifications
You must be signed in to change notification settings - Fork 769
[OpenCL] Add support for cslice partitioning #7963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OpenCL] Add support for cslice partitioning #7963
Conversation
Signed-off-by: Arvind Sudarsanam <arvind.sudarsanam@intel.com>
Marking it as a draft to allow for initial review. |
Signed-off-by: Arvind Sudarsanam <arvind.sudarsanam@intel.com>
Do you have a test for this already? |
sycl/plugins/opencl/pi_opencl.cpp
Outdated
cl_device_id parentId = nullptr; | ||
clGetDeviceInfo(cast<cl_device_id>(device), CL_DEVICE_PARENT_DEVICE, | ||
sizeof(cl_device_id), &parentId, NULL); | ||
if (parentId == nullptr) | ||
return 0; | ||
cl_device_id parentParentId = nullptr; | ||
clGetDeviceInfo(parentId, CL_DEVICE_PARENT_DEVICE, sizeof(cl_device_id), | ||
&parentParentId, NULL); | ||
if (parentParentId == nullptr) | ||
return 1; | ||
cl_device_id parentParentParentId = nullptr; | ||
clGetDeviceInfo(parentParentId, CL_DEVICE_PARENT_DEVICE, sizeof(cl_device_id), | ||
&parentParentParentId, NULL); | ||
if (parentParentParentId == nullptr) | ||
return 2; | ||
return -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd rather see a generic implementation (i.e., while
loop) with an assert that the result is <=2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the levels to use an enum. So, I am leaving code as is (no while loop). Hope that's ok. Thanks
sycl/plugins/opencl/pi_opencl.cpp
Outdated
if (device->subLevel == -1) | ||
device->subLevel = getSubLevel(device); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes little sense to me. Why don't we report an error immediately for the invalid level? Also, I've realized that I'd like to see a named const/constexpr variable for "-1".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, -1 (or INVALID in the upcoming patch) means that we have not yet queried the backend for the device level info. We will make a call to 'getLevel' only once.
sycl/plugins/opencl/pi_opencl.cpp
Outdated
clGetDeviceInfo( | ||
cast<cl_device_id>(device), CL_DEVICE_PARTITION_MAX_SUB_DEVICES, | ||
sizeof(partitionMaxSubDevices), &partitionMaxSubDevices, nullptr); | ||
} else if (device->isSubDevice()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What else can it be if it's not a root device (false at line 393)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be a sub-device or sub-sub device created using calls to CreateSubDevice by user
sycl/plugins/opencl/pi_opencl.cpp
Outdated
cast<cl_device_id>(device), CL_DEVICE_PARTITION_MAX_SUB_DEVICES, | ||
sizeof(partitionMaxSubDevices), &partitionMaxSubDevices, nullptr); | ||
} else if (device->isSubDevice()) { | ||
// find out number of CCSes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if it's already a sub-sub-device ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then it will just return 0 in out_num_devices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd prefer to be more explicit about this. Something like
switch (level) {
case 2: return no_further_partitioning
case 0: return delegate_to_opencl
case 1: { our logic }
default:
assert/PI_INVALID_ERROR/etc.
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified in upcoming patch. Thanks
sycl/plugins/opencl/pi_opencl.cpp
Outdated
device->subLevel = getSubLevel(device); | ||
if (device->isRootDevice()) { | ||
clGetDeviceInfo( | ||
cast<cl_device_id>(device), CL_DEVICE_PARTITION_MAX_SUB_DEVICES, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cast<cl_device_id>(device)
- I'm confused here. It looks like the handle is from the OpenCL RT and is opaque for us, yet we do device->subLevel
above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This handle (device) is a 'pi_device' handle. This way of passing the pi_device into OpenCL APIs seems like a standard way. I am not too sure about the 'internals' of OpenCL runtime. @bashbaug, can you please comment? Thanks
sycl/plugins/opencl/pi_opencl.cpp
Outdated
clGetDeviceInfo( | ||
cast<cl_device_id>(device), CL_DEVICE_QUEUE_FAMILY_PROPERTIES_INTEL, | ||
3*sizeof(cl_queue_family_properties_intel), qfprops, &qsize); | ||
qsize = qsize/sizeof(cl_queue_family_properties_intel); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not clang-format
ed, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran clang-format before submission. Which line do you think may be misformatted? Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's specific clang-format
configuration for the OpenCL plugin. The rest of the codebase would have spaces around /
and no spaces after (
and before )
in the for
loop below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a manual change for these. Not sure how I can catch them otherwise. Thanks
sycl/plugins/opencl/pi_opencl.cpp
Outdated
cast<cl_device_id>(device), CL_DEVICE_QUEUE_FAMILY_PROPERTIES_INTEL, | ||
3*sizeof(cl_queue_family_properties_intel), qfprops, &qsize); | ||
qsize = qsize/sizeof(cl_queue_family_properties_intel); | ||
for ( size_t q = 0; q < qsize; q++ ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran clang-format before submission. Which line do you think may be misformatted? Thanks
sycl/plugins/opencl/pi_opencl.cpp
Outdated
} | ||
} | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any chance we can employ early return above to reduce this nesting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will take a look. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think changing to switch/case reduced the nesting level. Thanks
sycl/plugins/opencl/pi_opencl.cpp
Outdated
return ReturnValue(pi_device_partition_property{0}); | ||
} | ||
return ReturnHelper(PI_DEVICE_PARTITION_BY_AFFINITY_DOMAIN); | ||
} else if (device->isSubDevice()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for else
as the above if
early-returns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will change. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
sycl/plugins/opencl/pi_opencl.cpp
Outdated
} | ||
for (uint32_t i = 0; i < *out_num_devices; ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't it be a single loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it can be. Just an incremental add I think. I will fuse them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fused. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I misclicked in the previous review accidentally selecting "approved". Hopefully this will reset it.
Got it. Thanks so much for the feedback Andrei. Will make changes soon. |
Signed-off-by: Arvind Sudarsanam <arvind.sudarsanam@intel.com>
Signed-off-by: Arvind Sudarsanam <arvind.sudarsanam@intel.com>
I will add a test very soon. Thanks |
sycl/plugins/opencl/pi_opencl.cpp
Outdated
pi_uint32 numSubDevices = 0; | ||
if (device->level == _pi_device::INVALID) // level not yet updated | ||
device->level = getLevel(device); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this lazy? Can we update when device is created?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually do update this when device is created. So, getLevel will mostly not be called here. I am just using this as a fail-safe. But, thinking of it, I am not able to formulate a scenario where device->level can be invalid once it has been created. I can take out this code if needed. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather have an assert here then. The laziness complicates the code unnecessarily.
sycl/plugins/opencl/pi_opencl.cpp
Outdated
if (device->level == _pi_device::INVALID) // level not updated yet | ||
device->level = getLevel(device); | ||
// For root-device there is no partitioning to report. | ||
if (device->isRootDevice()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, we should either drop lines 468/469 (getLevel
) or convert if
s into a switch
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah..good catch. I will drop lines 468/469.
sycl/plugins/opencl/pi_opencl.cpp
Outdated
return ReturnHelper(PI_DEVICE_PARTITION_BY_AFFINITY_DOMAIN); | ||
} | ||
case _pi_device::SUBDEVICE: { | ||
// find out number of CCSes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// find out number of CCSes | |
// Find out number of CCSes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will be done in next patch. Thanks
sycl/plugins/opencl/pi_opencl.cpp
Outdated
std::cout | ||
<< "This device does not support cl_intel_command_queue_families" | ||
<< std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like those debug prints need to be removed/guarded by debug macros.
sycl/plugins/opencl/pi_opencl.cpp
Outdated
qsize = qsize / sizeof(cl_queue_family_properties_intel); | ||
for (size_t q = 0; q < qsize; q++) { | ||
if (qfprops[q].capabilities == CL_QUEUE_DEFAULT_CAPABILITIES_INTEL && | ||
qfprops[q].count > numSubDevices) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this condition mean? Can we have numSubDevices
not equal to zero?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is covering a scenario where multiple families have 'DEFAULT' capabilities and we are trying to get the handle to the family with maximum number of dub-devices.
sycl/plugins/opencl/pi_opencl.cpp
Outdated
} | ||
// Absorb the CL_DEVICE_NOT_FOUND and just return 0 in out_num_devices | ||
if (result == CL_DEVICE_NOT_FOUND) { | ||
std::cout << "Device not found\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More debug prints that need a cleanup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry. removed it. Thanks
sycl/plugins/opencl/pi_opencl.cpp
Outdated
@@ -381,6 +583,9 @@ pi_result piDevicesGet(pi_platform platform, pi_device_type device_type, | |||
cast<cl_uint>(num_entries), cast<cl_device_id *>(devices), | |||
cast<cl_uint *>(num_devices)); | |||
|
|||
for (pi_uint32 i = 0; i < num_entries; ++i) { | |||
devices[i]->level = _pi_device::ROOTDEVICE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this in place I'm really surprised we need lazy initialization for this ->level
in the code earlier in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. However, earlier in the code, we will actually not call the getLevel function as we are already setting the level here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have actually removed getLevel function. We can populate the level info at the time of device creation.
sycl/plugins/opencl/pi_opencl.hpp
Outdated
// A single-threaded app has an opportunity to enable this mode to avoid | ||
// overhead from mutex locking. Default value is 0 which means that single | ||
// thread mode is disabled. | ||
static const bool SingleThreadMode = [] { | ||
const char *Ret = std::getenv("SYCL_PI_OPENCL_SINGLE_THREAD_MODE"); | ||
const bool RetVal = Ret ? std::stoi(Ret) : 0; | ||
return RetVal; | ||
}(); | ||
|
||
// Class which acts like shared_mutex if SingleThreadMode variable is not set. | ||
// If SingleThreadMode variable is set then mutex operations are turned into | ||
// nop. | ||
class pi_shared_mutex_ocl { | ||
std::shared_mutex Mutex; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this brand new or copy-pasted from level zero plugin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the copy-paste code and added the header file from unified runtime.
sycl/plugins/opencl/pi_opencl.hpp
Outdated
ROOTDEVICE = 0, | ||
SUBDEVICE = 1, | ||
SUBSUBDEVICE = 2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can do some magic/arithmetic inside getLevel
in the code at the top of the PR's diff and then just cast it back to the enum. Not insisting though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose you are referring to nextLevel? I am not sure how getLevel can be improved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both, actually. We can calculate the depth of the device hierarchy in a loop and then cast it to this enum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would advise to keep the code as is for better readability. Hope that's ok.
sycl/plugins/opencl/pi_opencl.hpp
Outdated
_pi_device(pi_platform Plt) : Platform{Plt} { | ||
level = INVALID; | ||
family = index = 0; | ||
// NOTE: one must additionally call initialize() to complete |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see _pi_device::initialize
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah..the comment needs to be deleted.
I'm not sure how I feel about directly copy-pasting level zero plugin's infrastructure here. Adding @smaslov-intel for his thoughts. |
sycl/plugins/opencl/pi_utils.hpp
Outdated
@@ -0,0 +1,79 @@ | |||
//===--------- pi_utils.hpp - Plugin Utility Functions -------------------===// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pi_utils.hpp will be removed. This code is available from unified runtime. Thanks.
sycl/plugins/opencl/pi_opencl.hpp
Outdated
pi_uint32 index; // SYCL queue index inside a given family of queues | ||
|
||
bool isRootDevice(void) { return level == ROOTDEVICE; } | ||
bool isSubDevice(void) { return level == SUBDEVICE; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's been more than a week since my review, so I've forgotten some amount of context, but looking into the conversation at #7963 (comment) I'm realizing that I don't particularly like this interface.
sub-sub-device->isSubDevice
is ambiguous in its meaning. It is a "sub-device" of a "sub-device", after all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. I agree. I was not able to think of any clearer/easier way to represent this. May be a comment can help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed isSubSubDevice to isCCS.
Just added for review. |
/verify with intel/llvm-test-suite#1530 |
Signed-off-by: Arvind Sudarsanam <arvind.sudarsanam@intel.com>
Signed-off-by: Arvind Sudarsanam <arvind.sudarsanam@intel.com>
Signed-off-by: Arvind Sudarsanam <arvind.sudarsanam@intel.com>
@@ -1747,8 +2116,20 @@ pi_result piextKernelGetNativeHandle(pi_kernel kernel, | |||
// This API is called by Sycl RT to notify the end of the plugin lifetime. | |||
// TODO: add a global variable lifetime management code here (see | |||
// pi_level_zero.cpp for reference) Currently this is just a NOOP. | |||
// We clear all the 'map' variables here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the code to clean up all the maps that we are populating in the plugin. I also delete the pi_device handles in the cslice_devices list if they have not been deleted yet.
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah..the code cleanup needs to be done in a different way. I will address this in the next PR. Thanks
Signed-off-by: Arvind Sudarsanam <arvind.sudarsanam@intel.com>
Signed-off-by: Arvind Sudarsanam <arvind.sudarsanam@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are additional enhancements we should make but I'm fine adding them in a subsequent PR. Thanks!
paramValueSize, paramValue, paramValueSizeRet); | ||
return static_cast<pi_result>(result); | ||
} else { | ||
// For root-device there is no partitioning to report. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see much similarities between the plugins in this code actually, but if Ben is fine with postponing the cleanup, so am I.
@@ -815,11 +1087,16 @@ pi_result piContextCreate(const pi_context_properties *properties, | |||
size_t cb, void *user_data1), | |||
void *user_data, pi_context *retcontext) { | |||
pi_result ret = PI_ERROR_INVALID_OPERATION; | |||
std::vector<cl_device_id> cl_devices = getClDevices(num_devices, devices); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it ok to have duplicates here? Although I think the helper itself has to be changed, if it's not ok.
std::vector<pi_device> device_list_vec(num_devices); | ||
for (size_t i = 0; i < num_devices; ++i) | ||
device_list_vec[i] = devices[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think
template< class InputIt >
vector( InputIt first, InputIt last,
const Allocator& alloc = Allocator() );
ctor should work just fine here.
std::vector<pi_device> device_list_vec(num_devices); | ||
for (size_t i = 0; i < num_devices; ++i) | ||
device_list_vec[i] = devices[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Likewise.
if (context2devlist.find(context) != context2devlist.end()) { | ||
auto devlist = context2devlist[context]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
if (context2devlist.find(context) != context2devlist.end()) { | |
auto devlist = context2devlist[context]; | |
if (auto it = context2devlist.find(context); it != context2devlist.end()) { | |
auto devlist = *it; |
Hi @againull Can you please help to merge this? latest comments will be addressed in an upcoming PR. Thanks |
@asudarsa this patch fails post-commit on Windows and macOS. Can you please fix this ASAP? https://github.com/intel/llvm/actions/runs/4116506385/jobs/7106640407 @bader FYI, revert might be needed in case these failures came from unified runtime - to return post-commit to (partially) green state as soon as possible. |
This reverts commit dcbd6e5.
Hi Dmitry, Some of the fails seem unrelated to my change. The only issue that seems to be related is 'zer_api.h' not being found. I am not sure why that issue does not occur in Linux. Can you please verify, if possible, if the post-commit builds and tests pass without my change, before actually reverting it? Thanks |
@asudarsa, these post-commit steps were clean before your commit and stably fail after it. I also looked at the error message and it points to the code modified by this patch. |
@asudarsa, do you have any ETA for resolving post-commit test issues? |
This PR is also resulting in a failure in llvm-test-suite/SYCL/Plugin/interop-opencl.cpp in the ACC/FPGA case. Our GHA testing doesn't include an ACC case so this was missed. Running llvm-test-suite would have detected both that failure and the windows building one. |
Okay, I think we should revert the change and do more testing locally. |
Recently, an extension was added here: #7513
This extension allows partitioning a device by "cslice" (aka CCS-es).
With this change, on PVC sub-sub-devices now require info::partition_property::ext_intel_partition_by_cslice instead of info::partition_property::partition_by_affinity_domain that wasn't quite accurately describing the actual scheme.
Level Zero backend support was added here: #7626
In this change, we are adding support in OpenCL plugin.
This change makes use of the cl_intel_command_queue_families recently added.
Thanks
Signed-off-by: Arvind Sudarsanam arvind.sudarsanam@intel.com