-
Notifications
You must be signed in to change notification settings - Fork 769
[SYCL] Adds support for device UUID as a SYCL extension. #3696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
A new device descriptor will be added which will provide the device Universal Unique ID (UUID). | ||
|
||
This new device descriptor is only available for devices in the Level Zero platform, and the matching aspect is only true for those devices. The DPC++ default behavior is to expose GPU devices through the Level Zero platform. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we wanted this to be Level-Zero only then we'd add this extension to https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/LevelZeroBackend/LevelZeroBackend.md. My understanding is that the intention is to have it supported at least with OpenCL too, so let's not have this description be too Level-Zero centric.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, will modify.
sycl/include/CL/sycl/detail/pi.h
Outdated
@@ -64,6 +65,8 @@ using pi_uint64 = uint64_t; | |||
using pi_bool = pi_uint32; | |||
using pi_bitfield = pi_uint64; | |||
using pi_native_handle = uintptr_t; | |||
using pi_byte_array = std::array<std::byte, 16>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized that this is not a good idea to use STL in PI API. The goal was to keep it C-only such that there are no C++ compatibility issues. Let's instead have the first call to know the size, have users allocate a C-array, and have the second call to populate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address review comments
sycl/source/detail/device_impl.cpp
Outdated
} | ||
|
||
pi_uint8_ptr uuid = | ||
static_cast<pi_uint8_ptr>(malloc(return_size * sizeof(uint8_t))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the end this memory is going to leak, isn't it?
Hence, with a long running application, RAM consumption is going to increase this tiny bit at a time.
I think, this should be mitigated somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for pointing this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tagging @smaslov-intel @gmlueck
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that we will change the DPC++ interface back to std::array
(see my other comment). In that case, this code here will be a memory leak, which is bad. If the DPC++ interface assumes that the UUID is always 16-bytes, then there is no point in trying to support an arbitrary size here. Therefore, you could just allocate a fixed 16-byte array to receive the results from the PI query. Something like:
assert(return_size <= 16);
std::byte buf[16];
/* read contents of "buf" from PI layer */
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If UUID is always 16-bytes (which it is), then do we even need 2 calls here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since PI interface readily supports receiving the size then I'd still go with 2 calls. You can additionally assert here that the returned size is 16 since you'd need to truncate it to fit the array of 16-bytes returned in SYCL API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, could somebody please clarify the resolution here? Will return value of the SYCL device query be std::array<uint8_t, 16>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's std::array<std::byte, 16>
sycl/source/detail/device_info.hpp
Outdated
@@ -1045,45 +1045,53 @@ get_device_info_host<info::device::ext_intel_pci_address>() { | |||
PI_INVALID_DEVICE); | |||
} | |||
template <> | |||
inline cl_uint get_device_info_host<info::device::ext_intel_gpu_eu_count>() { | |||
inline pi_uint32 get_device_info_host<info::device::ext_intel_gpu_eu_count>() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: It's better to move this kind of changes into a distinct patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do.
|
||
| Device Descriptors | Return Type | Description | | ||
| ------------------ | ----------- | ----------- | | ||
| info\:\:device\:\:ext\_intel\_device\_info\_uuid | uint8\_t \* | Returns the device UUID| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you change the return type here? Returning std::array<std::byte, 16>
was much better, I think. With uint8_t *
, it's unclear whether the caller is supposed to deallocate the memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smaslov-intel pointed out that it's preferable to not use STLs to maintain C compatibility. Changing this does include the requirement for deallocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding was that @smaslov-intel was concerned about the PI interface being C compatible. This interface here is the SYCL / DPC++ interface, which is assumed to be C++. There are other existing device descriptors that return STL types (e.g. info::device::sub_group_sizes
returns an std::vector
), so I think it's OK if this descriptor returns an std::array
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gmlueck is right, I only wanted PI interface to stay C-only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my bad, will fix it.
sycl/source/detail/device_impl.cpp
Outdated
@@ -284,6 +284,20 @@ bool device_impl::has(aspect Aspect) const { | |||
MDevice, PI_DEVICE_INFO_GPU_EU_COUNT_PER_SUBSLICE, | |||
sizeof(pi_device_type), &device_type, | |||
&return_size) == PI_SUCCESS; | |||
case aspect::ext_intel_device_info_uuid: { | |||
auto Result = getPlugin().call_nocheck<detail::PiApiKind::piDeviceGetInfo>( | |||
MDevice, PI_DEVICE_INFO_UUID, sizeof(pi_device_type), &device_type, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably I'm missing something, but won't this call write UUID values to the device_type
variable? If so, why?
Shouldn't 3rg argument be 0
and 4th nullptr
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here trying to get the size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we should pass 0 and nullptr as 3rd and 4th arguments. Otherwise low level runtime will try to write N bytes to the memory pointed by &device_type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rbegam Could you please address this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you mean sizeof(pi_device_type)
, &device_type
these two parameters? I mistakenly thought &device_type
and &return_size
. I think you are right. I am fixing this.
sycl/source/detail/device_impl.cpp
Outdated
} | ||
|
||
pi_uint8_ptr uuid = | ||
static_cast<pi_uint8_ptr>(malloc(return_size * sizeof(uint8_t))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, could somebody please clarify the resolution here? Will return value of the SYCL device query be std::array<uint8_t, 16>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall. just few small comments.
ping @smaslov-intel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Please add a test too.
Signed-off-by: rbegam <rehana.begam@intel.com>
Signed-off-by: rbegam <rehana.begam@intel.com>
Signed-off-by: rbegam <rehana.begam@intel.com>
Signed-off-by: rbegam <rehana.begam@intel.com>
Signed-off-by: rbegam <rehana.begam@intel.com>
Signed-off-by: rbegam <rehana.begam@intel.com>
Signed-off-by: rbegam <rehana.begam@intel.com>
Signed-off-by: rbegam <rehana.begam@intel.com>
Signed-off-by: rbegam <rehana.begam@intel.com>
Signed-off-by: rbegam <rehana.begam@intel.com>
Signed-off-by: rehana begam <rehana.begam@intel.com>
bddc7bc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rbegam, please, avoid forced pushed. It makes tracking conversations difficult.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: rehana begam <rehana.begam@intel.com>
I still see a bunch of unaddressed comments. What are we going to do with them? |
@bader all those comments were discussed offline and resolved. I've marked them as so now. |
ping @gmlueck |
* upstream/sycl: (649 commits) [SYCL][Driver][NFC] Update integration footer test for 32-bit host (intel#4039) [SYCL][L0] Initialize descriptor .stype and .pNext (intel#4032) [SYCL] Add sycl::kernel::get_kernel_bundle method (intel#3855) [SYCL] Add support for device UUID as a SYCL extension. (intel#3696) [SYCL][Matrix] Add spec document for the matrix extension interface and its first implementation for AMX (intel#3551) Fix debug build mangler test after PR#3992 (8f38045). (intel#4033) [Driver][SYCL] Restrict user -include file in final integration footer step (intel#4036) [SYCL] [Tests] Do not copy device binary image mocks (intel#4023) [SYCL][Doc] Update docs to reflect new compiler features (intel#4030) [SYCL][CUDA] cl_khr_fp16 extension connected to cuda PI. (intel#4029) [SYCL][NFC] Refactor RT unit tests (intel#4021) [SYCL] Switch to using integration footer by default (intel#3777) [SYCL][CUDA] Add the Use Default Stream property (intel#4004) Uplift GPU RT version for Linux to 21.24.20098 (intel#4003) [SYCL][CUDA] atomic_ref.fetch_add used for fp64 reduction if device.has(atomic64) (intel#3950) [Driver][SYCL] Differentiate host dependency link from regular host link (intel#4002) [SYCL][ESIMD] Support device half type in intrinsics. (intel#4024) [SYCL] Allow fpga_reg only for PODs and Trivially-copyable structs (intel#3643) [SYCL][FPGA] Restore legacy debug info version for the hardware (intel#3991) [SYCL][PI][L0] Force reset of memcpy command-list. (intel#4001) ...
Hi, @rbegam I have a question. For the device UUID, is there any platform, whose backend is Thank you. |
This includes support for only level_zero. The
llvm/sycl/docs/extensions/IntelGPU/IntelGPUDeviceInfo.md doc has
been modified with a brief description of the extension. A new aspect
is added to indicate if the support is available.
Signed-off-by: rbegam rehana.begam@intel.com