Skip to content

[SYCL] Adds support for device UUID as a SYCL extension. #3696

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Jul 2, 2021
Merged

Conversation

rbegam
Copy link
Contributor

@rbegam rbegam commented May 5, 2021

This includes support for only level_zero. The
llvm/sycl/docs/extensions/IntelGPU/IntelGPUDeviceInfo.md doc has
been modified with a brief description of the extension. A new aspect
is added to indicate if the support is available.

Signed-off-by: rbegam rehana.begam@intel.com

@rbegam rbegam requested review from smaslov-intel and a team as code owners May 5, 2021 21:53

A new device descriptor will be added which will provide the device Universal Unique ID (UUID).

This new device descriptor is only available for devices in the Level Zero platform, and the matching aspect is only true for those devices. The DPC++ default behavior is to expose GPU devices through the Level Zero platform.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we wanted this to be Level-Zero only then we'd add this extension to https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/LevelZeroBackend/LevelZeroBackend.md. My understanding is that the intention is to have it supported at least with OpenCL too, so let's not have this description be too Level-Zero centric.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, will modify.

@@ -64,6 +65,8 @@ using pi_uint64 = uint64_t;
using pi_bool = pi_uint32;
using pi_bitfield = pi_uint64;
using pi_native_handle = uintptr_t;
using pi_byte_array = std::array<std::byte, 16>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that this is not a good idea to use STL in PI API. The goal was to keep it C-only such that there are no C++ compatibility issues. Let's instead have the first call to know the size, have users allocate a C-array, and have the second call to populate it.

Copy link
Contributor

@smaslov-intel smaslov-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address review comments

}

pi_uint8_ptr uuid =
static_cast<pi_uint8_ptr>(malloc(return_size * sizeof(uint8_t)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the end this memory is going to leak, isn't it?
Hence, with a long running application, RAM consumption is going to increase this tiny bit at a time.
I think, this should be mitigated somehow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for pointing this out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that we will change the DPC++ interface back to std::array (see my other comment). In that case, this code here will be a memory leak, which is bad. If the DPC++ interface assumes that the UUID is always 16-bytes, then there is no point in trying to support an arbitrary size here. Therefore, you could just allocate a fixed 16-byte array to receive the results from the PI query. Something like:

assert(return_size <= 16);
std::byte buf[16];
/* read contents of "buf" from PI layer */

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If UUID is always 16-bytes (which it is), then do we even need 2 calls here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since PI interface readily supports receiving the size then I'd still go with 2 calls. You can additionally assert here that the returned size is 16 since you'd need to truncate it to fit the array of 16-bytes returned in SYCL API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, could somebody please clarify the resolution here? Will return value of the SYCL device query be std::array<uint8_t, 16>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's std::array<std::byte, 16>

@@ -1045,45 +1045,53 @@ get_device_info_host<info::device::ext_intel_pci_address>() {
PI_INVALID_DEVICE);
}
template <>
inline cl_uint get_device_info_host<info::device::ext_intel_gpu_eu_count>() {
inline pi_uint32 get_device_info_host<info::device::ext_intel_gpu_eu_count>() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: It's better to move this kind of changes into a distinct patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do.


| Device Descriptors | Return Type | Description |
| ------------------ | ----------- | ----------- |
| info\:\:device\:\:ext\_intel\_device\_info\_uuid | uint8\_t \* | Returns the device UUID|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you change the return type here? Returning std::array<std::byte, 16> was much better, I think. With uint8_t *, it's unclear whether the caller is supposed to deallocate the memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smaslov-intel pointed out that it's preferable to not use STLs to maintain C compatibility. Changing this does include the requirement for deallocation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding was that @smaslov-intel was concerned about the PI interface being C compatible. This interface here is the SYCL / DPC++ interface, which is assumed to be C++. There are other existing device descriptors that return STL types (e.g. info::device::sub_group_sizes returns an std::vector), so I think it's OK if this descriptor returns an std::array.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmlueck is right, I only wanted PI interface to stay C-only

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad, will fix it.

@@ -284,6 +284,20 @@ bool device_impl::has(aspect Aspect) const {
MDevice, PI_DEVICE_INFO_GPU_EU_COUNT_PER_SUBSLICE,
sizeof(pi_device_type), &device_type,
&return_size) == PI_SUCCESS;
case aspect::ext_intel_device_info_uuid: {
auto Result = getPlugin().call_nocheck<detail::PiApiKind::piDeviceGetInfo>(
MDevice, PI_DEVICE_INFO_UUID, sizeof(pi_device_type), &device_type,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably I'm missing something, but won't this call write UUID values to the device_type variable? If so, why?
Shouldn't 3rg argument be 0 and 4th nullptr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here trying to get the size.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we should pass 0 and nullptr as 3rd and 4th arguments. Otherwise low level runtime will try to write N bytes to the memory pointed by &device_type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rbegam Could you please address this comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean sizeof(pi_device_type), &device_type these two parameters? I mistakenly thought &device_type and &return_size. I think you are right. I am fixing this.

}

pi_uint8_ptr uuid =
static_cast<pi_uint8_ptr>(malloc(return_size * sizeof(uint8_t)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, could somebody please clarify the resolution here? Will return value of the SYCL device query be std::array<uint8_t, 16>?

Copy link
Contributor

@smaslov-intel smaslov-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. just few small comments.

@rbegam
Copy link
Contributor Author

rbegam commented May 25, 2021

ping @smaslov-intel

smaslov-intel
smaslov-intel previously approved these changes May 25, 2021
Copy link
Contributor

@smaslov-intel smaslov-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please add a test too.

rbegam added 11 commits June 28, 2021 12:56
Signed-off-by: rbegam <rehana.begam@intel.com>
Signed-off-by: rbegam <rehana.begam@intel.com>
Signed-off-by: rbegam <rehana.begam@intel.com>
Signed-off-by: rbegam <rehana.begam@intel.com>
Signed-off-by: rbegam <rehana.begam@intel.com>
Signed-off-by: rbegam <rehana.begam@intel.com>
Signed-off-by: rbegam <rehana.begam@intel.com>
Signed-off-by: rbegam <rehana.begam@intel.com>
Signed-off-by: rbegam <rehana.begam@intel.com>
Signed-off-by: rbegam <rehana.begam@intel.com>
Signed-off-by: rehana begam <rehana.begam@intel.com>
bader
bader previously approved these changes Jun 29, 2021
Copy link
Contributor

@bader bader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rbegam, please, avoid forced pushed. It makes tracking conversations difficult.

romanovvlad
romanovvlad previously approved these changes Jun 30, 2021
Copy link
Contributor

@romanovvlad romanovvlad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: rehana begam <rehana.begam@intel.com>
@rbegam rbegam dismissed stale reviews from romanovvlad and bader via 22fa8f6 June 30, 2021 22:20
@bader
Copy link
Contributor

bader commented Jul 1, 2021

I still see a bunch of unaddressed comments. What are we going to do with them?

@rbegam
Copy link
Contributor Author

rbegam commented Jul 1, 2021

@bader all those comments were discussed offline and resolved. I've marked them as so now.

@rbegam
Copy link
Contributor Author

rbegam commented Jul 1, 2021

ping @gmlueck

@bader bader merged commit 25aee28 into intel:sycl Jul 2, 2021
alexbatashev added a commit to alexbatashev/llvm that referenced this pull request Jul 2, 2021
* upstream/sycl: (649 commits)
  [SYCL][Driver][NFC] Update integration footer test for 32-bit host (intel#4039)
  [SYCL][L0] Initialize descriptor .stype and .pNext (intel#4032)
  [SYCL] Add sycl::kernel::get_kernel_bundle method (intel#3855)
  [SYCL] Add support for device UUID as a SYCL extension. (intel#3696)
  [SYCL][Matrix] Add spec document for the matrix extension interface and its first implementation for AMX (intel#3551)
  Fix debug build mangler test after PR#3992 (8f38045). (intel#4033)
  [Driver][SYCL] Restrict user -include file in final integration footer step (intel#4036)
  [SYCL] [Tests] Do not copy device binary image mocks (intel#4023)
  [SYCL][Doc] Update docs to reflect new compiler features (intel#4030)
  [SYCL][CUDA] cl_khr_fp16 extension connected to cuda PI. (intel#4029)
  [SYCL][NFC] Refactor RT unit tests (intel#4021)
  [SYCL] Switch to using integration footer by default (intel#3777)
  [SYCL][CUDA] Add the Use Default Stream property (intel#4004)
  Uplift GPU RT version for Linux to 21.24.20098 (intel#4003)
  [SYCL][CUDA] atomic_ref.fetch_add used for fp64 reduction if device.has(atomic64) (intel#3950)
  [Driver][SYCL] Differentiate host dependency link from regular host link (intel#4002)
  [SYCL][ESIMD] Support device half type in intrinsics. (intel#4024)
  [SYCL] Allow fpga_reg only for PODs and Trivially-copyable structs (intel#3643)
  [SYCL][FPGA] Restore legacy debug info version for the hardware (intel#3991)
  [SYCL][PI][L0] Force reset of memcpy command-list. (intel#4001)
  ...
@zejun-chen
Copy link

zejun-chen commented Jul 5, 2024

Hi, @rbegam

I have a question. For the device UUID, is there any platform, whose backend is sycl::backend::ext_oneapi_level_zero, but doesn't support device UUID? Or can i say all sycl devices support the device UUID.

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants