[SYCL] Redistribute USM aspects among CUDA devices #18782

frasercrmck · 2025-06-03T11:15:19Z

We were previously reporting all USM aspects as supported on all CUDA devices. This is incorrect behaviour as many devices do not support USM system allocations, nor atomic host/shared USM allocations.

Unfortunately it is very difficult to get a conclusive list of which devices support which features.

Links such as 1 suggest that pageable memory access (which the UR adapater uses to determine the runtime equivalents of these aspects) is limited to a Grace Hopper device or newer, or with Linux systems with HMM enabled. HMM is not something we can currently determine at compile time for these aspects. This change is therefore conservative for older devices (SM6.X) with HMM enabled, where we will now report "false".

For atomic host/shared allocations, the documentation on the 'hostNativeAtomicSupported' property at 1 and 2 suggests that we need both a hardware coherent system, for which 3 suggests we again need at least a Grace Hopper device. However, note again that only "some" hardware-coherent systems support the host native atomics, "including" NVLink-connected devices. This is therefore not an exhaustive list and we can't derive anything conclusive from it. This change might again be conservative for architectures older than Grace Hopper.

In short, this PR essentially just punts the problem slightly further down the road and prevents these three USM aspects from being reported as supported for SM89 devices and earlier.

We were previously reporting all USM aspects as supported on all CUDA devices. This is incorrect behaviour as many devices do not support USM system allocations, nor atomic host/shared USM allocations. Unfortunately it is very difficult to get a conclusive list of which devices support which features. Links such as [1] suggest that pageable memory access (which the UR adapater uses to determine the runtime equivalents of these aspects) is limited to at least Grace Hopper device, and possibly only with Linux systems with HMM enabled. This is not something we can currently determine at compile time for these aspects. Nevertheless, we can probably assume that devices below Grace Hopper cannot support USM system allocations. For atomic host/shared allocations, the documentation on the 'hostNativeAtomicSupported' property at [1] and [2] suggests that we need both a hardware coherent system, for which [3] suggests we again need at least a Grace Hopper device. However, note again that only "some" hardware-coherent systems support the host native atomics, "including" NVLink-connected devices. This is therefore not an exhaustive list and we can't derive anything conclusive from it. In short, this PR essentially just punts the problem slightly further down the road and prevents these three USM aspects from being reported as supported for SM89 devices and earlier. [1]: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#system-requirements-for-unified-memory. [2]: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#host-native-atomics [3]: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#cpu-and-gpu-page-tables-hardware-coherency-vs-software-coherency

ldrumm

Good writeup. Too much of my knowledge of what is and isn't supported on NVIDIA has come as rumours from press-releases and blog posts rather than real technical docs

frasercrmck · 2025-06-03T14:33:16Z

Good writeup. Too much of my knowledge of what is and isn't supported on NVIDIA has come as rumours from press-releases and blog posts rather than real technical docs

Unfortunately @GeorgeWeb's device is reporting sycl::aspect::usm_system_allocations: 1 on SM75 but with Linux HMM. So I've probably got something wrong and now we're too conservative.

frasercrmck · 2025-06-03T15:43:06Z

Good writeup. Too much of my knowledge of what is and isn't supported on NVIDIA has come as rumours from press-releases and blog posts rather than real technical docs

Unfortunately @GeorgeWeb's device is reporting sycl::aspect::usm_system_allocations: 1 on SM75 but with Linux HMM. So I've probably got something wrong and now we're too conservative.

It looks as if that's because of HMM, which we can't determine in the compiler. I've updated the PR description to explain why we report 'false' for such situations. This might be something we can enhance in the future, but it's inherently limited as HMM is not a property of the CUDA device, but of the host system, operating system, etc.

GeorgeWeb · 2025-06-03T16:18:50Z

These docs https://docs.nvidia.com/cuda/archive/12.1.0/pascal-tuning-guide/index.html#unified-memory-improvements support the fact that on supported operating systems system memory can be accessed from the GPU. However, this is reliant on, as #frasercrmck already said, the Linux kernel driver, Cuda driver's open kernel modules, etc. to support HMM (Heterogeneous Memory Management) in Cuda.

This blog post https://developer.nvidia.com/blog/simplifying-gpu-application-development-with-heterogeneous-memory-management/#enabling_and_detecting_hmm is an interesting reading and shows how to check if your Nvidia device supports Memory Addressing via HMM by using nvidia-smi.

Just as extra demonstration:
My GeForce GTX 1650 (SM 7.5 with HMM) and relevant Linux kernel version and Cuda driver with open kernel modules, is an example of this System USM being supported on older cards too as long as requirements are met.

NVIDIA-SMI 560.35.05              Driver Version: 560.35.05      CUDA Version: 12.6

$ nvidia-smi --query-gpu=compute_cap --format=csv
compute_cap
7.5

$ nvidia-smi -q | grep Addressing
Addressing Mode : HMM

That being said, it cannot be 100% guaranteed, judging by Fraser's case, that all NVIDIA GPUs prior to GH (or SM 9.0) will support the feature out-the-box, so maybe being conservative here for the compile-time/static value of the aspect is okay. Currently, we have no good way of determining this as the HMM property is not a part of the Cuda device.

If the user's system does have the required setup for Cuda's USM System Memory (and a device newer than SM 6.x) and they need to use the feature safely, they can simply re-query the value for that aspect via a runtime device query in SYCL.

GeorgeWeb

Left a larger general info comment separately.

LGTM. This more conservative approach seems best for now for the DeviceConfigFile.

frasercrmck · 2025-06-05T09:53:59Z

@intel/llvm-gatekeepers this is ready to merge, thanks

kbenzie · 2025-06-05T09:55:15Z

Approval is still required from @intel/dpcpp-tools-reviewers

frasercrmck · 2025-06-05T09:56:21Z

Approval is still required from @intel/dpcpp-tools-reviewers

How did I miss that??

maarquitos14

LGTM.

frasercrmck requested a review from a team as a code owner June 3, 2025 11:15

frasercrmck requested a review from GeorgeWeb June 3, 2025 11:15

frasercrmck temporarily deployed to WindowsCILock June 3, 2025 11:15 — with GitHub Actions Inactive

frasercrmck requested a review from ldrumm June 3, 2025 11:15

frasercrmck temporarily deployed to WindowsCILock June 3, 2025 12:02 — with GitHub Actions Inactive

ldrumm approved these changes Jun 3, 2025

View reviewed changes

GeorgeWeb approved these changes Jun 3, 2025

View reviewed changes

maarquitos14 approved these changes Jun 5, 2025

View reviewed changes

ldrumm merged commit cfc803c into intel:sycl Jun 5, 2025
24 checks passed

frasercrmck deleted the sycl-cuda-usm-aspects branch June 5, 2025 13:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL] Redistribute USM aspects among CUDA devices #18782

[SYCL] Redistribute USM aspects among CUDA devices #18782

Uh oh!

frasercrmck commented Jun 3, 2025 •

edited

Loading

Uh oh!

ldrumm left a comment

Uh oh!

frasercrmck commented Jun 3, 2025

Uh oh!

frasercrmck commented Jun 3, 2025 •

edited

Loading

Uh oh!

GeorgeWeb commented Jun 3, 2025

Uh oh!

GeorgeWeb left a comment

Uh oh!

frasercrmck commented Jun 5, 2025

Uh oh!

kbenzie commented Jun 5, 2025

Uh oh!

frasercrmck commented Jun 5, 2025

Uh oh!

maarquitos14 left a comment

Uh oh!

Uh oh!

Uh oh!

[SYCL] Redistribute USM aspects among CUDA devices #18782

[SYCL] Redistribute USM aspects among CUDA devices #18782

Uh oh!

Conversation

frasercrmck commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ldrumm left a comment

Choose a reason for hiding this comment

Uh oh!

frasercrmck commented Jun 3, 2025

Uh oh!

frasercrmck commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GeorgeWeb commented Jun 3, 2025

Uh oh!

GeorgeWeb left a comment

Choose a reason for hiding this comment

Uh oh!

frasercrmck commented Jun 5, 2025

Uh oh!

kbenzie commented Jun 5, 2025

Uh oh!

frasercrmck commented Jun 5, 2025

Uh oh!

maarquitos14 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

frasercrmck commented Jun 3, 2025 •

edited

Loading

frasercrmck commented Jun 3, 2025 •

edited

Loading