Open
Description
Describe the bug
When specifying multiple CUDA compute capabilities (e.g. -fsycl-targets=nvidia_gpu_sm_62,nvidia_gpu_sm_61
), the runtime will only ever choose the first one and ignore others.
I believe the relevant code is here, always picking the first llvm_nvptx64
it finds.
The correct behavior would be to choose the latest compute capability included not greater than the one of the device.
So if I compile for sm_60, sm_61, sm_62 in any order and I have a Pascal (sm_61) GPU; it should only try the sm_61 program.
To reproduce
#include <sycl/sycl.hpp>
int main() {
sycl::queue queue{sycl::default_selector_v};
queue.submit([&] (sycl::handler& cgh) {
auto os = sycl::stream{128, 128, cgh};
cgh.single_task<struct hello_world>([=]() {
#ifdef __SYCL_CUDA_ARCH__
os << "CUDA_ARCH " << __SYCL_CUDA_ARCH__ << "\n";
#else
os << "Not CUDA\n";
#endif
});
});
return 0;
}
With a Pascal (sm_61) GPU:
icpx -fsycl -fsycl-targets=nvidia_gpu_sm_60,nvidia_gpu_sm_61,nvidia_gpu_sm_62 main.cpp
printsCUDA_ARCH 600
instead of the correctCUDA_ARCH 610
.icpx -fsycl -fsycl-targets=nvidia_gpu_sm_62,nvidia_gpu_sm_61,nvidia_gpu_sm_60 main.cpp
errors out instead of printingCUDA_ARCH 610
.
Environment
- OS: Linux
- Target device and vendor: NVIDIA GeForce GTX 1070
- DPC++ version: Intel(R) oneAPI DPC++/C++ Compiler 2025.0.4 (2025.0.4.20241205)
- Dependencies version: NVIDIA-SMI 565.57.01, Driver Version: 565.57.01, CUDA Version: 12.7
Additional context
No response