Skip to content

Support multiple CUDA compute capabilities #16424

Open
@kiwixz

Description

@kiwixz

Describe the bug

When specifying multiple CUDA compute capabilities (e.g. -fsycl-targets=nvidia_gpu_sm_62,nvidia_gpu_sm_61), the runtime will only ever choose the first one and ignore others.

I believe the relevant code is here, always picking the first llvm_nvptx64 it finds.

The correct behavior would be to choose the latest compute capability included not greater than the one of the device.
So if I compile for sm_60, sm_61, sm_62 in any order and I have a Pascal (sm_61) GPU; it should only try the sm_61 program.

To reproduce

#include <sycl/sycl.hpp>

int main() {
    sycl::queue queue{sycl::default_selector_v};
    queue.submit([&] (sycl::handler& cgh) {
        auto os = sycl::stream{128, 128, cgh};
        cgh.single_task<struct hello_world>([=]() {
#ifdef __SYCL_CUDA_ARCH__
            os << "CUDA_ARCH " << __SYCL_CUDA_ARCH__ << "\n";
#else
            os << "Not CUDA\n";
#endif
        });
    });

    return 0;
}

With a Pascal (sm_61) GPU:

  • icpx -fsycl -fsycl-targets=nvidia_gpu_sm_60,nvidia_gpu_sm_61,nvidia_gpu_sm_62 main.cpp prints CUDA_ARCH 600 instead of the correct CUDA_ARCH 610.
  • icpx -fsycl -fsycl-targets=nvidia_gpu_sm_62,nvidia_gpu_sm_61,nvidia_gpu_sm_60 main.cpp errors out instead of printing CUDA_ARCH 610.

Environment

  • OS: Linux
  • Target device and vendor: NVIDIA GeForce GTX 1070
  • DPC++ version: Intel(R) oneAPI DPC++/C++ Compiler 2025.0.4 (2025.0.4.20241205)
  • Dependencies version: NVIDIA-SMI 565.57.01, Driver Version: 565.57.01, CUDA Version: 12.7

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingcudaCUDA back-end

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions