Closed
Description
Guilty commit (reverting it fixes the problem):
commit 714642633b59bf3642438ec910e39493fefae750
Author: Ruyman <ruyman@codeplay.com>
Date: Wed May 27 10:48:28 2020 +0100
[SYCL][CUDA] Improvements to CUDA device selection (#1689)
* Prevents NVIDIA OpenCL platform to be selected by a SYCL application
* NVIDIA OpenCL is not reported as a valid GPU platform for LIT testing
* Introduces device selection logic to reject devices
* Changes name of NVIDIA CUDA Backend to differentiate from OpenCL
* Provides better error message when SPIRV is passed to CUDA backend
* Using backend types to check for CUDA backend instead of strings
Signed-off-by: Ruyman Reyes <ruyman@codeplay.com>
Co-authored-by: Alexey Bader <alexey.bader@intel.com>
To reproduce build the latest sycl compiler with cuda plugin and run PiCudaTests from the build dir:
$ build/tools/sycl/unittests/pi/cuda/PiCudaTests
...
[----------] 5 tests from OnCudaPlatform/CudaInteropGetNativeTests
[ RUN ] OnCudaPlatform/CudaInteropGetNativeTests.getNativeDevice/0
[ OK ] OnCudaPlatform/CudaInteropGetNativeTests.getNativeDevice/0 (225 ms)
[ RUN ] OnCudaPlatform/CudaInteropGetNativeTests.getNativeContext/0
./PiCudaTests(+0x10576a)[0x55b91836876a]
./PiCudaTests(+0x103984)[0x55b918366984]
./PiCudaTests(+0x103ad3)[0x55b918366ad3]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f6137ce2890]
/usr/lib/x86_64-linux-gnu/libcuda.so.1(+0x207fbf)[0x7f6136511fbf]
/usr/lib/x86_64-linux-gnu/libcuda.so.1(+0xf41b1)[0x7f61363fe1b1]
/usr/lib/x86_64-linux-gnu/libcuda.so.1(cuCtxSetCurrent+0x180)[0x7f6136563720]
/localdisk2/ws/againull/clean_sycl/llvm/build/lib/libpi_cuda.so(+0x71fed)[0x7f6134f1ffed]
/localdisk2/ws/againull/clean_sycl/llvm/build/lib/libpi_cuda.so(cuda_piQueueCreate+0x83)[0x7f6134f243c1]
./PiCudaTests(+0x99211)[0x55b9182fc211]
./PiCudaTests(+0xc6058)[0x55b918329058]
./PiCudaTests(+0xc6399)[0x55b918329399]
./PiCudaTests(+0x136b02)[0x55b918399b02]
./PiCudaTests(+0x136c45)[0x55b918399c45]
./PiCudaTests(+0x138268)[0x55b91839b268]
./PiCudaTests(+0x1383a5)[0x55b91839b3a5]
./PiCudaTests(+0x1c037)[0x55b91827f037]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f6135813b97]
./PiCudaTests(+0x1cb2a)[0x55b91827fb2a]
Segmentation fault (core dumped)
(gdb) bt
#0 0x00007ffff63f7fbf in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#1 0x00007ffff62e41b1 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2 0x00007ffff6449720 in cuCtxSetCurrent () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3 0x00007ffff4e05fed in (anonymous namespace)::ScopedContext::ScopedContext (this=0x7fffffffe4d0, ctxt=0x55555c0c2b20) at /localdisk2/ws/againull/clean_sycl/llvm/sycl/plugins/cuda/pi_cuda.cpp:151
#4 0x00007ffff4e0a3c1 in cuda_piQueueCreate (context=0x55555c0c2b20, device=0x555555719ff0, properties=1, queue=0x7fffffffe570) at /localdisk2/ws/againull/clean_sycl/llvm/sycl/plugins/cuda/pi_cuda.cpp:1710
#5 0x00005555555ed211 in cl::sycl::queue::queue(cl::sycl::device const&, std::function<void (cl::sycl::exception_list)> const&, cl::sycl::property_list const&) ()
#6 0x000055555561a058 in CudaInteropGetNativeTests::CudaInteropGetNativeTests() ()
#7 0x000055555561a399 in testing::internal::ParameterizedTestFactory<CudaInteropGetNativeTests_getNativeContext_Test>::CreateTest() ()
#8 0x000055555568ab02 in testing::TestInfo::Run() ()
#9 0x000055555568ac45 in testing::TestCase::Run() ()
#10 0x000055555568c268 in testing::internal::UnitTestImpl::RunAllTests() ()
#11 0x000055555568c3a5 in testing::UnitTest::Run() ()
#12 0x0000555555570037 in main ()
Platform info:
Platforms : 1
Platform [#1] :
Profile : FULL_PROFILE
Version : OpenCL 1.2 CUDA 10.1.236
Name : NVIDIA CUDA
Vendor : NVIDIA Corporation
Devices : 1
Device [#1] :
Type : GPU
Profile : FULL_PROFILE
Version : OpenCL 1.2 CUDA
Name : TITAN RTX
Vendor : NVIDIA Corporation
C version : OpenCL C 1.2
Driver version : 418.87.00
On the following platform test doesn't fail:
Platform [#2] :
Profile : FULL_PROFILE
Version : OpenCL 1.2 CUDA 10.2.95
Name : NVIDIA CUDA
Vendor : NVIDIA Corporation
Devices : 1
Device [#1] :
Type : GPU
Profile : FULL_PROFILE
Version : OpenCL 1.2 CUDA
Name : GeForce GTX 1060 6GB
Vendor : NVIDIA Corporation
C version : OpenCL C 1.2
Driver version : 440.33.01