Description
I'm trying to understand if the CUDA backend supports USM. I'm using the USM example at the following site:
https://github.com/intel/BaseKit-code-samples/tree/master/DPC%2B%2BCompiler/vector-add
The first issue is this example uses unnamed lambdas, which isn't recognized. That's an easy enough fix. But then I get a floating point exception upon execution of the USM version. Should this work? Or does the CUDA backend not support USM yet? Is there a compiler switch I'm missing? The regular SYCL buffers version works fine.
cgpu07:vector-add$ clang++ --version
clang version 12.0.0 (https://github.com/intel/llvm b7ae462)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /global/project/projectdirs/mpccc/dwdoerf/cori-gpu/llvm-sycl/build-20200721/install/bin
cgpu07:vector-add$ make build_usm
clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda-sycldevice -O2 -g -std=c++17 -o vector-add-usm src/vector-add-usm.cpp
cgpu07:vector-add$ srun vector-add-usm
srun: error: cgpu07: task 0: Floating point exception
srun: Terminating job step 835352.13
cgpu07:vector-add$ make build_buffers
clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda-sycldevice -O2 -g -std=c++17 -o vector-add-buffers src/vector-add-buffers.cpp
cgpu07:vector-add$ srun vector-add-buffers
Running on device: Tesla V100-SXM2-16GB: Driver CUDA 10.2
Vector size: 10000
[0]: 0 + 0 = 0
[1]: 1 + 1 = 2
[2]: 2 + 2 = 4
...
[9999]: 9999 + 9999 = 19998
Vector add successfully completed on device.
cgpu07:vector-add$