Closed
Description
Running the example https://github.com/zjin-lcf/HeCBench/blob/master/aop-sycl/main.cpp built with the CUDA support on a P100 GPU
shows warp misaligned address may be caused by the shared local memory "double4 lsums" in the kernel prepare_svd_kernel<256, PayoffPut>. The SYCL program runs successfully on an Intel GPU.
Did you encounter warp misaligned address when porting a CUDA program ?
Running the example built with the HIP support shows the result does not match the HIP/CUDA version:
To reproduce
make HIP=yes
./main
==============
Num Timesteps : 100
Num Paths : 32K
Num Runs : 1
T : 1.000000
S0 : 3.600000
K : 4.000000
r : 0.060000
sigma : 0.200000
Option Type : American Put
==============
GPU Longstaff-Schwartz: 0.39776070 (the expected is 0.44783124)