[SYCL][CUDA][HIP] warp misaligned address on CUDA and results mismatch on HIP

Running the example https://github.com/zjin-lcf/HeCBench/blob/master/aop-sycl/main.cpp  built with the CUDA support on a P100 GPU
shows warp misaligned address may be caused by the shared local memory "double4 lsums" in the kernel prepare_svd_kernel<256, PayoffPut>.   The SYCL program runs successfully on an Intel GPU. 

Did you encounter warp misaligned address when porting a CUDA program ? 

Running the example built with the HIP support shows the result does not match the HIP/CUDA version:
To reproduce
```
make HIP=yes
./main

==============
Num Timesteps         : 100
Num Paths             : 32K
Num Runs              : 1
T                     : 1.000000
S0                    : 3.600000
K                     : 4.000000
r                     : 0.060000
sigma                 : 0.200000
Option Type           : American Put
==============
GPU Longstaff-Schwartz: 0.39776070   (the expected is 0.44783124)
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL][CUDA][HIP] warp misaligned address on CUDA and results mismatch on HIP #5007

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[SYCL][CUDA][HIP] warp misaligned address on CUDA and results mismatch on HIP #5007

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions