Skip to content

[SYCL][CUDA] performance issue with a SYCL program #16696

Open
@jinz2014

Description

@jinz2014

Hello
There seems a performance gap between the CUDA and SYCL programs on an NVIDIA A100 GPU.
I tried Syclomatic, but the translation was not successful.

https://github.com/zjin-lcf/HeCBench/tree/master/src/scatter-cuda

CUDA (12.5)

./main 10000000 100
INT32 scatter (mul, div, sum, min, max)
Average execution time of kernel: 609.347046 (us)
Average execution time of kernel: 513.615234 (us)
Average execution time of kernel: 224.066589 (us)
Average execution time of kernel: 224.341263 (us)
Average execution time of kernel: 224.259125 (us)

https://github.com/zjin-lcf/HeCBench/tree/master/src/scatter-sycl

SYCL (icpx 2025.0.0)
./main 10000000 100
INT32 scatter (mul, div, sum, min, max)
Average execution time of kernel: 5594.654785 (us)
Average execution time of kernel: 5526.372559 (us)
Average execution time of kernel: 5501.559570 (us)
Average execution time of kernel: 5502.131348 (us)
Average execution time of kernel: 5501.163086 (us)

Metadata

Metadata

Assignees

No one assigned

    Labels

    cudaCUDA back-endperformancePerformance related issues

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions