Skip to content

Missing warning with clang-llvm compiler #8836

Open
@vidyalatha-badde

Description

@vidyalatha-badde

Hi all,

I'm working on the migration of cuda-sample from Nvidia cuda toolkit and running the migrated code on Nvidia GPU (Tesla P100-PCIE-12GB) using open source DPC++ compiler.
I've successfully migrated the convolutionSeparable application and am able to run the code on Nvidia GPU. Since there is some performance gap I started experimenting with the code and as part of the trails I observed that nvcc throws a warning if we set a variable and never use it while the open-source DPC++ compiler (invocation - clang++) does not issue any such warnings.
Please find the attached migrated code below
dpct_output.zip

Steps to reproduce:
Compilation:
clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda -I ../../../Common/ *.cpp
Execution:
./a.out

Basically, there are 2 parts in the convolutionSeparable.dp.cpp file in which one is loading the input data and the other is computation. In order to understand where the migrated code spends a lot of time I just commented out the computation part (commented out the computation part in both rowkernelgpu and columnkernelgpu functions) and tried running the code and here is the result

./a.out] - Starting...
Running on Tesla P100-PCIE-12GB
Image Width x Height = 3072 x 3072

Allocating and initializing host arrays...
Allocating and initializing CUDA arrays...
Running GPU convolution (16 identical iterations)...

convolutionSeparable, Throughput = 53907.5108 MPixels/sec, **Time = 0.00018 s**, Size = 9437184 Pixels, NumDevsUsed = 1, Workgroup = 0

Reading back GPU results...
Checking the results...
Shutting down...

While repeating the same experiment with native cuda code it throws some warnings as shown below
warning #550-D: variable "s_Data" was set but never used

with output

[./convolutionSeparable] - Starting...
GPU Device 0: "Pascal" with compute capability 6.0

Image Width x Height = 3072 x 3072
Allocating and initializing host arrays...
Allocating and initializing CUDA arrays...
Running GPU convolution (16 identical iterations)...

convolutionSeparable, Throughput = 127100.1251 MPixels/sec, **Time = 0.00007 s**, Size = 9437184 Pixels, NumDevsUsed = 1, Workgroup = 0

Reading back GPU results...
Shutting down...

As you can see the timings are nearly 2.5X times that of the cuda's timing.

Could someone let me know why there is a time gap? Is there any significant meaning behind the nvcc warning?
what exactly does that mean and why llvm clang miss that warning?

Please let me know if you need any other information.

Thanks in advance
~Vidya.

Metadata

Metadata

Assignees

No one assigned

    Labels

    cudaCUDA back-end

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions