Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The cuda object file compiled by clang cannot be recognized by nvprune. #75147

Open
zq1997 opened this issue Dec 12, 2023 · 1 comment
Open
Labels

Comments

@zq1997
Copy link

zq1997 commented Dec 12, 2023

As the title says, when compiling CUDA source file with Clang, its object file cannot be recognized by nvprune, and the error is: nvprune fatal : Unexpected fatbin data.

// foo.cu
#include <cstdio>

__global__ void foo() {
    printf("CUDA kernel runs successfully.\n");
}

int main() {
    foo<<<1, 1>>>();
    cudaDeviceSynchronize();
    return 0;
}

with nvcc (everything is OK)

nvcc -gencode=arch=compute_70,code=sm_70  -gencode=arch=compute_80,code=sm_80 -c foo.cu
cuobjdump foo.o
nvprune -arch sm_80 foo.o -o foo.stripped.o
cuobjdump foo.stripped.o

with Clang (something wrong)

clang --cuda-gpu-arch=sm_70 --cuda-gpu-arch=sm_80 -c foo.cu
cuobjdump foo.o  # also OK
nvprune -arch sm_80 foo.o -o foo.stripped.o  # nvprune fatal   : Unexpected fatbin data

Operating system: Linux (tried both Centos and Ubuntu)
Software version: CUDA version or Clang version doesn't matter, this is almost always reproducible.

@github-actions github-actions bot added the clang Clang issues not falling into any other category label Dec 12, 2023
@EugeneZelenko EugeneZelenko added cuda and removed clang Clang issues not falling into any other category labels Dec 12, 2023
@Artem-B
Copy link
Member

Artem-B commented Jan 10, 2024

It's hard to tell why nvprune is unhappy, as it's a black box for us. The fatbin is generated using nvidia's own tools, so it's likely that it complains about finding the fatbinary in the object file.

We may want to take a look what NVCC does differently when it embeds GPU binary in a host object and compare it with what clang does. It's possible that things have changed on NVCC side since we've implemented it. How embedding is done is not documented by NVIDIA, so we tend to find out about changes when things break. :-/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants