Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA: Incorrect linkage with -fgpu-rdc on kernels created from lambdas inside anonymous namespaces #54560

Closed
mkuron opened this issue Mar 25, 2022 · 26 comments

Comments

@mkuron
Copy link
Contributor

mkuron commented Mar 25, 2022

Summary

In the example below, the PTX assembly generated by Clang declares the kernel as .weak .entry _ZN6thrust8cuda_cub4core13_kernel_agentINS0_14__parallel_for16ParallelForAgentINS0_11__transform17unary_transform_fINS_17counting_iteratorIiNS_11use_defaultES8_S8_EEPiNS5_14no_stencil_tagEZN12_GLOBAL__N_15Thing4calcESA_iNSC_5StuffEEUliE_NS5_21always_true_predicateEEElEESH_lEEvT0_T1_. The same example compiled with NVCC generates .weak .entry _ZN6thrust8cuda_cub4core13_kernel_agentINS0_14__parallel_for16ParallelForAgentINS0_11__transform17unary_transform_fINS_17counting_iteratorIiNS_11use_defaultES8_S8_EEPiNS5_14no_stencil_tagEZN60_GLOBAL__N__36_tmpxft_00006b02_00000000_6_b_cpp1_ii_968400945Thing4calcESA_iNSC_5StuffEEUliE_NS5_21always_true_predicateEEElEESH_lEEvT0_T1_. In both cases, external weak linkage is used, which is not necessary since it is coming from inside a anonymous namespace. With NVCC, this is not a problem because the anonymous namespace is mangled to a unique name (60_GLOBAL__N__36_tmpxft_00006b02_00000000_6_b_cpp1_ii_96840094). With Clang however, the name is not unique (mangled to 12_GLOBAL__N_1). This is a problem when passing the resulting object files to nvlink, which will report nvlink fatal error: Internal error: duplicate parameter bank data not same size or nvlink error: Duplicate weak parameter bank for ... depending on the CUDA version. To me, it seems like internal linkage (.entry) instead of weak linkage (.weak .entry) should be used in this case.

Versions

I reproduced the bug with multiple Clang versions between 12.0.0 and 14.0.0. Before abd8cd9 by @yxsamliu, Clang would generate .visible .entry instead of .weak .entry, which isn't any better and actually causes the example below to fail earlier on Multiple definition of '_ZN6thrust8cuda_cub3cub11EmptyKernelIvEEvv'().

Potential workaround

It seems like the Clang option -funique-internal-linkage-names should be usable as a workaround that forces the symbol names to be unique, however across all Clang versions this just gives me various internal compiler errors. But that's a different issue.

Working example

This example uses Thrust, which makes the symbol names very lengthy, but I am pretty sure the exact same behavior can also be observed by replacing thrust::transform with a hand-written kernel. The lambda capture seems to be important as just putting a kernel into an anonymous namespace is not sufficient to trigger the problem.

a.cu:

#include <thrust/transform.h>
#include <thrust/iterator/counting_iterator.h>
#include <thrust/execution_policy.h>

namespace {
    struct Stuff {
        long c[2];
    };

    struct Thing {
        void calc(int *data, int n, Stuff s) {
            auto f = [s] __device__ (int i) -> int {
                return 2*i;
            };
            
            auto first = thrust::counting_iterator<int>(0);
            auto last = first + n;
            thrust::transform(thrust::device, first, last, data, f);
        }

    };
}

void runA(int * data, int n) {
    Thing t;
    Stuff s({0, 0});
    t.calc(data, n, s);
}

b.cu:

#include <thrust/transform.h>
#include <thrust/iterator/counting_iterator.h>
#include <thrust/execution_policy.h>

namespace {
    struct Stuff {
        long c[4];
    };

    struct Thing {
        void calc(int *data, int n, Stuff s) {
            auto f = [s] __device__ (int i) -> int {
                return 2*i;
            };
            
            auto first = thrust::counting_iterator<int>(0);
            auto last = first + n;
            thrust::transform(thrust::device, first, last, data, f);
        }

    };
}

void runB(int * data, int n) {
    Thing t;
    Stuff s({0, 0, 0, 0});
    t.calc(data, n, s);
}

Compile and link these as follows:

clang++ -O3 -std=c++14 --cuda-gpu-arch=sm_70 -fPIC -c a.cu -fgpu-rdc -o a.o
clang++ -O3 -std=c++14 --cuda-gpu-arch=sm_70 -fPIC -c b.cu -fgpu-rdc -o b.o
nvcc -O3 -Xcompiler -fPIC -gencode=arch=compute_70,code=[sm_70,compute_70] -dlink a.o b.o -o device.o
clang++ -std=c++14 -fPIC -O3 -shared -o ab.so a.o b.o device.o

The second-to-last command will fail with the nvlink error given in the summary.

For comparison, compile and link with NVCC:

nvcc -O3 -std=c++14 --expt-extended-lambda -gencode=arch=compute_70,code=[sm_70,compute_70] -Xcompiler -fPIC -c a.cu -dc -o a.o
nvcc -O3 -std=c++14 --expt-extended-lambda -gencode=arch=compute_70,code=[sm_70,compute_70] -Xcompiler -fPIC -c b.cu -dc -o b.o
nvcc -O3 -Xcompiler -fPIC -gencode=arch=compute_70,code=[sm_70,compute_70] -dlink a.o b.o -o device.o
nvcc -O3 -Xcompiler -fPIC -gencode=arch=compute_70,code=[sm_70,compute_70] -shared a.o b.o device.o -o ab.so

This will succeed.

LLVM IR

Unfortunately, Thrust is currently broken on clang trunk on godbolt.org, so I cannot post a link to the LLVM IR. Running clang locally, I can see that the IR uses define weak_odr void @_ZN6thrust8cuda_cub4core13_kernel_agentINS0_14__parallel_for16ParallelForAgentINS0_11__transform17unary_transform_fINS_17counting_iteratorIiNS_11use_defaultES8_S8_EEPiNS5_14no_stencil_tagEZN12_GLOBAL__N_15Thing4calcESA_iNSC_5StuffEEUliE_NS5_21always_true_predicateEEElEESH_lEEvT0_T1_(%.... My guess is that it should probably use some variation of private or internal instead of weak_odr.

@Artem-B
Copy link
Member

Artem-B commented Mar 25, 2022

It appears that we somehow failed to make the normally invisible symbol unique. That is indeed a bug.

Minimal working example

Reproducer based on thrust is pretty hard to work with. :-(
If you could further reduce it to pure C++ code, that would help a lot.

@mkuron
Copy link
Contributor Author

mkuron commented Mar 25, 2022

It appears that we somehow failed to make the normally invisible symbol unique.

Do we actually have to make it unique? It should suffice to make it invisible.

Reproducer based on thrust is pretty hard to work with. :-(

Indeed. I'm still working on getting rid of Thrust here. Hopefully I can post a more minimal example by early next week.

@Artem-B
Copy link
Member

Artem-B commented Mar 25, 2022

It appears that we somehow failed to make the normally invisible symbol unique.

Do we actually have to make it unique? It should suffice to make it invisible.

Yes. In order to launch the kernel from the host side, CUDA runtime needs to know the name of the kernel in the ELF executable on the GPU side. If the symbol is hidden, the symbol will not be found and the kernel launch will fail.

This is where the way CUDA works conflicts with how C++ is expected to work. We try to work around it by making such symbols unique, which avoids conflicts during linking. But it also poses an issue for multiple template instantiations that should not have unique names because all instances are identical and the linker is supposed to pick only one of them. It's possible that that's the reason why we didn't make the function name unique and the problem is that NVIDIA's nvlink does not know how to deal with template instantiations (or, perhaps, we didn't generate the right PTX directives). Making all template instances unique would probably work, at the expense of bloating GPU executable size with multiple identical instances of the kernels.

We'll see what exactly is going on once we have a smaller reproducer.

@yxsamliu
Copy link
Collaborator

It seems in general cuda-clang works for -fgpu-rdc with identical template kernel instantiation in different TU's.

This could be a corner case where the ISA of template kernel instantiated in different TU's have some difference which triggered the nvlink error. If we can dump the .bc and nvptx file of the TU's and compare these two template kernel intantiation it may help.

@mkuron
Copy link
Contributor Author

mkuron commented Mar 28, 2022

This is where the way CUDA works conflicts with how C++ is expected to work.

Thanks for the detailed explanation. I'm mainly familiar with how these things work in C++, so CUDA continues to puzzle me.

We'll see what exactly is going on once we have a smaller reproducer.

Here's an example without Thrust:

template<typename F>
__global__ void generate(int * first, int * last, F f) {
    if(threadIdx.x < last - first) {
        first[threadIdx.x] = f(threadIdx.x);
    }
}

namespace {
    struct Stuff {
        long c[2];
    };

    struct Thing {
        void calc(int *data, int n, Stuff s) {
            auto f = [s] __device__ (int i) -> int {
                return 2*i;
            };

            generate<<<n/256+1, 256>>>(data, data + n, f);
        }

    };
}

void run(int * data, int n, Stuff s) {
    Thing t;
    t.calc(data, n, s);
}

As seen in https://godbolt.org/z/G1Tfcdn3W, NVCC with -dc generates the unique .weak .entry _Z8generateIZN45_GLOBAL__N__1c7773a2_10_example_cu_72a8c181_85Thing4calcEPiiNS0_5StuffEEUliE_EvS2_S2_T_. Clang with -fgpu-rdc generates the non-unique .weak .entry _Z8generateIZN12_GLOBAL__N_15Thing4calcEPiiNS0_5StuffEEUliE_EvS2_S2_T_, which is the same behavior as before with Thrust and which we have concluded above is a bug.
For comparison, if -dc/-fgpu-rdc are left out, .weak is replaced with .visible by both compilers, while the mangled names remain the same. I don't understand why -dc/-fgpu-rdc even affects this kernel which cannot be invoked from outside this translation unit and thus has no obvious need of being processed by nvlink.

The bug is not actually caused by templates. I can remove the template and directly put the kernel into the anonymous namespace and still get the nvlink error, though the PTX looks a bit different. As can be seen in https://godbolt.org/z/efGh6oa47, NVCC in this case prepends __nv_static to the mangled name and drops the .weak (thus making the kernel visible only to the current translation unit). Except for the __nv_static, this now looks exactly like when NVCC compiles a kernel without -dc, which seems reasonable to me since this kernel cannot be called from the outside -- and it seems like this kernel is now ignored by nvlink. I don't see an obvious reason why the templated case would require a .weak here, so maybe NVCC isn't 100% consistent either. Clang, on the other hand, still generates the same .weak and the same non-unique mangled name as before.

@mkuron
Copy link
Contributor Author

mkuron commented Mar 28, 2022

This could be a corner case where the ISA of template kernel instantiated in different TU's have some difference which triggered the nvlink error.

It's happening because the struct in both cases has the same name (Stuff, which ends up in the mangled name of the kernel), while the size of struct is different (16 byte vs. 32 byte). The size ends up in the function's parameter declaration, and that's probably what nvlink means by "parameter bank".

@Artem-B
Copy link
Member

Artem-B commented Mar 28, 2022

Clang with -fgpu-rdc generates the non-unique .weak .entry

AFAICT clang does the right thing here, but we may still need to 'fix' it.

Let's consider https://godbolt.org/z/efGh6oa47. If the code in the anonymous namespace were to live in a header file included from multiple TUs, would we expect the final GPU executable built with -fgpu-rdc to contain just one instance of f, or would we get a unique instance of f per each TU it was used in? For C++, we'd expect to have a single instance. Each object file would have it exposed as a weak symbol and we would expect users not to violate ODR, so every function with the same name would be expected to be indeed the same function, and linker would just pick one of the instances. NVCC, on the other hand, makes each kernel unique, so we'll have all instances linked in. This is a wrong thing to do, IMO, but probably works OK for kernels as their lifetime is limited by the kernel call itself, so they can't maintain state across calls.

nvlink fatal error: Internal error: duplicate parameter bank data not same size

I suspect this error may be a secondary failure. E.g. the root cause is that nvlink does not ignore the second instance of the weak symbol and fails further down the road when it needs to construct some per-kernel data which it expected to be unique. What's relevant here is that nvlink apparently fails to deal with weak kernels. I do not know whether it's by design or a bug.

Regardless of that, we'll probably need to do what NVCC does and make such kernel symbols unique, too.

I don't understand why -dc/-fgpu-rdc even affects this kernel which cannot be invoked from outside this translation unit and thus has no obvious need of being processed by nvlink.

You do want to launch that kernel from the host and the host needs a visible GPU-side symbol to refer to, even when that symbol is not needed for linking and would normally not even be visible if it were a C++ compilation. While the kernel can not be invoked from another GPU-side TU, it does effectively get invoked from the host side of the same TU and that's what breaks the C++ visibility model. On one hand, we do expect to have the same visibility within TU on both host and device sides (hence host-side kernel stub needs to see GPU-side kernel symbol), but we also need to keep GPU objects linkable and due to the way nvlink apparently works, we do need to make those symbols unique in order to avoid name conflicts.

@mkuron
Copy link
Contributor Author

mkuron commented Mar 29, 2022

If the code in the anonymous namespace were to live in a header file included from multiple TUs, would we expect the final GPU executable built with -fgpu-rdc to contain just one instance of f, or would we get a unique instance of f per each TU it was used in? For C++, we'd expect to have a single instance. [...] NVCC, on the other hand, makes each kernel unique, so we'll have all instances linked in. This is a wrong thing to do, IMO, but probably works OK for kernels as their lifetime is limited by the kernel call itself, so they can't maintain state across calls.

Ah, I forgot about that scenario. So that means that as long as CUDA has no concept of visibility, it is not possible to correctly represent this C++ as PTX. If we want to match NVCC and support my scenario, we need to make kernel names unique. We can keep the .weak for your scenario, but it won't do much because the kernels will have different names between files.

I suspect this error may be a secondary failure. E.g. the root cause is that nvlink does not ignore the second instance of the weak symbol and fails further down the road when it needs to construct some per-kernel data which it expected to be unique. What's relevant here is that nvlink apparently fails to deal with weak kernels. I do not know whether it's by design or a bug.

The weak kernels with the non-unique name are actually different in my example. nvlink however only complains about that because their signatures are also different:

.weak .entry _ZN12_GLOBAL__N_11fEPiiNS_5StuffE(
        .param .u64 _ZN12_GLOBAL__N_11fEPiiNS_5StuffE_param_0,
        .param .u32 _ZN12_GLOBAL__N_11fEPiiNS_5StuffE_param_1,
        .param .align 8 .b8 _ZN12_GLOBAL__N_11fEPiiNS_5StuffE_param_2[16]
)
.weak .entry _ZN12_GLOBAL__N_11fEPiiNS_5StuffE(
        .param .u64 _ZN12_GLOBAL__N_11fEPiiNS_5StuffE_param_0,
        .param .u32 _ZN12_GLOBAL__N_11fEPiiNS_5StuffE_param_1,
        .param .align 8 .b8 _ZN12_GLOBAL__N_11fEPiiNS_5StuffE_param_2[32]
)

If the signature is the same (e.g. if the Stuff struct is the same size and only the function f is different between the two files), nvlink does not complain, even though the generated PTX of the two files now contains two different weak implementations of the same kernel. The host linker now tells me that there are multiple definitions of __fatbinwrap__nv_cbd52c7f2ac2d9f4. I assume that the hexadecimal number is a hash of the kernel name, which would guarantee that a binary that violates the ODR like this can never be linked or launched.

Regardless of that, we'll probably need to do what NVCC does and make such kernel symbols unique, too.

Agreed. Is there anything I can do to help make it happen, or can you take care of it?

@yxsamliu
Copy link
Collaborator

In C++ program, for functions in an anonymous namespace, clang emits the function with internal linkage and a non-unique name. This indicates functions with the same name in the anonymous namespace in different TU's are not treated as one definition.

For CUDA/HIP, we have to make kernel symbols visible to runtime, therefore kernels must have non-internal linkage. To avoid conflict with kernels with the same name in other TU's, we have to make the kernel name unique. We also need to let the host compilation know the unique name used in the device compilation.

This is a similar situation we faced with making static device variables accessible to host compilation. We need to make their name unique and at the same time the unique name needs to be known at host compilation. We introduced CUID and externalized static device variables to solve the issue. It seems we could do that again with kernels in an anonymous namespace.

BTW, we need to externalize device variables in an anonymous namespace too.

@yxsamliu
Copy link
Collaborator

yxsamliu commented Apr 8, 2022

I have a patch for fixing this issue: https://reviews.llvm.org/D123353

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 8, 2022

@llvm/issue-subscribers-clang-codegen

@mkuron
Copy link
Contributor Author

mkuron commented Apr 20, 2022

Sorry for the late reply, I finally got around to testing the patch today. Unfortunately, it does not (fully) solve the issue. @yxsamliu, could you please reopen this ticket?

There are now two problems:

  1. The patch has no effect on the code samples I provided above. They still yield duplicate parameter bank data not same size errors because the name is not made unique. See godbolt for the generated PTX with the non-unique name.
  2. The patch does have the intended effect on a simpler code sample (below, without templates). However, the generated PTX is invalid: .weak .entry _ZN12_GLOBAL__N_11fEPiiNS_5StuffE.anon.97fc0acd7db7d399, which leads to ptxas ... fatal : Parsing error near '.anon': syntax error. As per 5db24d7, dots are not valid in PTX identifiers. See godbolt for the generated PTX containing .anon.
namespace {
    struct Stuff {
        long c[2];
    };

    __global__ void f(int * data, int n, Stuff s) {
        if(threadIdx.x < n) {
            data[threadIdx.x] = 2*threadIdx.x;
        }
    }

    void calc(int *data, int n, Stuff s) {
        f<<<n/256+1, 256>>>(data, n, s);
    }
}

void run(int * data, int n, Stuff s) {
    calc(data, n, s);
}

@yxsamliu
Copy link
Collaborator

The second issue is easy fix.

For the first issue, it seems a template instantiation with template argument type in anonymous name space will have internal linkage in C++, even though the template itself is not in anonymous name space. This is in contrary with ordinary template instantiation, which results in linkonce_odr linkage (https://godbolt.org/z/q765ehqxv). Therefore template instantiation with argument type in anonymous namespace does not follow ODR even if the type name is the same. Therefore we are justifiable to make them unique in CUDA/HIP. I will fix it.

@yxsamliu
Copy link
Collaborator

Fixed in https://reviews.llvm.org/D124189

@mkuron
Copy link
Contributor Author

mkuron commented Apr 22, 2022

@yxsamliu, thanks for taking care of this so quickly. I have confirmed that your latest patch resolves the issue with the above pieces of sample code and also with the larger codebase that initially led me to discover the issue.

I hope you can merge this bugfix to master soon. What would it take to get it backported to the 14.x branch as well?

yxsamliu added a commit to yxsamliu/llvm-project that referenced this issue Apr 22, 2022
kernels in anonymous name space needs to have unique name
to avoid duplicate symbols.

Fixes: llvm#54560

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D123353
yxsamliu added a commit that referenced this issue Apr 22, 2022
This patch is a continuation of https://reviews.llvm.org/D123353.

Not only kernels in anonymous namespace, but also template
kernels with template arguments in anonymous namespace
need to be externalized.

To be more generic, this patch checks the linkage of a kernel
assuming the kernel does not have __global__ attribute. If
the linkage is internal then clang will externalize it.

This patch also fixes the postfix for externalized symbol
since nvptx does not allow '.' in symbol name.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D124189

Fixes: #54560
@yxsamliu
Copy link
Collaborator

@tstellar tstellar added this to the LLVM 14.0.2 Release milestone Apr 22, 2022
@tstellar
Copy link
Collaborator

/cherry-pick 4ea1d43 04fb816

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 22, 2022

Failed to cherry-pick: 04fb816

https://github.com/llvm/llvm-project/actions/runs/2210241971

Please manually backport the fix and push it to your github fork. Once this is done, please add a comment like this:

/branch <user>/<repo>/<branch>

@mkuron
Copy link
Contributor Author

mkuron commented Apr 26, 2022

/branch mkuron/llvm-project/issue54560

@mkuron
Copy link
Contributor Author

mkuron commented May 24, 2022

@tstellar, please use llvmbot#185 instead of llvmbot#187. On the former, I‘ve fixed the ABI incompatibility.

tstellar pushed a commit to llvmbot/llvm-project that referenced this issue May 24, 2022
kernels in anonymous name space needs to have unique name
to avoid duplicate symbols.

Fixes: llvm#54560

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D123353

(cherry picked from commit 4ea1d43)
tstellar pushed a commit to llvmbot/llvm-project that referenced this issue May 24, 2022
This patch is a continuation of https://reviews.llvm.org/D123353.

Not only kernels in anonymous namespace, but also template
kernels with template arguments in anonymous namespace
need to be externalized.

To be more generic, this patch checks the linkage of a kernel
assuming the kernel does not have __global__ attribute. If
the linkage is internal then clang will externalize it.

This patch also fixes the postfix for externalized symbol
since nvptx does not allow '.' in symbol name.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D124189

Fixes: llvm#54560
(cherry picked from commit 04fb816)
@tstellar
Copy link
Collaborator

tstellar commented Jun 3, 2022

Merged: 29f1039

searlmc1 pushed a commit to ROCm/llvm-project that referenced this issue Sep 30, 2022
kernels in anonymous name space needs to have unique name
to avoid duplicate symbols.

Fixes: llvm#54560

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D123353

[CUDA][HIP] Externalize kernels with internal linkage

This patch is a continuation of https://reviews.llvm.org/D123353.

Not only kernels in anonymous namespace, but also template
kernels with template arguments in anonymous namespace
need to be externalized.

To be more generic, this patch checks the linkage of a kernel
assuming the kernel does not have __global__ attribute. If
the linkage is internal then clang will externalize it.

This patch also fixes the postfix for externalized symbol
since nvptx does not allow '.' in symbol name.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D124189

Fixes: llvm#54560

Fixes: SWDEV-335985
Change-Id: I97fae99bdc8b2b3eeb57e789aedcfe3bc8610706
mem-frob pushed a commit to draperlaboratory/hope-llvm-project that referenced this issue Oct 7, 2022
kernels in anonymous name space needs to have unique name
to avoid duplicate symbols.

Fixes: llvm/llvm-project#54560

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D123353
mem-frob pushed a commit to draperlaboratory/hope-llvm-project that referenced this issue Oct 7, 2022
This patch is a continuation of https://reviews.llvm.org/D123353.

Not only kernels in anonymous namespace, but also template
kernels with template arguments in anonymous namespace
need to be externalized.

To be more generic, this patch checks the linkage of a kernel
assuming the kernel does not have __global__ attribute. If
the linkage is internal then clang will externalize it.

This patch also fixes the postfix for externalized symbol
since nvptx does not allow '.' in symbol name.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D124189

Fixes: llvm/llvm-project#54560
tarunprabhu pushed a commit to tarunprabhu/kitsune that referenced this issue Apr 4, 2023
kernels in anonymous name space needs to have unique name
to avoid duplicate symbols.

Fixes: llvm#54560

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D123353

(cherry picked from commit 4ea1d43)
tarunprabhu pushed a commit to tarunprabhu/kitsune that referenced this issue Apr 4, 2023
This patch is a continuation of https://reviews.llvm.org/D123353.

Not only kernels in anonymous namespace, but also template
kernels with template arguments in anonymous namespace
need to be externalized.

To be more generic, this patch checks the linkage of a kernel
assuming the kernel does not have __global__ attribute. If
the linkage is internal then clang will externalize it.

This patch also fixes the postfix for externalized symbol
since nvptx does not allow '.' in symbol name.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D124189

Fixes: llvm#54560
(cherry picked from commit 04fb816)
tarunprabhu pushed a commit to tarunprabhu/kitsune that referenced this issue Sep 26, 2023
kernels in anonymous name space needs to have unique name
to avoid duplicate symbols.

Fixes: llvm#54560

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D123353

(cherry picked from commit 4ea1d43)
tarunprabhu pushed a commit to tarunprabhu/kitsune that referenced this issue Sep 26, 2023
This patch is a continuation of https://reviews.llvm.org/D123353.

Not only kernels in anonymous namespace, but also template
kernels with template arguments in anonymous namespace
need to be externalized.

To be more generic, this patch checks the linkage of a kernel
assuming the kernel does not have __global__ attribute. If
the linkage is internal then clang will externalize it.

This patch also fixes the postfix for externalized symbol
since nvptx does not allow '.' in symbol name.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D124189

Fixes: llvm#54560
(cherry picked from commit 04fb816)
tarunprabhu pushed a commit to tarunprabhu/kitsune that referenced this issue Oct 12, 2023
kernels in anonymous name space needs to have unique name
to avoid duplicate symbols.

Fixes: llvm#54560

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D123353

(cherry picked from commit 4ea1d43)
tarunprabhu pushed a commit to tarunprabhu/kitsune that referenced this issue Oct 12, 2023
This patch is a continuation of https://reviews.llvm.org/D123353.

Not only kernels in anonymous namespace, but also template
kernels with template arguments in anonymous namespace
need to be externalized.

To be more generic, this patch checks the linkage of a kernel
assuming the kernel does not have __global__ attribute. If
the linkage is internal then clang will externalize it.

This patch also fixes the postfix for externalized symbol
since nvptx does not allow '.' in symbol name.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D124189

Fixes: llvm#54560
(cherry picked from commit 04fb816)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

6 participants