Skip to content

Device function pointers #2450

Open
Open

Description

Right, so simply put. I want the following code to work:

using CUDA

f(x) = x+1

g(x) = x*2

function call_fxs!(fxs)
    x = 1
    for i = 1:length(fxs)
        x = fxs[1](x)
        @cuprintf("%g\n",x)
    end
end

@cuda threads = 1 call_fxs!((f, g))

This is what the code looks like in CUDA C:

#include <stdio.h>
typedef double (*func)(double x);

__device__ double func1(double x)
{
return x+1.0f;
}

__device__ double func2(double x)
{
return x*2.0f;
}

__device__ func pfunc1 = func1;
__device__ func pfunc2 = func2;

__global__ void test_kernel(func* f, int n)
{
  double x = 1.0;

  for(int i=0;i<n;++i){
   x=f[i](x);
   printf("%g\n",x);
  }
}

int main(void)
{
  int N = 2;

  func* h_f;
  func* d_f;

  h_f = (func*)malloc(N*sizeof(func));

  cudaMalloc((void**)&d_f,N*sizeof(func));

  cudaMemcpyFromSymbol( &h_f[0], pfunc1, sizeof(func));
  cudaMemcpyFromSymbol( &h_f[1], pfunc2, sizeof(func));

  cudaMemcpy(d_f,h_f,N*sizeof(func),cudaMemcpyHostToDevice);

  test_kernel<<<1,1>>>(d_f,N);

  cudaFree(d_f);
  free(h_f);

  return 0;
}

[jars@node0024 ~]$ nvcc check.cu 
[jars@node0024 ~]$ ./a.out 
2
4

I've been banging my head against it for a long time (a few months before this post: leios/Fable.jl#64 (comment))

My current solution involves @generated loops on loops, which ends up generating functions that are quite large and take a significant amount of time (sometimes up to 70 s for a kernel that runs in 0.0001 s). Mentioned here: https://discourse.julialang.org/t/is-there-any-good-way-to-call-functions-from-a-set-of-functions-in-a-cuda-kernel/102051/3?u=leios

Solutions that exist in other languages:

  1. GLSL / OpenCL: The user compiles shaders / kernels at runtime, so they can be spun up in the background relatively quickly. Somehow, this is much faster than doing essentially the same thing in Julia.
  2. CUDA: Just use fx pointers bro (though I did have to do my own AST solve for certain workflows)

I have had this discussion throughout the years with @vchuravy , @jpsamaroo , and @maleadt, but never documented it because I'm apparently the only one actually hitting the issue.

To be honest, I think we are approaching something that might not be fundamentally possible with Julia, but I would like to be able to pass in arbitrary functions to a kernel without forcing recompilation of any kind.

I am not sure if it is best to put this here or in GPUCompiler.

related discussions:

  1. https://forums.developer.nvidia.com/t/consistency-of-functions-pointer/29325/6
  2. [Roadmap Feedback] Function Pointers with some limitations KhronosGroup/Vulkan-Docs#2232
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestupstreamSomebody else's problem.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions