Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libc] GPU RPC interface: add return value to rpc_host_call #111288

Merged
merged 3 commits into from
Oct 6, 2024

Conversation

Hardcode84
Copy link
Contributor

No description provided.

Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine to me, long term I'm going to remove this function and header and allow the user to define their own, but short-term it should be usable.

We should probably use llu however.

@Hardcode84 Hardcode84 marked this pull request as ready for review October 6, 2024 17:06
@llvmbot
Copy link
Collaborator

llvmbot commented Oct 6, 2024

@llvm/pr-subscribers-libc

Author: Ivan Butygin (Hardcode84)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/111288.diff

6 Files Affected:

  • (modified) libc/newhdrgen/yaml/gpu/rpc.yaml (+1-1)
  • (modified) libc/spec/gpu_ext.td (+1-1)
  • (modified) libc/src/gpu/rpc_host_call.cpp (+7-2)
  • (modified) libc/src/gpu/rpc_host_call.h (+1-1)
  • (modified) libc/utils/gpu/server/rpc_server.cpp (+7-2)
  • (modified) offload/test/libc/host_call.c (+15-5)
diff --git a/libc/newhdrgen/yaml/gpu/rpc.yaml b/libc/newhdrgen/yaml/gpu/rpc.yaml
index 61856bc0c7d692..da4f6afb7856d2 100644
--- a/libc/newhdrgen/yaml/gpu/rpc.yaml
+++ b/libc/newhdrgen/yaml/gpu/rpc.yaml
@@ -16,7 +16,7 @@ functions:
   - name: rpc_host_call
     standards:
       - GPUExtensions
-    return_type: void
+    return_type: unsigned long long
     arguments:
       - type: void *
       - type: void *
diff --git a/libc/spec/gpu_ext.td b/libc/spec/gpu_ext.td
index dce81ff7786203..d99531dc06bcd6 100644
--- a/libc/spec/gpu_ext.td
+++ b/libc/spec/gpu_ext.td
@@ -7,7 +7,7 @@ def GPUExtensions : StandardSpec<"GPUExtensions"> {
     [
         FunctionSpec<
             "rpc_host_call",
-            RetValSpec<VoidType>,
+            RetValSpec<UnsignedLongLongType>,
             [ArgSpec<VoidPtr>, ArgSpec<VoidPtr>, ArgSpec<SizeTType>]
         >,
     ]
diff --git a/libc/src/gpu/rpc_host_call.cpp b/libc/src/gpu/rpc_host_call.cpp
index ca2e331340a6cb..f21fadc319c615 100644
--- a/libc/src/gpu/rpc_host_call.cpp
+++ b/libc/src/gpu/rpc_host_call.cpp
@@ -17,14 +17,19 @@ namespace LIBC_NAMESPACE_DECL {
 
 // This calls the associated function pointer on the RPC server with the given
 // arguments. We expect that the pointer here is a valid pointer on the server.
-LLVM_LIBC_FUNCTION(void, rpc_host_call, (void *fn, void *data, size_t size)) {
+LLVM_LIBC_FUNCTION(unsigned long long, rpc_host_call,
+                   (void *fn, void *data, size_t size)) {
   rpc::Client::Port port = rpc::client.open<RPC_HOST_CALL>();
   port.send_n(data, size);
   port.send([=](rpc::Buffer *buffer) {
     buffer->data[0] = reinterpret_cast<uintptr_t>(fn);
   });
-  port.recv([](rpc::Buffer *) {});
+  unsigned long long ret;
+  port.recv([&](rpc::Buffer *buffer) {
+    ret = static_cast<unsigned long long>(buffer->data[0]);
+  });
   port.close();
+  return ret;
 }
 
 } // namespace LIBC_NAMESPACE_DECL
diff --git a/libc/src/gpu/rpc_host_call.h b/libc/src/gpu/rpc_host_call.h
index 7cfea757ccdfd1..861149dead561e 100644
--- a/libc/src/gpu/rpc_host_call.h
+++ b/libc/src/gpu/rpc_host_call.h
@@ -14,7 +14,7 @@
 
 namespace LIBC_NAMESPACE_DECL {
 
-void rpc_host_call(void *fn, void *buffer, size_t size);
+unsigned long long rpc_host_call(void *fn, void *buffer, size_t size);
 
 } // namespace LIBC_NAMESPACE_DECL
 
diff --git a/libc/utils/gpu/server/rpc_server.cpp b/libc/utils/gpu/server/rpc_server.cpp
index 6951c5ae147df7..ca10e67509ae63 100644
--- a/libc/utils/gpu/server/rpc_server.cpp
+++ b/libc/utils/gpu/server/rpc_server.cpp
@@ -319,13 +319,18 @@ rpc_status_t handle_server_impl(
   }
   case RPC_HOST_CALL: {
     uint64_t sizes[lane_size] = {0};
+    unsigned long long results[lane_size] = {0};
     void *args[lane_size] = {nullptr};
     port->recv_n(args, sizes,
                  [&](uint64_t size) { return temp_storage.alloc(size); });
     port->recv([&](rpc::Buffer *buffer, uint32_t id) {
-      reinterpret_cast<void (*)(void *)>(buffer->data[0])(args[id]);
+      using func_ptr_t = unsigned long long (*)(void *);
+      auto func = reinterpret_cast<func_ptr_t>(buffer->data[0]);
+      results[id] = func(args[id]);
+    });
+    port->send([&](rpc::Buffer *buffer, uint32_t id) {
+      buffer->data[0] = static_cast<uint64_t>(results[id]);
     });
-    port->send([&](rpc::Buffer *, uint32_t id) {});
     break;
   }
   case RPC_FEOF: {
diff --git a/offload/test/libc/host_call.c b/offload/test/libc/host_call.c
index 11260cc285765d..61c4e14d5b3881 100644
--- a/offload/test/libc/host_call.c
+++ b/offload/test/libc/host_call.c
@@ -8,14 +8,14 @@
 
 #pragma omp begin declare variant match(device = {kind(gpu)})
 // Extension provided by the 'libc' project.
-void rpc_host_call(void *fn, void *args, size_t size);
+unsigned long long rpc_host_call(void *fn, void *args, size_t size);
 #pragma omp declare target to(rpc_host_call) device_type(nohost)
 #pragma omp end declare variant
 
 #pragma omp begin declare variant match(device = {kind(cpu)})
 // Dummy host implementation to make this work for all targets.
-void rpc_host_call(void *fn, void *args, size_t size) {
-  ((void (*)(void *))fn)(args);
+unsigned long long rpc_host_call(void *fn, void *args, size_t size) {
+  return ((unsigned long long (*)(void *))fn)(args);
 }
 #pragma omp end declare variant
 
@@ -25,17 +25,26 @@ typedef struct args_s {
 } args_t;
 
 // CHECK-DAG: Thread: 0, Block: 0
+// CHECK-DAG: Result: 42
 // CHECK-DAG: Thread: 1, Block: 0
+// CHECK-DAG: Result: 42
 // CHECK-DAG: Thread: 0, Block: 1
+// CHECK-DAG: Result: 42
 // CHECK-DAG: Thread: 1, Block: 1
+// CHECK-DAG: Result: 42
 // CHECK-DAG: Thread: 0, Block: 2
+// CHECK-DAG: Result: 42
 // CHECK-DAG: Thread: 1, Block: 2
+// CHECK-DAG: Result: 42
 // CHECK-DAG: Thread: 0, Block: 3
+// CHECK-DAG: Result: 42
 // CHECK-DAG: Thread: 1, Block: 3
-void foo(void *data) {
+// CHECK-DAG: Result: 42
+long long foo(void *data) {
   assert(omp_is_initial_device() && "Not executing on host?");
   args_t *args = (args_t *)data;
   printf("Thread: %d, Block: %d\n", args->thread_id, args->block_id);
+  return 42;
 }
 
 void *fn_ptr = NULL;
@@ -49,6 +58,7 @@ int main() {
 #pragma omp parallel num_threads(2)
   {
     args_t args = {omp_get_thread_num(), omp_get_team_num()};
-    rpc_host_call(fn_ptr, &args, sizeof(args_t));
+    unsigned long long res = rpc_host_call(fn_ptr, &args, sizeof(args_t));
+    printf("Result: %d\n", (int)res);
   }
 }

@llvmbot
Copy link
Collaborator

llvmbot commented Oct 6, 2024

@llvm/pr-subscribers-offload

Author: Ivan Butygin (Hardcode84)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/111288.diff

6 Files Affected:

  • (modified) libc/newhdrgen/yaml/gpu/rpc.yaml (+1-1)
  • (modified) libc/spec/gpu_ext.td (+1-1)
  • (modified) libc/src/gpu/rpc_host_call.cpp (+7-2)
  • (modified) libc/src/gpu/rpc_host_call.h (+1-1)
  • (modified) libc/utils/gpu/server/rpc_server.cpp (+7-2)
  • (modified) offload/test/libc/host_call.c (+15-5)
diff --git a/libc/newhdrgen/yaml/gpu/rpc.yaml b/libc/newhdrgen/yaml/gpu/rpc.yaml
index 61856bc0c7d692..da4f6afb7856d2 100644
--- a/libc/newhdrgen/yaml/gpu/rpc.yaml
+++ b/libc/newhdrgen/yaml/gpu/rpc.yaml
@@ -16,7 +16,7 @@ functions:
   - name: rpc_host_call
     standards:
       - GPUExtensions
-    return_type: void
+    return_type: unsigned long long
     arguments:
       - type: void *
       - type: void *
diff --git a/libc/spec/gpu_ext.td b/libc/spec/gpu_ext.td
index dce81ff7786203..d99531dc06bcd6 100644
--- a/libc/spec/gpu_ext.td
+++ b/libc/spec/gpu_ext.td
@@ -7,7 +7,7 @@ def GPUExtensions : StandardSpec<"GPUExtensions"> {
     [
         FunctionSpec<
             "rpc_host_call",
-            RetValSpec<VoidType>,
+            RetValSpec<UnsignedLongLongType>,
             [ArgSpec<VoidPtr>, ArgSpec<VoidPtr>, ArgSpec<SizeTType>]
         >,
     ]
diff --git a/libc/src/gpu/rpc_host_call.cpp b/libc/src/gpu/rpc_host_call.cpp
index ca2e331340a6cb..f21fadc319c615 100644
--- a/libc/src/gpu/rpc_host_call.cpp
+++ b/libc/src/gpu/rpc_host_call.cpp
@@ -17,14 +17,19 @@ namespace LIBC_NAMESPACE_DECL {
 
 // This calls the associated function pointer on the RPC server with the given
 // arguments. We expect that the pointer here is a valid pointer on the server.
-LLVM_LIBC_FUNCTION(void, rpc_host_call, (void *fn, void *data, size_t size)) {
+LLVM_LIBC_FUNCTION(unsigned long long, rpc_host_call,
+                   (void *fn, void *data, size_t size)) {
   rpc::Client::Port port = rpc::client.open<RPC_HOST_CALL>();
   port.send_n(data, size);
   port.send([=](rpc::Buffer *buffer) {
     buffer->data[0] = reinterpret_cast<uintptr_t>(fn);
   });
-  port.recv([](rpc::Buffer *) {});
+  unsigned long long ret;
+  port.recv([&](rpc::Buffer *buffer) {
+    ret = static_cast<unsigned long long>(buffer->data[0]);
+  });
   port.close();
+  return ret;
 }
 
 } // namespace LIBC_NAMESPACE_DECL
diff --git a/libc/src/gpu/rpc_host_call.h b/libc/src/gpu/rpc_host_call.h
index 7cfea757ccdfd1..861149dead561e 100644
--- a/libc/src/gpu/rpc_host_call.h
+++ b/libc/src/gpu/rpc_host_call.h
@@ -14,7 +14,7 @@
 
 namespace LIBC_NAMESPACE_DECL {
 
-void rpc_host_call(void *fn, void *buffer, size_t size);
+unsigned long long rpc_host_call(void *fn, void *buffer, size_t size);
 
 } // namespace LIBC_NAMESPACE_DECL
 
diff --git a/libc/utils/gpu/server/rpc_server.cpp b/libc/utils/gpu/server/rpc_server.cpp
index 6951c5ae147df7..ca10e67509ae63 100644
--- a/libc/utils/gpu/server/rpc_server.cpp
+++ b/libc/utils/gpu/server/rpc_server.cpp
@@ -319,13 +319,18 @@ rpc_status_t handle_server_impl(
   }
   case RPC_HOST_CALL: {
     uint64_t sizes[lane_size] = {0};
+    unsigned long long results[lane_size] = {0};
     void *args[lane_size] = {nullptr};
     port->recv_n(args, sizes,
                  [&](uint64_t size) { return temp_storage.alloc(size); });
     port->recv([&](rpc::Buffer *buffer, uint32_t id) {
-      reinterpret_cast<void (*)(void *)>(buffer->data[0])(args[id]);
+      using func_ptr_t = unsigned long long (*)(void *);
+      auto func = reinterpret_cast<func_ptr_t>(buffer->data[0]);
+      results[id] = func(args[id]);
+    });
+    port->send([&](rpc::Buffer *buffer, uint32_t id) {
+      buffer->data[0] = static_cast<uint64_t>(results[id]);
     });
-    port->send([&](rpc::Buffer *, uint32_t id) {});
     break;
   }
   case RPC_FEOF: {
diff --git a/offload/test/libc/host_call.c b/offload/test/libc/host_call.c
index 11260cc285765d..61c4e14d5b3881 100644
--- a/offload/test/libc/host_call.c
+++ b/offload/test/libc/host_call.c
@@ -8,14 +8,14 @@
 
 #pragma omp begin declare variant match(device = {kind(gpu)})
 // Extension provided by the 'libc' project.
-void rpc_host_call(void *fn, void *args, size_t size);
+unsigned long long rpc_host_call(void *fn, void *args, size_t size);
 #pragma omp declare target to(rpc_host_call) device_type(nohost)
 #pragma omp end declare variant
 
 #pragma omp begin declare variant match(device = {kind(cpu)})
 // Dummy host implementation to make this work for all targets.
-void rpc_host_call(void *fn, void *args, size_t size) {
-  ((void (*)(void *))fn)(args);
+unsigned long long rpc_host_call(void *fn, void *args, size_t size) {
+  return ((unsigned long long (*)(void *))fn)(args);
 }
 #pragma omp end declare variant
 
@@ -25,17 +25,26 @@ typedef struct args_s {
 } args_t;
 
 // CHECK-DAG: Thread: 0, Block: 0
+// CHECK-DAG: Result: 42
 // CHECK-DAG: Thread: 1, Block: 0
+// CHECK-DAG: Result: 42
 // CHECK-DAG: Thread: 0, Block: 1
+// CHECK-DAG: Result: 42
 // CHECK-DAG: Thread: 1, Block: 1
+// CHECK-DAG: Result: 42
 // CHECK-DAG: Thread: 0, Block: 2
+// CHECK-DAG: Result: 42
 // CHECK-DAG: Thread: 1, Block: 2
+// CHECK-DAG: Result: 42
 // CHECK-DAG: Thread: 0, Block: 3
+// CHECK-DAG: Result: 42
 // CHECK-DAG: Thread: 1, Block: 3
-void foo(void *data) {
+// CHECK-DAG: Result: 42
+long long foo(void *data) {
   assert(omp_is_initial_device() && "Not executing on host?");
   args_t *args = (args_t *)data;
   printf("Thread: %d, Block: %d\n", args->thread_id, args->block_id);
+  return 42;
 }
 
 void *fn_ptr = NULL;
@@ -49,6 +58,7 @@ int main() {
 #pragma omp parallel num_threads(2)
   {
     args_t args = {omp_get_thread_num(), omp_get_team_num()};
-    rpc_host_call(fn_ptr, &args, sizeof(args_t));
+    unsigned long long res = rpc_host_call(fn_ptr, &args, sizeof(args_t));
+    printf("Result: %d\n", (int)res);
   }
 }

@Hardcode84 Hardcode84 merged commit 26ca8ef into llvm:main Oct 6, 2024
12 checks passed
@Hardcode84 Hardcode84 deleted the rpc_host_call_ret branch October 6, 2024 17:22
Kyvangka1610 added a commit to Kyvangka1610/llvm-project that referenced this pull request Oct 7, 2024
* commit 'FETCH_HEAD':
  [X86] getIntImmCostInst - pull out repeated Imm.getBitWidth() calls. NFC.
  [X86] Add test coverage for llvm#111323
  [Driver] Use empty multilib file in another test (llvm#111352)
  [clang][OpenMP][test] Use x86_64-linux-gnu triple for test referencing avx512f feature (llvm#111337)
  [doc] Fix Kaleidoscope tutorial chapter 3 code snippet and full listing discrepancies (llvm#111289)
  [Flang][OpenMP] Improve entry block argument creation and binding (llvm#110267)
  [x86] combineMul - handle 0/-1 KnownBits cases before MUL_IMM logic (REAPPLIED)
  [llvm-dis] Fix non-deterministic disassembly across multiple inputs (llvm#110988)
  [lldb][test] TestDataFormatterLibcxxOptionalSimulator.py: change order of ifdefs
  [lldb][test] Add libcxx-simulators test for std::optional (llvm#111133)
  [x86] combineMul - use computeKnownBits directly to find MUL_IMM constant splat. (REAPPLIED)
  Reland "[lldb][test] TestDataFormatterLibcxxStringSimulator.py: add new padding layout" (llvm#111123)
  Revert "[x86] combineMul - use computeKnownBits directly to find MUL_IMM constant splat."
  update_test_checks: fix a simple regression  (llvm#111347)
  [LegalizeVectorTypes] Always widen fabs (llvm#111298)
  [lsan] Make ReportUnsuspendedThreads return bool also for Fuchsia
  [mlir][vector] Add more tests for ConvertVectorToLLVM (6/n) (llvm#111121)
  [bazel] port 9144fed
  [SystemZ] Remove inlining threshold multiplier. (llvm#106058)
  [LegalizeVectorTypes] When widening don't check for libcalls if promoted (llvm#111297)
  [clang][Driver] Improve multilib custom error reporting (llvm#110804)
  [clang][Driver] Rename "FatalError" key to "Error" in multilib.yaml (llvm#110804)
  [LLVM][Maintainers] Update release managers (llvm#111164)
  [Clang][Driver] Add option to provide path for multilib's YAML config file (llvm#109640)
  [LoopVectorize] Remove redundant code in emitSCEVChecks (llvm#111132)
  [AMDGPU] Only emit SCOPE_SYS global_wb (llvm#110636)
  [ELF] Change Ctx::target to unique_ptr (llvm#111260)
  [ELF] Pass Ctx & to some free functions
  [RISCV] Only disassemble fcvtmod.w.d if the rounding mode is rtz. (llvm#111308)
  [Clang] Remove the special-casing for RequiresExprBodyDecl in BuildResolvedCallExpr() after fd87d76 (llvm#111277)
  [ELF] Pass Ctx & to InputFile
  [clang-format] Add AlignFunctionDeclarations to AlignConsecutiveDeclarations (llvm#108241)
  [AMDGPU] Support preloading hidden kernel arguments (llvm#98861)
  [ELF] Move static nextGroupId isInGroup to LinkerDriver
  [clangd] Add ArgumentLists config option under Completion (llvm#111322)
  [ELF] Pass Ctx & to SyntheticSections
  [ELF] Pass Ctx & to Symbols
  [ELF] Pass Ctx & to Symbols
  [ELF] getRelocTargetVA: pass Ctx and Relocation. NFC
  [clang-tidy] Avoid capturing a local variable in a static lambda in UseRangesCheck (llvm#111282)
  [VPlan] Use pointer to member 0 as VPInterleaveRecipe's pointer arg. (llvm#106431)
  [clangd] Simplify ternary expressions with std::optional::value_or (NFC) (llvm#111309)
  [libc++][format][2/3] Optimizes c-string arguments. (llvm#101805)
  [RISCV] Combine RVBUnary and RVKUnary into classes that are more similar to ALU(W)_r(r/i). NFC (llvm#111279)
  [ELF] Pass Ctx & to InputFiles
  [libc] GPU RPC interface: add return value to `rpc_host_call` (llvm#111288)

Signed-off-by: kyvangka1610 <kyvangka2002@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants