Add GPU tensor support to decoupled API #154

Tabrizian · 2022-05-11T21:54:26Z

Add support for GPU outputs
Add support for GPU inputs
Add backend utility to collect input buffers into a single buffer
Add testing

GuanLuo · 2022-05-13T23:48:41Z

src/response_sender.cc

+                  "GPU buffers size does not match the provided buffers: ") +
+              std::to_string(gpu_tensors.size()) +
+              " != " + std::to_string(*gpu_buffer_count));
+      return;


Send error?

This is more like assert and will never be triggered. I can add a comment for it.

GuanLuo · 2022-05-13T23:50:04Z

src/python.cc

+      // limitation in the legacy CUDA IPC API that doesn't allow getting the
+      // handle of an exported pointer. If the cuda handle exists, it indicates
+      // that the cuda shared memory was used and the input is in a single
+      // buffer. [FIXME] for the case where the input is in cuda shared memory


Newline for [FIXME]

GuanLuo · 2022-05-13T23:51:30Z

src/python.cc

@@ -1443,7 +1460,7 @@ ModelInstanceState::ResponseSendDecoupled(
  ResponseSendMessage* send_message_payload =
      reinterpret_cast<ResponseSendMessage*>(send_message.data_.get());
  std::unique_ptr<PbString> error_message;
-  ScopedDefer _([send_message_payload] {
+  ScopedDefer _([&send_message_payload] {


Why take reference of a pointer?

No particular reason. I'll revert this change.

src/pb_stub.h

src/infer_response.cc

rmccorm4

Using Go-style defer is cool to see 🙂

src/infer_response.cc

GuanLuo · 2022-05-20T21:31:19Z

src/python.cc

+        ++index;
+      }
+
+      // Additional round trip so that the stub can fill the GPU output buffers.


Why there is another round trip? the backend process signals back that there is CUDA IPC handle and wait for stub process to copy the data to within the CUDA IPC handle?

Do I summarize the workflow correctly:

stub requests for output buffer (and passing buffer via shared memory at the same time if CPU tensor)

backend acquires output buffer, copy if CPU tensor, otherwise, passing the CUDA IPC handle back to stub.

(only for GPU tensor) stub then copy GPU tensor into the CUDA IPC handle

Exactly. The third bullet point is what requires the round trip.

Tabrizian added 3 commits May 11, 2022 17:53

Add GPU tensor support to decoupled API

88b45a2

Fix GPU output buffers for response send

74f766b

Add GPU input support for decoupled API

e677a70

Tabrizian force-pushed the imant-gpu-tensor branch from 87fb69c to e677a70 Compare May 13, 2022 16:22

Tabrizian marked this pull request as ready for review May 13, 2022 21:38

Tabrizian requested review from tanmayv25 and krishung5 May 13, 2022 21:38

GuanLuo reviewed May 13, 2022

View reviewed changes

tanmayv25 requested changes May 17, 2022

View reviewed changes

src/infer_response.cc Outdated Show resolved Hide resolved

src/infer_response.cc Show resolved Hide resolved

Review edit

76759f0

Tabrizian requested a review from tanmayv25 May 18, 2022 19:16

rmccorm4 requested changes May 20, 2022

View reviewed changes

src/infer_response.cc Show resolved Hide resolved

Review edit

f349db5

Tabrizian force-pushed the imant-gpu-tensor branch from 724b1ce to f349db5 Compare May 20, 2022 20:36

tanmayv25 approved these changes May 20, 2022

View reviewed changes

Tabrizian requested review from GuanLuo, tanmayv25 and rmccorm4 May 20, 2022 21:07

tanmayv25 approved these changes May 20, 2022

View reviewed changes

GuanLuo reviewed May 20, 2022

View reviewed changes

rmccorm4 approved these changes May 21, 2022

View reviewed changes

Tabrizian merged commit 92245a7 into main May 21, 2022

Tabrizian deleted the imant-gpu-tensor branch May 24, 2022 00:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add GPU tensor support to decoupled API #154

Add GPU tensor support to decoupled API #154

Uh oh!

Tabrizian commented May 11, 2022 •

edited

Loading

Uh oh!

GuanLuo May 13, 2022

Uh oh!

Tabrizian May 16, 2022

Uh oh!

GuanLuo May 13, 2022

Uh oh!

GuanLuo May 13, 2022

Uh oh!

Tabrizian May 17, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rmccorm4 left a comment

Uh oh!

Uh oh!

Uh oh!

GuanLuo May 20, 2022

Uh oh!

Tabrizian May 20, 2022 •

edited

Loading

Uh oh!

Uh oh!

Add GPU tensor support to decoupled API #154

Add GPU tensor support to decoupled API #154

Uh oh!

Conversation

Tabrizian commented May 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GuanLuo May 13, 2022

Choose a reason for hiding this comment

Uh oh!

Tabrizian May 16, 2022

Choose a reason for hiding this comment

Uh oh!

GuanLuo May 13, 2022

Choose a reason for hiding this comment

Uh oh!

GuanLuo May 13, 2022

Choose a reason for hiding this comment

Uh oh!

Tabrizian May 17, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rmccorm4 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

GuanLuo May 20, 2022

Choose a reason for hiding this comment

Uh oh!

Tabrizian May 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Tabrizian commented May 11, 2022 •

edited

Loading

Tabrizian May 20, 2022 •

edited

Loading