Skip to content

Add GPU tensor support to decoupled API #154

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 21, 2022
Merged

Add GPU tensor support to decoupled API #154

merged 5 commits into from
May 21, 2022

Conversation

Tabrizian
Copy link
Member

@Tabrizian Tabrizian commented May 11, 2022

  • Add support for GPU outputs
  • Add support for GPU inputs
  • Add backend utility to collect input buffers into a single buffer
  • Add testing

@Tabrizian Tabrizian marked this pull request as ready for review May 13, 2022 21:38
@Tabrizian Tabrizian requested review from tanmayv25 and krishung5 May 13, 2022 21:38
"GPU buffers size does not match the provided buffers: ") +
std::to_string(gpu_tensors.size()) +
" != " + std::to_string(*gpu_buffer_count));
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Send error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more like assert and will never be triggered. I can add a comment for it.

src/python.cc Outdated
// limitation in the legacy CUDA IPC API that doesn't allow getting the
// handle of an exported pointer. If the cuda handle exists, it indicates
// that the cuda shared memory was used and the input is in a single
// buffer. [FIXME] for the case where the input is in cuda shared memory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newline for [FIXME]

src/python.cc Outdated
@@ -1443,7 +1460,7 @@ ModelInstanceState::ResponseSendDecoupled(
ResponseSendMessage* send_message_payload =
reinterpret_cast<ResponseSendMessage*>(send_message.data_.get());
std::unique_ptr<PbString> error_message;
ScopedDefer _([send_message_payload] {
ScopedDefer _([&send_message_payload] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why take reference of a pointer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No particular reason. I'll revert this change.

@Tabrizian Tabrizian requested a review from tanmayv25 May 18, 2022 19:16
Copy link
Contributor

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Go-style defer is cool to see 🙂

++index;
}

// Additional round trip so that the stub can fill the GPU output buffers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why there is another round trip? the backend process signals back that there is CUDA IPC handle and wait for stub process to copy the data to within the CUDA IPC handle?

Do I summarize the workflow correctly:

  1. stub requests for output buffer (and passing buffer via shared memory at the same time if CPU tensor)
  2. backend acquires output buffer, copy if CPU tensor, otherwise, passing the CUDA IPC handle back to stub.
  3. (only for GPU tensor) stub then copy GPU tensor into the CUDA IPC handle

Copy link
Member Author

@Tabrizian Tabrizian May 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. The third bullet point is what requires the round trip.

@Tabrizian Tabrizian merged commit 92245a7 into main May 21, 2022
@Tabrizian Tabrizian deleted the imant-gpu-tensor branch May 24, 2022 00:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants