-
Notifications
You must be signed in to change notification settings - Fork 172
Add GPU tensor support to decoupled API #154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
87fb69c
to
e677a70
Compare
"GPU buffers size does not match the provided buffers: ") + | ||
std::to_string(gpu_tensors.size()) + | ||
" != " + std::to_string(*gpu_buffer_count)); | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Send error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more like assert and will never be triggered. I can add a comment for it.
src/python.cc
Outdated
// limitation in the legacy CUDA IPC API that doesn't allow getting the | ||
// handle of an exported pointer. If the cuda handle exists, it indicates | ||
// that the cuda shared memory was used and the input is in a single | ||
// buffer. [FIXME] for the case where the input is in cuda shared memory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Newline for [FIXME]
src/python.cc
Outdated
@@ -1443,7 +1460,7 @@ ModelInstanceState::ResponseSendDecoupled( | |||
ResponseSendMessage* send_message_payload = | |||
reinterpret_cast<ResponseSendMessage*>(send_message.data_.get()); | |||
std::unique_ptr<PbString> error_message; | |||
ScopedDefer _([send_message_payload] { | |||
ScopedDefer _([&send_message_payload] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why take reference of a pointer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No particular reason. I'll revert this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Go-style defer
is cool to see 🙂
724b1ce
to
f349db5
Compare
++index; | ||
} | ||
|
||
// Additional round trip so that the stub can fill the GPU output buffers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why there is another round trip? the backend process signals back that there is CUDA IPC handle and wait for stub process to copy the data to within the CUDA IPC handle?
Do I summarize the workflow correctly:
- stub requests for output buffer (and passing buffer via shared memory at the same time if CPU tensor)
- backend acquires output buffer, copy if CPU tensor, otherwise, passing the CUDA IPC handle back to stub.
- (only for GPU tensor) stub then copy GPU tensor into the CUDA IPC handle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. The third bullet point is what requires the round trip.
Uh oh!
There was an error while loading. Please reload this page.