-
Notifications
You must be signed in to change notification settings - Fork 5.7k
[WIP]Implement DeviceContextManager and Ensure only one CUDA stream existing in new framework at now #4218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* | ||
* @note CopyFrom supports CPU <-> GPU, GPU <-> GPU. | ||
*/ | ||
template <typename T> | ||
inline void CopyFrom(const Tensor& src, const platform::Place& dst_place); | ||
inline void CopyFrom(const Tensor& src, const platform::Place& dst_place, | ||
bool is_sync = false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure, but the easiest way might be to take a const DeviceContext&
by CopyFrom
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard for operator developers to pass a DeviceContext to CopyFrom method. Operator developers have to pass right DeviceContext depends on the src place(cpu or gpu) and dst place(cpu or gpu).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think DeviceContextMgr
is necessary. Maybe we could just pass a DeviceContext
to Tensor::CopyFrom
and make it async?
|
Since you haven't replied for a long time, we have closed this issue/pr. |
Fix #3796