-
Notifications
You must be signed in to change notification settings - Fork 5.8k
Utilize StreamSafeCUDAAllocator to support fast GC in new executor #37642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Utilize StreamSafeCUDAAllocator to support fast GC in new executor #37642
Conversation
Thanks for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已线下讨论H2D的一种场景,需要添加逻辑支持。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work~~
@@ -693,6 +693,29 @@ const std::vector<size_t>& Instruction::GCCheckVars() const { | |||
return gc_check_var_list_; | |||
} | |||
|
|||
void Instruction::InferNeedStreamSyncVars() { | |||
auto ParseIndexFromMap = []( | |||
std::map<std::string, std::vector<int>>& index_map, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::map<std::string, std::vector<int>>& index_map, | |
const std::map<std::string, std::vector<int>>& index_map, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. This code has been deleted, please check the new commit.
代码已修改,新的设计如下:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
… utilize-stream-safe-cuda-allocator-in-new-executor-gc
paddle/fluid/framework/tensor.cc
Outdated
@@ -235,5 +237,13 @@ void Tensor::ResetHolderWithType(std::shared_ptr<memory::Allocation> holder, | |||
|
|||
void Tensor::set_type(const proto::VarType::Type& type) { type_ = type; } | |||
|
|||
#if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
在不同设备共用的核心数据结构中,原则上禁止用宏定义控制 class / struct 中成员的类型或个数。确有需要的,需讨论后再合入。上个提交已经加入的,正在商量修改方法去除
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
macros used in Tensor/LoDTensor declaration will severely block work progresses on Pten, please modify.
Done, thanks.
paddle/fluid/framework/tensor.h
Outdated
@@ -290,6 +290,10 @@ class Tensor { | |||
return *inplace_version_counter_; | |||
} | |||
|
|||
#if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上,需要修改
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
代码已经修改,在Tensor类中去除了宏控制。
@@ -151,11 +151,12 @@ class AllocatorFacadePrivate { | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#37978
如果可能,这个PR里,请改用 unique_ptr<Allocation>
替代 shared_ptr<Allocation>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以,后续新写一个PR对base_ptr_test.cu单测中的shared_ptr替换成unique_ptr
paddle/fluid/framework/tensor.cc
Outdated
@@ -235,5 +237,13 @@ void Tensor::ResetHolderWithType(std::shared_ptr<memory::Allocation> holder, | |||
|
|||
void Tensor::set_type(const proto::VarType::Type& type) { type_ = type; } | |||
|
|||
#if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP) | |||
void Tensor::RecordStream(const gpuStream_t& stream) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding backend/device specific interfaces to backend-agnostic data structures such as Tensor is strongly unrecommended, and will block works related to Pten and Dygraph Refactor.
Taking "Tensor::RecordStream" as an example. When adding this interface to Tensor, the principle is that the interface has to "exist" whatsoever (regardless of the macro, the backend, the way you compile...), but you may use macros inside a source file to control how this interface is implemented towards different backends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The macros have been deleted, please check the new commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
macros used in Tensor/LoDTensor declaration will severely block work progresses on Pten, please modify.
…e/Paddle into utilize-stream-safe-cuda-allocator-in-new-executor-gc
… utilize-stream-safe-cuda-allocator-in-new-executor-gc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
… utilize-stream-safe-cuda-allocator-in-new-executor-gc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
New features
PR changes
Others
Describe
We have impletemented a StreamSafeCUDAAllocator in #37290, yet in this PR we do the following things to support fast GC in new executor: