-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement DeviceContext #2709
implement DeviceContext #2709
Conversation
paddle/platform/device_context.h
Outdated
Eigen::DefaultDevice* eigen_device_{nullptr}; | ||
}; | ||
|
||
#ifndef PADDLE_ONLY_CPU |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here to add PADDLE_ONLY_CPU
will bring a problem, the code that calls DeviceGuard or CudaDeviceContext needs to be separated by PADDLE_ONLY_CPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code that calls DeviceGuard or CudaDeviceContext must have WITH_GPU set 1.
Yes, this brings a question, how we organize our CPU/GPU codes clearly.
We can use marco, or make fake stub header file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are a few things to consider when dealing with GPU and CPU mixed code.
- If not necessary, try not to put the GPU and CPU code in a file. In this way, you do not need to use an extra macro to separate the code. (I think, context.h can only contain cpu context, cuda_context.h can contain gpu context.)
- Do not use
PADDLE_ONLY_CPU
, should be replaced byPADDLE_WITH_CUDA
. The default should be the CPU code, and when need to use CUDA code, addPADDLE_WITH_CUDA
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this suggestion is useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge this pr temporarily. And I will consider the design of DeviceContext combining with Operator interface. And I will follow advices of @hedaoyuan later.
paddle/platform/device_context.h
Outdated
Eigen::DefaultDevice eigen_device() { | ||
if (!eigen_device_) { | ||
eigen_device_ = new Eigen::DefaultDevice(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where use the Eigen::DefaultDevice
in our design? I find the Eigen::DefaultDevice
in the directory of eigen/unsupported/Eigen/CXX11/src/Tensor
, but I do not find the usage in the Tensor's doc of Eigen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eigen::DefaultDevice
is defined in unsupported/Eigen/CXX11/src/Tensor/TensorDeviceDefault.h
.
About the usage of Eigen::DefaultDevice, please refer to (https://github.com/QiJune/RefEigen/blob/master/main.cu)
paddle/platform/device_context.h
Outdated
paddle::platform::throw_on_error(cudaStreamCreate(&stream_), | ||
"cudaStreamCreate failed"); | ||
eigen_stream_ = new Eigen::CudaStreamDevice(&stream_); | ||
eigen_device_ = new Eigen::GpuDevice(eigen_stream_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we will use the CUDA implementation in Eigen. If not decide to use it, I think the eigen_stream_
and eigen_device_
can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do not use CUDA implementation in Eigen, then we will write CUDA kernels for every operators. Just like caffe2.
And tensorflow use CUDA implementation in Eigen. @hedaoyuan once mentioned the efficiency of expression template of Eigen in GPU is acceptable.
So, we may have a discussion about this offline.
paddle/platform/device_context.h
Outdated
#include "paddle/framework/enforce.h" | ||
#include "paddle/platform/dynload/cublas.h" | ||
#include "paddle/platform/dynload/cudnn.h" | ||
#include "paddle/platform/dynload/curand.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above three lines should also be included between #ifndef PADDLE_ONLY_CPU
and #endif
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, logically above three header files should be included between macros.
paddle/platform/device_context.h
Outdated
|
||
#ifndef PADDLE_ONLY_CPU | ||
#include "paddle/platform/cuda.h" | ||
#define EIGEN_USE_GPU |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the EIGEN_USE_GPU
used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The EIGEN_USE_GPU
is used by eigen library. If we want to use Tensor Expression of eigen in GPU, we have to define this marco.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that idea of a super simple engine.
paddle/platform/device_context.h
Outdated
virtual ~DeviceContext() {} | ||
}; | ||
|
||
class CpuDeviceContext : public DeviceContext { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cpu => CPU
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
paddle/platform/device_context.h
Outdated
GPUPlace previous_; | ||
}; | ||
|
||
class CudaDeviceContext : public DeviceContext { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cuda => CUDA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
baed7b1
to
39679d5
Compare
paddle/platform/device_context.h
Outdated
class CPUDeviceContext : public DeviceContext {}; | ||
|
||
#ifndef PADDLE_ONLY_CPU | ||
class DeviceGuard { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems GPUPlaceGuard?
not DeviceGuard
, because it takes GPUPlace
as argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually guard the GPUPlace.device. Since we pass GPUPlace, maybe GPUPlaceGuard is a more clear name
"cudaStreamSynchronize failed"); | ||
} | ||
|
||
cudaStream_t stream() { return stream_; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lake of const
for all methods?
But maybe it is not important because all device context is a mutable pointer passed to Op::Run.
paddle/platform/device_context.h
Outdated
cublasHandle_t cublas_handle() { | ||
if (!blas_handle_) { | ||
DeviceGuard guard(gpu_place_); | ||
PADDLE_ENFORCE(paddle::platform::dynload::cublasCreate(&blas_handle_) == |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tooooooooo long for the namespace.
Maybe we can add using namespace paddle::platform;
in this class private section, like
class GPUDeviceContext {
private:
using namespace paddle::platform; // only use namespace in this class.
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe alias is better, like
using dynload = paddle::platform::dynload;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that we cannot add an alias or using namespace inside a class
for (int i = 0; i < count; i++) { | ||
paddle::platform::CUDADeviceContext* device_context = | ||
new paddle::platform::CUDADeviceContext(i); | ||
__attribute__((unused)) Eigen::GpuDevice gpu_device = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not use unused
attribute, because it may fail on some compiler.
Maybe the return value does not need to store. What about
ASSERT_NE(nullptr, device_context->eigen_device());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
e3f0db4
to
c7bdbdb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, can seperate cpu code and gpu code later.
#2607
#2648