Skip to content

Single release for PaddlePaddle CPU Image #1607

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions paddle/cuda/Readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
## Single release for PaddlePaddle CPU Image

### Background

Currently, PaddlePaddle supports AVX and SSE3 intrinsics (extensions to the x86 instruction set architecture). When using CMake to compile PaddlePaddle source code, it will check and detect the host which SIMD instruction is supported, then automatically set the legal one. Developer or user also could manually set CMake option `WITH_AVX=ON/OFF` before PaddlePaddle compilation. That's good for local usage.


### Problem Involved

Nonetheless, from the perspective of the deployment, there are some drawbacks:

1. The online runtime environment is very complex, if an older node does not support AVX or others,
PaddlePaddle will crash and throw out `illegal instruction is used`. This problem will appear
frequently on cluster environment, like Kubernetes. **It must be addressed before PaddlePaddle on Cloud**

2. Once new version is ready to deliver, we have to release more products to users, for example, `no-avx-cpu`, `avx-cpu`, `no-avx-gpu`, `avx-gpu`. Users do not need to care about details. It sucks!


### How to Address it?

To address this issue, there are three primary components:

1. [Done] Runtime Check:

We can utilize CPU ID information to check SIMD info at runtime. This functionality already merged into
current develop branch. For full details, please check out [CpuId.cpp](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/utils/CpuId.cpp) and [CpuId.h](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/utils/CpuId.h).


2. [Pending] Adjust `cuda` Directory.

Since the current `cuda` directory includes heterogeneous source code (cpu and gpu), we want to refactor `cuda` directory. For simplicity, different simd intrinsics will be inside the different directories. we need to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cuda目录里面的代码需要调整,但是different simd intrinsics will be inside the different directories会怎加一些sse/avx目录,这样感觉并不是很好,每个目录里面可能没有几个文件;另外,我觉得相同功能的代码放在一起比相同指令集的代码放在一起更重要。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有一些开源库的做法是,把一些simd intrinsic做一层封装[fftw],上层的功能都是基于这层封装开发的,毕竟大部分用intrinsic实现的功能都只是指令的不一样,而常用的指令也就是load、store、add、mul

modified CMake files to support this solution.

3. [Pending] Modify CMake files.

Different simd intrinsics will be inside the different directories. we need to modified CMake files to support this solution. Each directory uses the different compile options (`-mavx` or `-msse`) to generate the corresponding binaries. Then, at runtime, using SIMD flags `HAS_AVX`, `HAS_SSE` automatically detect and select the supported branch (intrinsics) to execute.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这部分逻辑需要这么做再看一下,简单的做法#1634 (comment)



### Conclusion

The method could fix the releases and deployment problems.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

所以,releases and deployment 时的环境不一致,从而带来的运行时的一些困惑是这个design的目的。而Single release for PaddlePaddle CPU Image 是其中一种解决方法;#1634 (comment) 是另外一种解决方法。需要再比较一下这两种方法。
另外,我的建议是,是否有必要去做Single release for PaddlePaddle CPU Image ;如果后续引入一些AVX2/AVX512的代码,当前的设计是否能够支持(3. [Pending] Modify CMake files里面只提到-mavx/-msse)?