-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Add Fluid Compiler design doc #7178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
## Native Code Generator | ||
|
||
For the above example, the native code generator transpiler, say, the | ||
CUDA code generator, should generate a `main` function: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In most case, user may need a library such as a .so
or .a
to use in some other program, such as a Face recognition App, so do we also need to be able to generate these libraries at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point!
I prefer that our transpiler generates source code only, not reusing some .a/.so files, so to simplify the building process.
To be precise, if the transpiler generates source code only, the general workflow would be
PaddlePaddle
operator/kernel
source code
|
v
ProgramDesc -> transpiler -> .cc file -> nvcc -> binary file
Otherwise, if we try to reuse the .a/.so files
PaddlePaddle
operator/kernel -> nvcc(1) -> .a/.so
source code |
| |
v v
ProgramDesc -> transpiler -> .cc file -> nvcc(2) -> binary file
It is error-prone because there is a chance we are using different compilers for nvcc(1)
and nvcc(2)
, or we build in different environments with mutually exclusive configurations.
It is true that the generated code might depend on third-party libraries, so our transpiler might also need to generate build commands, including dependencies.
```c++ | ||
paddle::Tensor fluid_cuda_read(...) { | ||
paddle::Tensor t; | ||
paddle::operator::Read r(&t, ...); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we copy the operator::Read
files (read_op.cc
/read_op.cu
/read_op.h
) to the generated project. Or just read_op.a/.so
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to copy the source files instead of reusing .a/.so files due to reasons in #7178 (comment)
protobuf message as an interpreter. This article describes the Fluid | ||
compiler. | ||
|
||
 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need Executor
in this graph? It seems unrelated to this PR.
the following | ||
|
||
```protobuf | ||
message ProgramDesc { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding the ProgramDesc is a intermediate representation (IR), currently we have Python as frontend that generates the IR, and this PR discusses a cpp code backend.
I think having Python as a frontend is a huge pain. In my opinion the benefit of Python in the machine learning field is:
- Libraries such as numpy
- Python native program control primitives such as
for
,if
to control the training process, the researchers are familiar with them.
In our case we are benefit from neither of these two points:
- the transpiled code can not use numpy.
- the transpiled code can not use Python native program control primitives.
And we are trapped in the Python grammar.
I think a better way is to invent our own language.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree and I believe that a new language is the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the examples, I have a question concerning clubbing of execution and compilation by the "native code generator" transpiler.
paddle::Tensor fluid_cuda_read(...) { | ||
paddle::Tensor t; | ||
paddle::operator::Read r(&t, ...); | ||
r.Run(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a very basic question, but from this example, I see r.Run()
. Is this bringing execution into picture along with the compilation? (Maybe we instead have a Run() method, which is called by the executor later?, or maybe i misunderstood).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh nevermind, I mixed up the 2 things. I think post this code generation, the executor will indeed run these as usual i guess.
@@ -105,18 +105,10 @@ There are two ways to execute a Fluid program. When a program is executed, it c | |||
|
|||
There is a C++ class [`Executor`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/executor.h), which runs a `ProgramDesc`, similar to how an interpreter runs a Python program. | |||
|
|||
Fluid is moving towards the direction of a compiler, which is explain in more detail later in this article. | |||
Fluid is moving towards the direction of a compiler, which is explain in [fluid_compiler.md](fluid_compiler.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
explain -> explained
} | ||
``` | ||
|
||
For computational operators that have multiple *kernels*, each for a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also consider the possibility of having a default fallback device in case some operator cannot run on a device. For example, we might have some CPU only operators in that case our transpiler should make sure that it generates CPU code for that op even though the rest of the native code might be CUDA code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abhinavarora
Yes. We provide the kernel selection and kernel fallback mechanism.
Paddle/paddle/framework/data_transform.h
Lines 33 to 40 in eee6264
void DataTransform(const OpKernelType& expected_kernel_type, | |
const OpKernelType& kernel_type_for_var, | |
const Tensor& input_tensor, Tensor* out); | |
void CopyVariableWithTensor(const Variable& in_var, const Tensor& tensor, | |
Variable& out_var); | |
} // namespace framework |
If the target machine does not have the target device, it will try to use naive implement kernel (say CPU kernel) instead of terminated.
In the fluid overview design, there will be a runtime
.a
link to target program. In my view, the runtime library will solve the fallback problem, not the transpiler does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. We can merge this design doc if there are no new comments and updates.
No description provided.