Skip to content

Add Fluid Compiler design doc #7178

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 21, 2018
Merged

Conversation

wangkuiyi
Copy link
Collaborator

No description provided.

## Native Code Generator

For the above example, the native code generator transpiler, say, the
CUDA code generator, should generate a `main` function:
Copy link
Member

@jacquesqiao jacquesqiao Jan 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In most case, user may need a library such as a .so or .a to use in some other program, such as a Face recognition App, so do we also need to be able to generate these libraries at the same time?

Copy link
Collaborator Author

@wangkuiyi wangkuiyi Jan 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

I prefer that our transpiler generates source code only, not reusing some .a/.so files, so to simplify the building process.

To be precise, if the transpiler generates source code only, the general workflow would be

               PaddlePaddle
	       operator/kernel
	       source code
	          |
		  v
ProgramDesc -> transpiler -> .cc file -> nvcc -> binary file

Otherwise, if we try to reuse the .a/.so files

               PaddlePaddle
	       operator/kernel -> nvcc(1) -> .a/.so
	       source code                   |
	          |                          |
		  v                          v
ProgramDesc -> transpiler -> .cc file -> nvcc(2) -> binary file

It is error-prone because there is a chance we are using different compilers for nvcc(1) and nvcc(2), or we build in different environments with mutually exclusive configurations.

It is true that the generated code might depend on third-party libraries, so our transpiler might also need to generate build commands, including dependencies.

```c++
paddle::Tensor fluid_cuda_read(...) {
paddle::Tensor t;
paddle::operator::Read r(&t, ...);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we copy the operator::Read files (read_op.cc/read_op.cu/read_op.h) to the generated project. Or just read_op.a/.so?

Copy link
Collaborator Author

@wangkuiyi wangkuiyi Jan 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to copy the source files instead of reusing .a/.so files due to reasons in #7178 (comment)

protobuf message as an interpreter. This article describes the Fluid
compiler.

![](fluid-compiler.png)
Copy link
Contributor

@helinwang helinwang Jan 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need Executor in this graph? It seems unrelated to this PR.

the following

```protobuf
message ProgramDesc {
Copy link
Contributor

@helinwang helinwang Jan 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding the ProgramDesc is a intermediate representation (IR), currently we have Python as frontend that generates the IR, and this PR discusses a cpp code backend.

I think having Python as a frontend is a huge pain. In my opinion the benefit of Python in the machine learning field is:

  1. Libraries such as numpy
  2. Python native program control primitives such as for, if to control the training process, the researchers are familiar with them.

In our case we are benefit from neither of these two points:

  1. the transpiled code can not use numpy.
  2. the transpiled code can not use Python native program control primitives.

And we are trapped in the Python grammar.

I think a better way is to invent our own language.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree and I believe that a new language is the future.

Copy link
Contributor

@sidgoyal78 sidgoyal78 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the examples, I have a question concerning clubbing of execution and compilation by the "native code generator" transpiler.

paddle::Tensor fluid_cuda_read(...) {
paddle::Tensor t;
paddle::operator::Read r(&t, ...);
r.Run();
Copy link
Contributor

@sidgoyal78 sidgoyal78 Jan 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a very basic question, but from this example, I see r.Run(). Is this bringing execution into picture along with the compilation? (Maybe we instead have a Run() method, which is called by the executor later?, or maybe i misunderstood).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nevermind, I mixed up the 2 things. I think post this code generation, the executor will indeed run these as usual i guess.

@@ -105,18 +105,10 @@ There are two ways to execute a Fluid program. When a program is executed, it c

There is a C++ class [`Executor`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/executor.h), which runs a `ProgramDesc`, similar to how an interpreter runs a Python program.

Fluid is moving towards the direction of a compiler, which is explain in more detail later in this article.
Fluid is moving towards the direction of a compiler, which is explain in [fluid_compiler.md](fluid_compiler.md).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explain -> explained

}
```

For computational operators that have multiple *kernels*, each for a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also consider the possibility of having a default fallback device in case some operator cannot run on a device. For example, we might have some CPU only operators in that case our transpiler should make sure that it generates CPU code for that op even though the rest of the native code might be CUDA code?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abhinavarora
Yes. We provide the kernel selection and kernel fallback mechanism.

void DataTransform(const OpKernelType& expected_kernel_type,
const OpKernelType& kernel_type_for_var,
const Tensor& input_tensor, Tensor* out);
void CopyVariableWithTensor(const Variable& in_var, const Tensor& tensor,
Variable& out_var);
} // namespace framework

If the target machine does not have the target device, it will try to use naive implement kernel (say CPU kernel) instead of terminated.
In the fluid overview design, there will be a runtime .a link to target program. In my view, the runtime library will solve the fallback problem, not the transpiler does.

Copy link
Contributor

@dzhwinter dzhwinter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We can merge this design doc if there are no new comments and updates.

@dzhwinter dzhwinter merged commit 2dc5c69 into PaddlePaddle:develop Jan 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants