Plan to develop the inference library of fluid

Based on my experience in implementing a simple C++ inference example for fluid #7097 , I get a basic understanding of the training and inferring process of fluid.

To implement the inference library for fluid, we need discussion on following things:

- [Framework Design](#framework-design)
- [Refine the Storing Format of Inference Network](#refine-the-storing-format-of-inference-network)
- [Optimize the Inference Program](#optimize-the-inference-program)
- [Compiling Aspects](#compiling-aspects)

### Framework Design

In fluid, the inference process is composed of 2 phases:
- **Creation phase.** In fluid, two `ProgramDesc`s are needed for inference: one defines the inference network, namely `inference_program`; the other defines how and where to load the parameters, namely `load_program`. 
  - Basically, `proto string of inference network`, `feed_var_names` and `fetch_var_names` are needed to create the two `ProgramDesc`.
  - No Tensors or models are created in this phase, `inference_program` and `load_program` just hold the protobuf message information of the inference network and initializing the network, respectively.
- **Execution phase.** In fluid, users can switch among different execution environments flexibility. There are two steps:
  - **Configuration step**, allowing users to set different configuration, such as different devices (CPU or GPU), different runtime setting (multi-threads or multi-devices), etc.. In fact, this step is used to initialize the execution environment for inference network. 
    - Once the configuration step is completed, users can run inference for many-times.
    - Users can initialize several different execution environments for the same inference network.
    - The loading of parameters will be done in this step too, by running the `load_program`.
  - **Running step**, allowing users to feed different data, do inference and fetch the predicted results.
    - Supporting the running in both synchronous and asynchronous way.

    ![inference](https://user-images.githubusercontent.com/12538138/34478963-de881d0a-efde-11e7-9bd2-ab9378ac3124.png)

As a result, there should be at least three key concepts in fluid's C++ API:
- InferenceDesc, to hold the handle of `inference_program` and `load_program`, which can be initialized from file or from buffer.
- ~~Tensor, an easy-to-use data structure for users. The Tensor and LoDTensor of fluid are too complicated for users. We do not need delayed memory allocation in this structure.~~ Use Tensor and LoDTensor directly in user codes.
- Execution, to hold the configuration of the execution environment.

  **[NEED MORE DETAIL]**

### Refine the Storing Format of Inference Network
Currently, models trained by v2 api cannot be read in C-API directly. We need extra steps:
- Remove the label, cost and evaluator layers in the network manually to get an inference network
- Use [dumpy_config.py](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/utils/dump_config.py) to get the serialized config, or use [merge_model.py](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/utils/merge_model.py) to get a merged model (the serialized config and all parameter files are merged into a single file)

In fluid, we hope the model stored in the training process can be used in c++ inference code directly. Currently, there are a couple of interfaces, [fluid.io.save_infernece_model](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/fluid/io.py#L165) and [fluid.io.load_inference_model](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/fluid/io.py#L236) which are specially designed for inference. However, the storing format needs to be refined.
- `fluid.io.save_inference_model` uses `pickle.dump` to store `program_desc_str`, `feed_var_names` and `fetch_var_names` information. It is difficult to support `pickle` in C++ code without third-party tools (like [PicklingTools](http://www.picklingtools.com/)). It's better to design another way to save inference model, like:
  - design a new protobuf data structure for inference
  - or make `feed_var_names` and `fetch_var_names` a member of [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto#L145)
  - or store `feed_var_names`, `fetch_var_names` and `program_desc_str` sequentially and use some keywords to separate
- `fluid.io.save_inference_model` save the serialized protobuf message of the network and all parameter variables into separated files in the same directory. However, many users may hope to save all the parameters into a single file and init the inference model from a buffer. We should enable to merge all the parameter files into one. This may lead to modification of some operators, such as `load_op`.

Besides, the storing formats of train and inference are different. Users need to save two copies if they want to use the results to fine-tune and infer at the same time. I am not sure whether it is better to unify the storing format.

### Optimize the Inference Program  
- As shown in [#7097](https://github.com/PaddlePaddle/Paddle/pull/7097), many Variables create, such as `velocity_*`, `learning_rate_*` and `*@GRAD`, but not referenced in any op. There are many unreferenced Variable in the inference program. [Prune](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/prune.cc#L108) is called to remove the unreachable operators in the inference program which should remove the unreferenced Variables at the same time.

### Compiling Aspects
- All fluid's core C++ binaries should be contained in a single static or shared library, named like `libpaddle_fluid.a` or `libpaddle_fluid.so`.
  - `libpaddle_fluid.so` should link all the dependent libraries but limits the symbol-table at the same time as [C-API](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/capi/CMakeLists.txt#L73) does.
  - `libpaddle_fluid.a` should not contain any binaries of third-party libraries (`gflags`, `glog`, ...).
- Maybe we need to support the compiler (gcc 4.8.2) that commonly deployed on our developing servers.
  
  
  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plan to develop the inference library of fluid #7145

Framework Design

Refine the Storing Format of Inference Network

Optimize the Inference Program

Compiling Aspects

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Plan to develop the inference library of fluid #7145

Description

Framework Design

Refine the Storing Format of Inference Network

Optimize the Inference Program

Compiling Aspects

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions