diff --git a/docs/en/faq/develop_a_new_model.md b/docs/en/faq/develop_a_new_model.md index f2e16bd263..b6277bff57 100644 --- a/docs/en/faq/develop_a_new_model.md +++ b/docs/en/faq/develop_a_new_model.md @@ -1,3 +1,281 @@ -# FastDeploy integrates new model process +# How to Integrate New Model on FastDeploy -coming soon... + How to add a new model on FastDeploy, including C++/Python deployment? Here, we take the ResNet50 model in torchvision v0.12.0 as an example, introducing external [Model Integration](#modelsupport) on FastDeploy. The whole process only needs 3 steps. + +| Step | Description | Create or modify the files | +|:-----------:|:--------------------------------------------------------------------------------:|:-----------------------------------------:| +| [1](#step2) | Add a model implementation to the corresponding task module in FastDeploy/vision | resnet.h、resnet.cc、vision.h | +| [2](#step4) | Python interface binding via pybind | resnet_pybind.cc、classification_pybind.cc | +| [3](#step5) | Use Python to call Interface | resnet.py、\_\_init\_\_.py | + +After completing the above 3 steps, an external model is integrated. + +If you want to contribute your code to FastDeploy, it is very kind of you to add test code, instructions (Readme), and code annotations for the added model in the [test](#test). + +## Model Integration + +### Prepare the models + +Before integrating external models, it is important to convert the trained models (.pt, .pdparams, etc.) to the model formats (.onnx, .pdmodel) that FastDeploy supports for deployment. Most open source repositories provide model conversion scripts for developers. As torchvision does not provide conversion scripts, developers can write conversion scripts manually. In this demo, we convert `torchvison.models.resnet50` to `resnet50.onnx` with the following code for your reference. + +```python +import torch +import torchvision.models as models +model = models.resnet50(pretrained=True) +batch_size = 1 #batch size +input_shape = (3, 224, 224) #Input data, change to your own input shape +model.eval() +x = torch.randn(batch_size, *input_shape) # Generate Tensor +export_onnx_file = "resnet50.onnx" # ONNX file name +torch.onnx.export(model, + x, + export_onnx_file, + opset_version=12, + input_names=["input"], # Input names + output_names=["output"], # Output names + dynamic_axes={"input":{0:"batch_size"}, # batch size variables + "output":{0:"batch_size"}}) +``` + +Running the above script will generate a`resnet50.onnx` file. + +### C++ + +* Create`resnet.h` file + * Create a path + * FastDeploy/fastdeploy/vision/classification/contrib/resnet.h (FastDeploy/C++ code/vision/task name/external model name/model name.h) + * Create content + * First, create ResNet class in resnet.h and inherit from FastDeployModel parent class, then declare `Predict`, `Initialize`, `Preprocess`, `Postprocess` and `Constructor`, and necessary variables, please refer to [resnet.h](https://github.com/PaddlePaddle/FastDeploy/pull/347/files#diff-69128489e918f305c208476ba793d8167e77de2aa7cadf5dcbac30da448bd28e) for details. + +```C++ +class FASTDEPLOY_DECL ResNet : public FastDeployModel { + public: + ResNet(...); + virtual bool Predict(...); + private: + bool Initialize(); + bool Preprocess(...); + bool Postprocess(...); +}; +``` + +* Create`resnet.cc` file + * Create a path + * FastDeploy/fastdeploy/vision/classification/contrib/resnet.cc (FastDeploy/C++ code/vision/task name/external model name/model name.cc) + * Create content + * Implement the specific logic of the functions declared in `resnet.h` to `resnet.cc`, where `PreProcess` and `PostProcess` need to refer to the official source library for pre- and post-processing logic reproduction. The specific logic of each ResNet function is as follows. For more detailed code, please refer to [resnet.cc](https:// github.com/PaddlePaddle/FastDeploy/pull/347/files#diff-d229d702de28345253a53f2a5839fd2c638f3d32fffa6a7d04d23db9da13a871). + +```C++ +ResNet::ResNet(...) { + // Constructor logic + // 1. Specify Backend 2. Set RuntimeOption 3. Call Initialize()function +} +bool ResNet::Initialize() { + // Initialization logic + // 1. Assign values to global variables 2. Call InitRuntime()function + return true; +} +bool ResNet::Preprocess(Mat* mat, FDTensor* output) { +// Preprocess logic +// 1. Resize 2. BGR2RGB 3. Normalize 4. HWC2CHW 5. save the results to FDTensor class + return true; +} +bool ResNet::Postprocess(FDTensor& infer_result, ClassifyResult* result, int topk) { + //Postprocess logic + // 1. Softmax 2. Choose topk labels 3. Save the results to ClassifyResult + return true; +} +bool ResNet::Predict(cv::Mat* im, ClassifyResult* result, int topk) { + Preprocess(...) + Infer(...) + Postprocess(...) + return true; +} +``` + +* Add new model file to`vision.h` + * modify location + * FastDeploy/fastdeploy/vision.h + * modify content + +```C++ +#ifdef ENABLE_VISION +#include "fastdeploy/vision/classification/contrib/resnet.h" +#endif +``` + +### Pybind + +* Create Pybind file + + * Create path + + * FastDeploy/fastdeploy/vision/classification/contrib/resnet_pybind.cc (FastDeploy/C++ code/vision model/taks name/external model/model name_pybind.cc) + + * Create content + + * Use Pybind to bind function variables from C++ to Python, please refer to [resnet_pybind.cc](https://github.com/PaddlePaddle/FastDeploy/pull/347/files#diff-270af0d65720310e2cfbd5373c391b2110d65c0f4efa547f7b7eeffcb958bdec) for more details. + + ```C++ + void BindResNet(pybind11::module& m) { + pybind11::class_( + m, "ResNet") + .def(pybind11::init()) + .def("predict", ...) + .def_readwrite("size", &vision::classification::ResNet::size) + .def_readwrite("mean_vals", &vision::classification::ResNet::mean_vals) + .def_readwrite("std_vals", &vision::classification::ResNet::std_vals); + } + ``` + +* Call Pybind function + + * modify path + + * FastDeploy/fastdeploy/vision/classification/classification_pybind.cc (FastDeploy/C++ code/vision model/task name/task name}_pybind.cc) + + * modify content + + ```C++ + void BindResNet(pybind11::module& m); + void BindClassification(pybind11::module& m) { + auto classification_module = + m.def_submodule("classification", "Image classification models."); + BindResNet(classification_module); + } + ``` + +### Python + +* Create`resnet.py`file + * Create path + * FastDeploy/python/fastdeploy/vision/classification/contrib/resnet.py (FastDeploy/Python code/fastdeploy/vision model/task name/external model/model name.py) + * Create content + * Create ResNet class inherited from FastDeployModel, and implement `\_\_init\_\_`, Pybind bonded functions (such as `predict()`), and `functions to assign and get global variables bound to Pybind`. Please refer to [resnet.py](https://github.com/PaddlePaddle/FastDeploy/pull/347/files#diff-a4dc5ec2d450e91f1c03819bf314c238b37ac678df56d7dea3aab7feac10a157) for details + +```python +class ResNet(FastDeployModel): + def __init__(self, ...): + self._model = C.vision.classification.ResNet(...) + def predict(self, input_image, topk=1): + return self._model.predict(input_image, topk) + @property + def size(self): + return self._model.size + @size.setter + def size(self, wh): + ... +``` + +* Import ResNet classes + * modify path + * FastDeploy/python/fastdeploy/vision/classification/\_\_init\_\_.py (FastDeploy/Python code/fastdeploy/vision model/task name/\_\_init\_\_.py) + * modify content + +```Python +from .contrib.resnet import ResNet +``` + +## Test + +### Compile + +* C++ + * Path:FastDeploy/ + +``` +mkdir build & cd build +cmake .. -DENABLE_ORT_BACKEND=ON -DENABLE_VISION=ON -DCMAKE_INSTALL_PREFIX=${PWD/fastdeploy-0.0.3 +-DENABLE_PADDLE_BACKEND=ON -DENABLE_TRT_BACKEND=ON -DWITH_GPU=ON -DTRT_DIRECTORY=/PATH/TO/TensorRT/ +make -j8 +make install +``` + + Compile to get build/fastdeploy-0.0.3/。 + +* Python + * Path:FastDeploy/python/ + +``` +export TRT_DIRECTORY=/PATH/TO/TensorRT/ #If TensorRT is used, developers need to fill in the location of TensorRT and enable ENABLE_TRT_BACKEND +export ENABLE_TRT_BACKEND=ON +export WITH_GPU=ON +export ENABLE_PADDLE_BACKEND=ON +export ENABLE_OPENVINO_BACKEND=ON +export ENABLE_VISION=ON +export ENABLE_ORT_BACKEND=ON +python setup.py build +python setup.py bdist_wheel +cd dist +pip install fastdeploy_gpu_python-Version number-cpxx-cpxxm-system architecture.whl +``` + +### Compile Test Code + +* Create path: FastDeploy/examples/vision/classification/resnet/ (FastDeploy/examples/vision model/task anme/model name/) +* Creating directory structure + +``` +. +├── cpp +│ ├── CMakeLists.txt +│ ├── infer.cc // C++ test code +│ └── README.md // C++ Readme +├── python +│ ├── infer.py // Python test code +│ └── README.md // Python Readme +└── README.md // ResNet model integration readme +``` + +* C++ + * Write CmakeLists、C++ code and README.md . Please refer to[cpp/](https://github.com/PaddlePaddle/FastDeploy/pull/347/files#diff-afcbe607b796509581f89e38b84190717f1eeda2df0419a2ac9034197ead5f96) + * Compile infer.cc + * Path:FastDeploy/examples/vision/classification/resnet/cpp/ + +``` +mkdir build & cd build +cmake .. -DFASTDEPLOY_INSTALL_DIR=/PATH/TO/FastDeploy/build/fastdeploy-0.0.3/ +make +``` + +* Python + * Please refer to[python/](https://github.com/PaddlePaddle/FastDeploy/pull/347/files#diff-5a0d6be8c603a8b81454ac14c17fb93555288d9adf92bbe40454449309700135) for Python code and Readme.md + +### Annotate the Code + + + +To make the code clear for understanding, developers can annotate the newly-added code. + +- C++ code + Developers need to add annotations for functions and variables in the resnet.h file, there are three annotating methods as follows, please refer to [resnet.h](https://github.com/PaddlePaddle/FastDeploy/pull/347/files#diff- 69128489e918f305c208476ba793d8167e77de2aa7cadf5dcbac30da448bd28e) for more details. + +```C++ +/** \brief Predict for the input "im", the result will be saved in "result". +* +* \param[in] im Input image for inference. +* \param[in] result Saving the inference result. +* \param[in] topk The length of return values, e.g., if topk==2, the result will include the 2 most possible class label for input image. +*/ +virtual bool Predict(cv::Mat* im, ClassifyResult* result, int topk = 1); +/// Tuple of (width, height) +std::vector size; +/*! @brief Initialize for ResNet model, assign values to the global variables and call InitRuntime() +*/ +bool Initialize(); +``` + +- Python + The following example is to demonstrate how to annotate functions and variables in resnet.py file. For more details, please refer to [resnet.py](https://github.com/PaddlePaddle/FastDeploy/pull/347/files#diff-a4dc5ec2d450e91f1c03819bf314c238b37ac678df56d7dea3aab7feac10a157). + +```python + def predict(self, input_image, topk=1): + """Classify an input image + :param input_image: (numpy.ndarray)The input image data, 3-D array with layout HWC, BGR format + :param topk: (int)The topk result by the classify confidence score, default 1 + :return: ClassifyResult + """ + return self._model.predict(input_image, topk) +``` + +Other files in the integration process can also be annotated to explain the details of the implementation. diff --git a/examples/vision/detection/yolov5/quantize/README_EN.md b/examples/vision/detection/yolov5/quantize/README_EN.md new file mode 100644 index 0000000000..c439470adc --- /dev/null +++ b/examples/vision/detection/yolov5/quantize/README_EN.md @@ -0,0 +1,29 @@ +# YOLOv5 Quantized Model Deployment + +FastDeploy supports the deployment of quantized models and provides a one-click model quantization tool. +Users can use the one-click model quantization tool to quantize and deploy the models themselves or download the quantized models provided by FastDeploy directly for deployment. + +## FastDeploy One-Click Model Quantization Tool + +FastDeploy provides a one-click quantization tool that allows users to quantize a model simply with a configuration file. +For a detailed tutorial, please refer to: [One-Click Model Quantization Tool](... /... /... /... /... /... /... /tools/quantization/) + +## Download Quantized YOLOv5s Model + +Users can also directly download the quantized models in the table below for deployment. + +| Model | Inference Backend | Hardware | FP32 Inference Time Delay | INT8  Inference Time Delay | Acceleration ratio | FP32 mAP | INT8 mAP | Method | +| ----------------------------------------------------------------------- | ----------------- | -------- | ------------------------- | -------------------------- | ------------------ | -------- | -------- | ------------------------------- | +| [YOLOv5s](https://bj.bcebos.com/paddlehub/fastdeploy/yolov5s_quant.tar) | TensorRT | GPU | 8.79 | 5.17 | 1.70 | 37.6 | 36.6 | Quantized distillation training | +| [YOLOv5s](https://bj.bcebos.com/paddlehub/fastdeploy/yolov5s_quant.tar) | Paddle Inference | CPU | 217.05 | 133.31 | 1.63 | 37.6 | 36.8 | Quantized distillation training | + +The data in the above table shows the end-to-end inference performance of FastDeploy deployment before and after model quantization. + +- The test images are from COCO val2017. +- The inference time delay is the inference latency on different Runtime in milliseconds. +- CPU is Intel(R) Xeon(R) Gold 6271C, GPU is Tesla T4, TensorRT version 8.4.15, and the fixed CPU thread is 1 for all tests. + +## More Detailed Tutorials + +- [Python Deployment](python) +- [C++ Deployment](cpp) diff --git a/examples/vision/detection/yolov5/serving/README_EN.md b/examples/vision/detection/yolov5/serving/README_EN.md new file mode 100644 index 0000000000..db110efabd --- /dev/null +++ b/examples/vision/detection/yolov5/serving/README_EN.md @@ -0,0 +1,57 @@ +# YOLOv5 Serving Deployment Demo + +## Launch Serving + +```bash +#Download yolov5 model file +wget https://bj.bcebos.com/paddlehub/fastdeploy/yolov5s.onnx + +# Save the model under models/infer/1 and rename it as model.onnx +mv yolov5s.onnx models/infer/1/ + +# Pull fastdeploy image +docker pull paddlepaddle/fastdeploy:0.3.0-gpu-cuda11.4-trt8.4-21.10 + +# Run the docker. The docker name is fd_serving, and the current directory is mounted as the docker's /yolov5_serving directory +nvidia-docker run -it --net=host --name fd_serving -v `pwd`/:/yolov5_serving paddlepaddle/fastdeploy:0.3.0-gpu-cuda11.4-trt8.4-21.10 bash + +# Start the service (Without setting the CUDA_VISIBLE_DEVICES environment variable, it will have scheduling privileges for all GPU cards) +CUDA_VISIBLE_DEVICES=0 fastdeployserver --model-repository=models --backend-config=python,shm-default-byte-size=10485760 +``` + +Output the following contents if serving is launched + +``` +...... +I0928 04:51:15.784517 206 grpc_server.cc:4117] Started GRPCInferenceService at 0.0.0.0:8001 +I0928 04:51:15.785177 206 http_server.cc:2815] Started HTTPService at 0.0.0.0:8000 +I0928 04:51:15.826578 206 http_server.cc:167] Started Metrics Service at 0.0.0.0:8002 +``` + +## Client Requests + +Execute the following command in the physical machine to send a grpc request and output the result + +``` +#Download test images +wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg + +#Installing client-side dependencies +python3 -m pip install tritonclient\[all\] + +# Send requests +python3 yolov5_grpc_client.py +``` + +When the request is sent successfully, the results are returned in json format and printed out: + +``` +output_name: detction_result +{'boxes': [[268.48028564453125, 81.05305480957031, 298.69476318359375, 169.43902587890625], [104.73116302490234, 45.66197204589844, 127.58382415771484, 93.44938659667969], [378.9093933105469, 39.75013732910156, 395.6086120605469, 84.24342346191406], [158.552978515625, 80.36149597167969, 199.18576049804688, 168.18191528320312], [414.37530517578125, 90.94805908203125, 506.3218994140625, 280.40521240234375], [364.00341796875, 56.608917236328125, 381.97857666015625, 115.96823120117188], [351.7251281738281, 42.635345458984375, 366.9103088378906, 98.04837036132812], [505.8882751464844, 114.36674499511719, 593.1248779296875, 275.99530029296875], [327.7086181640625, 38.36369323730469, 346.84991455078125, 80.89302062988281], [583.493408203125, 114.53289794921875, 612.3546142578125, 175.87353515625], [186.4706573486328, 44.941375732421875, 199.6645050048828, 61.037628173828125], [169.6158905029297, 48.01460266113281, 178.1415557861328, 60.88859558105469], [25.81019401550293, 117.19969177246094, 59.88878631591797, 152.85012817382812], [352.1452941894531, 46.71272277832031, 381.9460754394531, 106.75212097167969], [1.875, 150.734375, 37.96875, 173.78125], [464.65728759765625, 15.901412963867188, 472.512939453125, 34.11640930175781], [64.625, 135.171875, 84.5, 154.40625], [57.8125, 151.234375, 103.0, 174.15625], [165.890625, 88.609375, 527.90625, 339.953125], [101.40625, 152.5625, 118.890625, 169.140625]], 'scores': [0.8965693116188049, 0.8695310950279236, 0.8684297800064087, 0.8429877758026123, 0.8358422517776489, 0.8151364326477051, 0.8089362382888794, 0.801361083984375, 0.7947245836257935, 0.7606497406959534, 0.6325908303260803, 0.6139386892318726, 0.5906146764755249, 0.505328893661499, 0.40457233786582947, 0.3460320234298706, 0.33283042907714844, 0.3325657248497009, 0.2594234347343445, 0.25389009714126587], 'label_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 24, 0, 24, 24, 33, 24], 'masks': [], 'contain_masks': False} +``` + +## Modify Configs + + + +The default is to run ONNXRuntime on GPU. If developers need to run it on CPU or other inference engines, please see the [Configs File](../../../../../serving/docs/zh_CN/model_configuration.md) to modify the configs in `models/runtime/config.pbtxt`. diff --git a/examples/vision/detection/yolov6/quantize/README_EN.md b/examples/vision/detection/yolov6/quantize/README_EN.md new file mode 100644 index 0000000000..5fd3082bcc --- /dev/null +++ b/examples/vision/detection/yolov6/quantize/README_EN.md @@ -0,0 +1,29 @@ +# YOLOv6 Quantized Model Deployment + +FastDeploy supports the deployment of quantized models and provides a one-click model quantization tool. +Users can use the one-click model quantization tool to quantize and deploy the models themselves or download the quantized models provided by FastDeploy directly for deployment. + +## FastDeploy One-Click Model Quantization Tool + +FastDeploy provides a one-click quantization tool that allows users to quantize a model simply with a configuration file. +For detailed tutorial, please refer to : [One-Click Model Quantization Tool](... /... /... /... /... /... /... /tools/quantization/) + +## Download Quantized YOLOv6s Model + +Users can also directly download the quantized models in the table below for deployment. + +| Model | Inference Backend | Hardware | FP32  Inference Time Delay | INT8 Inference Time Delay | Acceleration ratio | FP32 mAP | INT8 mAP | Method | +| ----------------------------------------------------------------------- | ----------------- | -------- | -------------------------- | ------------------------- | ------------------ | -------- | -------- | ------------------------------- | +| [YOLOv6s](https://bj.bcebos.com/paddlehub/fastdeploy/yolov6s_quant.tar) | TensorRT | GPU | 8.60 | 5.16 | 1.67 | 42.5 | 40.6 | Quantized distillation training | +| [YOLOv6s](https://bj.bcebos.com/paddlehub/fastdeploy/yolov6s_quant.tar) | Paddle Inference | CPU | 356.62 | 125.72 | 2.84 | 42.5 | 41.2 | Quantized distillation training | + +The data in the above table shows the end-to-end inference performance of FastDeploy deployment before and after model quantization. + +- The test images are from COCO val2017. +- The inference time delay is the inference latency on different Runtimes in milliseconds. +- CPU is Intel(R) Xeon(R) Gold 6271C, GPU is Tesla T4, TensorRT version 8.4.15, and the fixed CPU thread is 1 for all tests. + +## More Detailed Tutorials + +- [Python Deployment](python) +- [C++ Deployment](cpp) diff --git a/examples/vision/detection/yolov7/quantize/README_EN.md b/examples/vision/detection/yolov7/quantize/README_EN.md new file mode 100644 index 0000000000..039000d9e9 --- /dev/null +++ b/examples/vision/detection/yolov7/quantize/README_EN.md @@ -0,0 +1,29 @@ +# YOLOv7 Quantized Model Deployment + +FastDeploy supports the deployment of quantized models and provides a one-click model quantization tool. +Users can use the one-click model quantization tool to quantize and deploy the models themselves or download the quantized models provided by FastDeploy directly for deployment. + +## FastDeploy One-Click Model Quantization Tool + +FastDeploy provides a one-click quantization tool that allows users to quantize a model simply with a configuration file. +For detailed tutorial, please refer to : [One-Click Model Quantization Tool](... /... /... /... /... /... /... /tools/quantization/) + +## Download Quantized YOLOv7 Model + +Users can also directly download the quantized models in the table below for deployment. + +| Model | Inference Backend | Hardware | FP32 Inference Time Delay | FP32 Inference Time Delay | Acceleration ratio | FP32 mAP | INT8 mAP | Method | +| --------------------------------------------------------------------- | ----------------- | -------- | ------------------------- | ------------------------- | ------------------ | -------- | -------- | ------------------------------- | +| [YOLOv7](https://bj.bcebos.com/paddlehub/fastdeploy/yolov7_quant.tar) | TensorRT | GPU | 24.57 | 9.40 | 2.61 | 51.1 | 50.8 | Quantized distillation training | +| [YOLOv7](https://bj.bcebos.com/paddlehub/fastdeploy/yolov7_quant.tar) | Paddle Inference | CPU | 1022.55 | 490.87 | 2.08 | 51.1 | 46.3 | Quantized distillation training | + +The data in the above table shows the end-to-end inference performance of FastDeploy deployment before and after model quantization. + +- The test images are from COCO val2017. +- The inference time delay is the inference latency on different Runtimes in milliseconds. +- CPU is Intel(R) Xeon(R) Gold 6271C, GPU is Tesla T4, TensorRT version 8.4.15, and the fixed CPU thread is 1 for all tests. + +## More Detailed Tutorials + +- [Python Deployment](python) +- [C++ Deployment](cpp) diff --git a/serving/README_EN.md b/serving/README_EN.md index b2ac1f61c7..b7bb028f5f 100644 --- a/serving/README_EN.md +++ b/serving/README_EN.md @@ -1 +1,43 @@ -English | [简体中文](README_CN.md) +[简体中文](README_CN.md) | English + +# FastDeploy Serving Deployment + +## Introduction + +FastDeploy builds an end-to-end serving deployment based on [Triton Inference Server](https://github.com/triton-inference-server/server). The underlying backend uses the FastDeploy high-performance Runtime module and integrates the FastDeploy pre- and post-processing modules to achieve end-to-end serving deployment. It can achieve fast deployment with easy-to-use process and excellent performance. + +## Prepare the environment + +### Environment requirements + +- Linux +- If using a GPU image, NVIDIA Driver >= 470 is required (for older Tesla architecture GPUs, such as T4, the NVIDIA Driver can be 418.40+, 440.33+, 450.51+, 460.27+) + +### Obtain Image + +#### CPU Image + +CPU images only support Paddle/ONNX models for serving deployment on CPUs, and supported inference backends include OpenVINO, Paddle Inference, and ONNX Runtime + +```shell +docker pull paddlepaddle/fastdeploy:0.3.0-cpu-only-21.10 +``` + +#### GPU Image + +GPU images support Paddle/ONNX models for serving deployment on GPU and CPU, and supported inference backends including OpenVINO, TensorRT, Paddle Inference, and ONNX Runtime + +``` +docker pull paddlepaddle/fastdeploy:0.3.0-gpu-cuda11.4-trt8.4-21.10 +``` + +Users can also compile the image by themselves according to their own needs, referring to the following documents: + +- [FastDeploy Serving Deployment Image Compilation](docs/zh_CN/compile.md) + +## Other Tutorials + +- [How to Prepare Serving Model Repository](docs/zh_CN/model_repository.md) +- [Serving Deployment Configuration for Runtime](docs/zh_CN/model_configuration.md) +- [Serving Deployment Demo](docs/zh_CN/demo.md) + - [YOLOV5 - Detection Task](../examples/vision/detection/yolov5/serving/README.md) diff --git a/serving/docs/EN/compile-en.md b/serving/docs/EN/compile-en.md new file mode 100644 index 0000000000..fd72d76734 --- /dev/null +++ b/serving/docs/EN/compile-en.md @@ -0,0 +1,42 @@ +# FastDeploy Serving Deployment Image Compilation + +How to create a FastDploy image + +## GPU Image + +The GPU images published by FastDploy are based on version 21.10 of [Triton Inference Server](https://github.com/triton-inference-server/server). If developers need to use other CUDA versions, please refer to [ NVIDIA official website](https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html) to modify the scripts in Dockerfile and scripts. + +```shell +# Enter the serving directory and execute the script to compile the FastDeploy and serving backend +cd serving +bash scripts/build.sh + +# Exit to the FastDeploy home directory and create the image +cd ../ +docker build -t paddlepaddle/fastdeploy:0.3.0-gpu-cuda11.4-trt8.4-21.10 -f serving/Dockerfile . +``` + +## CPU Image + +```shell +# Enter the serving directory and execute the script to compile the FastDeploy and serving backend +cd serving +cd serving +bash scripts/build.sh OFF + +# Exit to the FastDeploy home directory and create the image +cd ../ +docker build -t paddlepaddle/fastdeploy:0.3.0-cpu-only-21.10 -f serving/Dockerfile_cpu . +``` + +## IPU Image + +```shell +# Enter the serving directory and execute the script to compile the FastDeploy and serving backend +cd serving +bash scripts/build_fd_ipu.sh + +# Exit to the FastDeploy home directory and create the image +cd ../ +docker build -t paddlepaddle/fastdeploy:0.3.0-ipu-only-21.10 -f serving/Dockerfile_ipu . +``` diff --git a/serving/docs/EN/model_repository-en.md b/serving/docs/EN/model_repository-en.md new file mode 100644 index 0000000000..6d8251549a --- /dev/null +++ b/serving/docs/EN/model_repository-en.md @@ -0,0 +1,84 @@ +# Model Repository + +FastDeploy starts the serving by specifying one or more models in the model repository to deploy the service. When the serving is running, the models in the service can be modified following [Model Management](https://github.com/triton-inference-server/server/blob/main/docs/model_management.md), and obtain serving from one or more model repositories specified at the serving initiation. + +## Repository Architecture + +The model repository path is specified via the *--model-repository* option at FastDeploy's initation, and multiple repositories can be loaded by specifying the *--model-repository* option multiple times. Example: + +``` +$ fastdeploy --model-repository= +``` + +Model repository architecture should comply the following format: + +``` + / + / + [config.pbtxt] + [ ...] + / + + / + + ... + / + [config.pbtxt] + [ ...] + / + + / + + ... + ... +``` + +At the topmost `` model repository directory, there must be 0 or more `` subdirectories. Each `` subdirectory contains information corresponding to the model deployment, multiple numeric subdirectories indicating the model version, and a *config.pbtxt* file describing the model configuration. + +Paddle models are saved in the version number subdirectory, which must be `model.pdmodel` and `model.pdiparams` files. + +## Model Version + +Each model can have one or more versions available in the repository. The subdirectory named with a number in the model directory implies the version number. Subdirectories that are not named with a number, or that start with *0* will be ignored. A [version policy](https://github.com/triton-inference-server/server/blob/main/docs/model_configuration.md#version-policy) can be specified in the model configuration file to control which version of the model in model directory is launched by Triton. + +## Repository Demo + +The model needed for Paddle deployment must be an inference model exported from version 2.0 or higher. The model contains `model.pdmodel` and `model.pdiparams` in the version directory. + +Example: A minimal model repository directory for deploying Paddle models + +``` + / + / + config.pbtxt + 1/ + model.pdmodel + model.pdiparams + + # Example: + models + └── ResNet50 + ├── 1 + │   ├── model.pdiparams + │   └── model.pdmodel + └── config.pbtxt +``` + +To deploy an ONNX model, model with the name `model.onnx` must be included in the version directory + +Example: A minimal model repository directory for deploying ONNX models + +``` + / + / + config.pbtxt + 1/ + model.onnx + + # Example: + models + └── ResNet50 + ├── 1 + │   ├── model.onnx + └── config.pbtxt +``` diff --git a/tools/auto_compression/README_EN.md b/tools/auto_compression/README_EN.md new file mode 100644 index 0000000000..e53a1a37a9 --- /dev/null +++ b/tools/auto_compression/README_EN.md @@ -0,0 +1,136 @@ +# FastDeploy One-Click Model Auto Compression + + + +FastDeploy, based on PaddleSlim's Auto Compression Toolkit(ACT), provides developers with a one-click model auto compression tool that supports post-training quantization and knowledge distillation training. +We take the Yolov5 series as an example to demonstrate how to install and execute FastDeploy's one-click model auto compression. + +## 1.Install + +### Environment Dependencies + +1. Install the develop version downloaded from PaddlePaddle official website. + +``` +https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/develop/install/pip/linux-pip.html +``` + +2.Install PaddleSlim-develop + +```bash +git clone https://github.com/PaddlePaddle/PaddleSlim.git & cd PaddleSlim +python setup.py install +``` + +### Install Fastdeploy Auto Compression Toolkit + +Run the following command in the current directory + +``` +python setup.py install +``` + +## 2. How to Use + +### Demo for One-Click Auto Compression Toolkit + +Fastdeploy Auto Compression can include multiple strategies, At present, offline quantization and quantization distillation are mainly used for training. The following will introduce how to use it from two strategies, offline quantization and quantitative distillation. + +#### Offline Quantization + +##### 1. Prepare models and Calibration data set + +Developers need to prepare the model to be quantized and the Calibration dataset on their own. +In this demo, developers can execute the following command to download the yolov5s.onnx model to be quantized and calibration data set. + +```shell +# Download yolov5.onnx +wget https://paddle-slim-models.bj.bcebos.com/act/yolov5s.onnx + +# Download dataset. This Calibration dataset is the first 320 images from COCO val2017 +wget https://bj.bcebos.com/paddlehub/fastdeploy/COCO_val_320.tar.gz +tar -xvf COCO_val_320.tar.gz +``` + +##### 2. Run fastdeploy_auto_compress command to compress the model + +The following command is to quantize the yolov5s model, if developers want to quantize other models, replace the config_path with other model configuration files in the configs folder. + +```shell +fastdeploy_quant --config_path=./configs/detection/yolov5s_quant.yaml --method='PTQ' --save_dir='./yolov5s_ptq_model/' +``` + +[notice] PTQ is short for post-training quantization + +##### 3. Parameters + +To complete the quantization, developers only need to provide a customized model config file, specify the quantization method, and the path to save the quantized model. + +| Parameter | Description | +| ------------- | ------------------------------------------------------------------------------------------------------------- | +| --config_path | Quantization profiles needed for one-click quantization.[Configs](./configs/README.md) | +| --method | Quantization method selection, PTQ for post-training quantization, QAT for quantization distillation training | +| --save_dir | Output of quantized model paths, which can be deployed directly in FastDeploy | + +#### Quantized distillation training + +##### 1.Prepare the model to be quantized and the training data set + +FastDeploy currently supports quantized distillation training only for images without annotation. It does not support evaluating model accuracy during training. +The datasets are images from inference application, and the number of images is determined by the size of the dataset, covering all deployment scenarios as much as possible. In this demo, we prepare the first 320 images from the COCO2017 validation set for users. +Note: If users want to obtain a more accurate quantized model through quantized distillation training, feel free to prepare more data and train more rounds. + +```shell +# Download yolov5.onnx +wget https://paddle-slim-models.bj.bcebos.com/act/yolov5s.onnx + +# Download dataset. This Calibration dataset is the first 320 images from COCO val2017 +wget https://bj.bcebos.com/paddlehub/fastdeploy/COCO_val_320.tar.gz +tar -xvf COCO_val_320.tar.gz +``` + +##### 2.Use fastdeploy_auto_compress command to compress models + +The following command is to quantize the yolov5s model, if developers want to quantize other models, replace the config_path with other model configuration files in the configs folder. + +```shell +# Please specify the single card GPU before training, otherwise it may get stuck during the training process. +export CUDA_VISIBLE_DEVICES=0 +fastdeploy_quant --config_path=./configs/detection/yolov5s_quant.yaml --method='QAT' --save_dir='./yolov5s_qat_model/' +``` + +##### 3.Parameters + +To complete the quantization, developers only need to provide a customized model config file, specify the quantization method, and the path to save the quantized model. + +| Parameter | Description | +| ------------- | ------------------------------------------------------------------------------------------------------------- | +| --config_path | Quantization profiles needed for one-click quantization.[Configs](./configs/README.md) | +| --method | Quantization method selection, PTQ for post-training quantization, QAT for quantization distillation training | +| --save_dir | Output of quantized model paths, which can be deployed directly in FastDeploy | + +## 3. FastDeploy One-Click Model Auto Compression Config file examples +FastDeploy currently provides users with compression [config](./configs/) files of multiple models, and the corresponding FP32 model, Users can directly download and experience it. + +| Config文件 | 待压缩的FP32模型 | 备注 | +| -------------------- | ------------------------------------------------------------ |----------------------------------------- | +| [mobilenetv1_ssld_quant](./configs/classification/mobilenetv1_ssld_quant.yaml) | [mobilenetv1_ssld](https://bj.bcebos.com/paddlehub/fastdeploy/MobileNetV1_ssld_infer.tgz) | | +| [resnet50_vd_quant](./configs/classification/resnet50_vd_quant.yaml) | [resnet50_vd](https://bj.bcebos.com/paddlehub/fastdeploy/ResNet50_vd_infer.tgz) | | +| [yolov5s_quant](./configs/detection/yolov5s_quant.yaml) | [yolov5s](https://paddle-slim-models.bj.bcebos.com/act/yolov5s.onnx) | | +| [yolov6s_quant](./configs/detection/yolov6s_quant.yaml) | [yolov6s](https://paddle-slim-models.bj.bcebos.com/act/yolov6s.onnx) | | +| [yolov7_quant](./configs/detection/yolov7_quant.yaml) | [yolov7](https://paddle-slim-models.bj.bcebos.com/act/yolov7.onnx) | | +| [ppyoloe_withNMS_quant](./configs/detection/ppyoloe_withNMS_quant.yaml) | [ppyoloe_l](https://bj.bcebos.com/v1/paddle-slim-models/act/ppyoloe_crn_l_300e_coco.tar) | Support PPYOLOE's s,m,l,x series models, export the model normally when exporting the model from PaddleDetection, do not remove NMS | +| [ppyoloe_plus_withNMS_quant](./configs/detection/ppyoloe_plus_withNMS_quant.yaml) | [ppyoloe_plus_s](https://bj.bcebos.com/paddlehub/fastdeploy/ppyoloe_plus_crn_s_80e_coco.tar) | Support PPYOLOE+'s s,m,l,x series models, export the model normally when exporting the model from PaddleDetection, do not remove NMS | +| [pp_liteseg_quant](./configs/segmentation/pp_liteseg_quant.yaml) | [pp_liteseg](https://bj.bcebos.com/paddlehub/fastdeploy/PP_LiteSeg_T_STDC1_cityscapes_without_argmax_infer.tgz) | | + +## 3. Deploy quantized models on FastDeploy + +Once obtained the quantized model, developers can deploy it on FastDeploy. Please refer to the following docs for more details + +- [YOLOv5 Quantized Model Deployment](../../examples/vision/detection/yolov5/quantize/) + +- [YOLOv6 Quantized Model Deployment](../../examples/vision/detection/yolov6/quantize/) + +- [YOLOv7 Quantized Model Deployment](../../examples/vision/detection/yolov7/quantize/) + +- [PadddleClas Quantized Model Deployment](../../examples/vision/classification/paddleclas/quantize/) diff --git a/tools/auto_compression/configs/README_EN.md b/tools/auto_compression/configs/README_EN.md new file mode 100644 index 0000000000..e2221a8473 --- /dev/null +++ b/tools/auto_compression/configs/README_EN.md @@ -0,0 +1,51 @@ +# Quantization Config File on FastDeploy + +The FastDeploy quantization configuration file contains global configuration, quantization distillation training configuration, post-training quantization configuration and training configuration. +In addition to using the configuration files provided by FastDeploy directly in this directory, users can modify the relevant configuration files according to their needs + +## Demo + +``` +# Global config +Global: + model_dir: ./yolov5s.onnx #Path to input model + format: 'onnx' #Input model format, please select 'paddle' for paddle model + model_filename: model.pdmodel #Quantized model name in Paddle format + params_filename: model.pdiparams #Parameter name for quantized model name in Paddle format + image_path: ./COCO_val_320 #Data set paths for post-training quantization or quantized distillation + arch: YOLOv5 #Model Architecture + input_list: ['x2paddle_images'] #Input name of the model to be quantified + preprocess: yolo_image_preprocess #The preprocessing functions for the data when quantizing the model. Developers can modify or write a new one in . /fdquant/dataset.py + +#uantization distillation training configuration +Distillation: + alpha: 1.0 # Distillation loss weight + loss: soft_label #Distillation loss algorithm + +Quantization: + onnx_format: true #Whether to use ONNX quantization standard format or not, must be true to deploy on FastDeploye + use_pact: true #Whether to use the PACT method for training + activation_quantize_type: 'moving_average_abs_max' #Activate quantization methods + quantize_op_types: #OPs that need to be quantized + - conv2d + - depthwise_conv2d + +#Post-Training Quantization +PTQ: + calibration_method: 'avg' #Activate calibration algorithm of post-training quantization , Options: avg, abs_max, hist, KL, mse, emd + skip_tensor_list: None #Developers can skip some conv layers‘ quantization + +#Traning +TrainConfig: + train_iter: 3000 + learning_rate: 0.00001 + optimizer_builder: + optimizer: + type: SGD + weight_decay: 4.0e-05 + target_metric: 0.365 +``` + +## More details + +FastDeploy one-click quantization tool is powered by PaddeSlim, please refer to [Automated Compression of Hyperparameter Tutorial](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/hyperparameter_tutorial.md) for more details.