🚀 Train Custom Data Tutorial 自定义数据集训练教程 🚀

## Train Custom Data Tutorial (English version)

Note：The Chinese version of the tutorial is located in the second reply below. 中文版教程请看下面第二条回复。

### 0. Examples

The PaddleDetection team provides various **feature detection models based on PP-YOLOE** , which can also be used as a reference to modify on your custom dataset. Please refer to [PP-YOLOE application](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.5/configs/ppyoloe/application), [pphuman](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.5/configs/pphuman), [ppvehicle](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.5/configs/ppvehicle), [visdrone](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.5/configs/visdrone) and [smalldet](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.5/configs/smalldet).

|Scenarios | Related Datasets | Links|
| :--------: | :---------: | :------: |
|Agriculture | [Embrapa WGISD](https://github.com/thsant/wgisd) | [application](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.5/configs/ppyoloe/application) |
|Low light | [ExDark](https://github.com/cs-chan/Exclusively-Dark-Image-Dataset/tree/master/Dataset) | [application](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.5/configs/ppyoloe/application) |
|Industry PCB Flaw | [PKU-Market-PCB](https://robotics.pkusz.edu.cn/resources/dataset/) | [application](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.5/configs/ppyoloe/application) |
|Pedestrian |[CrowdHuman](http://www.crowdhuman.org/) | [pphuman](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.5/configs/pphuman) |
|Vehicle | [BDD100K](https://www.bdd100k.com/), [UA-DETRAC](https://detrac-db.rit.albany.edu/) | [ppvehicle](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.5/configs/ppvehicle) |
|VisDrone | [VisDrone-DET](https://github.com/VisDrone/VisDrone-Dataset) | [visdrone](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.5/configs/visdrone) |
|Small Object |[DOTA](https://captain-whu.github.io/DOAI2019/dataset.html), [xView](http://xviewdataset.org) | [smalldet](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.5/configs/smalldet) |

PaddleDetection also provides **various YOLO models  for VOC dataset** , which can also be used as a reference to modify on your custom dataset. Please refer to [voc](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/voc).


### 1. Custom data preparation：

1.For the annotation of custom dataset, please refer to [DetAnnoTools](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.5/docs/tutorials/data/DetAnnoTools_en.md);

2.For training preparation of custom dataset，please refer to [PrepareDataSet](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.5/docs/tutorials/data/PrepareDetDataSet_en.md).

**Note:**
- For the format of COCO style custom dataset, please refer to [format-data](https://cocodataset.org/#format-data) and [format-results](https://cocodataset.org/#format-results).
- The evaluation metric is COCO, please refer to [detection-eval](https://cocodataset.org/#detection-eval), and install  [cocoapi](https://github.com/cocodataset/cocoapi) at first.


### 2. **Run script**
```
model_type=ppyoloe # modify to 'yolov7'
job_name=ppyoloe_plus_crn_s_80e_coco # modify to 'yolov7_tiny_300e_coco'

config=configs/${model_type}/${job_name}.yml
log_dir=log_dir/${job_name}
# weights=https://bj.bcebos.com/v1/paddledet/models/${job_name}.pdparams
weights=output/${job_name}/model_final.pdparams

# 1.training（single GPU / multi GPU）
# CUDA_VISIBLE_DEVICES=0 python tools/train.py -c ${config} --eval --amp
python -m paddle.distributed.launch --log_dir=${log_dir} --gpus 0,1,2,3,4,5,6,7 tools/train.py -c ${config} --eval --amp

# 2.eval
CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c ${config} -o weights=${weights} --classwise

# 3.infer
CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c ${config} -o weights=${weights} --infer_img=demo/000000014439_640x640.jpg --draw_threshold=0.5

# 4.export
CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c ${config} -o weights=${weights} # exclude_nms=True trt=True

# 5.deploy infer
CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/${job_name} --image_file=demo/000000014439_640x640.jpg --device=GPU

# 6.deploy speed
CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/${job_name} --image_file=demo/000000014439_640x640.jpg --device=GPU --run_benchmark=True # --run_mode=trt_fp16

# 7.export onnx
paddle2onnx --model_dir output_inference/${job_name} --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 12 --save_file ${job_name}.onnx

# 8.onnx speed
/usr/local/TensorRT-8.0.3.4/bin/trtexec --onnx=${job_name}.onnx --workspace=4096 --avgRuns=10 --shapes=input:1x3x640x640 --fp16

```

**Note:**
- Write the above commands in a script file, such as ```run.sh```, and run as：```sh run.sh```. You can also run the command line sentence by sentence.
- If you want to switch models, just modify the first two lines, such as:
  ```
  model_type=yolov7
  job_name=yolov7_tiny_300e_coco
  ```
- For **FLOPs(G) and Params(M)**, you should first install [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim), `pip install paddleslim`, then set `print_flops: True` and `print_params: True` in [runtime.yml](https://github.com/PaddlePaddle/PaddleYOLO/blob/release/2.5/configs/runtime.yml). Make sure **single scale** like 640x640.


### 3. Fintune for training：

In addition to changing the path of the dataset, it is generally recommended to load **the COCO pre training weight of the corresponding model** to fintune, which will converge faster and achieve higher accuracy, such as：

```base
# fintune with single GPU：
# CUDA_VISIBLE_DEVICES=0 python tools/train.py -c ${config} --eval --amp -o pretrain_weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_s_80e_coco.pdparams

# fintune with multi GPU：
python -m paddle.distributed.launch --log_dir=./log_dir --gpus 0,1,2,3,4,5,6,7 tools/train.py -c ${config} --eval --amp -o pretrain_weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_s_80e_coco.pdparams
```

**Note:**
- The fintune training will show that the channels of the last layer of the head classification branch is not matched, which is a normal situation, because the number of custom dataset is generally inconsistent with that of COCO dataset;
- In general, the number of epochs for fintune training can be set less, and the lr setting is also smaller, such as 1/10. The highest accuracy may occur in one of the middle epochs;


### 4. Predict and export:

When using custom dataset to predict and export models, if the path of the TestDataset dataset is set incorrectly, COCO 80 categories will be used by default.

In addition to the correct path setting of the TestDataset dataset, you can also modify and add the corresponding `label_list`. Txt file (one category is recorded in one line), and `anno_path` in TestDataset can also be set as an absolute path, such as:
```
TestDataset:
  !ImageFolder
    anno_path: label_list.txt # if not set dataset_dir, the anno_path will be relative path of PaddleDetection root directory
    # dataset_dir: dataset/my_coco # if set dataset_dir, the anno_path will be dataset_dir/anno_path
```
one line in `label_list.txt` records a corresponding category：
```
person
vehicle
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚀 Train Custom Data Tutorial 自定义数据集训练教程 🚀 #43

Train Custom Data Tutorial (English version)

0. Examples

1. Custom data preparation：

2. Run script

3. Fintune for training：

4. Predict and export:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Scenarios	Related Datasets	Links
Agriculture	Embrapa WGISD	application
Low light	ExDark	application
Industry PCB Flaw	PKU-Market-PCB	application
Pedestrian	CrowdHuman	pphuman
Vehicle	BDD100K, UA-DETRAC	ppvehicle
VisDrone	VisDrone-DET	visdrone
Small Object	DOTA, xView	smalldet