Skip to content

update tutorials #420

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jun 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,7 @@ For more illustration and usage, please refer to the model training section in [
- [Yaml Configuration](docs/en/tutorials/yaml_configuration.md)
- [Text Detection]() (coming soon)
- [Text Recognition](docs/en/tutorials/training_recognition_custom_dataset.md)
- [Distributed Training](docs/cn/tutorials/distribute_train.md)
- [Distributed Training](docs/en/tutorials/distribute_train.md)
- [Advance: Gradient Accumulation, EMA, Resume Training, etc](docs/en/tutorials/advanced_train.md)
- Inference and Deployment
- [Python/C++ Inference on Ascend 310](docs/en/inference/inference_tutorial.md)
Expand Down
8 changes: 4 additions & 4 deletions README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,13 +128,13 @@ python tools/eval.py \

- 数据集
- [数据集准备](tools/dataset_converters/README_CN.md)
- [数据增强策略](docs/en/tutorials/transform_tutorial.md)
- [数据增强策略](docs/cn/tutorials/transform_tutorial.md)
- 模型训练
- [Yaml配置文件](docs/cn/tutorials/yaml_configuration.md)
- [文本检测]() (即将更新)
- [文本识别](docs/cn/tutorials/training_recognition_custom_dataset.md)
- [分布式训练](docs/cn/tutorials/distribute_train.md)
- [进阶技巧:梯度累积,EMA,断点续训等](docs/en/tutorials/advanced_train.md)
- [进阶技巧:梯度累积,EMA,断点续训等](docs/cn/tutorials/advanced_train.md)
- 推理与部署
- [基于Python/C++t和昇腾310的OCR推理](docs/cn/inference/inference_tutorial.md)
- [基于Python的OCR在线推理](tools/infer/text/README.md)
Expand Down Expand Up @@ -207,7 +207,7 @@ MindOCR提供了[数据格式转换工具](tools/dataset_converters) ,以支
- [totaltext](docs/cn/datasets/totaltext.md)
- [mlt2017](docs/cn/datasets/mlt2017.md)
- [chinese_text_recognition](docs/cn/datasets/chinese_text_recognition.md)
3. 增加断点重训(resume training)功能,可在训练意外中断时使用。如需使用,请在配置文件中`model`字段下增加`resume`参数,允许传入具体路径`resume: /path/to/train_resume.ckpt`或者通过设置`resume: True`来加载在ckpt_save_dir下保存的trian_resume.ckpt
3. 增加断点续训(resume training)功能,可在训练意外中断时使用。如需使用,请在配置文件中`model`字段下增加`resume`参数,允许传入具体路径`resume: /path/to/train_resume.ckpt`或者通过设置`resume: True`来加载在ckpt_save_dir下保存的trian_resume.ckpt
4. 改进检测模块的后处理部分:默认情况下,将检测到的文本多边形重新缩放到原始图像空间,可以通过在`eval.dataset.output_columns`列表中增加"shape_list"实现。
5. 重构在线推理以支持更多模型,详情请参见[README.md](tools/infer/text/README.md) 。

Expand All @@ -232,7 +232,7 @@ MindOCR提供了[数据格式转换工具](tools/dataset_converters) ,以支

- 2023/04/21
1. 添加参数分组以支持训练中的正则化。用法:在yaml config中添加`grouping_strategy`参数以选择预定义的分组策略,或使用`no_weight_decay_params`参数选择要从权重衰减中排除的层(例如,bias、norm)。示例可参考`configs/rec/crn/crnn_icdar15.yaml`
2. 添加梯度积累,支持大批量训练。用法:在yaml配置中添加`gradient_accumulation_steps`,全局批量大小=batch_size * devices * gradient_aaccumulation_steps。示例可参考`configs/rec/crn/crnn_icdar15.yaml`
2. 添加梯度累积,支持大批量训练。用法:在yaml配置中添加`gradient_accumulation_steps`,全局批量大小=batch_size * devices * gradient_aaccumulation_steps。示例可参考`configs/rec/crn/crnn_icdar15.yaml`
3. 添加梯度裁剪,支持训练稳定。通过在yaml配置中将`grad_clip`设置为True来启用。

- 2023/03/23
Expand Down
57 changes: 57 additions & 0 deletions docs/cn/tutorials/advanced_train.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# 进阶训练策略

### 策略:梯度累积,梯度裁剪,EMA

训练策略可在模型YAML配置文件中进行配置。请在设置后运行`tools/train.py`脚本进行训练

[Yaml配置文件参考样例](../../../configs/rec/crnn/crnn_icdar15.yaml)

```yaml
train:
gradient_accumulation_steps: 2
clip_grad: True
clip_norm: 5.0
ema: True
ema_decay: 0.9999
```

#### 梯度累积

梯度累积可以有效解决显存不足的问题,使得在同等显存,允许**使用更大的全局batch size进行训练**。可以通过在yaml配置中将`train.gradient_accumulation_steps` 设置为大于1的值来启用梯度累积功能。
等价的全局batch size为:


`global_batch_size = batch_size * num_devices * gradient_accumulation_steps`

#### 梯度裁剪

梯度裁剪通常用来缓解梯度爆炸/溢出问题,以使模型收敛更稳定。可以通过在yaml配置中设置`train.clip_grad`为`True`来启用该功能,调整`train.clip_norm`的值可以控制梯度裁剪范数的大小。


#### EMA

Exponential Moving Average(EMA)是一种平滑模型权重的模型集成方法。它能帮助模型在训练中稳定收敛,并且通常会带来更好的模型性能。
可以通过在yaml配置中设置`train.ema`为`True`来使用该功能,并且可以调整`train.ema_decay`来控制权重衰减率,通常设置为接近1的值.


### 断点续训

断点续训通常用于训练意外中断时,此时使用该功能可以继续从中断处epoch继续训练。可以通过在yaml配置中设置`model.resume`为`True`来使用该功能,用例如下:

```yaml
model:
resume: True
```
>
>默认情况下,它将从`train.ckpt_save_dir`目录中保存的`train_resume.ckpt`恢复。

如果要使用其他epoch用于恢复训练,请在`resume`中指定epoch路径,用例如下:

```yaml
model:
resume: /some/path/to/train_resume.ckpt
```

### OpenI云平台训练

请参考[MindOCR云上训练快速入门](../../cn/tutorials/training_on_openi.md)
1 change: 0 additions & 1 deletion docs/cn/tutorials/distribute_train.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

# 分布式并行训练

本文档提供分布式并行训练的教程,在Ascend处理器上有两种方式可以进行单机多卡训练,通过OpenMPI运行脚本或通过配置RANK_TABLE_FILE进行单机多卡训练。在GPU处理器上可通过OpenMPI运行脚本进行单机多卡训练。
Expand Down
2 changes: 1 addition & 1 deletion docs/cn/tutorials/training_on_openi.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,4 +83,4 @@

## Reference

[1] Modified from https://github.com/mindspore-lab/mindyolo/blob/master/tutorials/modelarts_CN.md
[1] Modified from https://github.com/mindspore-lab/mindyolo/blob/master/tutorials/cloud/openi_CN.md
227 changes: 227 additions & 0 deletions docs/cn/tutorials/transform_tutorial.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,227 @@

# Transformation教程

[![Download Notebook](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r1.8/resource/_static/logo_notebook.png)](https://download.mindspore.cn/toolkits/mindocr/tutorials/transform_tutorial.ipynb) 

### 机制

1. 每个Transformation都是一个具有可调用函数的类。示例如下

```python
class ToCHWImage(object):
""" convert hwc image to chw image
required keys: image
modified keys: image
"""

def __init__(self, **kwargs):
pass

def __call__(self, data: dict):
img = data['image']
if isinstance(img, Image.Image):
img = np.array(img)
data['image'] = img.transpose((2, 0, 1))
return data
```

2. transformation的输入始终是dict,其中包含img_path、raw label等数据信息。

3. transformation api应该明确输入中所需的key以及输出数据中修改或/和添加的key。

可用的transformation可以在`mindocr/data/transforms/*_transform.py`中发现


```python
# import and check available transforms

from mindocr.data.transforms import general_transforms, det_transforms, rec_transforms
```


```python
general_transforms.__all__
```




['DecodeImage', 'NormalizeImage', 'ToCHWImage', 'PackLoaderInputs']




```python
det_transforms.__all__
```




['DetLabelEncode',
'MakeBorderMap',
'MakeShrinkMap',
'EastRandomCropData',
'PSERandomCrop']



### 文本检测

### 1. 加载图像和注释

#### 准备


```python
%load_ext autoreload
%autoreload 2
%reload_ext autoreload
```

The autoreload extension is already loaded. To reload it, use:
%reload_ext autoreload



```python
import os

# load the label file which has the info of image path and annotation.
# This file is generated from the ic15 annotations using the converter script.
label_fp = '/Users/Samit/Data/datasets/ic15/det/train/train_icdar2015_label.txt'
root_dir = '/Users/Samit/Data/datasets/ic15/det/train'

data_lines = []
with open(label_fp, 'r') as f:
for line in f:
data_lines.append(line)

# just pick one image and its annotation
idx = 3
img_path, annot = data_lines[idx].strip().split('\t')

img_path = os.path.join(root_dir, img_path)
print('img_path', img_path)
print('raw annotation: ', annot)


```

img_path /Users/Samit/Data/datasets/ic15/det/train/ch4_training_images/img_612.jpg
raw annotation: [{"transcription": "where", "points": [[483, 197], [529, 174], [530, 197], [485, 221]]}, {"transcription": "people", "points": [[531, 168], [607, 136], [608, 166], [532, 198]]}, {"transcription": "meet", "points": [[613, 128], [691, 100], [691, 131], [613, 160]]}, {"transcription": "###", "points": [[695, 299], [888, 315], [931, 635], [737, 618]]}, {"transcription": "###", "points": [[709, 19], [876, 8], [880, 286], [713, 296]]}, {"transcription": "###", "points": [[530, 270], [660, 246], [661, 300], [532, 324]]}, {"transcription": "###", "points": [[113, 356], [181, 359], [180, 387], [112, 385]]}, {"transcription": "###", "points": [[281, 328], [369, 338], [366, 361], [279, 351]]}, {"transcription": "###", "points": [[66, 314], [183, 313], [183, 328], [68, 330]]}]


#### 解码图像 - DecodeImage


```python
#img_path = '/Users/Samit/Data/datasets/ic15/det/train/ch4_training_images/img_1.jpg'
decode_image = general_transforms.DecodeImage(img_mode='RGB')

# TODO: check the input keys and output keys for the trans. func.

data = {'img_path': img_path}
data = decode_image(data)
img = data['image']

# visualize
from mindocr.utils.visualize import show_img, show_imgs
show_img(img)
```


![output_13_0](https://user-images.githubusercontent.com/20376974/228160967-262e9fe3-1118-49b2-b269-156e44761edf.png)



```python
import time

start = time.time()
att = 100
for i in range(att):
img = decode_image(data)['image']
avg = (time.time() - start) / att

print('avg reading time: ', avg)
```

avg reading time: 0.004545390605926514


#### 检测标签编码 - DetLabelEncode


```python
data['label'] = annot

decode_image = det_transforms.DetLabelEncode()
data = decode_image(data)

#print(data['polys'])
print(data['texts'])

# visualize
from mindocr.utils.visualize import draw_boxes

res = draw_boxes(data['image'], data['polys'])
show_img(res)

```

['where', 'people', 'meet', '###', '###', '###', '###', '###', '###']



![output_16_1](https://user-images.githubusercontent.com/20376974/228161131-c11209d1-f3f0-4a8c-a763-b72d729a4084.png)


### 2. 图像和注释处理/增强

#### RandomCrop - EastRandomCropData


```python
from mindocr.data.transforms.general_transforms import RandomCropWithBBox
import copy

#crop_data = det_transforms.EastRandomCropData(size=(640, 640))
crop_data = RandomCropWithBBox(crop_size=(640, 640))

show_img(data['image'])
for i in range(2):
data_cache = copy.deepcopy(data)
data_cropped = crop_data(data_cache)

res_crop = draw_boxes(data_cropped['image'], data_cropped['polys'])
show_img(res_crop)
```


![output_19_0](https://user-images.githubusercontent.com/20376974/228161220-c56ebd8d-37a0-48a8-9746-3c8da0eaddbb.png)



![output_19_1](https://user-images.githubusercontent.com/20376974/228161306-8359d0b5-f77d-4ec6-8192-fecdaa4c8a1e.png)



![output_19_2](https://user-images.githubusercontent.com/20376974/228161334-8232f0ac-7ca0-49d6-b15a-45b58cb80003.png)


#### ColorJitter


```python
random_color_adj = general_transforms.RandomColorAdjust(brightness=0.4, saturation=0.5)

data_cache = copy.deepcopy(data)
#data_cache['image'] = data_cache['image'][:,:, ::-1]
data_adj = random_color_adj(data_cache)
#print(data_adj)
show_img(data_adj['image'], is_bgr_img=True)
```


![output_21_0](https://user-images.githubusercontent.com/20376974/228161397-c64faae6-b4a2-41ff-9531-5bced781fd9d.png)
Loading