mindspore-lab · HaoyangLee · Jun 19, 2023 · Jun 16, 2023 · Jun 16, 2023 · Jun 16, 2023
diff --git a/README.md b/README.md
@@ -137,7 +137,7 @@ For more illustration and usage, please refer to the model training section in [
     - [Yaml Configuration](docs/en/tutorials/yaml_configuration.md)
     - [Text Detection]()  (coming soon)
     - [Text Recognition](docs/en/tutorials/training_recognition_custom_dataset.md)
-    - [Distributed Training](docs/cn/tutorials/distribute_train.md)
+    - [Distributed Training](docs/en/tutorials/distribute_train.md)
     - [Advance: Gradient Accumulation, EMA, Resume Training, etc](docs/en/tutorials/advanced_train.md)
 - Inference and Deployment
     - [Python/C++ Inference on Ascend 310](docs/en/inference/inference_tutorial.md)

diff --git a/README_CN.md b/README_CN.md
@@ -128,13 +128,13 @@ python tools/eval.py \
 
 - 数据集
     - [数据集准备](tools/dataset_converters/README_CN.md)
-    - [数据增强策略](docs/en/tutorials/transform_tutorial.md)
+    - [数据增强策略](docs/cn/tutorials/transform_tutorial.md)
 - 模型训练
     - [Yaml配置文件](docs/cn/tutorials/yaml_configuration.md)
     - [文本检测]()  (即将更新)
     - [文本识别](docs/cn/tutorials/training_recognition_custom_dataset.md)
     - [分布式训练](docs/cn/tutorials/distribute_train.md)
-    - [进阶技巧：梯度累积，EMA，断点续训等](docs/en/tutorials/advanced_train.md)
+    - [进阶技巧：梯度累积，EMA，断点续训等](docs/cn/tutorials/advanced_train.md)
 - 推理与部署
     - [基于Python/C++t和昇腾310的OCR推理](docs/cn/inference/inference_tutorial.md)
     - [基于Python的OCR在线推理](tools/infer/text/README.md)
@@ -207,7 +207,7 @@ MindOCR提供了[数据格式转换工具](tools/dataset_converters) ，以支
     - [totaltext](docs/cn/datasets/totaltext.md)
     - [mlt2017](docs/cn/datasets/mlt2017.md)
     - [chinese_text_recognition](docs/cn/datasets/chinese_text_recognition.md)
-3. 增加断点重训(resume training)功能，可在训练意外中断时使用。如需使用，请在配置文件中`model`字段下增加`resume`参数，允许传入具体路径`resume: /path/to/train_resume.ckpt`或者通过设置`resume: True`来加载在ckpt_save_dir下保存的trian_resume.ckpt
+3. 增加断点续训(resume training)功能，可在训练意外中断时使用。如需使用，请在配置文件中`model`字段下增加`resume`参数，允许传入具体路径`resume: /path/to/train_resume.ckpt`或者通过设置`resume: True`来加载在ckpt_save_dir下保存的trian_resume.ckpt
 4. 改进检测模块的后处理部分：默认情况下，将检测到的文本多边形重新缩放到原始图像空间，可以通过在`eval.dataset.output_columns`列表中增加"shape_list"实现。
 5. 重构在线推理以支持更多模型，详情请参见[README.md](tools/infer/text/README.md) 。
 
@@ -232,7 +232,7 @@ MindOCR提供了[数据格式转换工具](tools/dataset_converters) ，以支
 
 - 2023/04/21
 1. 添加参数分组以支持训练中的正则化。用法：在yaml config中添加`grouping_strategy`参数以选择预定义的分组策略，或使用`no_weight_decay_params`参数选择要从权重衰减中排除的层（例如，bias、norm）。示例可参考`configs/rec/crn/crnn_icdar15.yaml`
-2. 添加梯度积累，支持大批量训练。用法：在yaml配置中添加`gradient_accumulation_steps`，全局批量大小=batch_size * devices * gradient_aaccumulation_steps。示例可参考`configs/rec/crn/crnn_icdar15.yaml`
+2. 添加梯度累积，支持大批量训练。用法：在yaml配置中添加`gradient_accumulation_steps`，全局批量大小=batch_size * devices * gradient_aaccumulation_steps。示例可参考`configs/rec/crn/crnn_icdar15.yaml`
 3. 添加梯度裁剪，支持训练稳定。通过在yaml配置中将`grad_clip`设置为True来启用。
 
 - 2023/03/23

diff --git a/docs/cn/tutorials/advanced_train.md b/docs/cn/tutorials/advanced_train.md
@@ -0,0 +1,57 @@
+# 进阶训练策略
+
+### 策略：梯度累积，梯度裁剪，EMA
+
+训练策略可在模型YAML配置文件中进行配置。请在设置后运行`tools/train.py`脚本进行训练
+
+[Yaml配置文件参考样例](../../../configs/rec/crnn/crnn_icdar15.yaml)
+
+```yaml
+train:
+  gradient_accumulation_steps: 2
+  clip_grad: True
+  clip_norm: 5.0
+  ema: True
+  ema_decay: 0.9999
+```
+
+#### 梯度累积
+
+梯度累积可以有效解决显存不足的问题，使得在同等显存，允许**使用更大的全局batch size进行训练**。可以通过在yaml配置中将`train.gradient_accumulation_steps` 设置为大于1的值来启用梯度累积功能。
+等价的全局batch size为：
+
+
+`global_batch_size = batch_size * num_devices * gradient_accumulation_steps`
+
+#### 梯度裁剪
+
+梯度裁剪通常用来缓解梯度爆炸/溢出问题，以使模型收敛更稳定。可以通过在yaml配置中设置`train.clip_grad`为`True`来启用该功能，调整`train.clip_norm`的值可以控制梯度裁剪范数的大小。
+
+
+#### EMA
+
+Exponential Moving Average（EMA）是一种平滑模型权重的模型集成方法。它能帮助模型在训练中稳定收敛，并且通常会带来更好的模型性能。
+可以通过在yaml配置中设置`train.ema`为`True`来使用该功能，并且可以调整`train.ema_decay`来控制权重衰减率，通常设置为接近1的值.
+
+
+### 断点续训
+
+断点续训通常用于训练意外中断时，此时使用该功能可以继续从中断处epoch继续训练。可以通过在yaml配置中设置`model.resume`为`True`来使用该功能，用例如下：
+
+```yaml
+model:
+  resume: True
+```
+>
+>默认情况下，它将从`train.ckpt_save_dir`目录中保存的`train_resume.ckpt`恢复。
+
+如果要使用其他epoch用于恢复训练，请在`resume`中指定epoch路径，用例如下：
+
+```yaml
+model:
+  resume: /some/path/to/train_resume.ckpt
+```
+
+### OpenI云平台训练
+
+请参考[MindOCR云上训练快速入门](../../cn/tutorials/training_on_openi.md)
diff --git a/docs/cn/tutorials/distribute_train.md b/docs/cn/tutorials/distribute_train.md
@@ -1,4 +1,3 @@
-
 # 分布式并行训练
 
 本文档提供分布式并行训练的教程，在Ascend处理器上有两种方式可以进行单机多卡训练，通过OpenMPI运行脚本或通过配置RANK_TABLE_FILE进行单机多卡训练。在GPU处理器上可通过OpenMPI运行脚本进行单机多卡训练。

diff --git a/docs/cn/tutorials/training_on_openi.md b/docs/cn/tutorials/training_on_openi.md
@@ -83,4 +83,4 @@
 
 ## Reference
 
-[1] Modified from https://github.com/mindspore-lab/mindyolo/blob/master/tutorials/modelarts_CN.md
+[1] Modified from https://github.com/mindspore-lab/mindyolo/blob/master/tutorials/cloud/openi_CN.md
diff --git a/docs/cn/tutorials/transform_tutorial.md b/docs/cn/tutorials/transform_tutorial.md
@@ -0,0 +1,227 @@
+
+# Transformation教程
+
+[![Download Notebook](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r1.8/resource/_static/logo_notebook.png)](https://download.mindspore.cn/toolkits/mindocr/tutorials/transform_tutorial.ipynb)&emsp;
+
+### 机制
+
+1. 每个Transformation都是一个具有可调用函数的类。示例如下
+
+```python
+class ToCHWImage(object):
+    """ convert hwc image to chw image
+    required keys: image
+    modified keys: image
+    """
+
+    def __init__(self, **kwargs):
+        pass
+
+    def __call__(self, data: dict):
+        img = data['image']
+        if isinstance(img, Image.Image):
+            img = np.array(img)
+        data['image'] = img.transpose((2, 0, 1))
+        return data
+```
+
+2. transformation的输入始终是dict，其中包含img_path、raw label等数据信息。
+
+3. transformation api应该明确输入中所需的key以及输出数据中修改或/和添加的key。
+
+可用的transformation可以在`mindocr/data/transforms/*_transform.py`中发现
+
+
+```python
+# import and check available transforms
+
+from mindocr.data.transforms import general_transforms, det_transforms, rec_transforms
+```
+
+
+```python
+general_transforms.__all__
+```
+
+
+
+
+    ['DecodeImage', 'NormalizeImage', 'ToCHWImage', 'PackLoaderInputs']
+
+
+
+
+```python
+det_transforms.__all__
+```
+
+
+
+
+    ['DetLabelEncode',
+     'MakeBorderMap',
+     'MakeShrinkMap',
+     'EastRandomCropData',
+     'PSERandomCrop']
+
+
+
+### 文本检测
+
+### 1. 加载图像和注释
+
+#### 准备
+
+
+```python
+%load_ext autoreload
+%autoreload 2
+%reload_ext autoreload
+```
+
+    The autoreload extension is already loaded. To reload it, use:
+      %reload_ext autoreload
+
+
+
+```python
+import os
+
+# load the label file which has the info of image path and annotation.
+# This file is generated from the ic15 annotations using the converter script.
+label_fp = '/Users/Samit/Data/datasets/ic15/det/train/train_icdar2015_label.txt'
+root_dir = '/Users/Samit/Data/datasets/ic15/det/train'
+
+data_lines = []
+with open(label_fp, 'r') as f:
+    for line in f:
+        data_lines.append(line)
+
+# just pick one image and its annotation
+idx = 3
+img_path, annot = data_lines[idx].strip().split('\t')
+
+img_path = os.path.join(root_dir, img_path)
+print('img_path', img_path)
+print('raw annotation: ', annot)
+
+
+```
+
+    img_path /Users/Samit/Data/datasets/ic15/det/train/ch4_training_images/img_612.jpg
+    raw annotation:  [{"transcription": "where", "points": [[483, 197], [529, 174], [530, 197], [485, 221]]}, {"transcription": "people", "points": [[531, 168], [607, 136], [608, 166], [532, 198]]}, {"transcription": "meet", "points": [[613, 128], [691, 100], [691, 131], [613, 160]]}, {"transcription": "###", "points": [[695, 299], [888, 315], [931, 635], [737, 618]]}, {"transcription": "###", "points": [[709, 19], [876, 8], [880, 286], [713, 296]]}, {"transcription": "###", "points": [[530, 270], [660, 246], [661, 300], [532, 324]]}, {"transcription": "###", "points": [[113, 356], [181, 359], [180, 387], [112, 385]]}, {"transcription": "###", "points": [[281, 328], [369, 338], [366, 361], [279, 351]]}, {"transcription": "###", "points": [[66, 314], [183, 313], [183, 328], [68, 330]]}]
+
+
+#### 解码图像  -  DecodeImage
+
+
+```python
+#img_path = '/Users/Samit/Data/datasets/ic15/det/train/ch4_training_images/img_1.jpg'
+decode_image = general_transforms.DecodeImage(img_mode='RGB')
+
+# TODO: check the input keys and output keys for the trans. func.
+
+data = {'img_path': img_path}
+data  = decode_image(data)
+img = data['image']
+
+# visualize
+from mindocr.utils.visualize import show_img, show_imgs
+show_img(img)
+```
+
+
+![output_13_0](https://user-images.githubusercontent.com/20376974/228160967-262e9fe3-1118-49b2-b269-156e44761edf.png)
+
+
+
+```python
+import time
+
+start = time.time()
+att = 100
+for i in range(att):
+    img  = decode_image(data)['image']
+avg = (time.time() - start) / att
+
+print('avg reading time: ', avg)
+```
+
+    avg reading time:  0.004545390605926514
+
+
+#### 检测标签编码 - DetLabelEncode
+
+
+```python
+data['label'] = annot
+
+decode_image = det_transforms.DetLabelEncode()
+data = decode_image(data)
+
+#print(data['polys'])
+print(data['texts'])
+
+# visualize
+from mindocr.utils.visualize import draw_boxes
+
+res = draw_boxes(data['image'], data['polys'])
+show_img(res)
+
+```
+
+    ['where', 'people', 'meet', '###', '###', '###', '###', '###', '###']
+
+
+
+![output_16_1](https://user-images.githubusercontent.com/20376974/228161131-c11209d1-f3f0-4a8c-a763-b72d729a4084.png)
+
+
+### 2. 图像和注释处理/增强
+
+#### RandomCrop - EastRandomCropData
+
+
+```python
+from mindocr.data.transforms.general_transforms import RandomCropWithBBox
+import copy
+
+#crop_data = det_transforms.EastRandomCropData(size=(640, 640))
+crop_data = RandomCropWithBBox(crop_size=(640, 640))
+
+show_img(data['image'])
+for i in range(2):
+    data_cache = copy.deepcopy(data)
+    data_cropped = crop_data(data_cache)
+
+    res_crop = draw_boxes(data_cropped['image'], data_cropped['polys'])
+    show_img(res_crop)
+```
+
+
+![output_19_0](https://user-images.githubusercontent.com/20376974/228161220-c56ebd8d-37a0-48a8-9746-3c8da0eaddbb.png)
+
+
+
+![output_19_1](https://user-images.githubusercontent.com/20376974/228161306-8359d0b5-f77d-4ec6-8192-fecdaa4c8a1e.png)
+
+
+
+![output_19_2](https://user-images.githubusercontent.com/20376974/228161334-8232f0ac-7ca0-49d6-b15a-45b58cb80003.png)
+
+
+#### ColorJitter
+
+
+```python
+random_color_adj = general_transforms.RandomColorAdjust(brightness=0.4, saturation=0.5)
+
+data_cache = copy.deepcopy(data)
+#data_cache['image'] = data_cache['image'][:,:, ::-1]
+data_adj = random_color_adj(data_cache)
+#print(data_adj)
+show_img(data_adj['image'], is_bgr_img=True)
+```
+
+
+![output_21_0](https://user-images.githubusercontent.com/20376974/228161397-c64faae6-b4a2-41ff-9531-5bced781fd9d.png)
Original file line number	Diff line number	Diff line change
		@@ -1,4 +1,3 @@

		# 分布式并行训练

		本文档提供分布式并行训练的教程，在Ascend处理器上有两种方式可以进行单机多卡训练，通过OpenMPI运行脚本或通过配置RANK_TABLE_FILE进行单机多卡训练。在GPU处理器上可通过OpenMPI运行脚本进行单机多卡训练。
Expand Down
Original file line number	Diff line number	Diff line change
Expand Up		@@ -83,4 +83,4 @@

		## Reference

		[1] Modified from https://github.com/mindspore-lab/mindyolo/blob/master/tutorials/modelarts_CN.md
		[1] Modified from https://github.com/mindspore-lab/mindyolo/blob/master/tutorials/cloud/openi_CN.md