mindspore-lab · SamitHuang · May 19, 2023 · May 11, 2023 · May 11, 2023 · May 12, 2023
diff --git a/configs/rec/crnn/README.md b/configs/rec/crnn/README.md
@@ -335,6 +335,7 @@ To transform the groud-truth text into label ids, we have to provide the charact
 There are some built-in dictionaries, which are placed in `mindocr/utils/dict/`, and you can choose the appropriate dictionary to use.
 
 - `en_dict.txt` is an English dictionary containing 96 characters, including numbers, common symbols, and uppercase and lowercase English letters.
+- `ch_dict.txt` is a Chinese dictionary containing 6623 characters, including commonly used simplified and traditional Chinese, numbers, common symbols, uppercase and lowercase English letters.
 
 
 ### Customized Dictionary
@@ -350,6 +351,28 @@ To use a specific dictionary, set the parameter `character_dict_path` to the pat
 - Remember to check the value of `dataset->transform_pipeline->RecCTCLabelEncode->lower` in the configuration yaml. Set it to False if you prefer case-sensitive encoding.
 
 
+## 5. Multi-language Training
+
+Currently, this model supports multilingual recognition and provides pre-trained models for different languages. Details are as follows:
+
+### Introduction to Pre-trained Model Datasets
+Pre-trained models for different languages use different datasets for pre-training. Data sources, training methods, and evaluation methods can be referred to the link in the **Data Description** column.
+
+| **Language** | **Data Description** |
+| :------: | :------: |
+| Chinese | [ch_dataeset](../../../docs/en/datasets/chinese_text_recognition.md) | 
+
+### Pretrained models
+Pre-trained models have been evaluated on the benchmark test set, with the following results:
+
+| **Model** | **Language** | **Backbone** | **Scene** | **Web** | **Document** | **Recipe** | **Download** | 
+| :-----: | :-----:  | :--------: | :--------: | :--------: | :--------: | :---------: | :-----------: |
+| CRNN    | Chinese | ResNet34_vd | 59.71% | 64.86% | 89.23% |  [crnn_resnet34_ch.yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_resnet34_ch.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34_ch-a8d0f5d3.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34_ch-a8d0f5d3-f27f763a.mindir) |
+
+### Training with Custom Datasets
+You can train models for different languages with your own custom datasets. Please refer to the tutorial [Training Recognition Network with Custom Datasets](../../../docs/en/tutorials/training_recognition_custom_dataset.md).
+
+
 ## References
 <!--- Guideline: Citation format GB/T 7714 is suggested. -->
 

diff --git a/configs/rec/crnn/README_CN.md b/configs/rec/crnn/README_CN.md
@@ -339,6 +339,7 @@ python tools/eval.py --config configs/rec/crnn/crnn_resnet34.yaml
 Mindocr内置了一部分字典，均放在了 `mindocr/utils/dict/` 位置，可选择合适的字典使用。
 
 - `en_dict.txt` 是一个包含96个字符的英文字典，其中有数字，常用符号以及大小写的英文字母。
+- `ch_dict.txt` 是一个包含6623个字符的中文字典，其中有常用的繁简体中文，数字，常用符号以及大小写的英文字母
 
 
 ### 自定义词典
@@ -354,6 +355,28 @@ Mindocr内置了一部分字典，均放在了 `mindocr/utils/dict/` 位置，
 - 请记住检查配置文件中的 `dataset->transform_pipeline->RecCTCLabelEncode->lower` 参数的值。如果词典中有大小写字母而且想区分大小写的话，请将其设置为 False。
 
 
+## 5. 多语言训练
+
+目前，该模型支持多语种识别和提供不同语种的预训练模型。详细内容如下
+
+### 预训练模型数据集介绍
+不同语种的预训练模型采用不同数据集作为预训练，数据来源、训练方式和评估方式可参考 **数据说明**。
+
+| **语种** | **数据说明** |
+| :------: | :------: |
+| 中文 | [中文识别数据集](../../../docs/cn/datasets/chinese_text_recognition_CN.md) | 
+
+### 预训练模型
+预训练模型提供已在基准测试集上进行评估，结果如下：
+
+| **模型** | **语种** | **骨干网络** | **街景类** | **网页类** | **文档类** | **配置文件** | **模型权重下载** | 
+| :-----: | :-----:  | :--------: | :--------: | :--------: | :--------: | :---------: | :-----------: |
+| CRNN    | 中文 | ResNet34_vd | 59.71% | 64.86% | 89.23% |  [crnn_resnet34_ch.yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_resnet34_ch.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34_ch-a8d0f5d3.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34_ch-a8d0f5d3-f27f763a.mindir) |
+
+### 使用自定义数据集进行训练
+您可以使用自定义的数据集进行不同语种的模型训练。请参考教学 [使用自定义数据集训练识别网络](../../../docs/cn/tutorials/training_recognition_custom_dataset_CN.md)。
+
+
 ## 参考文献
 <!--- Guideline: Citation format GB/T 7714 is suggested. -->
 

diff --git a/configs/rec/crnn/crnn_resnet34.yaml b/configs/rec/crnn/crnn_resnet34.yaml
@@ -62,7 +62,6 @@ optimizer:
   momentum: 0.95
   weight_decay: 0.0001
   nesterov: False
-  #use_nesterov: True 
 
 loss_scaler:
   type: static

diff --git a/configs/rec/crnn/crnn_resnet34_ch.yaml b/configs/rec/crnn/crnn_resnet34_ch.yaml
@@ -0,0 +1,156 @@
+system:
+  mode: 0 # 0 for graph mode, 1 for pynative mode in MindSpore
+  distribute: True
+  amp_level: 'O3'
+  seed: 42
+  log_interval: 100
+  val_while_train: True
+  drop_overflow_update: False
+
+common:
+  character_dict_path: &character_dict_path  mindocr/utils/dict/ch_dict.txt
+  num_classes: &num_classes 6624 # num_chars_in_dict+1,  TODO: retreive it from dict or check correctness
+  max_text_len: &max_text_len 24
+  infer_mode: &infer_mode False
+  use_space_char: &use_space_char False
+  batch_size: &batch_size 64
+
+model:
+  type: rec
+  transform: null
+  backbone:
+    name: rec_resnet34
+    pretrained: False
+  neck:
+    name: RNNEncoder
+    hidden_size: 256 
+  head:
+    name: CTCHead 
+    weight_init: crnn_customised
+    bias_init: crnn_customised
+    out_channels: *num_classes 
+
+postprocess:
+  name: RecCTCLabelDecode
+  character_dict_path: *character_dict_path
+  use_space_char: *use_space_char
+
+metric:
+  name: RecMetric
+  main_indicator: acc
+  character_dict_path: *character_dict_path
+  ignore_space: True
+  print_flag: False
+
+loss:
+  name: CTCLoss 
+  pred_seq_len: 25 # TODO: retrieve from the network output shape.
+  max_label_len: *max_text_len  # this value should be smaller than pre_seq_len
+  batch_size: *batch_size
+
+scheduler: 
+  scheduler: warmup_cosine_decay
+  min_lr: 0.0
+  lr: 0.0005
+  num_epochs: 60
+  warmup_epochs: 6
+  decay_epochs: 54
+
+optimizer:
+  opt: adamw
+  filter_bias_and_bn: True
+  momentum: 0.95
+  weight_decay: 0.0001
+  nesterov: False
+  #use_nesterov: True 
+
+loss_scaler:
+  type: static
+  loss_scale: 512
+
+train:
+  ema: True
+  ema_decay: 0.9999
+  ckpt_save_dir: './tmp_rec'
+  dataset_sink_mode: False
+  dataset:
+    type: LMDBDataset
+    dataset_root: path/to/chinese_text_recognition/ # Optional, if set, dataset_root will be used as a prefix for data_dir
+    data_dir: training/
+    # label_file: # not required when using LMDBDataset
+    sample_ratio: 1.0
+    shuffle: True
+    transform_pipeline:
+      - DecodeImage: 
+          img_mode: BGR
+          to_float32: False
+      - RecCTCLabelEncode:
+          max_text_len: *max_text_len 
+          character_dict_path: *character_dict_path
+          use_space_char: *use_space_char
+          lower: False
+      - RecResizeImg: # different from paddle (paddle converts image from HWC to CHW and rescale to [-1, 1] after resize. 
+          image_shape: [32, 100] # H, W
+          infer_mode: *infer_mode
+          character_dict_path: *character_dict_path
+          padding: False # aspect ratio will be preserved if true.
+      - NormalizeImage:  # different from paddle (paddle wrongly normalize BGR image with RGB mean/std from ImageNet for det, and simple rescale to [-1, 1] in rec. 
+          bgr_to_rgb: True
+          is_hwc: True
+          mean : [127.0, 127.0, 127.0] 
+          std : [127.0, 127.0, 127.0]
+      - ToCHWImage: 
+    #  the order of the dataloader list, matching the network input and the input labels for the loss function, and optional data for debug/visaulize 
+    output_columns: ['image', 'text_seq'] #, 'length'] #'img_path'] 
+    net_input_column_index: [0] # input indices for network forward func in output_columns
+    label_column_index: [1] # input indices marked as label
+    #keys_for_loss: 4 # num labels for loss func 
+
+  loader:
+      shuffle: True # TODO: tbc
+      batch_size: *batch_size
+      drop_remainder: True
+      max_rowsize: 12
+      num_workers: 8
+
+eval:
+  ckpt_load_path: ./tmp_rec/best.ckpt
+  dataset_sink_mode: False
+  dataset:
+    type: LMDBDataset
+    dataset_root: path/to/chinese_text_recognition/
+    data_dir: validation/
+    # label_file: # not required when using LMDBDataset
+    sample_ratio: 1.0
+    shuffle: False 
+    transform_pipeline:
+      - DecodeImage: 
+          img_mode: BGR
+          to_float32: False
+      - RecCTCLabelEncode:
+          max_text_len: *max_text_len 
+          character_dict_path: *character_dict_path
+          use_space_char: *use_space_char
+          lower: False
+      - RecResizeImg: # different from paddle (paddle converts image from HWC to CHW and rescale to [-1, 1] after resize. 
+          image_shape: [32, 100] # H, W
+          infer_mode: *infer_mode
+          character_dict_path: *character_dict_path
+          padding: False # aspect ratio will be preserved if true.
+      - NormalizeImage:  # different from paddle (paddle wrongly normalize BGR image with RGB mean/std from ImageNet for det, and simple rescale to [-1, 1] in rec. 
+          bgr_to_rgb: True
+          is_hwc: True
+          mean : [127.0, 127.0, 127.0] 
+          std : [127.0, 127.0, 127.0]
+      - ToCHWImage: 
+    #  the order of the dataloader list, matching the network input and the input labels for the loss function, and optional data for debug/visaulize 
+    output_columns: ['image', 'text_padded', 'text_length']  # TODO return text string padding w/ fixed length, and a scaler to indicate the length 
+    net_input_column_index: [0] # input indices for network forward func in output_columns
+    label_column_index: [1, 2] # input indices marked as label
+
+  loader:
+      shuffle: False # TODO: tbc
+      batch_size: 64
+      drop_remainder: False
+      max_rowsize: 12
+      num_workers: 8
diff --git a/configs/rec/rare/README.md b/configs/rec/rare/README.md
@@ -334,7 +334,7 @@ Pre-trained models for different languages use different datasets for pre-traini
 ### Pretrained models
 Pre-trained models have been evaluated on the benchmark test set, with the following results:
 
-| **Model** | **Language** | **Backbone** | **Transform Module** | **Scenes** | **Web Page** | **Documents** | **Recipe** | **Download** | 
+| **Model** | **Language** | **Backbone** | **Transform Module** | **Scene** | **Web** | **Document** | **Recipe** | **Download** | 
 | :-----: | :-----:  | :--------: | :------------: | :--------: | :--------: | :--------: | :---------: | :-----------: |
 | RARE    | Chinese | ResNet34_vd | None | 55.39% | 61.90% | 97.05% |  [rare_resnet34_ch.yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/rare/rare_resnet34_ch.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/rare/rare_resnet34_ch-780b6d20.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/rare/rare_resnet34_ch-780b6d20-017aec13.mindir) |
 

diff --git a/docs/cn/tutorials/training_recognition_custom_dataset_CN.md b/docs/cn/tutorials/training_recognition_custom_dataset_CN.md
@@ -51,9 +51,10 @@ word_1814.png	cathay
 
 ## 字典准备
 
-为训练中、英文等不同语种的识别网络，用户需配置对应的字典。只有存在于字典中的字符会被模型正确预测。MindOCR现提供中、英两种字典，其中
-- `英文字典`：包括大小写英文、数字、空格和标点符号。存放于`mindocr/utils/dict/en_dict.txt`
-- `中文字典`：包括常用中文字符、大小写英文、数字和标点符号。存放于`mindocr/utils/dict/ch_dict.txt`
+为训练中、英文等不同语种的识别网络，用户需配置对应的字典。只有存在于字典中的字符会被模型正确预测。MindOCR现提供默认、中和英三种字典，其中
+- `默认字典`: 只包含小写英文和数字。如用户不配置字典，该字典会被默认使用。
+- `英文字典`：包括大小写英文、数字和标点符号，存放于`mindocr/utils/dict/en_dict.txt`。
+- `中文字典`：包括常用中文字符、大小写英文、数字和标点符号，存放于`mindocr/utils/dict/ch_dict.txt`。
 
 目前MindOCR暂未提供其他语种的字典配置。该功能将在新版本中推出。
 

diff --git a/docs/en/tutorials/training_recognition_custom_dataset.md b/docs/en/tutorials/training_recognition_custom_dataset.md
@@ -51,9 +51,10 @@ Similarly, please place all validation images in a single folder, and specify a
 
 ## Dictionary Preperation
 
-To train recognition networks for different languages, users need to configure corresponding dictionaries. Only characters that exist in the dictionary will be correctly predicted by the model. MindOCR currently provides two dictionaries for Chinese and English, respectively.
-- `English Dictionary`：includes uppercase and lowercase English letters, numbers, and punctuation marks. It is place at `mindocr/utils/dict/en_dict.txt`
-- `Chinese Dictionary`：includes commonly used Chinese characters, uppercase and lowercase English letters, numbers, and punctuation marks. It is placed at `mindocr/utils/dict/ch_dict.txt`
+To train recognition networks for different languages, users need to configure corresponding dictionaries. Only characters that exist in the dictionary will be correctly predicted by the model. MindOCR currently provides three dictionaries, corresponding to Default, Chinese and English respectively.
+- `Default Dictionary`：includes lowercase English letters and numbers only. If users do not configure the dictionay, this one will be used by default.
+- `English Dictionary`：includes uppercase and lowercase English letters, numbers and punctuation marks, it is place at `mindocr/utils/dict/en_dict.txt`.
+- `Chinese Dictionary`：includes commonly used Chinese characters, uppercase and lowercase English letters, numbers, and punctuation marks, it is placed at `mindocr/utils/dict/ch_dict.txt`.
 
 Currently, MindOCR does not provide a dictionary configuration for other languages. This feature will be released in a upcoming version.
 
@@ -119,7 +120,7 @@ The user can add, delete, or modify characters within the dictionary as needed.
 
 ### Configure an Chinese Model
 
-Please select `configs/rec/crnn/crnn_resnet34_CN.yaml` as the initial configuration file and modify the `train.dataset` and `eval.dataset` fields in it.
+Please select `configs/rec/crnn/crnn_resnet34_ch.yaml` as the initial configuration file and modify the `train.dataset` and `eval.dataset` fields in it.
 
 ```yaml
 ...

diff --git a/mindocr/models/rec_crnn.py b/mindocr/models/rec_crnn.py
@@ -5,12 +5,12 @@
 from .backbones.mindcv_models.utils import load_pretrained
 
 
-__all__ = ['CRNN', 'crnn_resnet34', 'crnn_vgg7']
+__all__ = ['CRNN', 'crnn_resnet34', 'crnn_vgg7', 'crnn_resnet34_ch']
 
-def _cfg(url='', **kwargs):
+def _cfg(url='', input_size=(3, 32, 100), **kwargs):
     return {
         'url': url,
-        'input_size': (3, 32, 100),
+        'input_size': input_size,
         **kwargs
     }
 
@@ -20,10 +20,11 @@ def _cfg(url='', **kwargs):
         url='https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34-83f37f07.ckpt'),
     'crnn_vgg7': _cfg(
         url='https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_vgg7-ea7e996c.ckpt'),
+    'crnn_resnet34_ch': _cfg(
+        url='https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34_ch-a8d0f5d3.ckpt'),
     }
 
 
-
 class CRNN(BaseModel):
     def __init__(self, config):
         BaseModel.__init__(self, config)
@@ -83,3 +84,31 @@ def crnn_vgg7(pretrained=False, **kwargs):
         load_pretrained(model, default_cfg)
 
     return model
+
+
+@register_model
+def crnn_resnet34_ch(pretrained=False, **kwargs):
+    model_config = {
+        "backbone": {
+            'name': 'rec_resnet34',
+            'pretrained': False
+        },
+        "neck": {
+            "name": 'RNNEncoder',
+            "hidden_size": 256,
+        },
+        "head": {
+            "name": 'CTCHead',
+            "out_channels": 6624,
+            "weight_init": "crnn_customised",
+            "bias_init": "crnn_customised",
+        }
+    }
+    model = CRNN(model_config)
+
+    # load pretrained weights
+    if pretrained:
+        default_cfg = default_cfgs['crnn_resnet34_ch']
+        load_pretrained(model, default_cfg)
+
+    return model