-
Notifications
You must be signed in to change notification settings - Fork 60
Add CRNN Chinese Support #298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
638b9f2
update config with chinese model
hqkate c567689
update dict
hqkate 3eb5593
update readme
hqkate 548e21b
add data description doc
hqkate e8d8739
Add CRNN ckpt and benchmark result
zhtmike c65e2b0
fix name
zhtmike 1dd32d3
Update document
zhtmike File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -339,6 +339,7 @@ python tools/eval.py --config configs/rec/crnn/crnn_resnet34.yaml | |
Mindocr内置了一部分字典,均放在了 `mindocr/utils/dict/` 位置,可选择合适的字典使用。 | ||
|
||
- `en_dict.txt` 是一个包含96个字符的英文字典,其中有数字,常用符号以及大小写的英文字母。 | ||
- `ch_dict.txt` 是一个包含6623个字符的中文字典,其中有常用的繁简体中文,数字,常用符号以及大小写的英文字母 | ||
|
||
|
||
### 自定义词典 | ||
|
@@ -354,6 +355,28 @@ Mindocr内置了一部分字典,均放在了 `mindocr/utils/dict/` 位置, | |
- 请记住检查配置文件中的 `dataset->transform_pipeline->RecCTCLabelEncode->lower` 参数的值。如果词典中有大小写字母而且想区分大小写的话,请将其设置为 False。 | ||
|
||
|
||
## 5. 多语言训练 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 英文README缺少这部分内容? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 有的 ##5. Multi-language Training |
||
|
||
目前,该模型支持多语种识别和提供不同语种的预训练模型。详细内容如下 | ||
|
||
### 预训练模型数据集介绍 | ||
不同语种的预训练模型采用不同数据集作为预训练,数据来源、训练方式和评估方式可参考 **数据说明**。 | ||
|
||
| **语种** | **数据说明** | | ||
| :------: | :------: | | ||
| 中文 | [中文识别数据集](../../../docs/cn/datasets/chinese_text_recognition_CN.md) | | ||
|
||
### 预训练模型 | ||
预训练模型提供已在基准测试集上进行评估,结果如下: | ||
|
||
| **模型** | **语种** | **骨干网络** | **街景类** | **网页类** | **文档类** | **配置文件** | **模型权重下载** | | ||
| :-----: | :-----: | :--------: | :--------: | :--------: | :--------: | :---------: | :-----------: | | ||
| CRNN | 中文 | ResNet34_vd | 59.71% | 64.86% | 89.23% | [crnn_resnet34_ch.yaml](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/crnn/crnn_resnet34_ch.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34_ch-a8d0f5d3.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34_ch-a8d0f5d3-f27f763a.mindir) | | ||
|
||
### 使用自定义数据集进行训练 | ||
您可以使用自定义的数据集进行不同语种的模型训练。请参考教学 [使用自定义数据集训练识别网络](../../../docs/cn/tutorials/training_recognition_custom_dataset_CN.md)。 | ||
|
||
|
||
## 参考文献 | ||
<!--- Guideline: Citation format GB/T 7714 is suggested. --> | ||
|
||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,156 @@ | ||
system: | ||
mode: 0 # 0 for graph mode, 1 for pynative mode in MindSpore | ||
distribute: True | ||
amp_level: 'O3' | ||
seed: 42 | ||
log_interval: 100 | ||
val_while_train: True | ||
drop_overflow_update: False | ||
|
||
common: | ||
character_dict_path: &character_dict_path mindocr/utils/dict/ch_dict.txt | ||
num_classes: &num_classes 6624 # num_chars_in_dict+1, TODO: retreive it from dict or check correctness | ||
max_text_len: &max_text_len 24 | ||
infer_mode: &infer_mode False | ||
use_space_char: &use_space_char False | ||
batch_size: &batch_size 64 | ||
|
||
model: | ||
type: rec | ||
transform: null | ||
backbone: | ||
name: rec_resnet34 | ||
pretrained: False | ||
neck: | ||
name: RNNEncoder | ||
hidden_size: 256 | ||
head: | ||
name: CTCHead | ||
weight_init: crnn_customised | ||
bias_init: crnn_customised | ||
out_channels: *num_classes | ||
|
||
postprocess: | ||
name: RecCTCLabelDecode | ||
character_dict_path: *character_dict_path | ||
use_space_char: *use_space_char | ||
|
||
metric: | ||
name: RecMetric | ||
main_indicator: acc | ||
character_dict_path: *character_dict_path | ||
ignore_space: True | ||
print_flag: False | ||
|
||
loss: | ||
name: CTCLoss | ||
pred_seq_len: 25 # TODO: retrieve from the network output shape. | ||
max_label_len: *max_text_len # this value should be smaller than pre_seq_len | ||
batch_size: *batch_size | ||
|
||
scheduler: | ||
scheduler: warmup_cosine_decay | ||
min_lr: 0.0 | ||
lr: 0.0005 | ||
num_epochs: 60 | ||
warmup_epochs: 6 | ||
decay_epochs: 54 | ||
|
||
optimizer: | ||
opt: adamw | ||
filter_bias_and_bn: True | ||
momentum: 0.95 | ||
weight_decay: 0.0001 | ||
nesterov: False | ||
#use_nesterov: True | ||
|
||
loss_scaler: | ||
type: static | ||
loss_scale: 512 | ||
|
||
train: | ||
ema: True | ||
ema_decay: 0.9999 | ||
ckpt_save_dir: './tmp_rec' | ||
dataset_sink_mode: False | ||
dataset: | ||
type: LMDBDataset | ||
dataset_root: path/to/chinese_text_recognition/ # Optional, if set, dataset_root will be used as a prefix for data_dir | ||
data_dir: training/ | ||
# label_file: # not required when using LMDBDataset | ||
sample_ratio: 1.0 | ||
shuffle: True | ||
transform_pipeline: | ||
- DecodeImage: | ||
img_mode: BGR | ||
to_float32: False | ||
- RecCTCLabelEncode: | ||
max_text_len: *max_text_len | ||
character_dict_path: *character_dict_path | ||
use_space_char: *use_space_char | ||
lower: False | ||
- RecResizeImg: # different from paddle (paddle converts image from HWC to CHW and rescale to [-1, 1] after resize. | ||
image_shape: [32, 100] # H, W | ||
infer_mode: *infer_mode | ||
character_dict_path: *character_dict_path | ||
padding: False # aspect ratio will be preserved if true. | ||
- NormalizeImage: # different from paddle (paddle wrongly normalize BGR image with RGB mean/std from ImageNet for det, and simple rescale to [-1, 1] in rec. | ||
bgr_to_rgb: True | ||
is_hwc: True | ||
mean : [127.0, 127.0, 127.0] | ||
std : [127.0, 127.0, 127.0] | ||
- ToCHWImage: | ||
# the order of the dataloader list, matching the network input and the input labels for the loss function, and optional data for debug/visaulize | ||
output_columns: ['image', 'text_seq'] #, 'length'] #'img_path'] | ||
net_input_column_index: [0] # input indices for network forward func in output_columns | ||
label_column_index: [1] # input indices marked as label | ||
#keys_for_loss: 4 # num labels for loss func | ||
|
||
loader: | ||
shuffle: True # TODO: tbc | ||
batch_size: *batch_size | ||
drop_remainder: True | ||
max_rowsize: 12 | ||
num_workers: 8 | ||
|
||
eval: | ||
ckpt_load_path: ./tmp_rec/best.ckpt | ||
dataset_sink_mode: False | ||
dataset: | ||
type: LMDBDataset | ||
dataset_root: path/to/chinese_text_recognition/ | ||
data_dir: validation/ | ||
# label_file: # not required when using LMDBDataset | ||
sample_ratio: 1.0 | ||
shuffle: False | ||
transform_pipeline: | ||
- DecodeImage: | ||
img_mode: BGR | ||
to_float32: False | ||
- RecCTCLabelEncode: | ||
max_text_len: *max_text_len | ||
character_dict_path: *character_dict_path | ||
use_space_char: *use_space_char | ||
lower: False | ||
- RecResizeImg: # different from paddle (paddle converts image from HWC to CHW and rescale to [-1, 1] after resize. | ||
image_shape: [32, 100] # H, W | ||
infer_mode: *infer_mode | ||
character_dict_path: *character_dict_path | ||
padding: False # aspect ratio will be preserved if true. | ||
- NormalizeImage: # different from paddle (paddle wrongly normalize BGR image with RGB mean/std from ImageNet for det, and simple rescale to [-1, 1] in rec. | ||
bgr_to_rgb: True | ||
is_hwc: True | ||
mean : [127.0, 127.0, 127.0] | ||
std : [127.0, 127.0, 127.0] | ||
- ToCHWImage: | ||
# the order of the dataloader list, matching the network input and the input labels for the loss function, and optional data for debug/visaulize | ||
output_columns: ['image', 'text_padded', 'text_length'] # TODO return text string padding w/ fixed length, and a scaler to indicate the length | ||
net_input_column_index: [0] # input indices for network forward func in output_columns | ||
label_column_index: [1, 2] # input indices marked as label | ||
|
||
loader: | ||
shuffle: False # TODO: tbc | ||
batch_size: 64 | ||
drop_remainder: False | ||
max_rowsize: 12 | ||
num_workers: 8 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
中文README引用参考文献的跳转链接失效。修改:"#references" -> "#参考文献"