Skip to content

add dbnet yaml for synthtext dataset and td500 dataset #257

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 10, 2023

Conversation

Songyuanwei
Copy link
Collaborator

@Songyuanwei Songyuanwei commented May 5, 2023

Thank you for your contribution to the MindOCR repo.
Before submitting this PR, please make sure:

Motivation

add dbnet yaml for synthtext dataset and td500 dataset
(Write your motivation for proposed changes here.)

Test Plan

(How should this PR be tested? Do you require special setup to run the test or repro the fixed bug?)

Related Issues and PRs

related issue #222
(Is this PR part of a group of changes? Link the other relevant PRs and Issues here. Use https://help.github.com/en/articles/closing-issues-using-keywords for help on GitHub syntax)


| **Model** | **Context** | **Backbone** | **Pretrained** | **Recall** | **Precision** | **F-score** | **Train T.** | **Throughput** | **Recipe** | **Download** |
|-------------------|----------------|--------------|----------------|------------|---------------|-------------|--------------|----------------|-----------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| DBNet (ours) | D910x1-MS2.0-G | ResNet-50 | SynthText | 82.47% | 87.75% | 85.03% | 24.4 s/epoch | 27.9 img/s | [yaml](db_r50_td500.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet50_td500-0d12b5e8.ckpt) |
Copy link
Collaborator

@SamitHuang SamitHuang May 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the Throughput for TD500 so different from that on SynthText (27.9 vs 82.02 FPS), considering the architecture and data processing pipeline should be the same?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batch_size is different. By adjusting num_workers, the current Throughput for TD500 is 51.1 img/s

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

需要在model readme中补充synthtext和td500的data preparation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的weight decay 5e-4,跟原论文 1e-4并未对齐,原因是?(如调整后 最终loss更低?)

Copy link
Collaborator

@SamitHuang SamitHuang May 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

另外,net_columns_to_net为旧参数名,须对齐最新的:net_input_column_index, label_column_index 设置上。

Copy link
Collaborator

@hadipash hadipash May 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original DBNet paper doesn't share details on the network pretraining with SynthText, it only mentions this:

For all the models, we first pre-train them with the SynthText dataset for 100k iterations.

Nor the original repo has a config for SynthText. The "Synthetic Data for Text Localisation in Natural Images" paper has pretraining hyperparameters where they mentioned weight decay 5e-4 (actually, 5-4 but I find it odd), however they used these hyperparameters to pretrain FCRN. So, I guess, it is up to us to find the best combination of hyperparameters to pretrain DBNet. Although, we must be cautious to not overfit the model since there's no validation set for SynthText.


| **Model** | **Context** | **Backbone** | **Pretrained** | **Train T.** | **Throughput** | **Recipe** | **Download** |
|-------------------|----------------|--------------|----------------|------------|---------------|-------------|--------------|
| DBNet (ours) | D910x1-MS2.0-G | ResNet-50 | ImageNet | 10470 s/epoch | 82.02 img/s | [yaml](db_r50_synthtext.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet50_synthtext-40655acb.ckpt) |
Copy link
Collaborator

@SamitHuang SamitHuang May 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yaml中distribute为True,但此处D910x1-MS2.0-G 为单卡,please double-check standalone/distributed mode and the number of cards used

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add training loss result for SynthText.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@Songyuanwei Songyuanwei force-pushed the branch_1 branch 9 times, most recently from cfa13c5 to fdb3366 Compare May 10, 2023 03:34
@SamitHuang SamitHuang merged commit 8038db3 into mindspore-lab:main May 10, 2023
@Songyuanwei Songyuanwei deleted the branch_1 branch May 10, 2023 06:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants