Skip to content

update readme #790

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 15, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 18 additions & 37 deletions configs/cls/mobilenetv3/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@ English | [中文](README_CN.md)

# MobileNetV3 for text direction classification

## 1. Introduction
## Introduction

### 1.1 MobileNetV3: [Searching for MobileNetV3](https://arxiv.org/abs/1905.02244)
### MobileNetV3: [Searching for MobileNetV3](https://arxiv.org/abs/1905.02244)

MobileNetV3[[1](#references)] was published in 2019, which combines the deep separable convolution of V1, the Inverted Residuals and Linear Bottleneck of V2, and the SE (Squeeze and Excitation) module to search the configuration and parameters of the network using NAS (Neural Architecture Search). MobileNetV3 first uses MnasNet to perform a coarse structure search, and then uses reinforcement learning to select the optimal configuration from a set of discrete choices. Besides, MobileNetV3 fine-tunes the architecture using NetAdapt. Overall, MobileNetV3 is a lightweight network having good performance in classification, detection and segmentation tasks.

Expand All @@ -16,7 +16,7 @@ MobileNetV3[[1](#references)] was published in 2019, which combines the deep sep
</p>


### 1.2 Text direction classifier
### Text direction classifier

The text directions in some images are revered, so that the text cannot be regconized correctly. Therefore. we use a text direction classifier to classify and rectify the text direction. The MobileNetV3 paper releases two versions of MobileNetV3: *MobileNetV3-Large* and *MobileNetV3-Small*. Taking the tradeoff between efficiency and accuracy, we adopt the *MobileNetV3-Small* as the text direction classifier.

Expand All @@ -32,32 +32,34 @@ Currently we support the 0 and 180 degree classification. You can update the par
</div>


## 2. Results
## Results

| mindspore | ascend driver | firmware | cann toolkit/kernel |
|:---------:|:---------------:|:------------:|:-------------------:|
| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.beta1 |

MobileNetV3 is pretrained on ImageNet. For text direction classification task, we further train MobileNetV3 on RCTW17, MTWI and LSVT datasets.

Experiments are tested on ascend 910* with mindspore 2.3.1 graph mode
<div align="center">

| **Model** | **Context** | **Specification** | **Pretrained dataset** | **Training dataset** | **Accuracy** | **Train T.** | **Throughput** | **Recipe** | **Download** |
|-------------------|----------------|--------------|----------------|------------|---------------|---------------|----------------|-----------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| MobileNetV3 | D910x4-MS2.0-G | small | ImageNet | RCTW17, MTWI, LSVT | 94.59% | 154.2 s/epoch | 5923.5 img/s | [yaml](cls_mv3.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/cls/cls_mobilenetv3-92db9c58.ckpt) |
| **model name** | **cards** | **batch size** | **img/s** | **accuracy** | **config** | **weight** |
|----------------|-----------|----------------|-----------|--------------|-----------------------------------------------------|------------------------------------------------|
| MobileNetV3 | 4 | 256 | 5923.5 | 94.59% | [yaml](cls_mv3.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/cls/cls_mobilenetv3-92db9c58.ckpt) |
</div>


#### Notes
- Context: Training context denoted as {device}x{pieces}-{MS version}{MS mode}, where MS (MindSpore) mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.



## 3. Quick Start
## Quick Start

### 3.1 Installation
### Installation

Please refer to the [installation instruction](https://github.com/mindspore-lab/mindocr#installation) in MindOCR.

### 3.2 Dataset preparation
### Dataset preparation

Please download [RCTW17](https://rctw.vlrlab.net/dataset), [MTWI](https://tianchi.aliyun.com/competition/entrance/231684/introduction), and [LSVT](https://rrc.cvc.uab.es/?ch=16&com=introduction) datasets, and then process the images and labels in desired format referring to [dataset_converters](https://github.com/mindspore-lab/mindocr/blob/main/tools/dataset_converters/README.md) (Coming soon...).
Please download [RCTW17](https://rctw.vlrlab.net/dataset), [MTWI](https://tianchi.aliyun.com/competition/entrance/231684/introduction), and [LSVT](https://rrc.cvc.uab.es/?ch=16&com=introduction) datasets, and then process the images and labels in desired format referring to [dataset_converters](https://github.com/mindspore-lab/mindocr/blob/main/tools/dataset_converters/README.md).

The prepared dataset file struture is suggested to be as follows.

Expand All @@ -75,7 +77,7 @@ The prepared dataset file struture is suggested to be as follows.
> If you want to use your own dataset for training, please convert the images and labels to the desired format referring to [dataset_converters](https://github.com/mindspore-lab/mindocr/blob/main/tools/dataset_converters/README.md).


### 3.3 Update yaml config file
### Update yaml config file

Update the dataset directories in yaml config file. The `dataset_root` will be concatenated with `data_dir` and `label_file` respectively to be the complete image directory and label file path.

Expand Down Expand Up @@ -117,29 +119,8 @@ model:
num_classes: *num_classes # 2 or 4
```

### 3.4 Training

* Standalone training

Please set `distribute` in yaml config file to be `False`.

```shell
python tools/train.py -c configs/cls/mobilenetv3/cls_mv3.yaml
```

* Distributed training

Please set `distribute` in yaml config file to be `True`.

```shell
# n is the number of NPUs
mpirun --allow-run-as-root -n 4 python tools/train.py -c configs/cls/mobilenetv3/cls_mv3.yaml
```

The training result (including checkpoints, per-epoch performance and curves) will be saved in the directory parsed by the arg `ckpt_save_dir` in yaml config file. The default directory is `./tmp_cls`.


### 3.5 Evaluation
### Evaluation

Please set the checkpoint path to the arg `ckpt_load_path` in the `eval` section of yaml config file, set `distribute` to be `False`, and then run:

Expand Down
59 changes: 19 additions & 40 deletions configs/cls/mobilenetv3/README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

# MobileNetV3用于文字方向分类

## 1. 概述
## 概述

### 1.1 MobileNetV3: [Searching for MobileNetV3](https://arxiv.org/abs/1905.02244)
### MobileNetV3: [Searching for MobileNetV3](https://arxiv.org/abs/1905.02244)

MobileNetV3[[1](#参考文献)]于2019年发布,这个版本结合了V1的deep separable convolution,V2的Inverted Residuals and Linear Bottleneck,以及SE(Squeeze and Excitation)模块,并使用NAS(Neural Architecture Search)搜索最优网络的配置和参数。MobileNetV3 首先使用 MnasNet 进行粗粒度的结构搜索,然后使用强化学习从一组离散选择中选择最优配置。另外,MobileNetV3 还使用 NetAdapt 对架构进行微调。总之,MobileNetV3是一个轻量级的网络,在分类、检测和分割任务上有不错的表现。

Expand All @@ -16,7 +16,7 @@ MobileNetV3[[1](#参考文献)]于2019年发布,这个版本结合了V1的deep
<em>图 1. MobileNetV3整体架构图 [<a href="#参考文献">1</a>] </em>
</p>

### 1.2 文字方向分类器
### 文字方向分类器

在某些图片中,文字方向是反过来或不正确的,导致文字无法被正确识别。因此,我们使用了文字方向分类器来对文字方向进行分类并校正。MobileNetV3论文提出了两个版本的MobileNetV3:*MobileNetV3-Large*和*MobileNetV3-Small*。为了兼顾性能和分类准确性,我们采用*MobileNetV3-Small*作为文字方向分类器。

Expand All @@ -32,32 +32,34 @@ MobileNetV3[[1](#参考文献)]于2019年发布,这个版本结合了V1的deep
</div>


## 2. 实验结果
## 实验结果

| mindspore | ascend driver | firmware | cann toolkit/kernel |
|:---------:|:---------------:|:------------:|:-------------------:|
| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.beta1 |

MobileNetV3在ImageNet上预训练。另外,我们进一步在RCTW17、MTWI和LSVT数据集上进行了文字方向分类任务的训练。

在采用图模式的ascend 910*上实验结果,mindspore版本为2.3.1
<div align="center">

| **模型** | **环境配置** | **规格** | **预训练数据集** | **训练数据集** | **准确率从** | **训练时间** | **吞吐量** | **配置文件** | **模型权重下载** |
|-------------------|----------------|--------------|----------------|------------|---------------|---------------|----------------|-----------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| MobileNetV3 | D910x4-MS2.0-G | small | ImageNet | RCTW17, MTWI, LSVT | 94.59% | 154.2 s/epoch | 5923.5 img/s | [yaml](cls_mv3.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/cls/cls_mobilenetv3-92db9c58.ckpt) |
| **模型名称** | **卡数** | **单卡批量大小** | **img/s** | **准确率** | **配置** | **权重** |
|-------------|--------|------------|-----------|---------|----------------------|------------------------------------------------------------------------------------------|
| MobileNetV3 | 4 | 256 | 5923.5 | 94.59% | [yaml](cls_mv3.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/cls/cls_mobilenetv3-92db9c58.ckpt) |
</div>


#### 注释:
- 环境配置:训练的环境配置表示为 {处理器}x{处理器数量}-{MS模式},其中 MS(MindSpore) 模式可以是 G-graph 模式或 F-pynative 模式。

## 3. 快速上手
## 快速上手

### 3.1 安装
### 安装

请参考MindOCR套件的[安装指南](https://github.com/mindspore-lab/mindocr#installation) 。

### 3.2 数据准备
### 数据准备

#### 3.2.1 ICDAR2015 数据集
#### ICDAR2015 数据集

请下载[RCTW17](https://rctw.vlrlab.net/dataset)、[MTWI](https://tianchi.aliyun.com/competition/entrance/231684/introduction)和[LSVT](https://rrc.cvc.uab.es/?ch=16&com=introduction)数据集,然后参考[数据转换](https://github.com/mindspore-lab/mindocr/blob/main/tools/dataset_converters/README_CN.md)章节对数据集和标注进行格式转换(敬请期待)
请下载[RCTW17](https://rctw.vlrlab.net/dataset)、[MTWI](https://tianchi.aliyun.com/competition/entrance/231684/introduction)和[LSVT](https://rrc.cvc.uab.es/?ch=16&com=introduction)数据集,然后参考[数据转换](https://github.com/mindspore-lab/mindocr/blob/main/tools/dataset_converters/README_CN.md)章节对数据集和标注进行格式转换。

完成数据准备工作后,数据的目录结构应该如下所示:

Expand All @@ -75,7 +77,7 @@ MobileNetV3在ImageNet上预训练。另外,我们进一步在RCTW17、MTWI和
> 用户如果想要使用自己的数据集进行训练,请参考[数据转换](https://github.com/mindspore-lab/mindocr/blob/main/tools/dataset_converters/README_CN.md)对数据集和标注进行格式转换。


### 3.3 配置说明
### 配置说明


在配置文件中更新数据集路径。其中`dataset_root`会分别和`data_dir`以及`label_file`拼接构成完整的数据集目录和标签文件路径。
Expand Down Expand Up @@ -118,30 +120,7 @@ model:
num_classes: *num_classes # 2 or 4
```


### 3.4 训练

* 单卡训练

请确保yaml文件中的`distribute`参数为`False`。

``` shell
python tools/train.py -c configs/cls/mobilenetv3/cls_mv3.yaml
```

* 分布式训练

请确保yaml文件中的`distribute`参数为`True`。

```shell
# n is the number of NPUs
mpirun --allow-run-as-root -n 4 python tools/train.py -c configs/cls/mobilenetv3/cls_mv3.yaml
yaml
```

训练结果(包括checkpoint、每个epoch的性能和曲线图)将被保存在yaml配置文件的`ckpt_save_dir`参数配置的路径下,默认为`./tmp_cls`。

### 3.5 评估
### 评估

评估环节,在yaml配置文件中将`ckpt_load_path`参数配置为checkpoint文件的路径,并设置`distribute`为`False`,然后运行:

Expand Down
Loading
Loading