mindspore-lab · CaitinZhao · Jan 15, 2025 · Jan 2, 2025 · Jan 7, 2025 · Jan 10, 2025
diff --git a/configs/cls/mobilenetv3/README.md b/configs/cls/mobilenetv3/README.md
@@ -2,9 +2,9 @@ English | [中文](README_CN.md)
 
 # MobileNetV3 for text direction classification
 
-## 1. Introduction
+## Introduction
 
-### 1.1 MobileNetV3: [Searching for MobileNetV3](https://arxiv.org/abs/1905.02244)
+### MobileNetV3: [Searching for MobileNetV3](https://arxiv.org/abs/1905.02244)
 
 MobileNetV3[[1](#references)] was published in 2019, which combines the deep separable convolution of V1, the Inverted Residuals and Linear Bottleneck of V2, and the SE (Squeeze and Excitation) module to search the configuration and parameters of the network using NAS (Neural Architecture Search). MobileNetV3 first uses MnasNet to perform a coarse structure search, and then uses reinforcement learning to select the optimal configuration from a set of discrete choices. Besides, MobileNetV3 fine-tunes the architecture using NetAdapt. Overall, MobileNetV3 is a lightweight network having good performance in classification, detection and segmentation tasks.
 
@@ -16,7 +16,7 @@ MobileNetV3[[1](#references)] was published in 2019, which combines the deep sep
 </p>
 
 
-### 1.2 Text direction classifier
+### Text direction classifier
 
 The text directions in some images are revered, so that the text cannot be regconized correctly. Therefore. we use a text direction classifier to classify and rectify the text direction. The MobileNetV3 paper releases two versions of MobileNetV3: *MobileNetV3-Large* and *MobileNetV3-Small*. Taking the tradeoff between efficiency and accuracy, we adopt the *MobileNetV3-Small* as the text direction classifier.
 
@@ -32,32 +32,34 @@ Currently we support the 0 and 180 degree classification. You can update the par
 </div>
 
 
-## 2. Results
+## Results
+
+| mindspore |  ascend driver  |   firmware   | cann toolkit/kernel |
+|:---------:|:---------------:|:------------:|:-------------------:|
+|   2.3.1   |    24.1.RC2     | 7.3.0.1.231  |    8.0.RC2.beta1    |
 
 MobileNetV3 is pretrained on ImageNet. For text direction classification task, we further train MobileNetV3 on RCTW17, MTWI and LSVT datasets.
 
+Experiments are tested on ascend 910* with mindspore 2.3.1 graph mode
 <div align="center">
 
-| **Model**         | **Context**    | **Specification** | **Pretrained dataset** |  **Training dataset** | **Accuracy** | **Train T.** | **Throughput** | **Recipe**                  | **Download**                                                                                                                                                                                         |
-|-------------------|----------------|--------------|----------------|------------|---------------|---------------|----------------|-----------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| MobileNetV3            | D910x4-MS2.0-G | small    | ImageNet | RCTW17, MTWI, LSVT | 94.59%     | 154.2 s/epoch  | 5923.5 img/s      | [yaml](cls_mv3.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/cls/cls_mobilenetv3-92db9c58.ckpt)  |
+| **model name** | **cards** | **batch size** | **img/s** | **accuracy** | **config**  | **weight**                                                                            |
+|----------------|-----------|----------------|-----------|--------------|-----------------------------------------------------|------------------------------------------------|
+| MobileNetV3    | 4         | 256            | 5923.5    | 94.59%       | [yaml](cls_mv3.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/cls/cls_mobilenetv3-92db9c58.ckpt) |
 </div>
 
 
-#### Notes
-- Context: Training context denoted as {device}x{pieces}-{MS version}{MS mode}, where MS (MindSpore) mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
-
 
 
-## 3. Quick Start
+## Quick Start
 
-### 3.1 Installation
+### Installation
 
 Please refer to the [installation instruction](https://github.com/mindspore-lab/mindocr#installation) in MindOCR.
 
-### 3.2 Dataset preparation
+### Dataset preparation
 
-Please download [RCTW17](https://rctw.vlrlab.net/dataset), [MTWI](https://tianchi.aliyun.com/competition/entrance/231684/introduction), and [LSVT](https://rrc.cvc.uab.es/?ch=16&com=introduction) datasets, and then process the images and labels in desired format referring to [dataset_converters](https://github.com/mindspore-lab/mindocr/blob/main/tools/dataset_converters/README.md) (Coming soon...).
+Please download [RCTW17](https://rctw.vlrlab.net/dataset), [MTWI](https://tianchi.aliyun.com/competition/entrance/231684/introduction), and [LSVT](https://rrc.cvc.uab.es/?ch=16&com=introduction) datasets, and then process the images and labels in desired format referring to [dataset_converters](https://github.com/mindspore-lab/mindocr/blob/main/tools/dataset_converters/README.md).
 
 The prepared dataset file struture is suggested to be as follows.
 
@@ -75,7 +77,7 @@ The prepared dataset file struture is suggested to be as follows.
 > If you want to use your own dataset for training, please convert the images and labels to the desired format referring to [dataset_converters](https://github.com/mindspore-lab/mindocr/blob/main/tools/dataset_converters/README.md).
 
 
-### 3.3 Update yaml config file
+### Update yaml config file
 
 Update the dataset directories in yaml config file. The `dataset_root` will be concatenated with `data_dir` and `label_file` respectively to be the complete image directory and label file path.
 
@@ -117,29 +119,8 @@ model:
     num_classes: *num_classes  # 2 or 4
 ```
 
-### 3.4 Training
-
-* Standalone training
-
-Please set `distribute` in yaml config file to be `False`.
-
-```shell
-python tools/train.py -c configs/cls/mobilenetv3/cls_mv3.yaml
-```
-
-* Distributed training
-
-Please set `distribute` in yaml config file to be `True`.
-
-```shell
-# n is the number of NPUs
-mpirun --allow-run-as-root -n 4 python tools/train.py -c configs/cls/mobilenetv3/cls_mv3.yaml
-```
-
-The training result (including checkpoints, per-epoch performance and curves) will be saved in the directory parsed by the arg `ckpt_save_dir` in yaml config file. The default directory is `./tmp_cls`.
-
 
-### 3.5 Evaluation
+### Evaluation
 
 Please set the checkpoint path to the arg `ckpt_load_path` in the `eval` section of yaml config file, set `distribute` to be `False`, and then run:
 

diff --git a/configs/cls/mobilenetv3/README_CN.md b/configs/cls/mobilenetv3/README_CN.md
@@ -2,9 +2,9 @@
 
 # MobileNetV3用于文字方向分类
 
-## 1. 概述
+## 概述
 
-### 1.1 MobileNetV3: [Searching for MobileNetV3](https://arxiv.org/abs/1905.02244)
+### MobileNetV3: [Searching for MobileNetV3](https://arxiv.org/abs/1905.02244)
 
 MobileNetV3[[1](#参考文献)]于2019年发布，这个版本结合了V1的deep separable convolution，V2的Inverted Residuals and Linear Bottleneck，以及SE(Squeeze and Excitation)模块，并使用NAS（Neural Architecture Search）搜索最优网络的配置和参数。MobileNetV3 首先使用 MnasNet 进行粗粒度的结构搜索，然后使用强化学习从一组离散选择中选择最优配置。另外，MobileNetV3 还使用 NetAdapt 对架构进行微调。总之，MobileNetV3是一个轻量级的网络，在分类、检测和分割任务上有不错的表现。
 
@@ -16,7 +16,7 @@ MobileNetV3[[1](#参考文献)]于2019年发布，这个版本结合了V1的deep
   <em>图 1. MobileNetV3整体架构图 [<a href="#参考文献">1</a>] </em>
 </p>
 
-### 1.2 文字方向分类器
+### 文字方向分类器
 
 在某些图片中，文字方向是反过来或不正确的，导致文字无法被正确识别。因此，我们使用了文字方向分类器来对文字方向进行分类并校正。MobileNetV3论文提出了两个版本的MobileNetV3：*MobileNetV3-Large*和*MobileNetV3-Small*。为了兼顾性能和分类准确性，我们采用*MobileNetV3-Small*作为文字方向分类器。
 
@@ -32,32 +32,34 @@ MobileNetV3[[1](#参考文献)]于2019年发布，这个版本结合了V1的deep
 </div>
 
 
-## 2. 实验结果
+## 实验结果
+
+| mindspore |  ascend driver  |   firmware   | cann toolkit/kernel |
+|:---------:|:---------------:|:------------:|:-------------------:|
+|   2.3.1   |    24.1.RC2     | 7.3.0.1.231  |    8.0.RC2.beta1    |
 
 MobileNetV3在ImageNet上预训练。另外，我们进一步在RCTW17、MTWI和LSVT数据集上进行了文字方向分类任务的训练。
 
+在采用图模式的ascend 910*上实验结果，mindspore版本为2.3.1
 <div align="center">
 
-| **模型**         | **环境配置**    | **规格** | **预训练数据集** |  **训练数据集** | **准确率从** | **训练时间** | **吞吐量** | **配置文件**                  | **模型权重下载**                                                                                                                                                                                         |
-|-------------------|----------------|--------------|----------------|------------|---------------|---------------|----------------|-----------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| MobileNetV3            | D910x4-MS2.0-G | small    | ImageNet | RCTW17, MTWI, LSVT | 94.59%     | 154.2 s/epoch  | 5923.5 img/s      | [yaml](cls_mv3.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/cls/cls_mobilenetv3-92db9c58.ckpt)  |
+| **模型名称**    | **卡数** | **单卡批量大小** | **img/s** | **准确率** | **配置**               | **权重**                                                                                   |
+|-------------|--------|------------|-----------|---------|----------------------|------------------------------------------------------------------------------------------|
+| MobileNetV3 | 4      | 256        | 5923.5    | 94.59%  | [yaml](cls_mv3.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/cls/cls_mobilenetv3-92db9c58.ckpt) |
 </div>
 
 
-#### 注释：
-- 环境配置：训练的环境配置表示为 {处理器}x{处理器数量}-{MS模式}，其中 MS(MindSpore) 模式可以是 G-graph 模式或 F-pynative 模式。
-
-## 3. 快速上手
+## 快速上手
 
-### 3.1 安装
+### 安装
 
 请参考MindOCR套件的[安装指南](https://github.com/mindspore-lab/mindocr#installation) 。
 
-### 3.2 数据准备
+### 数据准备
 
-#### 3.2.1 ICDAR2015 数据集
+#### ICDAR2015 数据集
 
-请下载[RCTW17](https://rctw.vlrlab.net/dataset)、[MTWI](https://tianchi.aliyun.com/competition/entrance/231684/introduction)和[LSVT](https://rrc.cvc.uab.es/?ch=16&com=introduction)数据集，然后参考[数据转换](https://github.com/mindspore-lab/mindocr/blob/main/tools/dataset_converters/README_CN.md)章节对数据集和标注进行格式转换（敬请期待）。
+请下载[RCTW17](https://rctw.vlrlab.net/dataset)、[MTWI](https://tianchi.aliyun.com/competition/entrance/231684/introduction)和[LSVT](https://rrc.cvc.uab.es/?ch=16&com=introduction)数据集，然后参考[数据转换](https://github.com/mindspore-lab/mindocr/blob/main/tools/dataset_converters/README_CN.md)章节对数据集和标注进行格式转换。
 
 完成数据准备工作后，数据的目录结构应该如下所示：
 
@@ -75,7 +77,7 @@ MobileNetV3在ImageNet上预训练。另外，我们进一步在RCTW17、MTWI和
 > 用户如果想要使用自己的数据集进行训练，请参考[数据转换](https://github.com/mindspore-lab/mindocr/blob/main/tools/dataset_converters/README_CN.md)对数据集和标注进行格式转换。
 
 
-### 3.3 配置说明
+### 配置说明
 
 
 在配置文件中更新数据集路径。其中`dataset_root`会分别和`data_dir`以及`label_file`拼接构成完整的数据集目录和标签文件路径。
@@ -118,30 +120,7 @@ model:
     num_classes: *num_classes  # 2 or 4
 ```
 
-
-### 3.4 训练
-
-* 单卡训练
-
-请确保yaml文件中的`distribute`参数为`False`。
-
-``` shell
-python tools/train.py -c configs/cls/mobilenetv3/cls_mv3.yaml
-```
-
-* 分布式训练
-
-请确保yaml文件中的`distribute`参数为`True`。
-
-```shell
-# n is the number of NPUs
-mpirun --allow-run-as-root -n 4 python tools/train.py -c configs/cls/mobilenetv3/cls_mv3.yaml
-yaml
-```
-
-训练结果（包括checkpoint、每个epoch的性能和曲线图）将被保存在yaml配置文件的`ckpt_save_dir`参数配置的路径下，默认为`./tmp_cls`。
-
-### 3.5 评估
+### 评估
 
 评估环节，在yaml配置文件中将`ckpt_load_path`参数配置为checkpoint文件的路径，并设置`distribute`为`False`，然后运行：