mindspore-lab · SamitHuang · May 5, 2023 · May 4, 2023 · hadipash · May 5, 2023
diff --git a/README.md b/README.md
@@ -107,7 +107,7 @@ Coming soon
 
 #### 2.3 Inference with native MindSpore
 
-Coming soon
+MindOCR provides the detection and recognition prediction pipeline using MindOCR-trained ckpt files. Please check [here](docs/en/predict_ckpt.md).
 
 ## Model List
 
@@ -159,7 +159,7 @@ After downloading these datasets in the `DATASETS_DIR` folder, you can run `bash
 
 ### Change Log
 - 2023/04/21
-1. Add parameter grouping to support flexible regularization in training. Usage: add `grouping_strategy` argment in yaml config to select a predefined grouping strategy, or use `no_weight_decay_params` argument to pick layers to exclude from weight decay (e.g., bias, norm). Example can be referred in `configs/rec/crnn/crnn_icdar15.yaml` 
+1. Add parameter grouping to support flexible regularization in training. Usage: add `grouping_strategy` argument in yaml config to select a predefined grouping strategy, or use `no_weight_decay_params` argument to pick layers to exclude from weight decay (e.g., bias, norm). Example can be referred in `configs/rec/crnn/crnn_icdar15.yaml` 
 2. Add gradient accumulation to support large batch size training. Usage: add `gradient_accumulation_steps` in yaml config, the global batch size = batch_size * devices * gradient_accumulation_steps. Example can be referred in `configs/rec/crnn/crnn_icdar15.yaml`
 3. Add gradient clip to support training stablization. Enable it by setting `grad_clip` as True in yaml config.
 

diff --git a/README_CN.md b/README_CN.md
@@ -98,7 +98,7 @@ MindOCR集成了MX推理引擎，支持文本检测识别任务，请参考[mx_i
 
 #### 2.3 使用原生MindSpore推理
 
-MindOCR支持使用MindSpore训练好的ckpt文件进行文本检测+文本识别串联推理，请参考[此处](docs/cn/predict_ckpt_cn.md)。
+MindOCR支持使用MindOCR训练好的ckpt文件进行文本检测+文本识别串联推理，请参考[此处](docs/cn/predict_ckpt_cn.md)。
 
 ## 模型列表
 

diff --git a/docs/cn/predict_ckpt_cn.md b/docs/cn/predict_ckpt_cn.md
@@ -1,6 +1,6 @@
 # MindOCR串联推理
 
-本文档介绍如何使用MindSpore训练出来的ckpt文件进行文本检测+文本识别的串联推理。
+本文档介绍如何使用MindOCR训练出来的ckpt文件进行文本检测+文本识别的串联推理。
 
 ## 1. 支持的串联模型组合
 
@@ -14,15 +14,17 @@
 
 ### 2.1 环境配置
 
-| 环境/设备   | 版本    |
+| 环境        | 版本    |
 |-----------|-------|
 | MindSpore | >=1.9 |
 | Python    | >=3.7 |
 
 
 ### 2.2 参数配置
 
-参数配置包含两部分：（1）模型yaml配置文件（2）推理脚本`tools/predict/text/predict_system.py`中的args参数。
+参数配置包含两部分：
+- （1）模型yaml配置文件
+- （2）推理脚本`tools/predict/text/predict_system.py`中的args参数。
 
 **注意：如果在（2）中传入args参数值，则会覆盖（1）yaml配置文件中的相应参数值；否则，将会使用yaml配置文件中的默认参数值，您可以手动更新yaml配置文件中的参数值。**
 
@@ -31,6 +33,7 @@
    检测模型和识别模型各有一个yaml配置文件。请重点关注这**两个**文件中`predict`模块内的内容，重点参数如下。
 
    ```yaml
+   # 检测模型或识别模型的yaml配置文件
    ...
    predict:
      ckpt_load_path: tmp_det/best.ckpt              <--- args.det_ckpt_path覆盖检测yaml, args.rec_ckpt_path覆盖识别yaml; 或手动更新该值
@@ -50,7 +53,7 @@
        ...
    ```
 
-#### (2) args参数列表
+#### (2) 推理脚本`predict_system.py`的args参数列表
 
    | 参数名            | 含义                                   | 默认值                                     |
    |--------------------------------------|-----------------------------------------| -------- |
@@ -76,7 +79,7 @@
 
 ### 2.4 精度评估
 
-   推理完成后，图片名、文字检测框（points）和识别的文字（trancription）将保存在args.result_save_path。推理结果文件格式示例如下：
+   推理完成后，图片名、文字检测框(`points`)和识别的文字(`trancription`)将保存在args.result_save_path。推理结果文件格式示例如下：
    ```text
    img_1.jpg	[{"transcription": "hello", "points": [600, 150, 715, 157, 714, 177, 599, 170]}, {"transcription": "world", "points": [622, 126, 695, 129, 694, 154, 621, 151]}, ...]
    img_2.jpg	[{"transcription": "apple", "points": [553, 338, 706, 318, 709, 342, 556, 362]}, ...]

diff --git a/docs/en/predict_ckpt.md b/docs/en/predict_ckpt.md
@@ -0,0 +1,91 @@
+# MindOCR detection and recognition prediction pipeline
+
+This doc introduces how to run the detection and recognition prediction pipeline using MindOCR-trained ckpt files.
+
+## 1. Pipeline model lists
+
+| Text detection + text recognition pipeline | Datasets                                                          | Inference acc |
+|--------------------------------------------|-------------------------------------------------------------------|---------------|
+| DBNet+CRNN                                 | [ICDAR15](https://rrc.cvc.uab.es/?ch=4&com=downloads)<sup>*</sup> | 55.99%        |
+
+> *We use Test Set in ICDAR15 Task 4.1.
+
+## 2. Quick start
+
+### 2.1 Dependency
+
+| Environment | Version |
+|-------------|---------|
+| MindSpore   | >=1.9   |
+| Python      | >=3.7   |
+
+
+### 2.2 Argument configuration
+
+Argument configuration includes two parts: 
+- (1) yaml config file
+- (2) args in `tools/predict/text/predict_system.py`
+
+**Note that if you set the values of the args by (2), those args values will overwrite their counterparts in (1) yaml config file. 
+Otherwise, the args values in (1) yaml config file will be used by default. You can also update the args values in yaml config file directly.**
+
+#### (1) yaml config file
+
+   Detection model and recognition model have one yaml config file respectively. Please pay attention to the `predict` module in both detection and recognition config files. The important args are listed below.
+
+   ```yaml
+   # yaml config file for detection model or recognition model
+   ...
+   predict:
+     ckpt_load_path: tmp_det/best.ckpt              <--- args.det_ckpt_path (if set) overwrites it in det yaml, args.rec_ckpt_path (if set) overwrites it in rec yaml; or update it here directly
+     dataset_sink_mode: False
+     dataset:
+       type: PredictDataset
+       dataset_root: path/to/dataset_root           <--- args.raw_data_dir (if set) overwrites it in det yaml, args.crop_save_dir (if set) overwrites it in rec yaml; or update it here directly
+       data_dir: ic15/det/test/ch4_test_images      <--- args.raw_data_dir (if set) overwrites it in det yaml, args.crop_save_dir (if set) overwrites it in rec yaml; or update it here directly
+       sample_ratio: 1.0
+       transform_pipeline:
+         ...
+       output_columns: [ 'img_path', 'image', 'raw_img_shape' ]
+       num_columns_to_net: 1
+     loader:
+       shuffle: False
+       batch_size: 1
+       ...
+   ```
+
+#### (2) args list in prediction script `predict_system.py`
+   | Argument          | Explanation                                                                                                        | Default                                    |
+   |-------------------|-----------------------------------------| -------- |
+   | raw_data_dir      | Directory of raw data to be predicted                                                                              | -                                       |
+   | det_ckpt_path     | Path of detection model ckpt file                                                                                  | -                                       |
+   | rec_ckpt_path     | Path of recognition model ckpt file                                                                                | -                                       |
+   | det_config_path   | Path of detection model yaml config file                                                                           | 'configs/det/dbnet/db_r50_icdar15.yaml' |
+   | rec_config_path   | Path of recognition model yaml config file                                                                         | 'configs/rec/crnn/crnn_resnet34.yaml'   |
+   | crop_save_dir     | Directory for saving the cropped images after detection, i.e., **directory of input images for recognition model** | 'predict_result/crop'                   |
+   | result_save_path  | Path for saving the pipeline prediction results                                                                    | 'predict_result/ckpt_pred_result.txt'   |
+
+
+### 2.3 Prediction
+
+   Run the following command to start the detection and recognition prediction pipeline. **Note that the args values below will overwrite their counterparts in yaml config file.**
+
+   ```bash
+   python tools/predict/text/predict_system.py \
+                --raw_data_dir path/to/raw_data \
+                --det_ckpt_path path/to/detection_ckpt \
+                --rec_ckpt_path path/to/recognition_ckpt
+   ```
+
+### 2.4 Evaluation of prediction results
+   After the prediction finishes, the results including image names, bounding boxes (`points`) and recognized texts (`transcription`) will be saved in `args.result_save_path`. The format of prediction results is shown below.
+   ```text
+   img_1.jpg	[{"transcription": "hello", "points": [600, 150, 715, 157, 714, 177, 599, 170]}, {"transcription": "world", "points": [622, 126, 695, 129, 694, 154, 621, 151]}, ...]
+   img_2.jpg	[{"transcription": "apple", "points": [553, 338, 706, 318, 709, 342, 556, 362]}, ...]
+   ...
+   ```
+   Prepare the **ground truth** file (in the same format as above) and **prediction results** file, and then run the following command to evaluate the prediction results.
+   ```bash
+   cd deploy/eval_utils
+   python eval_pipeline.py --gt_path path/to/gt.txt --pred_path path/to/ckpt_pred_result.txt
+   ```
diff --git a/tools/predict/text/utils_predict.py b/tools/predict/text/utils_predict.py
@@ -19,18 +19,18 @@ def check_args(args):
         print("WARNING: The 'det_ckpt_path' is empty. The detection predictor will load the ckpt from 'ckpt_loat_path' in det yaml file. ")
     else:
         if not os.path.isfile(args.det_ckpt_path):
-            raise ValueError("The ckpt file of detection model is not existed. Please check the arg 'det_ckpt_path'.")
+            raise ValueError("The ckpt file of detection model does not exist. Please check the arg 'det_ckpt_path'.")
 
     if not args.rec_ckpt_path:
         print("WARNING: The 'rec_ckpt_path' is empty. The recognition predictor will load the ckpt from 'ckpt_loat_path' in rec yaml file. ")
     else:
         if not os.path.isfile(args.rec_ckpt_path):
-            raise ValueError("The ckpt file of recognition model is not existed. Please check the arg 'rec_ckpt_path'.")
+            raise ValueError("The ckpt file of recognition model does not exist. Please check the arg 'rec_ckpt_path'.")
 
     if not os.path.isfile(args.det_config_path):
-        raise ValueError("The detection model yaml config file is not existed. Please check the arg 'det_config_path'.")
+        raise ValueError("The detection model yaml config file does not exist. Please check the arg 'det_config_path'.")
     if not os.path.isfile(args.rec_config_path):
-        raise ValueError("The recognition model yaml config file is not existed. Please check the arg 'rec_config_path'.")
+        raise ValueError("The recognition model yaml config file does not exist. Please check the arg 'rec_config_path'.")
 
     if args.crop_save_dir:
         args.crop_save_dir = os.path.realpath(args.crop_save_dir)
@@ -61,13 +61,13 @@ def update_config(args, cfg, model_type):
             cfg.predict.dataset.dataset_root = args.raw_data_dir
             cfg.predict.dataset.data_dir = '.'
         if args.det_ckpt_path:
-            cfg.ckpt_load_path = args.det_ckpt_path
+            cfg.predict.ckpt_load_path = args.det_ckpt_path
     elif model_type == 'rec':
         if args.crop_save_dir:
             cfg.predict.dataset.dataset_root = args.crop_save_dir
             cfg.predict.dataset.data_dir = '.'
         if args.rec_ckpt_path:
-            cfg.ckpt_load_path = args.rec_ckpt_path
+            cfg.predict.ckpt_load_path = args.rec_ckpt_path
     else:
         raise ValueError(f"Invalid value of 'model_type'. It must be 'det' or 'rec'.")