update doc

Liyulingyue · Aug 23, 2022 · 9c424ff · 9c424ff
1 parent d5d78b4
commit 9c424ff
Show file tree

Hide file tree

Showing 13 changed files with 192 additions and 124 deletions.
diff --git a/__init__.py b/__init__.py
@@ -16,5 +16,6 @@
 __version__ = paddleocr.VERSION
 __all__ = [
     'PaddleOCR', 'PPStructure', 'draw_ocr', 'draw_structure_result',
-    'save_structure_res', 'download_with_progressbar'
+    'save_structure_res', 'download_with_progressbar', 'sorted_layout_boxes',
+    'convert_info_docx'
 ]
diff --git a/deploy/hubserving/readme.md b/deploy/hubserving/readme.md
@@ -20,16 +20,16 @@ PaddleOCR提供2种服务部署方式：
 
 # 基于PaddleHub Serving的服务部署
 
-hubserving服务部署目录下包括文本检测、文本方向分类，文本识别、文本检测+文本方向分类+文本识别3阶段串联，表格识别、PP-Structure和版面分析七种服务包，请根据需求选择相应的服务包进行安装和启动。目录结构如下：
+hubserving服务部署目录下包括文本检测、文本方向分类，文本识别、文本检测+文本方向分类+文本识别3阶段串联，版面分析、表格识别和PP-Structure七种服务包，请根据需求选择相应的服务包进行安装和启动。目录结构如下：
 ```
 deploy/hubserving/
   └─  ocr_cls     文本方向分类模块服务包
   └─  ocr_det     文本检测模块服务包
   └─  ocr_rec     文本识别模块服务包
   └─  ocr_system  文本检测+文本方向分类+文本识别串联服务包
+  └─  structure_layout  版面分析服务包
   └─  structure_table  表格识别服务包
   └─  structure_system  PP-Structure服务包
-  └─  structure_layout  版面分析服务包
 ```
 
 每个服务包下包含3个文件。以2阶段串联服务包为例，目录如下：
@@ -42,9 +42,9 @@ deploy/hubserving/ocr_system/
 ```
 ## 1. 近期更新
 
+* 2022.08.23 新增版面分析服务。
 * 2022.05.05 新增PP-OCRv3检测和识别模型。
 * 2022.03.30 新增PP-Structure和表格识别两种服务。
-* 2022.08.23 新增版面分析服务。
 
 ## 2. 快速启动服务
 以下步骤以检测+识别2阶段串联服务为例，如果只需要检测服务或识别服务，替换相应文件路径即可。

diff --git a/deploy/hubserving/readme_en.md b/deploy/hubserving/readme_en.md
@@ -20,16 +20,16 @@ PaddleOCR provides 2 service deployment methods:
 
 # Service deployment based on PaddleHub Serving  
 
-The hubserving service deployment directory includes seven service packages: text detection, text angle class, text recognition, text detection+text angle class+text recognition three-stage series connection, table recognition, PP-Structure and layout analysis. Please select the corresponding service package to install and start service according to your needs. The directory is as follows:  
+The hubserving service deployment directory includes seven service packages: text detection, text angle class, text recognition, text detection+text angle class+text recognition three-stage series connection, layout analysis, table recognition and PP-Structure. Please select the corresponding service package to install and start service according to your needs. The directory is as follows:  
 ```
 deploy/hubserving/
   └─  ocr_det     text detection module service package
   └─  ocr_cls     text angle class module service package
   └─  ocr_rec     text recognition module service package
   └─  ocr_system  text detection+text angle class+text recognition three-stage series connection service package
+  └─  structure_layout  layout analysis service package
   └─  structure_table  table recognition service package
   └─  structure_system  PP-Structure service package
-  └─  structure_layout  layout analysis service package
 ```
 
 Each service pack contains 3 files. Take the 2-stage series connection service package as an example, the directory is as follows:  

diff --git a/paddleocr.py b/paddleocr.py
@@ -562,7 +562,7 @@ def __init__(self, **kwargs):
             params.table_model_dir,
             os.path.join(BASE_DIR, 'whl', 'table'), table_model_config['url'])
         layout_model_config = get_model_config(
-            'STRUCTURE', params.structure_version, 'layout', 'ch')
+            'STRUCTURE', params.structure_version, 'layout', lang)
         params.layout_model_dir, layout_url = confirm_model_dir_url(
             params.layout_model_dir,
             os.path.join(BASE_DIR, 'whl', 'layout'), layout_model_config['url'])
@@ -584,7 +584,7 @@ def __init__(self, **kwargs):
         logger.debug(params)
         super().__init__(params)
 
-    def __call__(self, img, return_ocr_result_in_table=False):
+    def __call__(self, img, return_ocr_result_in_table=False, img_idx=0):
         if isinstance(img, str):
             # download net image
             if img.startswith('http'):
@@ -602,7 +602,8 @@ def __call__(self, img, return_ocr_result_in_table=False):
         if isinstance(img, np.ndarray) and len(img.shape) == 2:
             img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
 
-        res, _ = super().__call__(img, return_ocr_result_in_table)
+        res, _ = super().__call__(
+            img, return_ocr_result_in_table, img_idx=img_idx)
         return res
 
 
@@ -637,25 +638,54 @@ def main():
                 for line in result:
                     logger.info(line)
         elif args.type == 'structure':
-            result = engine(img_path)
-            save_structure_res(result, args.output, img_name)
+            img, flag_gif, flag_pdf = check_and_read(img_path)
+            if not flag_gif and not flag_pdf:
+                img = cv2.imread(img_path)
 
-            if args.recovery:
-                try:
-                    from ppstructure.recovery.recovery_to_doc import sorted_layout_boxes, convert_info_docx
-                    img = cv2.imread(img_path)
+            if not flag_pdf:
+                if img is None:
+                    logger.error("error in loading image:{}".format(image_file))
+                    continue
+                img_paths = [[img_path, img]]
+            else:
+                img_paths = []
+                for index, pdf_img in enumerate(img):
+                    os.makedirs(
+                        os.path.join(args.output, img_name), exist_ok=True)
+                    pdf_img_path = os.path.join(args.output, img_name, img_name
+                                                + '_' + str(index) + '.jpg')
+                    cv2.imwrite(pdf_img_path, pdf_img)
+                    img_paths.append([pdf_img_path, pdf_img])
+
+            all_res = []
+            for index, (new_img_path, img) in enumerate(img_paths):
+                logger.info('processing {}/{} page:'.format(index + 1,
+                                                            len(img_paths)))
+                new_img_name = os.path.basename(new_img_path).split('.')[0]
+                result = engine(new_img_path, img_idx=index)
+                save_structure_res(result, args.output, img_name, index)
+
+                if args.recovery and result != []:
+                    from copy import deepcopy
+                    from ppstructure.recovery.recovery_to_doc import sorted_layout_boxes
                     h, w, _ = img.shape
-                    res = sorted_layout_boxes(result, w)
-                    convert_info_docx(img, res, args.output, img_name,
+                    result_cp = deepcopy(result)
+                    result_sorted = sorted_layout_boxes(result_cp, w)
+                    all_res += result_sorted
+
+                for item in result:
+                    item.pop('img')
+                    item.pop('res')
+                    logger.info(item)
+                logger.info('result save to {}'.format(args.output))
+
+            if args.recovery and all_res != []:
+                try:
+                    from ppstructure.recovery.recovery_to_doc import convert_info_docx
+                    convert_info_docx(img, all_res, args.output, img_name,
                                       args.save_pdf)
                 except Exception as ex:
                     logger.error(
                         "error in layout recovery image:{}, err msg: {}".format(
                             img_name, ex))
                     continue
-
-            for item in result:
-                item.pop('img')
-                item.pop('res')
-                logger.info(item)
-            logger.info('result save to {}'.format(args.output))
diff --git a/ppstructure/docs/quickstart.md b/ppstructure/docs/quickstart.md
@@ -102,6 +102,8 @@ paddleocr --image_dir=ppstructure/docs/table/table.jpg --type=structure --layout
 paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --recovery=true
 # 英文测试图
 paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --recovery=true --lang='en'
+# pdf测试文件
+paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --lang='en'
 ```
 
 <a name="22"></a>

diff --git a/ppstructure/docs/quickstart_en.md b/ppstructure/docs/quickstart_en.md
@@ -85,6 +85,8 @@ Please refer to: [Key Information Extraction](../kie/README.md) .
 paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/1.png --type=structure --recovery=true
 # English pic
 paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/1.png --type=structure --recovery=true --lang='en'
+# pdf file
+paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --lang='en'
 ```
 
 <a name="22"></a>

diff --git a/ppstructure/docs/recovery/UnrealText.pdf b/ppstructure/docs/recovery/UnrealText.pdf
diff --git a/ppstructure/docs/recovery/recovery_ch.jpg b/ppstructure/docs/recovery/recovery_ch.jpg
diff --git a/ppstructure/layout/README_ch.md b/ppstructure/layout/README_ch.md
@@ -160,11 +160,13 @@ json文件包含所有图像的标注，数据以字典嵌套的方式存放，
 ```
 mkdir pretrained_model
 cd pretrained_model
-# 下载PubLayNet预训练模型
-wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_layout.pdparams
+# 下载PubLayNet预训练模型（直接体验模型评估、预测、动转静）
+wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout.pdparams
+# 下载PubLaynet推理模型（直接体验模型推理）
+wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar
 ```
 
-下载更多[版面分析模型](../docs/models_list.md)（中文CDLA数据集预训练模型、表格预训练模型）
+如果测试图片为中文，可以下载中文CDLA数据集的预训练模型，识别10类文档区域：Table、Figure、Figure caption、Table、Table caption、Header、Footer、Reference、Equation，在[版面分析模型](../docs/models_list.md)中下载`picodet_lcnet_x1_0_fgd_layout_cdla`模型的训练模型和推理模型。如果只检测图片中的表格区域，可以下载表格数据集的预训练模型，在[版面分析模型](../docs/models_list.md)中下载`picodet_lcnet_x1_0_fgd_layout_table`模型的训练模型和推理模型。
 
 ### 4.1. 启动训练
 
@@ -216,14 +218,14 @@ TestDataset:
 # 单卡训练
 export CUDA_VISIBLE_DEVICES=0
 python3 tools/train.py \
-	-c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \
-	--eval
+    -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \
+    --eval
 
 # 多卡训练，通过--gpus参数指定卡号
 export CUDA_VISIBLE_DEVICES=0,1,2,3
 python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py \
-	-c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \
-	--eval
+    -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \
+    --eval
 ```
 
 **注意：**如果训练时显存out memory，将TrainReader中batch_size调小，同时LearningRate中base_lr等比例减小。发布的config均由8卡训练得到，如果改变GPU卡数为1，那么base_lr需要减小8倍。
@@ -252,9 +254,9 @@ PaddleDetection支持了基于FGD([Focal and Global Knowledge Distillation for D
 # 单卡训练
 export CUDA_VISIBLE_DEVICES=0
 python3 tools/train.py \
-	-c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \
-	--slim_config configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x2_5_layout.yml \
-	--eval
+    -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \
+    --slim_config configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x2_5_layout.yml \
+    --eval
 ```
 
 - `-c`: 指定模型配置文件。
@@ -269,8 +271,8 @@ python3 tools/train.py \
 ```bash
 # GPU 评估， weights 为待测权重
 python3 tools/eval.py \
-	-c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \
-	-o weights=./output/picodet_lcnet_x1_0_layout/best_model
+    -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \
+    -o weights=./output/picodet_lcnet_x1_0_layout/best_model
 ```
 
 会输出以下信息，打印出mAP、AP0.5等信息。
@@ -292,13 +294,13 @@ python3 tools/eval.py \
 [08/15 07:07:09] ppdet.engine INFO: Best test bbox ap is 0.935.
 ```
 
-使用FGD蒸馏模型进行评估：
+若使用**提供的预训练模型进行评估**，或使用**FGD蒸馏训练的模型**，更换`weights`模型路径，执行如下命令进行评估：
 
 ```
 python3 tools/eval.py \
-	-c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \
-	--slim_config configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x2_5_layout.yml \
-	-o weights=output/picodet_lcnet_x2_5_layout/best_model
+    -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \
+    --slim_config configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x2_5_layout.yml \
+    -o weights=output/picodet_lcnet_x2_5_layout/best_model
 ```
 
 - `-c`: 指定模型配置文件。
@@ -325,18 +327,16 @@ python3 tools/infer.py \
 - `--output_dir`: 指定可视化结果保存路径。
 - `--draw_threshold`:指定绘制结果框的NMS阈值。
 
-预测图片如下所示，图片会存储在`output_dir`路径中。
-
-使用FGD蒸馏模型进行测试：
+若使用**提供的预训练模型进行预测**，或使用**FGD蒸馏训练的模型**，更换`weights`模型路径，执行如下命令进行预测：
 
 ```
 python3 tools/infer.py \
-	-c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \
-	--slim_config configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x2_5_layout.yml \
-	-o weights='output/picodet_lcnet_x2_5_layout/best_model.pdparams' \
-	--infer_img='docs/images/layout.jpg' \
-	--output_dir=output_dir/ \
-	--draw_threshold=0.5
+    -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \
+    --slim_config configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x2_5_layout.yml \
+    -o weights='output/picodet_lcnet_x2_5_layout/best_model.pdparams' \
+    --infer_img='docs/images/layout.jpg' \
+    --output_dir=output_dir/ \
+    --draw_threshold=0.5
 ```
 
 
@@ -351,9 +351,9 @@ inference 模型（`paddle.jit.save`保存的模型） 一般是模型训练，
 
 ```bash
 python3 tools/export_model.py \
-	-c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \
-	-o weights=output/picodet_lcnet_x1_0_layout/best_model \
-	--output_dir=output_inference/
+    -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \
+    -o weights=output/picodet_lcnet_x1_0_layout/best_model \
+    --output_dir=output_inference/
 ```
 
 * 如无需导出后处理，请指定：`-o export.benchmark=True`（如果-o已出现过，此处删掉-o）
@@ -368,27 +368,27 @@ output_inference/picodet_lcnet_x1_0_layout/
     └── model.pdmodel           # inference模型的模型结构文件
 ```
 
-FGD蒸馏模型转inference模型步骤如下：
+若使用**提供的预训练模型转Inference模型**，或使用**FGD蒸馏训练的模型**，更换`weights`模型路径，模型转inference模型步骤如下：
 
 ```bash
 python3 tools/export_model.py \
-	-c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \
-	--slim_config configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x2_5_layout.yml \
-	-o weights=./output/picodet_lcnet_x2_5_layout/best_model \
-	--output_dir=output_inference/
+    -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \
+    --slim_config configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x2_5_layout.yml \
+    -o weights=./output/picodet_lcnet_x2_5_layout/best_model \
+    --output_dir=output_inference/
 ```
 
 
 
 ### 6.2 模型推理
 
-版面恢复任务进行推理，可以执行如下命令：
+若使用**提供的推理训练模型推理**，或使用**FGD蒸馏训练的模型**，更换`model_dir`推理模型路径，执行如下命令进行推理：
 
 ```bash
 python3 deploy/python/infer.py \
-	--model_dir=output_inference/picodet_lcnet_x1_0_layout/ \
-	--image_file=docs/images/layout.jpg \
-	--device=CPU
+    --model_dir=output_inference/picodet_lcnet_x1_0_layout/ \
+    --image_file=docs/images/layout.jpg \
+    --device=CPU
 ```
 
 - --device：指定GPU、CPU设备