Skip to content

Remove old DB, and add PPOCRv3DB text detection model. #158

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion models/text_detection_db/LICENSE
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved

Apache License
Version 2.0, January 2004
Expand Down Expand Up @@ -187,7 +188,7 @@
same "printed page" as the copyright notice for easier
identification within third-party archives.

Copyright [yyyy] [name of copyright owner]
Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
12 changes: 7 additions & 5 deletions models/text_detection_db/README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
# DB
# PP-OCRv3 Text Detection

Real-time Scene Text Detection with Differentiable Binarization
### NOTE: the PP-OCRv3 Text Detection can be supported `opencv >= 4.8.0`.
PP-OCRv3: More Attempts for the Improvement of Ultra Lightweight OCR System

Note:

- Models source: [here](https://drive.google.com/drive/folders/1qzNCHfUJOS0NEUOIKn69eCtxdlNPpWbq).
- Original Paddle Models source of English: [here](https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar).
- Original Paddle Models source of Chinese: [here](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar).
- `IC15` in the filename means the model is trained on [IC15 dataset](https://rrc.cvc.uab.es/?ch=4&com=introduction), which can detect English text instances only.
- `TD500` in the filename means the model is trained on [TD500 dataset](http://www.iapr-tc11.org/mediawiki/index.php/MSRA_Text_Detection_500_Database_(MSRA-TD500)), which can detect both English & Chinese instances.
- Visit https://docs.opencv.org/master/d4/d43/tutorial_dnn_text_spotting.html for more information.
Expand Down Expand Up @@ -35,6 +37,6 @@ All files in this directory are licensed under [Apache 2.0 License](./LICENSE).

## Reference

- https://arxiv.org/abs/1911.08947
- https://github.com/MhLiao/DB
- https://arxiv.org/abs/2206.03001
- https://github.com/PaddlePaddle/PaddleOCR
- https://docs.opencv.org/master/d4/d43/tutorial_dnn_text_spotting.html
12 changes: 9 additions & 3 deletions models/text_detection_db/db.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
import numpy as np
import cv2 as cv

class DB:
class PPOCRv3DB:
def __init__(self, modelPath, inputSize=[736, 736], binaryThreshold=0.3, polygonThreshold=0.5, maxCandidates=200, unclipRatio=2.0, backendId=0, targetId=0):
self._modelPath = modelPath
self._model = cv.dnn_TextDetectionModel_DB(
Expand All @@ -32,7 +32,10 @@ def __init__(self, modelPath, inputSize=[736, 736], binaryThreshold=0.3, polygon
self._model.setUnclipRatio(self._unclipRatio)
self._model.setMaxCandidates(self._maxCandidates)

self._model.setInputParams(1.0/255.0, self._inputSize, (122.67891434, 116.66876762, 104.00698793))
self._model.setInputSize(self._inputSize)
self._model.setInputMean((123.675, 116.28, 103.53))
self._model.setInputScale(1.0/255.0/np.array([0.229, 0.224, 0.225]))
self._model.setInputSwapRB(True)

@property
def name(self):
Expand All @@ -46,7 +49,10 @@ def setBackendAndTarget(self, backendId, targetId):

def setInputSize(self, input_size):
self._inputSize = tuple(input_size)
self._model.setInputParams(1.0/255.0, self._inputSize, (122.67891434, 116.66876762, 104.00698793))
self._model.setInputSize(self._inputSize)
self._model.setInputMean((123.675, 116.28, 103.53))
self._model.setInputScale(1.0/255.0/np.array([0.229, 0.224, 0.225]))
self._model.setInputSwapRB(True)

def infer(self, image):
assert image.shape[0] == self._inputSize[1], '{} (height of input image) != {} (preset height)'.format(image.shape[0], self._inputSize[1])
Expand Down
10 changes: 5 additions & 5 deletions models/text_detection_db/demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
import numpy as np
import cv2 as cv

from db import DB
from db import PPOCRv3DB
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to update benchmark related code and stats. Ping @WanliZhong to help.


# Check OpenCV version
assert cv.__version__ >= "4.7.0", \
Expand All @@ -24,11 +24,11 @@
[cv.dnn.DNN_BACKEND_CANN, cv.dnn.DNN_TARGET_NPU]
]

parser = argparse.ArgumentParser(description='Real-time Scene Text Detection with Differentiable Binarization (https://arxiv.org/abs/1911.08947).')
parser = argparse.ArgumentParser(description='PP-OCRv3 Text Detection (https://arxiv.org/abs/2206.03001).')
parser.add_argument('--input', '-i', type=str,
help='Usage: Set path to the input image. Omit for using default camera.')
parser.add_argument('--model', '-m', type=str, default='text_detection_DB_TD500_resnet18_2021sep.onnx',
help='Usage: Set model path, defaults to text_detection_DB_TD500_resnet18_2021sep.onnx.')
parser.add_argument('--model', '-m', type=str, default='./text_detection_en_ppocrv3_2023may.onnx',
help='Usage: Set model path, defaults to text_detection_en_ppocrv3_2023may.onnx.')
parser.add_argument('--backend_target', '-bt', type=int, default=0,
help='''Choose one of the backend-target pair to run this demo:
{:d}: (default) OpenCV implementation + CPU,
Expand Down Expand Up @@ -71,7 +71,7 @@ def visualize(image, results, box_color=(0, 255, 0), text_color=(0, 0, 255), isC
target_id = backend_target_pairs[args.backend_target][1]

# Instantiate DB
model = DB(modelPath=args.model,
model = PPOCRv3DB(modelPath=args.model,
inputSize=[args.width, args.height],
binaryThreshold=args.binary_threshold,
polygonThreshold=args.polygon_threshold,
Expand Down

This file was deleted.

This file was deleted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this and push with LFS.

Binary file not shown.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this and push with LFS.

Binary file not shown.