Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rotate page #488

Merged
merged 66 commits into from
Dec 6, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
1af0fc1
feat: integrate image rotation before using predictor
Rob192 Jul 19, 2021
811fd6a
Merge branch 'main' into rotate_page
Rob192 Jul 19, 2021
6f5c183
Merge branch 'main' into rotate_page
Rob192 Sep 21, 2021
c2b18d7
feat: add rotate_document functionality
Rob192 Sep 22, 2021
312179f
fix: remove min_angle from rotate_page
Rob192 Sep 22, 2021
b6bc74f
merge
Rob192 Sep 30, 2021
6fe1bc6
fix: correct models.predictor.tensorflow
Rob192 Sep 30, 2021
a6f2ff1
fix: minor corrections
Rob192 Oct 1, 2021
ccda4d0
feat: Rotate back images and boxes after straightening
Rob192 Oct 4, 2021
7d4ed75
fix: correct typo
Rob192 Oct 4, 2021
a303b23
fix: merge two functions rotate_image
Rob192 Oct 5, 2021
7a78263
fix: do not rotate back pages but only boxes
Rob192 Oct 5, 2021
eb341ac
fix: typos
Rob192 Oct 6, 2021
eeff2d6
fix: add more testing for remap_boxes in cases of boxes with an angle…
Rob192 Oct 6, 2021
16f3489
fix: remove the cropping after rotation of the image
Rob192 Oct 14, 2021
eb063f1
Merge branch 'main' of https://github.com/mindee/doctr
Rob192 Oct 25, 2021
b9ec27e
Merge branch 'main' into rotate_page
Rob192 Oct 25, 2021
f7fcf90
fix: correct model/_utils.py
Rob192 Oct 25, 2021
a04ab4f
Merge branch 'main' of https://github.com/mindee/doctr
Rob192 Oct 28, 2021
8ec9eab
Merge branch 'main' into rotate_page
Rob192 Oct 28, 2021
cf9ab0d
fix: do not use resolve_lines and resolve_boxes as it does not work w…
Rob192 Oct 28, 2021
8a31014
fix: remove expand in geometry.rotate_boxes
Rob192 Oct 29, 2021
e457cf0
fix: reformat code
Rob192 Oct 29, 2021
9975c82
fix: reformat expand from function signature
Rob192 Oct 29, 2021
32d53e4
fix: rename keep_original_size to preserve_aspect_ratio
Rob192 Oct 29, 2021
6821442
fix: vectorize box transformation
Rob192 Oct 29, 2021
988e2d0
Merge branch 'main' into rotate_page
Rob192 Nov 16, 2021
573f13f
fix: minor modifications + remove test_bbox_to_rbbox
Rob192 Nov 16, 2021
290e8ed
fix: add the straighten_pages to the latest codebase
Rob192 Nov 19, 2021
1775ebf
feat: add the straighten_pages to the pytorch predictor
Rob192 Nov 19, 2021
816168c
Merge branch 'main' into rotate_page
Rob192 Nov 19, 2021
f199044
feat: add testing for the straighten_pages parameter
Rob192 Nov 19, 2021
887ed25
fix: in case no angle is found in estimate_orientation return 0
Rob192 Nov 19, 2021
789a9c2
Merge branch 'main' into rotate_page
Rob192 Nov 24, 2021
239c508
fix: make sure boxes are outputted from _process_predictions
Rob192 Nov 24, 2021
52461dc
fix: update docstrings in OCRPredictor
Rob192 Nov 24, 2021
b6f8cca
fix: create a copy of boxes inside rotate_boxes
Rob192 Nov 24, 2021
a9f3d6e
fix: update docstring for rotate_image
Rob192 Nov 24, 2021
ea69de6
fix: add comments inside remap_boxes
Rob192 Nov 24, 2021
1a72a8c
fix: change testing in test_estimate_orientation
Rob192 Nov 24, 2021
d658be4
fix: change testing in test_estimate_orientation
Rob192 Nov 24, 2021
4711797
Merge branch 'main' into rotate_page
Rob192 Nov 25, 2021
e5ed562
Merge branch 'main' into rotate_page
Rob192 Nov 26, 2021
9a6c658
fix: delete imports not used
Rob192 Nov 29, 2021
a9cbe04
fix: styling
Rob192 Nov 29, 2021
ebdc320
fix: change assertion in test_utils_geometry.py
Rob192 Nov 30, 2021
d16fba3
fix: keep check with if expand in rotate_image
Rob192 Nov 30, 2021
e344e5c
fix: change rotate_boxes signature
Rob192 Nov 30, 2021
2e11dc8
fix: use loc_preds instead of boxes
Rob192 Nov 30, 2021
bb6bc79
fix: wrong test in remap boxes
Rob192 Nov 30, 2021
258b18b
Merge branch 'main' into rotate_page
Rob192 Nov 30, 2021
b22309d
add unit tests for pytorch
Rob192 Dec 1, 2021
495ad8c
add unit tests for remap_boxes and estimate_orientation
Rob192 Dec 1, 2021
98b44f6
fix: styling
Rob192 Dec 1, 2021
28f1fd8
fix: isort
Rob192 Dec 1, 2021
1d61de7
fix: remove unnecessary fixture
Rob192 Dec 1, 2021
faac0bd
fix: add testing for pytorch predictor
Rob192 Dec 2, 2021
938c9f2
fix: styling
Rob192 Dec 2, 2021
30a70f2
fix: correct testing for ocrpredictor with pytorch
Rob192 Dec 2, 2021
a7c0d55
fix: correct imports for testing
Rob192 Dec 2, 2021
ce23100
fix: isort
Rob192 Dec 2, 2021
8525b14
fix: make sure that expand in rotate_image is keeping the same image …
Rob192 Dec 3, 2021
7bcf639
fix: styling
Rob192 Dec 3, 2021
2205737
fix: use absolute centers for rotate_boxes
Rob192 Dec 5, 2021
484451b
fix: calculation of image_center and documentation
Rob192 Dec 5, 2021
66fad67
fix: remove default value for orig_shape in rotate_boxes
Rob192 Dec 5, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion doctr/models/_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,11 @@ def estimate_orientation(img: np.ndarray, n_ct: int = 50, ratio_threshold_for_li
angles.append(angle)
elif w / h < 1 / ratio_threshold_for_lines: # if lines are vertical, substract 90 degree
angles.append(angle - 90)
return -median_low(angles)

if len(angles) == 0:
return 0 # in case no angles is found
else:
return -median_low(angles)


def get_bitmap_angle(bitmap: np.ndarray, n_ct: int = 20, std_max: float = 3.) -> float:
Expand Down
22 changes: 22 additions & 0 deletions doctr/models/predictor/pytorch.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,11 @@
from torch import nn

from doctr.io.elements import Document
from doctr.models._utils import estimate_orientation
from doctr.models.builder import DocumentBuilder
from doctr.models.detection.predictor import DetectionPredictor
from doctr.models.recognition.predictor import RecognitionPredictor
from doctr.utils.geometry import rotate_boxes, rotate_image

from .base import _OCRPredictor

Expand All @@ -29,6 +31,9 @@ class OCRPredictor(nn.Module, _OCRPredictor):
without rotated textual elements.
export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
(potentially rotated) as straight bounding boxes.
straighten_pages: if True, estimates the page general orientation based on the median line orientation.
Then, rotates page before passing it to the deep learning modules. The final predictions will be remapped
accordingly. Doing so will improve performances for documents with page-uniform rotations.

"""

Expand All @@ -38,13 +43,15 @@ def __init__(
reco_predictor: RecognitionPredictor,
assume_straight_pages: bool = True,
export_as_straight_boxes: bool = False,
straighten_pages: bool = False,
) -> None:

super().__init__()
self.det_predictor = det_predictor.eval() # type: ignore[attr-defined]
self.reco_predictor = reco_predictor.eval() # type: ignore[attr-defined]
self.doc_builder = DocumentBuilder(export_as_straight_boxes=export_as_straight_boxes)
self.assume_straight_pages = assume_straight_pages
self.straighten_pages = straighten_pages

@torch.no_grad()
def forward(
Expand All @@ -57,6 +64,13 @@ def forward(
if any(page.ndim != 3 for page in pages):
raise ValueError("incorrect input shape: all pages are expected to be multi-channel 2D images.")

origin_page_shapes = [page.shape[:2] if isinstance(page, np.ndarray) else page.shape[-2:] for page in pages]

# Detect document rotation and rotate pages
if self.straighten_pages:
origin_page_orientations = [estimate_orientation(page) for page in pages]
pages = [rotate_image(page, -angle, expand=True) for page, angle in zip(pages, origin_page_orientations)]

# Localize text elements
loc_preds = self.det_predictor(pages, **kwargs)
# Check whether crop mode should be switched to channels first
Expand All @@ -70,6 +84,14 @@ def forward(

boxes, text_preds = self._process_predictions(loc_preds, word_preds)

# Rotate back pages and boxes while keeping original image size
if self.straighten_pages:
boxes = [rotate_boxes(page_boxes,
angle,
orig_shape=page.shape[:2] if isinstance(page, np.ndarray) else page.shape[-2:],
target_shape=mask) for
page_boxes, page, angle, mask in zip(boxes, pages, origin_page_orientations, origin_page_shapes)]

out = self.doc_builder(
boxes,
text_preds,
Expand Down
21 changes: 20 additions & 1 deletion doctr/models/predictor/tensorflow.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,11 @@
import tensorflow as tf

from doctr.io.elements import Document
from doctr.models._utils import estimate_orientation
from doctr.models.builder import DocumentBuilder
from doctr.models.detection.predictor import DetectionPredictor
from doctr.models.recognition.predictor import RecognitionPredictor
from doctr.utils.geometry import rotate_boxes, rotate_image
from doctr.utils.repr import NestedObject

from .base import _OCRPredictor
Expand All @@ -29,6 +31,9 @@ class OCRPredictor(NestedObject, _OCRPredictor):
without rotated textual elements.
export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
(potentially rotated) as straight bounding boxes.
straighten_pages: if True, estimates the page general orientation based on the median line orientation.
Then, rotates page before passing it to the deep learning modules. The final predictions will be remapped
accordingly. Doing so will improve performances for documents with page-uniform rotations.
"""
_children_names = ['det_predictor', 'reco_predictor']

Expand All @@ -38,13 +43,15 @@ def __init__(
reco_predictor: RecognitionPredictor,
assume_straight_pages: bool = True,
export_as_straight_boxes: bool = False,
straighten_pages: bool = False,
) -> None:

super().__init__()
self.det_predictor = det_predictor
self.reco_predictor = reco_predictor
self.doc_builder = DocumentBuilder(export_as_straight_boxes=export_as_straight_boxes)
self.assume_straight_pages = assume_straight_pages
self.straighten_pages = straighten_pages

def __call__(
self,
Expand All @@ -56,6 +63,13 @@ def __call__(
if any(page.ndim != 3 for page in pages):
raise ValueError("incorrect input shape: all pages are expected to be multi-channel 2D images.")

origin_page_shapes = [page.shape[:2] for page in pages]

# Detect document rotation and rotate pages
if self.straighten_pages:
origin_page_orientations = [estimate_orientation(page) for page in pages]
pages = [rotate_image(page, -angle, expand=True) for page, angle in zip(pages, origin_page_orientations)]

# Localize text elements
loc_preds = self.det_predictor(pages, **kwargs)
# Crop images
Expand All @@ -67,5 +81,10 @@ def __call__(

boxes, text_preds = self._process_predictions(loc_preds, word_preds)

out = self.doc_builder(boxes, text_preds, [page.shape[:2] for page in pages]) # type: ignore[misc]
# Rotate back pages and boxes while keeping original image size
if self.straighten_pages:
boxes = [rotate_boxes(page_boxes, angle, orig_shape=page.shape[:2], target_shape=mask) for
page_boxes, page, angle, mask in zip(boxes, pages, origin_page_orientations, origin_page_shapes)]

out = self.doc_builder(boxes, text_preds, origin_page_shapes) # type: ignore[misc]
return out
2 changes: 1 addition & 1 deletion doctr/utils/common_types.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
from pathlib import Path
from typing import List, Tuple, Union

__all__ = ['Point2D', 'BoundingBox', 'RotatedBbox', 'Polygon4P', 'Polygon']
__all__ = ['Point2D', 'BoundingBox', 'RotatedBbox', 'Polygon4P', 'Polygon', 'Bbox']


Point2D = Tuple[float, float]
Expand Down
99 changes: 76 additions & 23 deletions doctr/utils/geometry.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
# This program is licensed under the Apache License version 2.
# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.

import math
from typing import List, Tuple, Union
from math import ceil
from typing import List, Optional, Tuple, Union

import cv2
import numpy as np
Expand Down Expand Up @@ -145,54 +145,105 @@ def rotate_abs_boxes(boxes: np.ndarray, angle: float, img_shape: Tuple[int, int]
return rotated_boxes


def remap_boxes(loc_preds: np.ndarray, orig_shape: Tuple[int, int], dest_shape: Tuple[int, int]) -> np.ndarray:
""" Remaps a batch of rotated locpred (x, y, w, h, alpha, c) expressed for an origin_shape to a destination_shape.
This does not impact the absolute shape of the boxes, but allow to calculate the new relative RotatedBbox
coordinates after a resizing of the image.

Args:
loc_preds: (N, 6) array of RELATIVE locpred (x, y, w, h, alpha, c)
orig_shape: shape of the origin image
dest_shape: shape of the destination image

Returns:
A batch of rotated loc_preds (N, 6): (x, y, w, h, alpha, c) expressed in the destination referencial

"""

if len(dest_shape) != 2:
raise ValueError(f"Mask length should be 2, was found at: {len(dest_shape)}")
if len(orig_shape) != 2:
raise ValueError(f"Image_shape length should be 2, was found at: {len(orig_shape)}")
orig_height, orig_width = orig_shape
dest_height, dest_width = dest_shape
mboxes = loc_preds.copy()
# remaps position of the box center for the destination image shape
mboxes[:, 0] = ((loc_preds[:, 0] * orig_width) + (dest_width - orig_width) / 2) / dest_width
mboxes[:, 1] = ((loc_preds[:, 1] * orig_height) + (dest_height - orig_height) / 2) / dest_height
# remaps box dimension for the destination image shape
mboxes[:, 2] = loc_preds[:, 2] * orig_width / dest_width
mboxes[:, 3] = loc_preds[:, 3] * orig_height / dest_height
return mboxes


def rotate_boxes(
boxes: np.ndarray,
angle: float = 0.,
min_angle: float = 1.
loc_preds: np.ndarray,
angle: float,
orig_shape: Tuple[int, int],
min_angle: float = 1.,
target_shape: Optional[Tuple[int, int]] = None,
) -> np.ndarray:
"""Rotate a batch of straight bounding boxes (xmin, ymin, xmax, ymax) of an angle,
if angle > min_angle, around the center of the page.
"""Rotate a batch of straight bounding boxes (xmin, ymin, xmax, ymax, c) or rotated bounding boxes
(x, y, w, h, alpha, c) of an angle, if angle > min_angle, around the center of the page.
If target_shape is specified, the boxes are remapped to the target shape after the rotation. This
is done to remove the padding that is created by rotate_page(expand=True)

Args:
boxes: (N, 4) array of RELATIVE boxes
loc_preds: (N, 5) or (N, 6) array of RELATIVE boxes
angle: angle between -90 and +90 degrees
orig_shape: shape of the origin image
min_angle: minimum angle to rotate boxes
target_shape: shape of the target image

Returns:
A batch of rotated boxes (N, 5): (x, y, w, h, alpha) or a batch of straight bounding boxes
"""

# Change format of the boxes to rotated boxes
_boxes = loc_preds.copy()
if _boxes.shape[1] == 5:
_boxes = np.column_stack(((_boxes[:, 0] + _boxes[:, 2]) / 2,
(_boxes[:, 1] + _boxes[:, 3]) / 2,
_boxes[:, 2] - _boxes[:, 0],
_boxes[:, 3] - _boxes[:, 1],
np.zeros(_boxes.shape[0]),
_boxes[:, 4]))
# If small angle, return boxes (no rotation)
if abs(angle) < min_angle or abs(angle) > 90 - min_angle:
return boxes
return _boxes
# Compute rotation matrix
angle_rad = angle * np.pi / 180. # compute radian angle for np functions
rotation_mat = np.array([
[np.cos(angle_rad), -np.sin(angle_rad)],
[np.sin(angle_rad), np.cos(angle_rad)]
], dtype=boxes.dtype)
# Compute unrotated boxes
x_unrotated, y_unrotated = (boxes[:, 0] + boxes[:, 2]) / 2, (boxes[:, 1] + boxes[:, 3]) / 2
width, height = boxes[:, 2] - boxes[:, 0], boxes[:, 3] - boxes[:, 1]
# Rotate centers
centers = np.stack((x_unrotated, y_unrotated), axis=-1)
rotated_centers = .5 + np.matmul(centers - .5, np.transpose(rotation_mat))
x_center, y_center = rotated_centers[:, 0], rotated_centers[:, 1]
], dtype=_boxes.dtype)
# Rotate absolute centers
centers = np.stack((_boxes[:, 0] * orig_shape[1], _boxes[:, 1] * orig_shape[0]), axis=-1)
image_center = (orig_shape[1] / 2, orig_shape[0] / 2)
rotated_centers = image_center + np.matmul(centers - image_center, rotation_mat)
x_center, y_center = rotated_centers[:, 0] / orig_shape[1], rotated_centers[:, 1] / orig_shape[0]
# Compute rotated boxes
rotated_boxes = np.stack((x_center, y_center, width, height, angle * np.ones_like(boxes[:, 0])), axis=1)
rotated_boxes = np.stack((x_center, y_center, _boxes[:, 2], _boxes[:, 3], angle * np.ones_like(_boxes[:, 0]),
_boxes[:, 5]), axis=1)
# Apply a mask if requested
if target_shape is not None:
rotated_boxes = remap_boxes(rotated_boxes, orig_shape=orig_shape, dest_shape=target_shape)
return rotated_boxes


def rotate_image(
image: np.ndarray,
angle: float,
expand=False,
expand: bool = False,
preserve_origin_shape: bool = False,
) -> np.ndarray:
"""Rotate an image counterclockwise by an given angle.

Args:
image: numpy tensor to rotate
angle: rotation angle in degrees, between -90 and +90
expand: whether the image should be padded before the rotation
preserve_origin_shape: if expand is set to True, resizes the final output to the original image size

Returns:
Rotated array, padded by 0 by default.
Expand All @@ -201,14 +252,15 @@ def rotate_image(
# Compute the expanded padding
if expand:
exp_shape = compute_expanded_shape(image.shape[:-1], angle)
h_pad, w_pad = int(math.ceil(exp_shape[0] - image.shape[0])), int(math.ceil(exp_shape[1] - image.shape[1]))
h_pad, w_pad = int(max(0, ceil(exp_shape[0] - image.shape[0]))), int(
max(0, ceil(exp_shape[1] - image.shape[1])))
exp_img = np.pad(image, ((h_pad // 2, h_pad - h_pad // 2), (w_pad // 2, w_pad - w_pad // 2), (0, 0)))
else:
exp_img = image

height, width = exp_img.shape[:2]
rot_mat = cv2.getRotationMatrix2D((width / 2, height / 2), angle, 1.0)
rot_img = cv2.warpAffine(exp_img.astype(np.float32), rot_mat, (width, height))
rot_img = cv2.warpAffine(exp_img, rot_mat, (width, height))
if expand:
# Pad to get the same aspect ratio
if (image.shape[0] / image.shape[1]) != (rot_img.shape[0] / rot_img.shape[1]):
Expand All @@ -219,7 +271,8 @@ def rotate_image(
else:
h_pad, w_pad = int(rot_img.shape[1] * image.shape[0] / image.shape[1] - rot_img.shape[0]), 0
rot_img = np.pad(rot_img, ((h_pad // 2, h_pad - h_pad // 2), (w_pad // 2, w_pad - w_pad // 2), (0, 0)))
# rescale
rot_img = cv2.resize(rot_img, image.shape[:-1][::-1], interpolation=cv2.INTER_LINEAR)
if preserve_origin_shape:
# rescale
rot_img = cv2.resize(rot_img, image.shape[:-1][::-1], interpolation=cv2.INTER_LINEAR)

return rot_img
2 changes: 1 addition & 1 deletion doctr/utils/visualization.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ def polygon_patch(
# Switch to absolute coords
x, w = x * width, w * width
y, h = y * height, h * height
points = cv2.boxPoints(((x, y), (w, h), a))
points = cv2.boxPoints(((x, y), (w, h), -a))

return patches.Polygon(
points,
Expand Down
7 changes: 7 additions & 0 deletions tests/common/test_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@

from doctr.io import DocumentFile, reader
from doctr.models._utils import estimate_orientation, extract_crops, extract_rcrops, get_bitmap_angle
from doctr.utils import geometry


def test_extract_crops(mock_pdf): # noqa: F811
Expand Down Expand Up @@ -91,5 +92,11 @@ def test_get_bitmap_angle(mock_bitmap):


def test_estimate_orientation(mock_image):
assert estimate_orientation(mock_image * 0) == 0

angle = estimate_orientation(mock_image)
assert abs(angle - 30.) < 1.

rotated = geometry.rotate_image(mock_image, -angle)
angle_rotated = estimate_orientation(rotated)
assert abs(angle_rotated) < 1.
Loading