rotation angle for non native pages are coming wrong #4308

trinanjan12 · 2025-02-18T11:39:54Z

trinanjan12
Feb 18, 2025

Description of the bug

I have a use case where I upload a PDF to a pipeline, If it is native PDF, I can see the rotation angle is detected correctly by pymupdf, But when I have non native PDFs with rotated image, I am getting the page.rotation as 0, Is this an expected behaviour ? If I get the page.rotation correctly I can probably do set_rotation as 0, but for a rotated non native pdf I am getting page.rotation as 0.

Accident Fund.pdf

cc @JorjMcKie

How to reproduce the bug

doc = fitz.open()
doc[0].rotation

PyMuPDF version

1.25.1

Operating system

Linux

Python version

3.11

Answered by JorjMcKie

Feb 18, 2025

There is no such differentiation as between "native" or "non-native" PDFs.
Your example simply is a normal PDF showing a full-page image - obviously created by a scanner. The person operating the scanner didn't bother about how to dump the original on the scanner's glass and / or telling the scanner how to interpret the page orientation.
That's what you have.

You could look at what the page knows knows about the image(s) it displays. Depending on information delivered to it, the page might be aware of some transformation that has taken place to create its display, like so:

page.get_images()
[(4, 0, 1704, 2200, 1, 'DeviceGray', '', 'Im1', 'CCITTFaxDecode')]

page.get_image_info()
[{'number':

View full answer

JorjMcKie · 2025-02-18T11:49:23Z

JorjMcKie
Feb 18, 2025
Maintainer

This is not an issue but a typical "Discussions" item ... transferring.

0 replies

JorjMcKie · 2025-02-18T12:09:33Z

JorjMcKie
Feb 18, 2025
Maintainer

There is no such differentiation as between "native" or "non-native" PDFs.
Your example simply is a normal PDF showing a full-page image - obviously created by a scanner. The person operating the scanner didn't bother about how to dump the original on the scanner's glass and / or telling the scanner how to interpret the page orientation.
That's what you have.

You could look at what the page knows knows about the image(s) it displays. Depending on information delivered to it, the page might be aware of some transformation that has taken place to create its display, like so:

page.get_images()
[(4, 0, 1704, 2200, 1, 'DeviceGray', '', 'Im1', 'CCITTFaxDecode')]

page.get_image_info()
[{'number': 0, 'bbox': (0.0, -6.103515625e-05, 613.4400024414062, 792.0), 'transform': (613.4400024414062, 0.0, -0.0, 792.0000610351562, 0.0, -6.103515625e-05), 'width': 1704, 'height': 2200, 'colorspace': 1, 'cs-name': 'DeviceGray', 'xres': 96, 'yres': 96, 'bpc': 1, 'size': 35461}]

Here, the transformation matrix 'transform': (613.4400024414062, 0.0, -0.0, 792.0000610351562, 0.0, -6.103515625e-05) only contains the two scaling factors in its subitems 0 and 3. Any rotation would have shown up in positions 1 and 2, but both are 0.

So the page is not aware of any rotation for that image.
Now you are dependent on some OCR software which might or might not be clever enough to detect the text's orientation in the image.

0 replies

JorjMcKie · 2025-02-18T12:13:45Z

JorjMcKie
Feb 18, 2025
Maintainer

On your other side remark:
No, can not simply set page rotation to 0 and expect everything else will subsequently work like normal.
You must use page.remove_rotation() instead.

As I wrote before: this example already has rotation 0. The problem has been caused by the person operating the scanner. Admittedly, not all scanners support advanced options though.

1 reply

trinanjan12 Feb 18, 2025
Author

Yeah @JorjMcKie , you are right about the pdfs being scanned, Actually my Use case requires me to process Pdf that are scanned also, Maybe I will look for some OCR based solutions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rotation angle for non native pages are coming wrong #4308

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

rotation angle for non native pages are coming wrong #4308

trinanjan12 Feb 18, 2025

Description of the bug

How to reproduce the bug

PyMuPDF version

Operating system

Python version

Replies: 3 comments · 1 reply

JorjMcKie Feb 18, 2025 Maintainer

JorjMcKie Feb 18, 2025 Maintainer

JorjMcKie Feb 18, 2025 Maintainer

trinanjan12 Feb 18, 2025 Author

trinanjan12
Feb 18, 2025

Replies: 3 comments 1 reply

JorjMcKie
Feb 18, 2025
Maintainer

JorjMcKie
Feb 18, 2025
Maintainer

JorjMcKie
Feb 18, 2025
Maintainer

trinanjan12 Feb 18, 2025
Author