Skip to content

Issue with coordinates of text and table when cropbox is smaller than mediabox and rotated #4394

Open
@AllisonHu64

Description

@AllisonHu64

Description of the bug

Based on my research, Mediabox defines size of the pdf page. Cropbox defines the rect of the page displayed by PDF Viewers. Pixmap displays the area intersected with Cropbox and Clip. And from my understanding and tests, the coordinates of text retrieved by functions such as Page.get_text() is with respect to the Cropbox and unrotated. (Pymupdf should have better documentation regarding the coordinates.) I recently discovered that after using the function Page.find_tables() on a rotated page where cropbox is smaller than mediabox. The Cropbox value changes, resulting subsequent Page.get_text() coordinates to change.

Can someone please take a look? Thank you so much!

How to reproduce the bug

Please download the test pdf attached.
Install the following pymupdf in a clean virual environment
python -m venv venv pip install pymupdf=="1.25.3"

Run the following code
`
import fitz

# Read the pdf file
pdf_document = fitz.open("test.pdf")
page = pdf_document.load_page(0)
print("Before: ", page.cropbox, page.cropbox_position, page.rotation_matrix)
print("Before: ", page.search_for("第七章"))

print("\n")
# Find the tables
tables = page.find_tables()
print("After: ", page.cropbox, page.cropbox_position, page.rotation_matrix)
print("After: ", page.search_for("第七章"))
`

The result will look something like this. The page's Cropbox and its related property changes after running page.find_tables
`
Before: Rect(36.0, 36.0, 559.0, 805.9000244140625) Point(36.0, 36.0) Matrix(0.0, 1.0, -1.0, 0.0, 769.9000244140625, 0.0)
Before: [Rect(194.8800048828125, 38.02104568481445, 237.0097198486328, 52.66114807128906)]

After: Rect(0.0, 0.0, 595.2999877929688, 841.9000244140625) Point(0.0, 0.0) Matrix(0.0, 1.0, -1.0, 0.0, 841.9000244140625, 0.0)
After: [Rect(230.8800048828125, 74.02104187011719, 273.00970458984375, 88.66114807128906)]
`

test.pdf

PyMuPDF version

1.25.3

Operating system

MacOS

Python version

3.12

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions