-
Notifications
You must be signed in to change notification settings - Fork 374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adjustments for upstream upgrades #253
Conversation
A fix is identified already and will lead to a new PyMuPDF version - hopefully even today. |
Thanks Train! ( hope you don't me calling you "Train" - seen you use it a few times, please let me know ) . It is possible that we will have fixed some of the issues you are seeing in an imminent release of PyMuPDF. Adding Harald & Julian here as they should be aware of the behaviour change for |
@jamie-lemon "Train" sounds friendly for me, as it's a name used in my college time. 😁 Regarding this PR, once released PyMuPDF 1.23.16, I just need to delete the version restriction in requirements.txt, the rest should be OK. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just didn't understand one of the comments.
pdf2docx/common/Element.py
Outdated
factor (float, optional): Threshold of overlap ratio, the larger it is, the higher | ||
probability the two bbox-es are aligned. | ||
text_direction (bool, optional): Consider text direction or not. | ||
True by default,from left to right if False. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand what this means "True by default, from left to right if False".
So if the text_direction
is True
then we do consider text direction - okay, but if it is False
then we don't consider text directions, however when I read this it seems like False
means we consider a left to right text direction as it says "from left to right if False". I'm confused!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't read any contradictions between your thoughts -> don't consider text directions -> ignore real text directions -> use default text directions -> the most common case, horizontal, i.e., from left to right.
Appreciated if you help a precise wording.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per my understanding, don't consider text directions, means to use default text direction, which is from left to right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose the bit I don't understand then is if True
then what is the text direction? Basically:
False
= from left to right
True
= ?
Also if we are False
then we do consider the text direction don't we (left to right)? Which is why text_direction (bool, optional): Consider text direction or not.
confuses me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay - yes please just delete to avoid the confusion.
pdf2docx/common/Element.py
Outdated
factor (float, optional): threshold of overlap ratio, the larger it is, the higher | ||
probability the two bbox-es are aligned. | ||
text_direction (bool, optional): consider text direction or not. | ||
True by default, from left to right if False. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See previous comment.
FYI, PyMuPDF-1.23.16 has just been released, and fixes the Pixmap issue pymupdf/PyMuPDF#3058. The fitz.Rect problem was introduced in 1.23.9 and fixed in 1.23.13 and later. Hope that helps. |
Some adjustments to handle the side effects of upgrading
PyMuPDF
from 1.23.8 to 1.23.14+, e.g.,fitz.Rect()
-> adjusted frompdf2docx
side. 所有文字丢失:Ignore Line xx due to overlap. #250fitz.Pixmap()
-> downgradepymupdf
for now. Pixmap created from CMYK JPEG delivers RGB format pymupdf/PyMuPDF#3058And some other tweaks, e.g., remove Github Action for publishing docs to Github page.