-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mupdf 1.19.0 builds #1313
Comments
👍😎 |
Sounds great, thanks! |
I'm interested in testing as well; running |
@caerulescens - Python version? |
I can test this as well with respect to #1311. Python 3.8 Linux as well. Thanks! |
@JorjMcKie Python 3.8; thank you. I'm set up to build any version from source. |
Guys, I am trying to compile on Linux. Python 3.8.12 (default, Sep 10 2021, 00:16:05)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.27.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import fitz
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-1-5ebe19763ce1> in <module>
----> 1 import fitz
/usr/local/lib/python3.8/dist-packages/fitz/__init__.py in <module>
8 # ------------------------------------------------------------------------
9 import sys
---> 10 from fitz.fitz import *
11
12 # define the supported colorspaces for convenience
/usr/local/lib/python3.8/dist-packages/fitz/fitz.py in <module>
13 # Import the low-level C/C++ module
14 if __package__ or "." in __name__:
---> 15 from . import _fitz
16 else:
17 import _fitz
ImportError: /usr/local/lib/python3.8/dist-packages/fitz/_fitz.cpython-38-x86_64-linux-gnu.so: undefined symbol:
_ZTVN10__cxxabiv117__class_type_infoE
In [2]: Does this ring any bells? |
I think this might be the solution: https://stackoverflow.com/questions/47890484/undefined-symbol-during-dlopen |
This seems to allow you tell what compiler to use: https://shwina.github.io/custom-compiler-linker-extensions/ Alternatively setting the "CC" environment might also work: https://stackoverflow.com/questions/16737260/how-to-tell-distutils-to-use-gcc I am not sure if it's in general as just using g++ instead of gcc, but worth a shot I guess. |
Thanks @MerlijnWajer - it was simple enough: just inserted the argument |
For those who want to do preliminary Linux Python3.8 tests of pymupdf 1.19.0: There are a lot of changes to "geometry" objects Rect / IRect. Also a new feature "journalling" of PDF changes. I started documenting them, which is still incomplete. |
If you want to execute tests using |
The build works well for me, I did not have to change any code. I tested https://github.com/internetarchive/archive-pdf-tools which makes heavy use of PyMuPDF (I think), and the results for 1.18 and 1.19 are the same:
With the added bonus that the new MuPDF ships (finally) with proper JBIG2 support, which significantly increases compression ratios:
https://archive.org/~merlijn/pymupdf1.19/ Many thanks! |
Thank you very much for your effort and feedback! |
@JorjMcKie I'm just now getting around to testing some specific errors; the link to the wheels build is broken I think. How should I go about getting the release? |
Try this one, it's newer anyway: https://github.com/JorjMcKie/py-mupdf/actions/runs/1339456713 |
Thanks! Will do. |
All of the below are fixed with upgrading to v1.19.0:
In addition to that, an image redact annotation glitch that has been noticed in the past has been fixed. For a document containing an image mask, if the image mask is redacted, then the color is inverted, swapping white and black. I reproduced the bug and verified the fix with a document that I cannot post here. See info and screenshots below,
|
@JorjMcKie When do you expect v1.19.0 to be released to PyPI? |
@caerulescens - That looks great! Thank you very much for that thorough analysis!
I am hoping to do this over this weekend. |
That's great; thank you! |
The new version 1.19.0 has just been uploaded to PyPI for Windows and Linux on desktop systems. Linux ARM and Mac OSX wheels are currently being generated and should be available too in the next few minutes. |
Since you've been testing: Do you have examples for the new OCR feature? (I'm the Fedora packager for mupdf and need to make sure mupdf has what PyMuPDF needs.) |
Hey, thanks for your work and quick updates ! |
@saetlan - can you let me have the doc example please? Looks more like that words is now shifted to the left by 213.8 ... |
@mjg - I will shortly publish a subversion: turns out that the OCR resolution for document pages is just too unsatisfactory: 72 dpi. What really works already well with OCR, is making an OCR-ed PDF page from images (pixmaps). The API however will remain unchanged and goes like this: # take a pixmap of some arbitrary image
# may include a document page
# then
pix.pdfocr_save("ocr-ed.pdf", language="eng", compress=True) The result is a 1-page PDF showing the image and having an OCR textlayer. The quality is comparable to ocrmypdf and depends on the image itself. Version 1.19.1 will improve the fllowing: page = doc.load_page(i) # some page of a document to be OCR-ed
tp = page.get_textpage_ocr(language="eng", dpi=72) # adjust resolution as desired
# now all text extraction and text search methods will work by reusing that textpage:
rectangles = page.search_for("needle", textpage=tp)
text = page.get_text("text", textpage=tp)
# etc. The time consumer is the textpage. This is where the OCR happens. A page full of text and choosing 300 dpi may need 2 seconds to execute, which again compares well with other approaches / packages. OCR only works if an environment variable has been set that names the Setting this must happen outside / before Python scripts can run. |
Never mind - found the error. |
No particular rush, but it would be great to get (test) builds of MuPDF 1.19.0. I was going to ask for a build of the latest RC (to search for potential issues), but I see they just released 1.19.0.
I'm sure this is already in planning, but please let me know if you need any testers. I'd be happy to test.
Thanks!
The text was updated successfully, but these errors were encountered: