Releases: pymupdf/PyMuPDF
First version to support MuPDF v1.19.*
Introduces major new features like PDF journalling and OCR support by directly invoking Tesseract-OCR.
In addition, it is possible to detect whether object are covered (hidden) by other objects.
As part of the new version, the following issues have resolved:
#1313, #1311, #1290, #1286, #1287, #1284.
Hotfix
Implement various fixes
Performance improvement for drawings extraction
improve test scripts `show_pdf_page` and `insert_image` are now tested with rotated insertions.
Layout Preserving Text Extraction
Support of Small Capitals, assigning subset font name tags
Apart from some minor fixes, this release introduces support for small caps in TextWriter
based text output.
In addition, method Document.subset_fonts()
now prefixes subsetted font names with the 6 upper case letter prefix as prescribed by the PDF standard.
Fixes and minor improvements
The following habe been fixed:
- #1043
- #1053
- undocumented occasional error calculating envelopping rectangle for paths in
Page.get_drawings()
- undocumented occasional loop in
TextWriter.fill_textbox()
- added method
Font.char_lengths()
which returns a tuple of all character widths for a given string. An improved version ofFont.text_length()
- greatly improved performance of
Font.text_length()
- added various ways to delete multiple PDF pages, among them are slices and the Python
del
statement - changed method
Document.del_toc_item()
: the item's title text will no longer be removed - instead the item is shown grayed-out to indicate its deletion.
Rewritten method `Page.insert_image`
Method Page.insert_image
has been rewritten for improved performance in standard cases. Also introduced option to re-use pre-existing images in the file directly to provide another performance boost.
Other changes:
New Image Transformation Matrix Available
Meta information for images embedded in document pages has been enriched by the so-called transformation matrix. It can be used to find out, what "happened" to the image rectangle to make it fit in its bbox on the page, like scaling and rotation.
Other changes are mostly minor bug fixes:
#990
#972
A new Page
method get_image_info()
is also available, which extracts image meta information from the page's TextPage
- much like the corresponding Page.get_text("dict")
, but without extracting any text or the image binary data themselves.