Skip to content

Releases: pymupdf/PyMuPDF

First version to support MuPDF v1.19.*

17 Oct 10:46
Compare
Choose a tag to compare

Introduces major new features like PDF journalling and OCR support by directly invoking Tesseract-OCR.
In addition, it is possible to detect whether object are covered (hidden) by other objects.

As part of the new version, the following issues have resolved:
#1313, #1311, #1290, #1286, #1287, #1284.

Hotfix

16 Sep 22:04
Compare
Choose a tag to compare

Fixes #1266

Implement various fixes

16 Sep 16:07
Compare
Choose a tag to compare

Performance improvement for drawings extraction

24 Aug 10:30
Compare
Choose a tag to compare
improve test scripts

`show_pdf_page` and `insert_image` are now tested with rotated insertions.

Layout Preserving Text Extraction

08 Aug 06:31
Compare
Choose a tag to compare

The fitz module now supports text extraction via a new subcommand "gettext". Among a couple of modes, preserving the original layout can be chosen.

Also fixed #1187, #1184, #1154, #1152 and #1146.

Support of Small Capitals, assigning subset font name tags

10 Jul 22:47
Compare
Choose a tag to compare

Apart from some minor fixes, this release introduces support for small caps in TextWriter based text output.

In addition, method Document.subset_fonts() now prefixes subsetted font names with the 6 upper case letter prefix as prescribed by the PDF standard.

List of fixed issues:
#1088, #1081, #1078, #1085.

Fixes and minor improvements

02 Jun 11:01
Compare
Choose a tag to compare

The following habe been fixed:

  • #1043
  • #1053
  • undocumented occasional error calculating envelopping rectangle for paths in Page.get_drawings()
  • undocumented occasional loop in TextWriter.fill_textbox()
  • added method Font.char_lengths() which returns a tuple of all character widths for a given string. An improved version of Font.text_length()
  • greatly improved performance of Font.text_length()
  • added various ways to delete multiple PDF pages, among them are slices and the Python del statement
  • changed method Document.del_toc_item(): the item's title text will no longer be removed - instead the item is shown grayed-out to indicate its deletion.

Rewritten method `Page.insert_image`

05 May 12:43
Compare
Choose a tag to compare

Method Page.insert_image has been rewritten for improved performance in standard cases. Also introduced option to re-use pre-existing images in the file directly to provide another performance boost.
Other changes:

  • implemented or fixed #1042, #1041, #1037
  • minor improvements in PDF EmbeddedFiles handling for better support of building PDF collections apps.

New Image Transformation Matrix Available

10 Apr 12:24
Compare
Choose a tag to compare

Meta information for images embedded in document pages has been enriched by the so-called transformation matrix. It can be used to find out, what "happened" to the image rectangle to make it fit in its bbox on the page, like scaling and rotation.

Other changes are mostly minor bug fixes:
#990
#972

A new Page method get_image_info() is also available, which extracts image meta information from the page's TextPage - much like the corresponding Page.get_text("dict"), but without extracting any text or the image binary data themselves.

Minor bug fixes, improved Quad recovering for text extractions

22 Mar 12:29
Compare
Choose a tag to compare

Fixed: #941 #929 #927

  • included PDF trailer access in Document.xref_get_key()
  • added a number of functions for recovering text quads in "dict" / "rawdict" text extractions