Releases · pymupdf/PyMuPDF

17 Oct 10:46

1.19.0

884f381

Introduces major new features like PDF journalling and OCR support by directly invoking Tesseract-OCR.
In addition, it is possible to detect whether object are covered (hidden) by other objects.

As part of the new version, the following issues have resolved:
#1313, #1311, #1290, #1286, #1287, #1284.

Assets 2

42 Join discussion

16 Sep 22:04

JorjMcKie

1.18.19

d206ef7

Hotfix

Fixes #1266

Assets 2

16 Sep 16:07

JorjMcKie

1.18.18

b32abc7

Implement various fixes

This version fixes #1257, #1252, #1244, #1241, #1234, #1236, #1227.

Assets 2

24 Aug 10:30

JorjMcKie

1.18.17

0c839a2

Performance improvement for drawings extraction

improve test scripts

`show_pdf_page` and `insert_image` are now tested with rotated insertions.

Assets 2

08 Aug 06:31

JorjMcKie

1.18.16

be074d5

Layout Preserving Text Extraction

The fitz module now supports text extraction via a new subcommand "gettext". Among a couple of modes, preserving the original layout can be chosen.

Also fixed #1187, #1184, #1154, #1152 and #1146.

Assets 2

10 Jul 22:47

JorjMcKie

1.18.15

d7f55b3

Support of Small Capitals, assigning subset font name tags

Apart from some minor fixes, this release introduces support for small caps in TextWriter based text output.

In addition, method Document.subset_fonts() now prefixes subsetted font names with the 6 upper case letter prefix as prescribed by the PDF standard.

List of fixed issues:
#1088, #1081, #1078, #1085.

Assets 2

02 Jun 11:01

JorjMcKie

1.18.14

c9a17d7

Fixes and minor improvements

The following habe been fixed:

#1043
#1053
undocumented occasional error calculating envelopping rectangle for paths in Page.get_drawings()
undocumented occasional loop in TextWriter.fill_textbox()
added method Font.char_lengths() which returns a tuple of all character widths for a given string. An improved version of Font.text_length()
greatly improved performance of Font.text_length()
added various ways to delete multiple PDF pages, among them are slices and the Python del statement
changed method Document.del_toc_item(): the item's title text will no longer be removed - instead the item is shown grayed-out to indicate its deletion.

Assets 18

05 May 12:43

JorjMcKie

1.18.13

ce06352

Rewritten method `Page.insert_image`

Method Page.insert_image has been rewritten for improved performance in standard cases. Also introduced option to re-use pre-existing images in the file directly to provide another performance boost.
Other changes:

implemented or fixed #1042, #1041, #1037
minor improvements in PDF EmbeddedFiles handling for better support of building PDF collections apps.

Assets 22

10 Apr 12:24

JorjMcKie

1.18.11

8537b18

New Image Transformation Matrix Available

Meta information for images embedded in document pages has been enriched by the so-called transformation matrix. It can be used to find out, what "happened" to the image rectangle to make it fit in its bbox on the page, like scaling and rotation.

Other changes are mostly minor bug fixes:
#990
#972

A new Page method get_image_info() is also available, which extracts image meta information from the page's TextPage - much like the corresponding Page.get_text("dict"), but without extracting any text or the image binary data themselves.

Assets 18

22 Mar 12:29

JorjMcKie

1.18.10

f9740b5

Minor bug fixes, improved Quad recovering for text extractions

Fixed: #941 #929 #927

included PDF trailer access in Document.xref_get_key()
added a number of functions for recovering text quads in "dict" / "rawdict" text extractions

Assets 18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: pymupdf/PyMuPDF

First version to support MuPDF v1.19.*

Hotfix

Implement various fixes

Performance improvement for drawings extraction

Layout Preserving Text Extraction

Support of Small Capitals, assigning subset font name tags

Fixes and minor improvements

Rewritten method `Page.insert_image`

New Image Transformation Matrix Available

Minor bug fixes, improved Quad recovering for text extractions