diff --git a/PKG-INFO b/PKG-INFO index 36a1c97f5..a7ec5cfc6 100644 --- a/PKG-INFO +++ b/PKG-INFO @@ -1,6 +1,6 @@ Metadata-Version: 1.1 Name: PyMuPDF -Version: 1.18.2 +Version: 1.18.3 Author: Jorj McKie Author-email: jorj.x.mckie@outlook.de Maintainer: Jorj McKie @@ -9,7 +9,7 @@ Home-page: https://github.com/pymupdf/PyMuPDF Download-url: https://github.com/pymupdf/PyMuPDF Summary: PyMuPDF is a Python binding for the PDF rendering library MuPDF Description: - Release date: October 7, 2020 + Release date: November 9, 2020 Authors ======= @@ -20,7 +20,7 @@ Description: Introduction ============ - This is **version 1.18.2 of PyMuPDF**, a Python binding for `MuPDF `_ - "a lightweight PDF and XPS viewer". + This is **version 1.18.3 of PyMuPDF**, a Python binding for `MuPDF `_ - "a lightweight PDF and XPS viewer". MuPDF can access files in PDF, XPS, OpenXPS, epub, comic and fiction book formats, and it is known for both, its top performance and high rendering quality. diff --git a/README.md b/README.md index 0b43cf2cf..58a8fbbb1 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,13 @@ -# PyMuPDF 1.18.2 +# PyMuPDF 1.18.3 ![logo](https://github.com/pymupdf/PyMuPDF/blob/master/demo/pymupdf.jpg) -Release date: October 27, 2020 +Release date: November 9, 2020 **Travis-CI:** [![Build Status](https://travis-ci.org/JorjMcKie/py-mupdf.svg?branch=master)](https://travis-ci.org/JorjMcKie/py-mupdf) -On **[PyPI](https://pypi.org/project/PyMuPDF)** since August 2016, [![](https://pepy.tech/badge/pymupdf)](https://pepy.tech/project/pymupdf) + +On **[PyPI](https://pypi.org/project/PyMuPDF)** since August 2016: [![](https://pepy.tech/badge/pymupdf)](https://pepy.tech/project/pymupdf) # Authors * [Jorj X. McKie](mailto:jorj.x.mckie@outlook.de) @@ -14,7 +15,7 @@ On **[PyPI](https://pypi.org/project/PyMuPDF)** since August 2016, [![](https:// # Introduction -This is **version 1.18.2 of PyMuPDF**, a Python binding with support for [MuPDF 1.18.*](http://mupdf.com/) - "a lightweight PDF, XPS, and E-book viewer". +This is **version 1.18.3 of PyMuPDF**, a Python binding with support for [MuPDF 1.18.*](http://mupdf.com/) - "a lightweight PDF, XPS, and E-book viewer". MuPDF can access files in PDF, XPS, OpenXPS, CBZ, EPUB and FB2 (e-books) formats, and it is known for its top performance and high rendering quality. @@ -30,7 +31,7 @@ For all supported document types (i.e. **_including images_**) you can * extract text and images * convert to other formats: PDF, (X)HTML, XML, JSON, text -> To some degree, PyMuPDF can therefore be used as an [image converter](https://github.com/pymupdf/PyMuPDF/wiki/How-to-Convert-Images): it can read a range of input formats, including SVG, and can produce **Portable Network Graphics (PNG)**, **Portable Anymaps** (**PNM**, etc.), **Portable Arbitrary Maps (PAM)**, **Scalable Vector Graphics (SVG)**, **Adobe Postscript** and **Adobe Photoshop** documents, making the use of other graphics packages obselete in these cases. But interfacing with e.g. PIL/Pillow for image input and output is easy as well. +> To some degree, PyMuPDF can therefore be used as an [image converter](https://github.com/pymupdf/PyMuPDF/wiki/How-to-Convert-Images): it can read a range of input formats and can produce **Portable Network Graphics (PNG)**, **Portable Anymaps** (**PNM**, etc.), **Portable Arbitrary Maps (PAM)**, **Adobe Postscript** and **Adobe Photoshop** documents, making the use of other graphics packages obselete in these cases. But interfacing with e.g. PIL/Pillow for image input and output is easy as well. For **PDF documents,** there exists a plethorea of additional features: they can be created, joined or split up. Pages can be inserted, deleted, re-arranged or modified in many ways (including annotations and form fields). diff --git a/docs/annot.rst b/docs/annot.rst index b9aa281fa..ccc19831a 100644 --- a/docs/annot.rst +++ b/docs/annot.rst @@ -15,24 +15,26 @@ There is a parent-child relationship between an annotation and its page. If the **Attribute** **Short Description** =============================== ============================================================== :meth:`Annot.blendMode` return the annotation's blend mode -:meth:`Annot.setBlendMode` set the annotation's blend mode :meth:`Annot.delete_responses` delete all responding annotions :meth:`Annot.fileGet` return attached file content -:meth:`Annot.soundGet` return the sound of an audio annotation :meth:`Annot.fileInfo` return attached file information :meth:`Annot.fileUpd` set attached file new content +:meth:`Annot.getOC` return xref of an optional content group :meth:`Annot.getPixmap` image of the annotation as a pixmap :meth:`Annot.getText` extract annotation text :meth:`Annot.getTextbox` extract annotation text +:meth:`Annot.setBlendMode` set the annotation's blend mode :meth:`Annot.setBorder` change the border :meth:`Annot.setColors` change the colors :meth:`Annot.setFlags` change the flags :meth:`Annot.setInfo` change various properties :meth:`Annot.setLineEnds` set line ending styles :meth:`Annot.setName` change the "Name" field (e.g. icon name) +:meth:`Annot.setOC` set visibility via an optional content group (OCG) :meth:`Annot.setOpacity` change transparency :meth:`Annot.setRect` change the rectangle :meth:`Annot.setRotation` change rotation +:meth:`Annot.soundGet` return the sound of an audio annotation :meth:`Annot.update` apply accumulated annot changes :attr:`Annot.border` border details :attr:`Annot.colors` border / background and fill colors @@ -141,6 +143,18 @@ There is a parent-child relationship between an annotation and its page. If the :arg int start: The symbol number for the first point. :arg int end: The symbol number for the last point. + .. method:: setOC(xref) + + Set the annotation's visibility using optional content groups. This visibility can be controlled by user interfaces provided by supporting PDF viewers and is independent from other parameters like ::attr:`Annot.flags`. + + :arg int xref: :data:`xref` of an optional contents group (OCG). If zero, any previous enty will be removed. An exception occurs if the xref does not point to a valid PDF object. + + .. method:: getOC() + + Return the :data:`xref` of an optional content group, or zero if there is none. + + :returns: zero or the xref of an OCG (or OCMD). + .. method:: setOpacity(value) Set the annotation's transparency. Opacity can also be set in :meth:`Annot.update`. diff --git a/docs/app2.rst b/docs/app2.rst index fb03bb962..bffc704a7 100644 --- a/docs/app2.rst +++ b/docs/app2.rst @@ -268,6 +268,7 @@ preserve ligatures 1 1 1 1 1 1 1 1 preserve whitespace 1 1 1 1 1 1 1 1 preserve images n/a 1 1 n/a 1 1 n/a 0 inhibit spaces 0 0 0 0 0 0 0 0 +dehyphenate 0 0 0 0 0 0 0 0 =================== ==== ==== ===== === ==== ======= ===== ====== * **"json"** is handled exactly like **"dict"** and is hence left out. diff --git a/docs/changes.rst b/docs/changes.rst index 22dc659b7..2771ae9cf 100644 --- a/docs/changes.rst +++ b/docs/changes.rst @@ -1,9 +1,22 @@ Change Logs =============== +Changes in Version 1.18.3 +--------------------------- +As a major new feature, this version introduces support for PDF's **Optional Content** concept. + +* **Fixed** issue `#714 `_. +* **Fixed** issue `#711 `_. + +* **Fixed** issue `#707 `_: if a PDF user password is supplied but no owner password is supplied nor present, then the user password is also used as the owner password. +* **Fixed** ``expand`` and ``deflate`` parameters of methods :meth:`Document.save` and :meth:`Document.write`. Individual image and font compression should now finally work. Addresses issue `#713 `_. +* **Added** a support of PDF optional content. This includes several new :ref:`Document` methods for inquiring and setting optional content status and adding optional content configurations and groups. In addition, images, form XObjects and annotations now can be bound to optional content specifications. **Resolved** issue `#709 `_. + + + Changes in Version 1.18.2 --------------------------- -This version contains some interesting improvements for text searching: any number of search hits is now returned thanks to the removal of the **hit_max** parameter. The new **clip** parameter in addition allows to restrict the search area. Searching now detects hyphenations at line breaks and accordingly finds hyphenated words. +This version contains some interesting improvements for text searching: any number of search hits is now returned and the **hit_max** parameter was removed. The new **clip** parameter in addition allows to restrict the search area. Searching now detects hyphenations at line breaks and accordingly finds hyphenated words. * **Fixed** issue `#575 `_: if using ``quads=False`` in text searching, then overlapping rectangles on the same line are joined. Previously, parts of the search string, which belonged to different "marked content" items, each generated their own rectangle -- just as if occurring on separate lines. * **Added** :attr:`Document.isRepaired`, which is true if the PDF was repaired on open. diff --git a/docs/conf.py b/docs/conf.py index 96d3c13e0..5aa8003e5 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -40,7 +40,7 @@ # built documents. # # The full version, including alpha/beta/rc tags. -release = "1.18.2" +release = "1.18.3" # The short X.Y version version = release diff --git a/docs/device.rst b/docs/device.rst index 7c4ed3b0f..ee668a9f1 100644 --- a/docs/device.rst +++ b/docs/device.rst @@ -28,6 +28,3 @@ The different format handlers (pdf, xps, etc.) interpret pages to a "device". De :type textpage: :ref:`TextPage` :arg int flags: control the way how text is parsed into the text page. Currently 3 options can be coded into this parameter, see :ref:`TextPreserve`. To set these options use something like *flags=0 | TEXT_PRESERVE_LIGATURES | ...*. - -.. note:: In higher level code (:meth:`Page.getText`, :meth:`Document.getPageText`), the following decisions for creating text devices have been implemented: (1) *TEXT_PRESERVE_LIGATURES* and *TEXT_PRESERVE_WHITESPACES* are always set, (2) *TEXT_PRESERVE_IMAGES* is set for JSON and HTML, otherwise off. - diff --git a/docs/document.rst b/docs/document.rst index c3b150519..e0c971a69 100644 --- a/docs/document.rst +++ b/docs/document.rst @@ -23,6 +23,8 @@ For details on **embedded files** refer to Appendix 3. ======================================= ========================================================== **Method / Attribute** **Short Description** ======================================= ========================================================== +:meth:`Document.addLayerConfig` PDF only: make new optional content configuration +:meth:`Document.addOCG` PDF only: add new optional content group :meth:`Document.authenticate` gain access to an encrypted document :meth:`Document.can_save_incrementally` check if incremental save is possible :meth:`Document.chapterPageCount` number of pages in chapter @@ -41,17 +43,22 @@ For details on **embedded files** refer to Appendix 3. :meth:`Document.embeddedFileUpd` PDF only: change an embedded file :meth:`Document.findBookmark` retrieve page location after layouting :meth:`Document.fullcopyPage` PDF only: duplicate a page +:meth:`Document.getOCGs` PDF only: info on all optional content groups +:meth:`Document.getOCStates` PDF only: lists of OCGs in ON, OFF, RBGroups +:meth:`Document.setOCStates` PDF only: mass changing OCG states :meth:`Document.getPageFontList` PDF only: make a list of fonts on a page :meth:`Document.getPageImageList` PDF only: make a list of images on a page :meth:`Document.getPagePixmap` create a pixmap of a page by page number :meth:`Document.getPageText` extract the text of a page by page number :meth:`Document.getPageXObjectList` PDF only: make a list of XObjects on a page :meth:`Document.getSigFlags` PDF only: determine signature state -:meth:`Document.getToC` create a table of contents :meth:`Document.getTOC` alias of previous +:meth:`Document.getToC` create a table of contents :meth:`Document.getXmlMetadata` PDF only: read the XML metadata :meth:`Document.insertPage` PDF only: insert a new page :meth:`Document.insertPDF` PDF only: insert pages from another PDF +:meth:`Document.layerConfigs` PDF only: list of optional content configurations +:meth:`Document.layerUIConfigs` PDF only: list of optional content intents :meth:`Document.layout` re-paginate the document (if supported) :meth:`Document.loadPage` read a page :meth:`Document.makeBookmark` create a page pointer in reflowable documents @@ -73,10 +80,11 @@ For details on **embedded files** refer to Appendix 3. :meth:`Document.scrub` PDF only: remove sensitive data :meth:`Document.searchPageFor` search for a string on a page :meth:`Document.select` PDF only: select a subset of pages +:meth:`Document.setLayerConfig` PDF only: activate optional content layer :meth:`Document.setMetadata` PDF only: set the metadata :meth:`Document.setTOC_item` PDF only: change a single TOC item -:meth:`Document.setToC` PDF only: set the table of contents (TOC) :meth:`Document.setTOC` PDF only: alias of previous +:meth:`Document.setToC` PDF only: set the table of contents (TOC) :meth:`Document.setXmlMetadata` PDF only: create or update document XML metadata :meth:`Document.updateObject` PDF only: replace object source :meth:`Document.updateStream` PDF only: replace stream source @@ -145,7 +153,7 @@ For details on **embedded files** refer to Appendix 3. :arg float fontsize: the default fontsize for reflowable document types. This parameter is ignored if none of the parameters *rect* or *width* and *height* are specified. Will be used to calculate the page layout. - Overview of possible forms (using the *open* synonym of *Document*):: + Overview of possible forms (*open* is a synonym of *Document*):: >>> # from a file >>> doc = fitz.open("some.pdf") @@ -174,6 +182,166 @@ For details on **embedded files** refer to Appendix 3. True >>> + .. method:: layerConfigs() + + *(New in v1.18.3)* + + Show optional layer configurations. There always is a standard one, which is not included in the response. + + >>> for item in doc.layerConfigs: print(item) + {'number': 0, 'name': 'my-config', 'creator': ''} + >>> # use 'number' as config identifyer in addOCG + + .. method:: addLayerConfig(name, creator=None, on=None) + + *(New in v1.18.3)* + + Add an optional content configuration. Layers serve as a collection of ON / OFF states for optional content groups. They allow fast visibility switches between different views on the same document. + + :arg str name: arbitrary name. + :arg str creator: creating software. + :arg sequ on: a sequence of OCG :data:`xref` numbers which should be set to ON (visible). All other OCGs will be set to OFF. + + + .. method:: setLayerConfig(number, as_default=False) + + *(New in v1.18.3)* + + Switch to a document view as defined by the optional layer's configuration number. This is temporary, except if established as default. + + :arg int number: config number as returned by :meth:`Document.layerConfigs`. + :arg bool as_default: make this the default configuration. + + Activates the ON / OFF states of OCGs as defined in this layer. If *as_default=True*, then additionally all layers, including the standard one, are merged and the result is written back to the standard layer, and **all optional layers are deleted**. + + + .. method:: addOCG(name, config=-1, on=True, intent="View", usage="Artwork") + + *(New in v1.18.3)* + + Add an optional content group. An OCG is the most important unit of information to determine object visibility. For a PDF, in order to be regarded as having optional content, at least one OCG must exist. + + :arg str name: arbitrary name. Will show up in supporting PDF viewers. + :arg int config: layer configuration number. Default -1 is the standard configuration. + :arg bool on: standard visibility status for objects pointing to this OCG. + :arg str,list intent: a string or list of strings declaring the visibility intents. There are two PDF standard values to choose from: "View" and "Design". Default is "View". Correct **spelling is important**. + :arg str usage: another influencer for OCG visibility. This will become part of the OCG's ``/Usage`` key. There are two PDF standard values to choose from: "Artwork" and "Technical". Default is "Artwork". Please only change when required. + + :returns: :data:`xref` of the created OCG. Use as entry for ``oc`` parameter in supporting objects. + + .. note:: Multiple OCGs with identical parameters may be created. This will not cause problems. Garbage option 3 of :meth:`Document.save` will get rid of any duplicates. + + .. method:: getOCStates() + + *(New in v1.18.3)* + + List of optional content groups by status. This is a dictionary with lists of cross reference numbers for OCGs that are ON, OFF or in some radio button group (``/RBGroups``). + + >>> pprint(doc.getOCStates()) + {'off': [8, 9, 10], 'on': [5, 6, 7], 'rbgroups': [[7, 10]]} + >>> + + .. method:: setOCStates(config, on=None, off=None, basestate=None, rbgroups=None) + + *(New in v1.18.3)* + + Mass changes of optional content groups. **Permanently** sets the status of OCGs. + + :arg int config: desired configuration layer, choose -1 for the default one. + :arg list on: list of :data:`xref` of OCGs to set ON. Replaces previous values. An empty list will cause no OCG being set to ON anymore. Should be specified if ``basestate="ON"`` is used. + :arg list off: list of :data:`xref` of OCGs to set OFF. Replaces previous values. An empty list will cause no OCG being set to OFF anymore. Should be specified if ``basestate="OFF"`` is used. + :arg str basestate: desired state of OCGs that are not mentioned in *on* resp. *off*. Possible values are "ON", "OFF" or "Unchanged". Upper / lower case possible. + :arg list rbgroups: a list of lists. Repleaces previous values. Each sublist should contain two or more OCG xrefs. OCGs in the same sublist are handled like grouped radio buttons: setting one to ON automatically sets all other group members to OFF. + + >>> doc.setOCStates(-1, basestate="OFF") + >>> pprint(doc.getOCStates()) + {'basestate': 'OFF', 'off': [8, 9, 10], 'on': [5, 6, 7], 'rbgroups': [[7, 10]]} + + + .. method:: getOCGs() + + *(New in v1.18.3)* + + Details of all optional content groups. This is a dictionary of dictionaries like this (key is the OCG's :data:`xref`): + + >>> pprint(doc.getOCGs()) + {13: {'on': True, + 'intent': ['View', 'Design'], + 'name': 'Circle', + 'usage': 'Artwork'}, + 14: {'on': True, + 'intent': ['View', 'Design'], + 'name': 'Square', + 'usage': 'Artwork'}, + 15: {'on': False, 'intent': ['View'], 'name': 'Square', 'usage': 'Artwork'}} + >>> + + .. method:: layerUIConfigs() + + *(New in v1.18.3)* + + Show the visibility status of optional content that is modifyable by the user interface of supporting PDF viewers. Example: + + >>> pprint(doc.layerUIConfigs()) + ({'depth': 0, + 'locked': False, + 'number': 0, + 'on': True, + 'text': 'Circle', + 'type': 'checkbox'}, + {'depth': 0, + 'locked': False, + 'number': 1, + 'on': False, + 'text': 'Square', + 'type': 'checkbox'}) + >>> # refers to OCGs named "Circle" (ON), resp. "Square" (OFF) + + .. note:: + + * Only reports items contained in the currently selected layer configuration. + + * The meaning of the dictionary keys is as follows: + - *depth:* item's nesting level in the `/Order` array + - *locked:* whether changing the item's state is prohibited + - *number:* running sequence number + - *on:* item state + - *text:* text string or name field of the originating OCG + - *type:* one of "label" (set by a text string), "checkbox" (set by a single OCG) or "radiobox" (set by a set of connected OCGs) + + .. method:: setLayerUIConfig(number, action=0) + + *(New in v1.18.3)* + + Modify OC visibility status of content groups, This is analog to what supporting PDF viewers would offer. + + .. note:: + Visibility is **not** a property of an OCG -- and the current visibility not even information necessarily present in the PDF document. Using this method, the user can **temporarily** modify visibility just as if doing so via a supporting PDF consumer software. + + To make permanent changes, follow the recommendation mentioned in :meth:`Document.addLayerConfig`. + + :arg in number: number as returned by :meth:`Document.layerUIConfigs`. + :arg int action: 0 = set on (default), 1 = toggle on/off, 2 = set off. + + Example: + + >>> # let's make above "Square" visible: + >>> doc.setLayerUIConfig(1, action=0) + >>> pprint(doc.layerUIConfigs()) + ({'depth': 0, + 'locked': False, + 'number': 0, + 'on': True, + 'text': 'Circle', + 'type': 'checkbox'}, + {'depth': 0, + 'locked': False, + 'number': 1, + 'on': True, # <=== + 'text': 'Square', + 'type': 'checkbox'}) + >>> + .. method:: authenticate(password) Decrypts the document with the string *password*. If successful, document data can be accessed. For PDF documents, the "owner" and the "user" have different priviledges, and hence different passwords may exist for these authorization levels. The method will automatically establish the appropriate access rights for the provided password. @@ -626,7 +794,9 @@ For details on **embedded files** refer to Appendix 3. :arg bool xml_metadata: Remove XML metadata. - .. method:: save(outfile, garbage=0, clean=False, deflate=False, incremental=False, ascii=False, expand=0, linear=False, pretty=False, encryption=PDF_ENCRYPT_NONE, permissions=-1, owner_pw=None, user_pw=None) + .. method:: save(outfile, garbage=0, clean=False, deflate=False, deflate_images=False, deflate_fonts=False, incremental=False, ascii=False, expand=0, linear=False, pretty=False, encryption=PDF_ENCRYPT_NONE, permissions=-1, owner_pw=None, user_pw=None) + + *(Changed in v1.18.3)* PDF only: Saves the document in its **current state**. @@ -635,14 +805,16 @@ For details on **embedded files** refer to Appendix 3. :arg int garbage: Do garbage collection. Positive values exclude "incremental". * 0 = none - * 1 = remove unused objects - * 2 = in addition to 1, compact the :data:`xref` table - * 3 = in addition to 2, merge duplicate objects - * 4 = in addition to 3, check :data:`stream` objects for duplication. This may be slow because such data are typically large and in addition may require (de-) compression before such comparisons can be made. + * 1 = remove unused (unreferenced) objects. + * 2 = in addition to 1, compact the :data:`xref` table. + * 3 = in addition to 2, merge duplicate objects. + * 4 = in addition to 3, check :data:`stream` objects for duplication. This may be slow because such data are typically large. :arg bool clean: Clean and sanitize content streams [#f1]_. Corresponds to "mutool clean -sc". :arg bool deflate: Deflate (compress) uncompressed streams. + :arg bool deflate_images: *(new in v1.18.3)* Deflate (compress) uncompressed image streams [#f4]_. + :arg bool deflate_fonts: *(new in v1.18.3)* Deflate (compress) uncompressed fontfile streams [#f4]_. :arg bool incremental: Only save changed objects. Excludes "garbage" and "linear". Cannot be used for files that are decrypted or repaired and also in some other cases. To be sure, check :meth:`Document.can_save_incrementally`. If this is false, saving to a new file is required. @@ -663,7 +835,7 @@ For details on **embedded files** refer to Appendix 3. :arg int encryption: *(new in version 1.16.0)* set the desired encryption method. See :ref:`EncryptionMethods` for possible values. - :arg str owner_pw: *(new in version 1.16.0)* set the document's owner password. + :arg str owner_pw: *(new in version 1.16.0)* set the document's owner password. *(Changed in v1.18.3)* If not provided, the user password is taken if provided. :arg str user_pw: *(new in version 1.16.0)* set the document's user password. @@ -672,9 +844,11 @@ For details on **embedded files** refer to Appendix 3. PDF only: saves the document incrementally. This is a convenience abbreviation for *doc.save(doc.name, incremental=True, encryption=PDF_ENCRYPT_KEEP)*. - .. method:: write(garbage=0, clean=False, deflate=False, ascii=False, expand=0, pretty=False, encryption=PDF_ENCRYPT_NONE, permissions=-1, owner_pw=None, user_pw=None) + .. method:: write(garbage=0, clean=False, deflate=False, deflate_images=False, deflate_fonts=False, ascii=False, expand=0, pretty=False, encryption=PDF_ENCRYPT_NONE, permissions=-1, owner_pw=None, user_pw=None) + + *(Changed in v1.18.3)* - PDF only: Writes the **current content of the document** to a bytes object instead of to a file. Obviously, you should be wary about memory requirements. The meanings of the parameters exactly equal those in :meth:`save`. Chater :ref:`FAQ` contains an example for using this method as a pre-processor to `pdfrw `_. + PDF only: Writes the **current content of the document** to a bytes object instead of to a file. Obviously, you should be wary about memory requirements. The meanings of the parameters exactly equal those in :meth:`save`. Chapter :ref:`FAQ` contains an example for using this method as a pre-processor to `pdfrw `_. *(Changed in version 1.16.0)* for extended encryption support. @@ -1258,3 +1432,5 @@ Other Examples .. [#f2] However, you **can** use :meth:`Document.getToC` and :meth:`Page.getLinks` (which are available for all document types) and copy this information over to the output PDF. See demo `pdf-converter.py `_. .. [#f3] For applicable (EPUB) document types, loading a page via its absolute number may result in layouting a large part of the document, before the page can be accessed. To avoid this performance impact, prefer chapter-based access. Use convenience methods / attributes :meth:`Document.nextLocation`, :meth:`Document.previousLocation` and :attr:`Document.lastLocation` for maintaining a high level of coding efficiency. + +.. [#f4] These parameters cause separate handling of stream categories: use it together with ``expand`` to restrict decompression to streams other than images / fontfiles. diff --git a/docs/faq.rst b/docs/faq.rst index 437a566b9..e93f083f4 100644 --- a/docs/faq.rst +++ b/docs/faq.rst @@ -1790,9 +1790,7 @@ How to Deal with PDF Encryption Starting with version 1.16.0, PDF decryption and encryption (using passwords) are fully supported. You can do the following: * Check whether a document is password protected / (still) encrypted (:attr:`Document.needsPass`, :attr:`Document.isEncrypted`). - * Gain access authorization to a document (:meth:`Document.authenticate`). - * Set encryption details for PDF files using :meth:`Document.save` or :meth:`Document.write` and - decrypt or encrypt the content @@ -1939,13 +1937,13 @@ Since v1.16.0, there is the property :attr:`Page._isWrapped`, which lets you che If it is *False* or if you want to be on the safe side, pick one of the following: -1. The easiest way: in your script, do a :meth:`Page._cleanContents` before you do your first item insertion. +1. The easiest way: in your script, do a :meth:`Page.cleanContents` before you do your first item insertion. 2. Pre-process your PDF with the MuPDF command line utility *mutool clean -c ...* and work with its output file instead. 3. Directly wrap the page's :data:`contents` with the stacking commands before you do your first item insertion. **Solutions 1. and 2.** use the same technical basis and **do a lot more** than what is required in this context: they also clean up other inconsistencies or redundancies that may exist, multiple */Contents* objects will be concatenated into one, and much more. -.. note:: For **incremental saves,** solution 1. has an unpleasant implication: it will bloat the update delta, because it changes so many things and, in addition, stores the **cleaned contents uncompressed**. So, if you use :meth:`Page._cleanContents` you should consider **saving to a new file** with (at least) *garbage=3* and *deflate=True*. +.. note:: For **incremental saves,** solution 1. has an unpleasant implication: it will bloat the update delta, because it changes so many things and, in addition, stores the **cleaned contents uncompressed**. So, if you use :meth:`Page.cleanContents` you should consider **saving to a new file** with (at least) *garbage=3* and *deflate=True*. **Solution 3.** is completely under your control and only does the minimum corrective action. There exists a handy low-level utility function which you can use for this. Suggested procedure: @@ -2070,7 +2068,7 @@ Here are two ways of combining multiple contents of a page:: >>> # method 1: use the clean function >>> for i in range(len(doc)): - doc[i]._cleanContents() # cleans and combines multiple Contents + doc[i].cleanContents() # cleans and combines multiple Contents page = doc[i] # re-read the page (has only 1 contents now) cont = page._getContents()[0] # do something with the cleaned, combined contents @@ -2082,7 +2080,7 @@ Here are two ways of combining multiple contents of a page:: cont += doc.xrefStream(xref) # do something with the combined contents -The clean function :meth:`Page._cleanContents` does a lot more than just gluing :data:`contents` objects: it also corrects and optimizes the PDF operator syntax of the page and removes any inconsistencies. +The clean function :meth:`Page.cleanContents` does a lot more than just gluing :data:`contents` objects: it also corrects and optimizes the PDF operator syntax of the page and removes any inconsistencies. ---------------------------------- diff --git a/docs/page.rst b/docs/page.rst index d45d9ac73..fad4a5b06 100644 --- a/docs/page.rst +++ b/docs/page.rst @@ -772,8 +772,9 @@ In a nutshell, this is what you can do with PyMuPDF: pair: rotate; insertImage pair: stream; insertImage pair: mask; insertImage + pair: oc; insertImage - .. method:: insertImage(rect, filename=None, pixmap=None, stream=None, mask=None, rotate=0, keep_proportion=True, overlay=True) + .. method:: insertImage(rect, filename=None, pixmap=None, stream=None, mask=None, rotate=0, oc=0, keep_proportion=True, overlay=True) PDF only: Put an image inside the given rectangle. The image can be taken from a pixmap, a file or a memory area - of these parameters **exactly one** must be specified. @@ -798,6 +799,7 @@ In a nutshell, this is what you can do with PyMuPDF: :arg int rotate: *(new in version v1.14.11)* rotate the image. Must be an integer multiple of 90 degrees. If you need a rotation by an arbitrary angle, consider converting the image to a PDF (:meth:`Document.convertToPDF`) first and then use :meth:`Page.showPDFpage` instead. + :arg int oc: *(new in v1.18.3)* (:data:`xref`) make image visibility dependent on this OCG (optional content group). Please be aware, that this property is stored with the generated PDF image definition. If you insert the same image anywhere else, but **with a different 'oc' value**, a full additional image copy will be stored. :arg bool keep_proportion: *(new in version v1.14.11)* maintain the aspect ratio of the image. For a description of *overlay* see :ref:`CommonParms`. @@ -898,9 +900,9 @@ In a nutshell, this is what you can do with PyMuPDF: Return the draw commands of the page. These are instructions which draw lines, rectangles or curves, including properties like colors, transparency, line width and dashing, etc. - :returns: a list of dictionaries. Each dictionary item contains one or more single draw commands which belong together: their lines are connected and they have the same properties (colors, dashing, etc.). This is called a **"path"** in the PDF specification, but this method works the same for all document types. + :returns: a list of dictionaries. Each dictionary item contains one or more single draw commands which belong together: their lines are connected and they have the same properties (colors, dashing, etc.). This is called a **"path"** in the PDF specification, but the method works the same for **all document types**. - The dictionary been designed to be compatible with the methods and terminology of class :ref:`Shape`. There are the following keys: + The path dictionary has been designed to be compatible with the methods and terminology of class :ref:`Shape`. There are the following keys: ============== ========================================================================= Key Value @@ -926,7 +928,12 @@ In a nutshell, this is what you can do with PyMuPDF: Using class :ref:`Shape`, you should be able to recreate the original drawings on a separate (PDF) page with high fidelity. A coding draft can be found in section "Extractings Drawings" of chapter :ref:`FAQ`. - Please note that this is a brandnew function, so there exists a higher probability for bugs. + The following limitations exist by design: + + * The visual appearance of a page may have been designed in a very complex way. For example in PDF, layers (Optional Content Groups) can control the visibility of any item (drawings and other objects) depending on whatever condition: a watermark may be supressed if the page is shown by a viewer, but is visible if printed on paper. + * Only drawings are extracted, other page content is ignored. The method therefore does not detect whether a drawing is covered, hidden or overlaid in the original document, e.g. by some text or an image. + + Effects like these are ignored by the method -- it will return all paths unconditionally. .. method:: getFontList(full=False) @@ -1061,7 +1068,7 @@ In a nutshell, this is what you can do with PyMuPDF: pair: overlay; showPDFpage pair: rotate; showPDFpage - .. method:: showPDFpage(rect, docsrc, pno=0, keep_proportion=True, overlay=True, rotate=0, clip=None) + .. method:: showPDFpage(rect, docsrc, pno=0, keep_proportion=True, overlay=True, oc=0, rotate=0, clip=None) PDF only: Display a page of another PDF as a **vector image** (otherwise similar to :meth:`Page.insertImage`). This is a multi-purpose method. For example, you can use it to @@ -1086,6 +1093,7 @@ In a nutshell, this is what you can do with PyMuPDF: :arg bool overlay: put image in foreground (default) or background. + :arg int oc: *(new in v1.18.3)* (:data:`xref`) make visibility dependent on this OCG (optional content group). :arg float rotate: *(new in version 1.14.10)* show the source rectangle rotated by some angle. *Changed in version 1.14.11:* Any angle is now supported. :arg rect_like clip: choose which part of the source page to show. Default is the full page, else must be finite and its intersection with the source page must not be empty. @@ -1131,22 +1139,20 @@ In a nutshell, this is what you can do with PyMuPDF: Search for *needle* on a page. Wrapper for :meth:`TextPage.search`. :arg str needle: Text to search for. Upper / lower case is ignored. The string may contain spaces. - :arg rect_like clip: *(New in v1.18.2)* only search in this area. - :arg bool quads: Return :ref:`Quad` instead of :ref:`Rect` objects. + :arg rect_like clip: *(New in v1.18.2)* only search within this area. + :arg bool quads: Return object type :ref:`Quad` instead of :ref:`Rect`. :arg int flags: Control the data extracted by the underlying :ref:`TextPage`. By default ligatures are expanded, white space is replaced with spaces and hyphenation is detected. :rtype: list :returns: - A list of :ref:`Rect` or :ref:`Quad` objects each of which -- **normally!** -- surrounds one occurrence of *needle*. **However:** if parts of *needle* occur on more than one line, then a separate item is geberated for each part of the string per line. So, if ``needle = "search string"``, two rectangles may be generated. + A list of :ref:`Rect` or :ref:`Quad` objects each of which -- **normally!** -- surrounds one occurrence of *needle*. **However:** if parts of *needle* occur on more than one line, then a separate item is generated for each part of the string per line. So, if ``needle = "search string"``, two rectangles may be generated. **Changes in v1.18.2:** * There no longer is a limit on the list length (removal of the ``hit_max`` parameter). - * If a word is **hyphenated** at a line break, it will still be found. E.g. the word "method" will be found even if hyphenated as "meth-" / "od" by a line break, and two rectangles will be returned: one surrounding "meth" (without the hyphen) and another one surrounding "od". - - + * If a word is **hyphenated** at a line break, it will still be found. E.g. the word "method" will be found even if hyphenated as "meth-od" by a line break, and two rectangles will be returned: one surrounding "meth" (without the hyphen) and another one surrounding "od". .. note:: The method supports multi-line text marker annotations: you can use the full returned list as **one** parameter for creating the annotation. diff --git a/docs/textpage.rst b/docs/textpage.rst index cbd680226..78dcf464c 100644 --- a/docs/textpage.rst +++ b/docs/textpage.rst @@ -111,13 +111,14 @@ For a description of what this class is all about, see Appendix 2. :arg str needle: the string to search for. Upper and lower cases will all match. :arg bool quads: return quadrilaterals instead of rectangles. :rtype: list - :returns: a list of :ref:`Rect` or :ref:`Quad` objects, each surrounding a found *needle* occurrence. The search string may contain spaces, it may therefore happen, that its parts are located on different lines. In this case, more than one rectangle (resp. quadrilateral) are returned. **(Changed in v1.18.2)** The method **now supports hyphenation**, so it will find "method" even if it was hyphenated in two parts "meth-" and "od" across two lines. The rectangles will exclude the hyphen in this case. + :returns: a list of :ref:`Rect` or :ref:`Quad` objects, each surrounding a found *needle* occurrence. The search string may contain spaces, it may therefore happen, that its parts are located on different lines. In this case, more than one rectangle (resp. quadrilateral) are returned. **(Changed in v1.18.2)** The method **now supports dehyphenation**, so it will find "method" even if it was hyphenated in two parts "meth-" and "od" across two lines. The two returned rectangles will **exclude the hyphen** in this case. .. note:: **Overview of changes in v1.18.2:** 1. The ``hit_max`` parameter has been removed: all hits are always returned. 2. The ``rect`` parameter of the :ref:`TextPage` is now respected: only text inside this area is examined. Only characters with fully contained bboxes are considered. 3. Words hyphenated at the end of a line are now found. + 4. Overlapping **rectangles** in the same line are now automatically joined. We assume that such separations are an artifact created by multiple marked content groups containing parts of the same search needle. Example Quad versus Rect: when searching for needle "pymupdf", then the corresponding entry will either be the blue rectangle, or, if *quads* was specified, *Quad(ul, ur, ll, lr)*. diff --git a/docs/tutorial.rst b/docs/tutorial.rst index f61141280..a4699aab4 100644 --- a/docs/tutorial.rst +++ b/docs/tutorial.rst @@ -318,19 +318,19 @@ You can write changes back to the **original PDF** by specifying option *increme garbage=1 g garbage collect unused objects garbage=2 gg in addition to 1, compact :data:`xref` tables garbage=3 ggg in addition to 2, merge duplicate objects -garbage=4 gggg in addition to 3, skip duplicate streams -clean=1 cs clean and sanitize content streams -deflate=1 z deflate uncompressed streams -ascii=1 a convert binary data to ASCII format -linear=1 l create a linearized version -expand=1 i decompress images -expand=2 f decompress fonts -expand=255 d decompress all +garbage=4 gggg in addition to 3, merge duplicate stream content +clean=True cs clean and sanitize content streams +deflate=True z deflate uncompressed streams +deflate_images=True i deflate image streams +deflate_fonts=True f deflate fontfile streams +ascii=True a convert binary data to ASCII format +linear=True l create a linearized version +expand=True d decompress all streams =================== =========== ================================================== .. note:: For an explanation of terms like *object, stream, xref* consult the :ref:`Glossary` chapter. -For example, *mutool clean -ggggz file.pdf* yields excellent compression results. It corresponds to *doc.save(filename, garbage=4, deflate=1)*. +For example, *mutool clean -ggggz file.pdf* yields excellent compression results. It corresponds to *doc.save(filename, garbage=4, deflate=True)*. Closing ========= diff --git a/docs/vars.rst b/docs/vars.rst index 8b1ed8928..36554a374 100644 --- a/docs/vars.rst +++ b/docs/vars.rst @@ -170,6 +170,14 @@ Options controlling the amount of data a text device parses into a :ref:`TextPag 8 -- If set, we will not try to add missing space characters where there are large gaps between characters. +.. py:data:: TEXT_DEHYPHENATE + + 16 -- Ignore hyphens at line ends and join with next line. Used mainly with search function + +.. py:data:: TEXT_PRESERVE_SPANS + + 32 -- Generate a new line for every span. Not used in PyMuPDF. + .. _linkDest Kinds: @@ -295,12 +303,14 @@ These identifiers also cover **links** and **widgets**: the PDF specification te PDF_ANNOT_FILE_ATTACHMENT 17 PDF_ANNOT_SOUND 18 PDF_ANNOT_MOVIE 19 - PDF_ANNOT_WIDGET 20 # <=== Widget object in PyMuPDF - PDF_ANNOT_SCREEN 21 - PDF_ANNOT_PRINTER_MARK 22 - PDF_ANNOT_TRAP_NET 23 - PDF_ANNOT_WATERMARK 24 - PDF_ANNOT_3D 25 + PDF_ANNOT_RICH_MEDIA 20 + PDF_ANNOT_WIDGET 21 # <=== Widget object in PyMuPDF + PDF_ANNOT_SCREEN 22 + PDF_ANNOT_PRINTER_MARK 23 + PDF_ANNOT_TRAP_NET 24 + PDF_ANNOT_WATERMARK 25 + PDF_ANNOT_3D 26 + PDF_ANNOT_PROJECTION 27 PDF_ANNOT_UNKNOWN -1 .. _AnnotationFlags: diff --git a/docs/version.rst b/docs/version.rst index 6eeefc16f..7973c9340 100644 --- a/docs/version.rst +++ b/docs/version.rst @@ -1,6 +1,6 @@ Covered Version -------------------- -This documentation covers PyMuPDF v1.18.2 features as of **2020-10-23 09:17:55**. +This documentation covers PyMuPDF v1.18.3 features as of **2020-11-09 07:36:17**. .. note:: The major and minor versions of **PyMuPDF** and **MuPDF** will always be the same. Only the third qualifier (patch level) may deviate from that of MuPDF. \ No newline at end of file diff --git a/docs/widget.rst b/docs/widget.rst index 48dffe58a..e2e00ab4b 100644 --- a/docs/widget.rst +++ b/docs/widget.rst @@ -4,11 +4,11 @@ Widget ================ -This class represents a PDF Form field, also called "widget". Fields are a special case of annotations, which allow users with limited permissions to enter information in a PDF. This is primarily used for filling out forms. +This class represents a PDF Form field, also called a "widget". Throughout this documentation, we are using these terms synonymously. Fields technically are a special case of PDF annotations, which allow users with limited permissions to enter information in a PDF. This is primarily used for filling out forms. Like annotations, widgets live on PDF pages. Similar to annotations, the first widget on a page is accessible via :attr:`Page.firstWidget` and subsequent widgets can be accessed via the :attr:`Widget.next` property. -*(Changed in version 1.16.0)* MuPDF no longer treats widgets as a subset of general annotations. Consequently, :attr:`Page.firstAnnot` and :meth:`Annot.next` will deliver non-widget annotations exclusively, and be *None* if only form fields exist on a page. Vice versa, :attr:`Page.firstWidget` and :meth:`Widget.next` will only show widgets. This design decision is purely internal to MuPDF; technically, links, annotations and fields have a lot in common and also continue to share the better part of their code within (Py-) MuPDF. +*(Changed in version 1.16.0)* MuPDF no longer treats widgets as a subset of general annotations. Consequently, :attr:`Page.firstAnnot` and :meth:`Annot.next` will deliver **non-widget annotations exclusively**, and be *None* if only form fields exist on a page. Vice versa, :attr:`Page.firstWidget` and :meth:`Widget.next` will only show widgets. This design decision is purely internal to MuPDF; technically, links, annotations and fields have a lot in common and also continue to share the better part of their code within (Py-) MuPDF. **Class API** @@ -25,11 +25,11 @@ Like annotations, widgets live on PDF pages. Similar to annotations, the first w .. attribute:: next - Point to the next form field on the page. + Point to the next form field on the page. The last widget returns *None*. .. attribute:: border_color - A list of up to 4 floats defining the field's border. Default value is *None* which causes border style and border width to be ignored. + A list of up to 4 floats defining the field's border color. Default value is *None* which causes border style and border width to be ignored. .. attribute:: border_style @@ -45,7 +45,7 @@ Like annotations, widgets live on PDF pages. Similar to annotations, the first w .. attribute:: choice_values - Python sequence of strings defining the valid choices of list boxes and combo boxes. For these widgets, this property is mandatory and must contain at least two items. Ignored for other types. + Python sequence of strings defining the valid choices of list boxes and combo boxes. For these widget types, this property is mandatory and must contain at least two items. Ignored for other types. .. attribute:: field_name @@ -61,7 +61,7 @@ Like annotations, widgets live on PDF pages. Similar to annotations, the first w .. attribute:: field_flags - An integer defining a large amount of proprties of a field. Handle this attribute with care. + An integer defining a large amount of properties of a field. Be careful when changing this attribute as this may change the field type. .. attribute:: field_type @@ -81,7 +81,7 @@ Like annotations, widgets live on PDF pages. Similar to annotations, the first w .. attribute:: is_signed - A bool indicating the status of a signature field, else *None*. + A bool indicating the signing status of a signature field, else *None*. .. attribute:: rect @@ -163,6 +163,18 @@ ZaDb ZapfDingbats You are generally free to use any font for every widget. However, we recommend using *ZaDb* ("ZapfDingbats") and fontsize 0 for check boxes: typical viewers will put a correctly sized tickmark in the field's rectangle, when it is clicked. +Supported Widget Types +----------------------- +PyMuPDF supports the creation and update of many, but not all widget types. + +* text (``PDF_WIDGET_TYPE_TEXT``) +* push button (``PDF_WIDGET_TYPE_BUTTON``) +* check box (``PDF_WIDGET_TYPE_CHECKBOX``) +* combo box (``PDF_WIDGET_TYPE_COMBOBOX``) +* list box (``PDF_WIDGET_TYPE_LISTBOX``) +* radio button (``PDF_WIDGET_TYPE_RADIOBUTTON``): PyMuPDF does not currently support groups of (interconnected) buttons, where setting one automatically unsets the other buttons in the group. The widget object also does not reflect the presence of a button group. Setting or unsetting happens via values ``True`` and ``False`` and will always work without affecting other radio buttons. +* signature (``PDF_WIDGET_TYPE_SIGNATURE``) **read only**. + .. rubric:: Footnotes .. [#f1] If you intend to re-access a new or updated field (e.g. for making a pixmap), make sure to reload the page first. Either close and re-open the document, or load another page first, or simply do ``page = doc.reload_page(page)``. diff --git a/fitz/fitz.i b/fitz/fitz.i index 3d34a9f26..aa8c87815 100644 --- a/fitz/fitz.i +++ b/fitz/fitz.i @@ -73,7 +73,7 @@ CheckParent(self)%} #define EMPTY_STRING PyUnicode_FromString("") #define EXISTS(x) (x != NULL && PyObject_IsTrue(x)==1) -#define THROWMSG(msg) fz_throw(gctx, FZ_ERROR_GENERIC, msg) +#define THROWMSG(ctx, msg) fz_throw(ctx, FZ_ERROR_GENERIC, msg) #define ASSERT_PDF(cond) if (cond == NULL) fz_throw(gctx, FZ_ERROR_GENERIC, "not a PDF") #define INRANGE(v, low, high) ((low) <= v && v <= (high)) #define MAX(a, b) ((a) < (b)) ? (b) : (a) @@ -222,6 +222,7 @@ import io import math import os import weakref +import hashlib from binascii import hexlify fitz_py2 = str is bytes # if true, this is Python 2 @@ -273,15 +274,14 @@ struct Document Notes: Basic usages: - open() - creates new empty PDF document + open() - new PDF document open(filename) - string or pathlib.Path, must have supported file extension. open(type, buffer) - type: valid extension, buffer: bytes object. open(stream=buffer, filetype=type) - keyword version of previous. open(filename, fileype=type) - filename with unrecognized extension. - - rect, width, height, fontsize may be used to re-layout reflowable documents - on open (e.g. EPUB). Ignored if not applicable. + rect, width, height, fontsize: layout reflowable document + on open (e.g. EPUB). Ignored if n/a. """ if not filename or type(filename) is str: @@ -320,6 +320,7 @@ struct Document self.FontInfos = [] self.Graftmaps = {} self.ShownPages = {} + self.InsertedImages = {} self._page_refs = weakref.WeakValueDictionary()%} %pythonappend Document %{ @@ -367,7 +368,7 @@ struct Document handler = fz_recognize_document(gctx, filetype); if (handler && handler->open) doc = handler->open(gctx, filename); - else THROWMSG("unrecognized file type"); + else THROWMSG(gctx, "unrecognized file type"); } } else { pdf_document *pdf = pdf_create_document(gctx); @@ -401,6 +402,7 @@ struct Document self.Graftmaps[k] = None self.Graftmaps = {} self.ShownPages = {} + self.InsertedImages = {} %} %pythonappend close %{self.thisown = False%} @@ -451,21 +453,21 @@ struct Document fz_try(gctx) { if (PySequence_Check(page_id)) { val = PySequence_GetItem(page_id, 0); - if (!val) THROWMSG("bad page page id"); + if (!val) THROWMSG(gctx, "bad page page id"); int chapter = (int) PyLong_AsLong(val); Py_DECREF(val); - if (PyErr_Occurred()) THROWMSG("bad page id"); + if (PyErr_Occurred()) THROWMSG(gctx, "bad page id"); val = PySequence_GetItem(page_id, 1); - if (!val) THROWMSG("bad page page id"); + if (!val) THROWMSG(gctx, "bad page page id"); pno = (int) PyLong_AsLong(val); Py_DECREF(val); - if (PyErr_Occurred()) THROWMSG("bad page id"); + if (PyErr_Occurred()) THROWMSG(gctx, "bad page id"); page = fz_load_chapter_page(gctx, doc, chapter, pno); } else { pno = (int) PyLong_AsLong(page_id); - if (PyErr_Occurred()) THROWMSG("bad page id"); + if (PyErr_Occurred()) THROWMSG(gctx, "bad page id"); page = fz_load_page(gctx, doc, pno); } } @@ -640,10 +642,10 @@ struct Document pdf_obj *filespec = pdf_dict_getl(gctx, entry, PDF_NAME(EF), PDF_NAME(F), NULL); - if (!filespec) THROWMSG("bad PDF: /EF object not found"); + if (!filespec) THROWMSG(gctx, "bad PDF: /EF object not found"); res = JM_BufferFromBytes(gctx, buffer); - if (EXISTS(buffer) && !res) THROWMSG("bad type: 'buffer'"); + if (EXISTS(buffer) && !res) THROWMSG(gctx, "bad type: 'buffer'"); if (res) { JM_update_stream(gctx, pdf, filespec, res, 1); @@ -717,7 +719,7 @@ struct Document fz_try(gctx) { ASSERT_PDF(pdf); data = JM_BufferFromBytes(gctx, buffer); - if (!data) THROWMSG("bad type: 'buffer'"); + if (!data) THROWMSG(gctx, "bad type: 'buffer'"); size = fz_buffer_storage(gctx, data, &buffdata); names = pdf_dict_getl(gctx, pdf_trailer(gctx, pdf), @@ -876,7 +878,7 @@ struct Document fz_try(gctx) { int fp = from_page, tp = to_page, srcCount = fz_count_pages(gctx, fz_doc); if (pdf_specifics(gctx, fz_doc)) - THROWMSG("bad document type"); + THROWMSG(gctx, "bad document type"); if (fp < 0) fp = 0; if (fp > srcCount - 1) fp = srcCount - 1; if (tp < 0) tp = srcCount - 1; @@ -944,7 +946,7 @@ struct Document fz_try(gctx) { int chapters = fz_count_chapters(gctx, (fz_document *) $self); if (chapter < 0 || chapter >= chapters) - THROWMSG("bad chapter number"); + THROWMSG(gctx, "bad chapter number"); pages = fz_count_chapter_pages(gctx, (fz_document *) $self, chapter); } fz_catch(gctx) { @@ -973,16 +975,16 @@ struct Document int pno; fz_try(gctx) { val = PySequence_GetItem(page_id, 0); - if (!val) THROWMSG("bad page id"); + if (!val) THROWMSG(gctx, "bad page id"); int chapter = (int) PyLong_AsLong(val); Py_DECREF(val); - if (PyErr_Occurred()) THROWMSG("bad page id"); + if (PyErr_Occurred()) THROWMSG(gctx, "bad page id"); val = PySequence_GetItem(page_id, 1); - if (!val) THROWMSG("bad page id"); + if (!val) THROWMSG(gctx, "bad page id"); pno = (int) PyLong_AsLong(val); Py_DECREF(val); - if (PyErr_Occurred()) THROWMSG("bad page id"); + if (PyErr_Occurred()) THROWMSG(gctx, "bad page id"); loc = fz_make_location(chapter, pno); prev_loc = fz_previous_page(gctx, this_doc, loc); @@ -1016,16 +1018,16 @@ struct Document int pno; fz_try(gctx) { val = PySequence_GetItem(page_id, 0); - if (!val) THROWMSG("bad page id"); + if (!val) THROWMSG(gctx, "bad page id"); int chapter = (int) PyLong_AsLong(val); Py_DECREF(val); - if (PyErr_Occurred()) THROWMSG("bad page id"); + if (PyErr_Occurred()) THROWMSG(gctx, "bad page id"); val = PySequence_GetItem(page_id, 1); - if (!val) THROWMSG("bad page id"); + if (!val) THROWMSG(gctx, "bad page id"); pno = (int) PyLong_AsLong(val); Py_DECREF(val); - if (PyErr_Occurred()) THROWMSG("bad page id"); + if (PyErr_Occurred()) THROWMSG(gctx, "bad page id"); loc = fz_make_location(chapter, pno); next_loc = fz_next_page(gctx, this_doc, loc); @@ -1048,7 +1050,7 @@ struct Document while (pno < 0) pno += pageCount; fz_try(gctx) { if (pno >= pageCount) - THROWMSG("bad page number(s)"); + THROWMSG(gctx, "bad page number(s)"); loc = fz_location_from_page_number(gctx, this_doc, pno); } fz_catch(gctx) { @@ -1077,16 +1079,16 @@ struct Document int pno; fz_try(gctx) { val = PySequence_GetItem(page_id, 0); - if (!val) THROWMSG("bad page id"); + if (!val) THROWMSG(gctx, "bad page id"); int chapter = (int) PyLong_AsLong(val); Py_DECREF(val); - if (PyErr_Occurred()) THROWMSG("bad page id"); + if (PyErr_Occurred()) THROWMSG(gctx, "bad page id"); val = PySequence_GetItem(page_id, 1); - if (!val) THROWMSG("bad page id"); + if (!val) THROWMSG(gctx, "bad page id"); pno = (int) PyLong_AsLong(val); Py_DECREF(val); - if (PyErr_Occurred()) THROWMSG("bad page id"); + if (PyErr_Occurred()) THROWMSG(gctx, "bad page id"); loc = fz_make_location(chapter, pno); page_n = fz_page_number_from_location(gctx, this_doc, loc); @@ -1202,7 +1204,7 @@ struct Document h = r.y1 - r.y0; } if (w <= 0.0f || h <= 0.0f) - THROWMSG("invalid page size"); + THROWMSG(gctx, "invalid page size"); fz_layout_document(gctx, doc, w, h, fontsize); } fz_catch(gctx) { @@ -1220,11 +1222,11 @@ struct Document fz_bookmark mark; fz_try(gctx) { if (JM_INT_ITEM(loc, 0, &location.chapter) == 1) - THROWMSG("Bad location"); + THROWMSG(gctx, "Bad location"); if (JM_INT_ITEM(loc, 1, &location.page) == 1) - THROWMSG("Bad location"); + THROWMSG(gctx, "Bad location"); mark = fz_make_bookmark(gctx, doc, location); - if (!mark) THROWMSG("Bad location"); + if (!mark) THROWMSG(gctx, "Bad location"); } fz_catch(gctx) { return NULL; @@ -1266,7 +1268,7 @@ struct Document fz_try(gctx) { ASSERT_PDF(pdf); if (!INRANGE(xref, 1, pdf_xref_len(gctx, pdf)-1)) - THROWMSG("bad xref"); + THROWMSG(gctx, "bad xref"); pdf_delete_object(gctx, pdf, xref); } fz_catch(gctx) { @@ -1328,7 +1330,7 @@ struct Document return idlist; } - CLOSECHECK0(isPDF, """Check if a PDF document.""") + CLOSECHECK0(isPDF, """Check for PDF.""") %pythoncode%{@property%} PyObject *isPDF() { @@ -1394,9 +1396,9 @@ struct Document return Py_BuildValue("i", fz_authenticate_password(gctx, (fz_document *) $self, (const char *) password)); } - //--------------------------------------------------------------------- + //------------------------------------------------------------------ // save PDF file - //--------------------------------------------------------------------- + //------------------------------------------------------------------ FITZEXCEPTION(save, !result) %pythonprepend save %{ """Save PDF to filename.""" @@ -1417,14 +1419,19 @@ struct Document raise ValueError("incremental needs original file") %} - PyObject *save(char *filename, int garbage=0, int clean=0, int deflate=0, int incremental=0, int ascii=0, int expand=0, int linear=0, int pretty=0, int encryption=1, int permissions=-1, char *owner_pw=NULL, char *user_pw=NULL) + PyObject * + save(char *filename, int garbage=0, int clean=0, + int deflate=0, int deflate_images=0, int deflate_fonts=0, + int incremental=0, int ascii=0, int expand=0, int linear=0, + int pretty=0, int encryption=1, int permissions=-1, + char *owner_pw=NULL, char *user_pw=NULL) { pdf_write_options opts = pdf_default_write_options; opts.do_incremental = incremental; opts.do_ascii = ascii; opts.do_compress = deflate; - opts.do_compress_images = deflate; - opts.do_compress_fonts = deflate; + opts.do_compress_images = deflate_images; + opts.do_compress_fonts = deflate_fonts; opts.do_decompress = expand; opts.do_garbage = garbage; opts.do_pretty = pretty; @@ -1435,6 +1442,8 @@ struct Document opts.permissions = permissions; if (owner_pw) { memcpy(&opts.opwd_utf8, owner_pw, strlen(owner_pw)+1); + } else if (user_pw) { + memcpy(&opts.opwd_utf8, user_pw, strlen(user_pw)+1); } if (user_pw) { memcpy(&opts.upwd_utf8, user_pw, strlen(user_pw)+1); @@ -1454,9 +1463,9 @@ struct Document return_none; } - //--------------------------------------------------------------------- + //------------------------------------------------------------------ // write document to memory - //--------------------------------------------------------------------- + //------------------------------------------------------------------ FITZEXCEPTION(write, !result) %pythonprepend write %{ """Write the PDF to a bytes object.""" @@ -1465,12 +1474,11 @@ struct Document if self.pageCount < 1: raise ValueError("cannot write with zero pages")%} - PyObject *write(int garbage=0, int clean=0, int deflate=0, - int ascii=0, int expand=0, int pretty=0, - int encryption=1, - int permissions=-1, - char *owner_pw=NULL, - char *user_pw=NULL) + PyObject * + write(int garbage=0, int clean=0, + int deflate=0, int deflate_images=0, int deflate_fonts=0, + int ascii=0, int expand=0, int pretty=0, int encryption=1, + int permissions=-1, char *owner_pw=NULL, char *user_pw=NULL) { PyObject *r = NULL; fz_output *out = NULL; @@ -1479,8 +1487,8 @@ struct Document opts.do_incremental = 0; opts.do_ascii = ascii; opts.do_compress = deflate; - opts.do_compress_images = deflate; - opts.do_compress_fonts = deflate; + opts.do_compress_images = deflate_images; + opts.do_compress_fonts = deflate_fonts; opts.do_decompress = expand; opts.do_garbage = garbage; opts.do_linear = 0; @@ -1491,6 +1499,8 @@ struct Document opts.permissions = permissions; if (owner_pw) { memcpy(&opts.opwd_utf8, owner_pw, strlen(owner_pw)+1); + } else if (user_pw) { + memcpy(&opts.opwd_utf8, user_pw, strlen(user_pw)+1); } if (user_pw) { memcpy(&opts.upwd_utf8, user_pw, strlen(user_pw)+1); @@ -1503,7 +1513,7 @@ struct Document fz_try(gctx) { ASSERT_PDF(pdf); if (pdf_count_pages(gctx, pdf) < 1) - THROWMSG("cannot save with zero pages"); + THROWMSG(gctx, "cannot save with zero pages"); JM_embedded_clean(gctx, pdf); JM_ensure_identity(gctx, pdf); res = fz_new_buffer(gctx, 8192); @@ -1609,7 +1619,7 @@ struct Document sa = MIN(sa, outCount); // but that is also the limit fz_try(gctx) { - if (!pdfout || !pdfsrc) THROWMSG("source or target not a PDF"); + if (!pdfout || !pdfsrc) THROWMSG(gctx, "source or target not a PDF"); JM_merge_range(gctx, pdfout, pdfsrc, fp, tp, sa, rotate, links, annots, show_progress, (pdf_graft_map *) _gmap); } fz_catch(gctx) { @@ -1619,9 +1629,9 @@ struct Document return_none; } - //--------------------------------------------------------------------- + //------------------------------------------------------------------ // Create and insert a new page (PDF) - //--------------------------------------------------------------------- + //------------------------------------------------------------------ FITZEXCEPTION(_newPage, !result) CLOSECHECK(_newPage, """Make a new PDF page.""") %pythonappend _newPage %{self._reset_page_refs()%} @@ -1635,7 +1645,7 @@ struct Document fz_buffer *contents = NULL; fz_try(gctx) { ASSERT_PDF(pdf); - if (pno < -1) THROWMSG("bad page number(s)"); + if (pno < -1) THROWMSG(gctx, "bad page number(s)"); // create /Resources and /Contents objects resources = pdf_add_object_drop(gctx, pdf, pdf_new_dict(gctx, pdf, 1)); page_obj = pdf_add_page(gctx, pdf, mediabox, 0, resources, contents); @@ -1652,10 +1662,10 @@ struct Document return_none; } - //--------------------------------------------------------------------- + //------------------------------------------------------------------ // Create sub-document to keep only selected pages. // Parameter is a Python sequence of the wanted page numbers. - //--------------------------------------------------------------------- + //------------------------------------------------------------------ FITZEXCEPTION(select, !result) %pythonprepend select %{"""Build sub-pdf with page numbers in the list.""" if self.isClosed or self.isEncrypted: @@ -1692,9 +1702,9 @@ if len(pyliste) == 0 or min(pyliste) not in range(len(self)) or max(pyliste) not return_none; } - //--------------------------------------------------------------------- + //------------------------------------------------------------------ // remove one page - //--------------------------------------------------------------------- + //------------------------------------------------------------------ FITZEXCEPTION(_deletePage, !result) PyObject *_deletePage(int pno) { @@ -1713,9 +1723,9 @@ if len(pyliste) == 0 or min(pyliste) not in range(len(self)) or max(pyliste) not return_none; } - //******************************************************************** + //------------------------------------------------------------------ // get document permissions - //******************************************************************** + //------------------------------------------------------------------ %pythoncode%{@property%} %pythonprepend permissions %{ """Document permissions.""" @@ -1825,7 +1835,7 @@ if len(pyliste) == 0 or min(pyliste) not in range(len(self)) or max(pyliste) not fz_var(pageref); pdf_document *pdf = pdf_specifics(gctx, this_doc); fz_try(gctx) { - if (n >= pageCount) THROWMSG("bad page number(s)"); + if (n >= pageCount) THROWMSG(gctx, "bad page number(s)"); ASSERT_PDF(pdf); pageref = pdf_lookup_page_obj(gctx, pdf, n); } @@ -1851,7 +1861,7 @@ if len(pyliste) == 0 or min(pyliste) not in range(len(self)) or max(pyliste) not fz_var(pageref); pdf_document *pdf = pdf_specifics(gctx, this_doc); fz_try(gctx) { - if (n >= pageCount) THROWMSG("bad page number(s)"); + if (n >= pageCount) THROWMSG(gctx, "bad page number(s)"); ASSERT_PDF(pdf); pageref = pdf_lookup_page_obj(gctx, pdf, n); } @@ -1878,7 +1888,7 @@ if len(pyliste) == 0 or min(pyliste) not in range(len(self)) or max(pyliste) not int pageCount = fz_count_pages(gctx, doc); int n = pno; // pno < 0 is allowed while (n < 0) n += pageCount; // make it non-negative - if (n >= pageCount) THROWMSG("bad page number(s)"); + if (n >= pageCount) THROWMSG(gctx, "bad page number(s)"); ASSERT_PDF(pdf); pageref = pdf_lookup_page_obj(gctx, pdf, n); rsrc = pdf_dict_get_inheritable(gctx, @@ -1983,13 +1993,13 @@ if len(pyliste) == 0 or min(pyliste) not in range(len(self)) or max(pyliste) not fz_try(gctx) { ASSERT_PDF(pdf); if (!INRANGE(xref, 1, pdf_xref_len(gctx, pdf)-1)) - THROWMSG("bad xref"); + THROWMSG(gctx, "bad xref"); obj = pdf_new_indirect(gctx, pdf, xref, 0); pdf_obj *subtype = pdf_dict_get(gctx, obj, PDF_NAME(Subtype)); if (!pdf_name_eq(gctx, subtype, PDF_NAME(Image))) - THROWMSG("not an image"); + THROWMSG(gctx, "not an image"); pdf_obj *o = pdf_dict_get(gctx, obj, PDF_NAME(SMask)); if (o) smask = pdf_to_num(gctx, o); @@ -2084,10 +2094,10 @@ if len(pyliste) == 0 or min(pyliste) not in range(len(self)) or max(pyliste) not } - //--------------------------------------------------------------------- + //------------------------------------------------------------------ // Delete all bookmarks (table of contents) - // returns the list of deleted (now available) xref numbers - //--------------------------------------------------------------------- + // returns list of deleted (now available) xref numbers + //------------------------------------------------------------------ CLOSECHECK(_delToC, """Delete the TOC.""") %pythonappend _delToC %{self.initData()%} PyObject *_delToC() @@ -2124,10 +2134,10 @@ if len(pyliste) == 0 or min(pyliste) not in range(len(self)) or max(pyliste) not return xrefs; } - //--------------------------------------------------------------------- + //------------------------------------------------------------------ // Return outline xref by index. // Performs a linear search - //--------------------------------------------------------------------- + //------------------------------------------------------------------ CLOSECHECK0(outlineXref, """Get outline xref by index.""") int outlineXref(int index) { @@ -2145,9 +2155,9 @@ if len(pyliste) == 0 or min(pyliste) not in range(len(self)) or max(pyliste) not return JM_outline_xref(gctx, first, index); } - //--------------------------------------------------------------------- + //------------------------------------------------------------------ // Check: is xref a stream object? - //--------------------------------------------------------------------- + //------------------------------------------------------------------ CLOSECHECK0(isStream, """Check if xref is a stream object.""") PyObject *isStream(int xref=0) { @@ -2156,9 +2166,9 @@ if len(pyliste) == 0 or min(pyliste) not in range(len(self)) or max(pyliste) not return JM_BOOL(pdf_obj_num_is_stream(gctx, pdf, xref)); } - //--------------------------------------------------------------------- + //------------------------------------------------------------------ // Return or set NeedAppearances - //--------------------------------------------------------------------- + //------------------------------------------------------------------ %pythonprepend need_appearances %{"""Get/set the NeedAppearances value.""" if self.isClosed: @@ -2198,9 +2208,9 @@ if not self.isFormPDF: return_none; } - //--------------------------------------------------------------------- + //------------------------------------------------------------------ // Return the /SigFlags value - //--------------------------------------------------------------------- + //------------------------------------------------------------------ CLOSECHECK0(getSigFlags, """Get the /SigFlags value.""") int getSigFlags() { @@ -2224,9 +2234,9 @@ if not self.isFormPDF: return sigflag; } - //--------------------------------------------------------------------- + //------------------------------------------------------------------ // Check: is this an AcroForm with at least one field? - //--------------------------------------------------------------------- + //------------------------------------------------------------------ CLOSECHECK0(isFormPDF, """Check if PDF Form document.""") %pythoncode%{@property%} PyObject *isFormPDF() @@ -2255,9 +2265,9 @@ if not self.isFormPDF: } } - //--------------------------------------------------------------------- + //------------------------------------------------------------------ // Return the list of field font resource names - //--------------------------------------------------------------------- + //------------------------------------------------------------------ CLOSECHECK0(FormFonts, """Get list of field font resource names.""") %pythoncode%{@property%} PyObject *FormFonts() @@ -2266,6 +2276,7 @@ if not self.isFormPDF: if (!pdf) return_none; // not a PDF pdf_obj *fonts = NULL; PyObject *liste = PyList_New(0); + fz_var(liste); fz_try(gctx) { fonts = pdf_dict_getl(gctx, pdf_trailer(gctx, pdf), PDF_NAME(Root), PDF_NAME(AcroForm), PDF_NAME(DR), PDF_NAME(Font), NULL); if (fonts && pdf_is_dict(gctx, fonts)) // fonts exist @@ -2278,13 +2289,16 @@ if not self.isFormPDF: } } } - fz_catch(gctx) return_none; // any problem yields None + fz_catch(gctx) { + Py_DECREF(liste); + return_none; // any problem yields None + } return liste; } - //--------------------------------------------------------------------- + //------------------------------------------------------------------ // Add a field font - //--------------------------------------------------------------------- + //------------------------------------------------------------------ FITZEXCEPTION(_addFormFont, !result) CLOSECHECK(_addFormFont, """Add new form font.""") PyObject *_addFormFont(char *name, char *font) @@ -2296,7 +2310,7 @@ if not self.isFormPDF: fonts = pdf_dict_getl(gctx, pdf_trailer(gctx, pdf), PDF_NAME(Root), PDF_NAME(AcroForm), PDF_NAME(DR), PDF_NAME(Font), NULL); if (!fonts || !pdf_is_dict(gctx, fonts)) - THROWMSG("PDF has no form fonts yet"); + THROWMSG(gctx, "PDF has no form fonts yet"); pdf_obj *k = pdf_new_name(gctx, (const char *) name); pdf_obj *v = JM_pdf_obj_from_str(gctx, pdf, font); pdf_dict_put(gctx, fonts, k, v); @@ -2305,9 +2319,9 @@ if not self.isFormPDF: return_none; } - //--------------------------------------------------------------------- + //------------------------------------------------------------------ // Get Xref Number of Outline Root, create it if missing - //--------------------------------------------------------------------- + //------------------------------------------------------------------ FITZEXCEPTION(_getOLRootNumber, !result) CLOSECHECK(_getOLRootNumber, """Get xref of Outline Root, create it if missing.""") PyObject *_getOLRootNumber() @@ -2337,9 +2351,9 @@ if not self.isFormPDF: return Py_BuildValue("i", pdf_to_num(gctx, olroot)); } - //--------------------------------------------------------------------- + //------------------------------------------------------------------ // Get a new Xref number - //--------------------------------------------------------------------- + //------------------------------------------------------------------ FITZEXCEPTION(_getNewXref, !result) CLOSECHECK(_getNewXref, """Make new xref.""") PyObject *_getNewXref() @@ -2355,9 +2369,9 @@ if not self.isFormPDF: return Py_BuildValue("i", pdf_create_object(gctx, pdf)); } - //--------------------------------------------------------------------- + //------------------------------------------------------------------ // Get Length of Xref - //--------------------------------------------------------------------- + //------------------------------------------------------------------ CLOSECHECK0(_getXrefLength, """Get length of xref table.""") PyObject *_getXrefLength() { @@ -2367,9 +2381,9 @@ if not self.isFormPDF: return Py_BuildValue("i", xreflen); } - //--------------------------------------------------------------------- + //------------------------------------------------------------------ // Get XML Metadata - //--------------------------------------------------------------------- + //------------------------------------------------------------------ CLOSECHECK0(getXmlMetadata, """Get document XML metadata.""") PyObject *getXmlMetadata() { @@ -2398,9 +2412,9 @@ if not self.isFormPDF: return rc; } - //--------------------------------------------------------------------- + //------------------------------------------------------------------ // Get XML Metadata xref - //--------------------------------------------------------------------- + //------------------------------------------------------------------ CLOSECHECK0(_getXmlMetadataXref, """Get xref of document XML metadata.""") PyObject *_getXmlMetadataXref() { @@ -2409,7 +2423,7 @@ if not self.isFormPDF: pdf_document *pdf = pdf_specifics(gctx, (fz_document *) $self); ASSERT_PDF(pdf); pdf_obj *root = pdf_dict_get(gctx, pdf_trailer(gctx, pdf), PDF_NAME(Root)); - if (!root) THROWMSG("PDF has no root"); + if (!root) THROWMSG(gctx, "PDF has no root"); pdf_obj *xml = pdf_dict_get(gctx, root, PDF_NAME(Metadata)); if (xml) xref = pdf_to_num(gctx, xml); } @@ -2417,9 +2431,9 @@ if not self.isFormPDF: return Py_BuildValue("i", xref); } - //--------------------------------------------------------------------- + //------------------------------------------------------------------ // Delete XML Metadata - //--------------------------------------------------------------------- + //------------------------------------------------------------------ FITZEXCEPTION(_delXmlMetadata, !result) CLOSECHECK(_delXmlMetadata, """Delete XML metadata.""") PyObject *_delXmlMetadata() @@ -2437,11 +2451,11 @@ if not self.isFormPDF: return_none; } - //--------------------------------------------------------------------- + //------------------------------------------------------------------ // Set XML-based Metadata - //--------------------------------------------------------------------- + //------------------------------------------------------------------ FITZEXCEPTION(setXmlMetadata, !result) - CLOSECHECK(setXmlMetadata, """Put XML metadata.""") + CLOSECHECK(setXmlMetadata, """Store XML metadata.""") PyObject *setXmlMetadata(char *metadata) { pdf_document *pdf = pdf_specifics(gctx, (fz_document *) $self); @@ -2449,7 +2463,7 @@ if not self.isFormPDF: fz_try(gctx) { ASSERT_PDF(pdf); pdf_obj *root = pdf_dict_get(gctx, pdf_trailer(gctx, pdf), PDF_NAME(Root)); - if (!root) THROWMSG("PDF has no root"); + if (!root) THROWMSG(gctx, "PDF has no root"); res = fz_new_buffer_from_copied_data(gctx, (const unsigned char *) metadata, strlen(metadata)); pdf_obj *xml = pdf_dict_get(gctx, root, PDF_NAME(Metadata)); if (xml) { @@ -2471,9 +2485,9 @@ if not self.isFormPDF: return_none; } - //--------------------------------------------------------------------- + //------------------------------------------------------------------ // Get Object String of xref - //--------------------------------------------------------------------- + //------------------------------------------------------------------ FITZEXCEPTION(_getXrefString, !result) CLOSECHECK0(_getXrefString, """Get xref object source as a string.""") PyObject *_getXrefString(int xref, int compressed=0, int ascii=0) @@ -2487,7 +2501,7 @@ if not self.isFormPDF: ASSERT_PDF(pdf); int xreflen = pdf_xref_len(gctx, pdf); if (!INRANGE(xref, 1, xreflen-1)) - THROWMSG("bad xref"); + THROWMSG(gctx, "bad xref"); obj = pdf_load_object(gctx, pdf, xref); res = JM_object_to_buffer(gctx, pdf_resolve_indirect(gctx, obj), compressed, ascii); text = JM_EscapeStrFromBuffer(gctx, res); @@ -2500,9 +2514,9 @@ if not self.isFormPDF: return text; } - //--------------------------------------------------------------------- + //------------------------------------------------------------------ // Get String of PDF trailer - //--------------------------------------------------------------------- + //------------------------------------------------------------------ FITZEXCEPTION(_getTrailerString, !result) CLOSECHECK0(_getTrailerString, """Get PDF trailer as a string.""") PyObject *_getTrailerString(int compressed=0, int ascii=0) @@ -2524,10 +2538,10 @@ if not self.isFormPDF: return text; } - //--------------------------------------------------------------------- + //------------------------------------------------------------------ // Get compressed stream of an object by xref // return_none if not stream - //--------------------------------------------------------------------- + //------------------------------------------------------------------ FITZEXCEPTION(_getXrefStreamRaw, !result) CLOSECHECK(_getXrefStreamRaw, """Get xref stream without decompression.""") PyObject *_getXrefStreamRaw(int xref) @@ -2542,7 +2556,7 @@ if not self.isFormPDF: ASSERT_PDF(pdf); int xreflen = pdf_xref_len(gctx, pdf); if (!INRANGE(xref, 1, xreflen-1)) - THROWMSG("bad xref"); + THROWMSG(gctx, "bad xref"); obj = pdf_new_indirect(gctx, pdf, xref, 0); if (pdf_is_stream(gctx, obj)) { @@ -2562,10 +2576,10 @@ if not self.isFormPDF: return r; } - //--------------------------------------------------------------------- + //------------------------------------------------------------------ // Get decompressed stream of an object by xref // return_none if not stream - //--------------------------------------------------------------------- + //------------------------------------------------------------------ FITZEXCEPTION(_getXrefStream, !result) CLOSECHECK(_getXrefStream, """Get decompressed xref stream.""") PyObject *_getXrefStream(int xref) @@ -2580,7 +2594,7 @@ if not self.isFormPDF: ASSERT_PDF(pdf); int xreflen = pdf_xref_len(gctx, pdf); if (!INRANGE(xref, 1, xreflen-1)) - THROWMSG("bad xref"); + THROWMSG(gctx, "bad xref"); obj = pdf_new_indirect(gctx, pdf, xref, 0); if (pdf_is_stream(gctx, obj)) { @@ -2600,9 +2614,9 @@ if not self.isFormPDF: return r; } - //--------------------------------------------------------------------- + //------------------------------------------------------------------ // Update an Xref number with a new object given as a string - //--------------------------------------------------------------------- + //------------------------------------------------------------------ FITZEXCEPTION(_updateObject, !result) CLOSECHECK(_updateObject, """Replace object definition source.""") PyObject *_updateObject(int xref, char *text, struct Page *page = NULL) @@ -2613,7 +2627,7 @@ if not self.isFormPDF: ASSERT_PDF(pdf); int xreflen = pdf_xref_len(gctx, pdf); if (!INRANGE(xref, 1, xreflen-1)) - THROWMSG("bad xref"); + THROWMSG(gctx, "bad xref"); // create new object with passed-in string new_obj = JM_pdf_obj_from_str(gctx, pdf, text); pdf_update_object(gctx, pdf, xref, new_obj); @@ -2628,9 +2642,9 @@ if not self.isFormPDF: return_none; } - //--------------------------------------------------------------------- + //------------------------------------------------------------------ // Update a stream identified by its xref - //--------------------------------------------------------------------- + //------------------------------------------------------------------ FITZEXCEPTION(_updateStream, !result) CLOSECHECK(_updateStream, """Replace xref stream part.""") PyObject *_updateStream(int xref = 0, PyObject *stream = NULL, int new = 0) @@ -2644,13 +2658,13 @@ if not self.isFormPDF: ASSERT_PDF(pdf); int xreflen = pdf_xref_len(gctx, pdf); if (!INRANGE(xref, 1, xreflen-1)) - THROWMSG("bad xref"); + THROWMSG(gctx, "bad xref"); // get the object obj = pdf_new_indirect(gctx, pdf, xref, 0); if (!new && !pdf_is_stream(gctx, obj)) - THROWMSG("xref not a stream object"); + THROWMSG(gctx, "no stream object at xref"); res = JM_BufferFromBytes(gctx, stream); - if (!res) THROWMSG("bad type: 'stream'"); + if (!res) THROWMSG(gctx, "bad type: 'stream'"); JM_update_stream(gctx, pdf, obj, res, 1); } @@ -2664,9 +2678,9 @@ if not self.isFormPDF: return_none; } - //--------------------------------------------------------------------- + //------------------------------------------------------------------ // Add or update metadata based on provided raw string - //--------------------------------------------------------------------- + //------------------------------------------------------------------ FITZEXCEPTION(_setMetadata, !result) CLOSECHECK(_setMetadata, """Set old style metadata.""") PyObject *_setMetadata(char *text) @@ -2699,9 +2713,9 @@ if not self.isFormPDF: return_none; } - //--------------------------------------------------------------------- + //------------------------------------------------------------------ // create / refresh the page map - //--------------------------------------------------------------------- + //------------------------------------------------------------------ FITZEXCEPTION(_make_page_map, !result) CLOSECHECK0(_make_page_map, """Make an array page number -> page object.""") PyObject *_make_page_map() @@ -2719,9 +2733,9 @@ if not self.isFormPDF: } - //--------------------------------------------------------------------- + //------------------------------------------------------------------ // full (deep) copy of one page - //--------------------------------------------------------------------- + //------------------------------------------------------------------ FITZEXCEPTION(fullcopyPage, !result) CLOSECHECK0(fullcopyPage, """Make full page duplication.""") %pythonappend fullcopyPage %{self._reset_page_refs()%} @@ -2735,7 +2749,7 @@ if not self.isFormPDF: ASSERT_PDF(pdf); if (!INRANGE(pno, 0, pageCount - 1) || !INRANGE(to, -1, pageCount - 1)) - THROWMSG("bad page number(s)"); + THROWMSG(gctx, "bad page number(s)"); pdf_obj *page1 = pdf_resolve_indirect(gctx, pdf_lookup_page_obj(gctx, pdf, pno)); @@ -2797,9 +2811,9 @@ if not self.isFormPDF: } - //--------------------------------------------------------------------- + //------------------------------------------------------------------ // move or copy one page - //--------------------------------------------------------------------- + //------------------------------------------------------------------ FITZEXCEPTION(_move_copy_page, !result) CLOSECHECK0(_move_copy_page, """Move or copy a PDF page reference.""") %pythonappend _move_copy_page %{self._reset_page_refs()%} @@ -2934,9 +2948,406 @@ if not self.isFormPDF: return_none; } - //--------------------------------------------------------------------- + //------------------------------------------------------------------ + // PDF Optional Content functions + //------------------------------------------------------------------ + FITZEXCEPTION(layerConfigs, !result) + CLOSECHECK0(layerConfigs, """Show optional content configurations.""") + PyObject *layerConfigs() + { + PyObject *rc = NULL; + pdf_layer_config info = {NULL, NULL}; + fz_try(gctx) { + pdf_document *pdf = pdf_specifics(gctx, (fz_document *) self); + ASSERT_PDF(pdf); + int i, n = pdf_count_layer_configs(gctx, pdf); + if (n == 1) { + pdf_obj *obj = pdf_dict_getl(gctx, pdf_trailer(gctx, pdf), + PDF_NAME(Root), PDF_NAME(OCProperties), PDF_NAME(Configs), NULL); + if (!pdf_is_array(gctx, obj)) n = 0; + } + rc = PyTuple_New(n); + for (i = 0; i < n; i++) { + pdf_layer_config_info(gctx, pdf, i, &info); + PyObject *item = Py_BuildValue("{s:i,s:s,s:s}", + "number", i, "name", info.name, "creator", info.creator); + PyTuple_SET_ITEM(rc, i, item); + info.name = NULL; + info.creator = NULL; + } + } + fz_catch(gctx) { + Py_CLEAR(rc); + return NULL; + } + return rc; + } + + + FITZEXCEPTION(setLayerConfig, !result) + CLOSECHECK0(setLayerConfig, """Activate a optional content configuration.""") + PyObject *setLayerConfig(int config, int as_default=0) + { + fz_try(gctx) { + pdf_document *pdf = pdf_specifics(gctx, (fz_document *) self); + ASSERT_PDF(pdf); + pdf_obj *cfgs = pdf_dict_getl(gctx, pdf_trailer(gctx, pdf), + PDF_NAME(Root), PDF_NAME(OCProperties), PDF_NAME(Configs), NULL); + if (!pdf_is_array(gctx, cfgs) || !pdf_array_len(gctx, cfgs)) { + if (config < 1) goto finished; + THROWMSG(gctx, "bad config number"); + } + if (config < 0) goto finished; + pdf_select_layer_config(gctx, pdf, config); + if (as_default) { + pdf_set_layer_config_as_default(gctx, pdf); + pdf_read_ocg(gctx, pdf); + } + finished:; + } + fz_catch(gctx) { + return NULL; + } + Py_RETURN_NONE; + } + + + FITZEXCEPTION(getOCStates, !result) + CLOSECHECK0(getOCStates, """Content of ON, OFF, RBGroups of a configuration.""") + PyObject *getOCStates(int config=-1) + { + PyObject *rc; + pdf_obj *obj = NULL; + fz_try(gctx) { + pdf_document *pdf = pdf_specifics(gctx, (fz_document *) self); + ASSERT_PDF(pdf); + pdf_obj *ocp = pdf_dict_getl(gctx, pdf_trailer(gctx, pdf), + PDF_NAME(Root), PDF_NAME(OCProperties), NULL); + if (!ocp) { + rc = Py_BuildValue("s", NULL); + goto finished; + } + if (config == -1) { + obj = pdf_dict_get(gctx, ocp, PDF_NAME(D)); + } else { + obj = pdf_array_get(gctx, pdf_dict_get(gctx, ocp, PDF_NAME(Configs)), config); + } + if (!obj) THROWMSG(gctx, "bad config number"); + rc = JM_get_ocg_arrays(gctx, obj); + finished:; + } + fz_catch(gctx) { + Py_CLEAR(rc); + return NULL; + } + return rc; + } + + + FITZEXCEPTION(setOCStates, !result) + %pythonprepend setOCStates %{"""Set ON, OFF, RBGroups of a configuration.""" +if self.isClosed: + raise ValueError("document closed") +if on is not None and type(on) not in (list, tuple): + raise ValueError("bad type: 'on'") +if off is not None and type(off) not in (list, tuple): + raise ValueError("bad type: 'off'") +if rbgroups is not None and type(rbgroups) not in (list, tuple): + raise ValueError("bad type: 'rbgroups'") +if basestate is not None: + basestate = basestate.upper() + if basestate == "UNCHANGED": + basestate = "Unchanged" + if basestate not in ("ON", "OFF", "Unchanged"): + raise ValueError("bad value: 'basestate'") +%} + PyObject * + setOCStates(int config, const char *basestate=NULL, PyObject *on=NULL, + PyObject *off=NULL, PyObject *rbgroups=NULL) + { + pdf_obj *obj = NULL; + fz_try(gctx) { + pdf_document *pdf = pdf_specifics(gctx, (fz_document *) self); + ASSERT_PDF(pdf); + pdf_obj *ocp = pdf_dict_getl(gctx, pdf_trailer(gctx, pdf), + PDF_NAME(Root), PDF_NAME(OCProperties), NULL); + if (!ocp) { + goto finished; + } + if (config == -1) { + obj = pdf_dict_get(gctx, ocp, PDF_NAME(D)); + } else { + obj = pdf_array_get(gctx, pdf_dict_get(gctx, ocp, PDF_NAME(Configs)), config); + } + if (!obj) THROWMSG(gctx, "bad config number"); + JM_set_ocg_arrays(gctx, obj, basestate, on, off, rbgroups); + pdf_read_ocg(gctx, pdf); + finished:; + } + fz_catch(gctx) { + return NULL; + } + Py_RETURN_NONE; + } + + + FITZEXCEPTION(addLayerConfig, !result) + CLOSECHECK0(addLayerConfig, """Add new optional content configuration.""") + PyObject *addLayerConfig(char *name, char *creator=NULL, PyObject *on=NULL) + { + fz_try(gctx) { + pdf_document *pdf = pdf_specifics(gctx, (fz_document *) self); + ASSERT_PDF(pdf); + JM_add_layer_config(gctx, pdf, name, creator, on); + pdf_read_ocg(gctx, pdf); + } + fz_catch(gctx) { + return NULL; + } + Py_RETURN_NONE; + } + + + FITZEXCEPTION(layerUIConfigs, !result) + CLOSECHECK0(layerUIConfigs, """Show OC visibility status modifyable by user.""") + PyObject *layerUIConfigs() + { + typedef struct + { + const char *text; + int depth; + pdf_layer_config_ui_type type; + int selected; + int locked; + } pdf_layer_config_ui; + PyObject *rc = NULL; + + fz_try(gctx) { + pdf_document *pdf = pdf_specifics(gctx, (fz_document *) self); + ASSERT_PDF(pdf); + pdf_layer_config_ui info; + int i, n = pdf_count_layer_config_ui(gctx, pdf); + rc = PyTuple_New(n); + char *type = NULL; + for (i = 0; i < n; i++) { + pdf_layer_config_ui_info(gctx, pdf, i, (void *) &info); + switch (info.type) + { + case (1): type = "checkbox"; break; + case (2): type = "radiobox"; break; + default: type = "label"; break; + } + PyObject *item = Py_BuildValue("{s:i,s:s,s:i,s:s,s:O,s:O}", + "number", i, + "text", info.text, + "depth", info.depth, + "type", type, + "on", JM_BOOL(info.selected), + "locked", JM_BOOL(info.locked)); + PyTuple_SET_ITEM(rc, i, item); + } + } + fz_catch(gctx) { + Py_CLEAR(rc); + return NULL; + } + return rc; + } + + + FITZEXCEPTION(setLayerUIConfig, !result) + CLOSECHECK0(setLayerUIConfig, """Set / unset OC intent configuration.""") + PyObject *setLayerUIConfig(int number, int action=0) + { + fz_try(gctx) { + pdf_document *pdf = pdf_specifics(gctx, (fz_document *) self); + ASSERT_PDF(pdf); + switch (action) + { + case (1): + pdf_toggle_layer_config_ui(gctx, pdf, number); + break; + case (2): + pdf_deselect_layer_config_ui(gctx, pdf, number); + break; + default: + pdf_select_layer_config_ui(gctx, pdf, number); + break; + } + } + fz_catch(gctx) { + return NULL; + } + Py_RETURN_NONE; + } + + + FITZEXCEPTION(getOCGs, !result) + CLOSECHECK0(getOCGs, """Show existing optional content groups.""") + PyObject * + getOCGs() + { + PyObject *rc = NULL; + pdf_obj *ci = pdf_new_name(gctx, "CreatorInfo"); + fz_try(gctx) { + pdf_document *pdf = pdf_specifics(gctx, (fz_document *) self); + ASSERT_PDF(pdf); + pdf_obj *ocgs = pdf_dict_getl(gctx, + pdf_dict_get(gctx, + pdf_trailer(gctx, pdf), PDF_NAME(Root)), + PDF_NAME(OCProperties), PDF_NAME(OCGs), NULL); + rc = PyDict_New(); + if (!pdf_is_array(gctx, ocgs)) goto fertig; + int i, n = pdf_array_len(gctx, ocgs); + for (i = 0; i < n; i++) { + pdf_obj *ocg = pdf_array_get(gctx, ocgs, i); + int xref = pdf_to_num(gctx, ocg); + const char *name = pdf_to_text_string(gctx, pdf_dict_get(gctx, ocg, PDF_NAME(Name))); + pdf_obj *obj = pdf_dict_getl(gctx, ocg, PDF_NAME(Usage), ci, PDF_NAME(Subtype), NULL); + const char *usage = NULL; + if (obj) usage = pdf_to_name(gctx, obj); + PyObject *intents = PyList_New(0); + pdf_obj *intent = pdf_dict_get(gctx, ocg, PDF_NAME(Intent)); + if (intent) { + if (pdf_is_name(gctx, intent)) { + LIST_APPEND_DROP(intents, Py_BuildValue("s", pdf_to_name(gctx, intent))); + } else if (pdf_is_array(gctx, intent)) { + int j, m = pdf_array_len(gctx, intent); + for (j = 0; j < m; j++) { + pdf_obj *o = pdf_array_get(gctx, intent, j); + if (pdf_is_name(gctx, o)) + LIST_APPEND_DROP(intents, Py_BuildValue("s", pdf_to_name(gctx, o))); + } + } + } + pdf_ocg_descriptor *desc = pdf->ocg; + int hidden = pdf_is_hidden_ocg(gctx, desc, NULL, usage, ocg); + PyObject *item = Py_BuildValue("{s:s,s:O,s:O,s:s}", + "name", name, + "intent", intents, + "on", JM_BOOL(!hidden), + "usage", usage); + Py_DECREF(intents); + PyObject *temp = Py_BuildValue("i", xref); + DICT_SETITEM_DROP(rc, temp, item); + Py_DECREF(temp); + } + fertig:; + } + fz_always(gctx) { + pdf_drop_obj(gctx, ci); + } + fz_catch(gctx) { + Py_CLEAR(rc); + return NULL; + } + return rc; + } + + FITZEXCEPTION(addOCG, !result) + CLOSECHECK0(addOCG, """Add new optional content group.""") + PyObject * + addOCG(char *name, int config=-1, int on=1, PyObject *intent=NULL, const char *usage=NULL) + { + PyObject *xref = NULL; + pdf_obj *obj = NULL, *cfg = NULL; + pdf_obj *indocg = NULL; + fz_try(gctx) { + pdf_document *pdf = pdf_specifics(gctx, (fz_document *) self); + ASSERT_PDF(pdf); + + // ------------------------------ + // make the OCG + // ------------------------------ + pdf_obj *ocg = pdf_add_new_dict(gctx, pdf, 3); + pdf_dict_put(gctx, ocg, PDF_NAME(Type), PDF_NAME(OCG)); + pdf_dict_put_text_string(gctx, ocg, PDF_NAME(Name), name); + pdf_obj *intents = pdf_dict_put_array(gctx, ocg, PDF_NAME(Intent), 2); + if (!EXISTS(intent)) { + pdf_array_push(gctx, intents, PDF_NAME(View)); + } else if (!PyUnicode_Check(intent)) { + int i, n = PySequence_Size(intent); + for (i = 0; i < n; i++) { + PyObject *item = PySequence_ITEM(intent, i); + char *c = JM_Python_str_AsChar(item); + if (c) { + pdf_array_push(gctx, intents, pdf_new_name(gctx, c)); + JM_Python_str_DelForPy3(c); + } + Py_DECREF(item); + } + } else { + char *c = JM_Python_str_AsChar(intent); + if (c) { + pdf_array_push(gctx, intents, pdf_new_name(gctx, c)); + JM_Python_str_DelForPy3(c); + } + } + pdf_obj *use_for = pdf_dict_put_dict(gctx, ocg, PDF_NAME(Usage), 3); + pdf_obj *ci_name = pdf_new_name(gctx, "CreatorInfo"); + pdf_obj *cre_info = pdf_dict_put_dict(gctx, use_for, ci_name, 2); + pdf_dict_put_text_string(gctx, cre_info, PDF_NAME(Creator), "PyMuPDF"); + if (usage) { + pdf_dict_put_name(gctx, cre_info, PDF_NAME(Subtype), usage); + } else { + pdf_dict_put_name(gctx, cre_info, PDF_NAME(Subtype), "Artwork"); + } + indocg = pdf_add_object(gctx, pdf, ocg); + + // ------------------------------ + // Insert OCG in the right config + // ------------------------------ + pdf_obj *ocp = JM_ensure_ocproperties(gctx, pdf); + obj = pdf_dict_get(gctx, ocp, PDF_NAME(OCGs)); + pdf_array_push(gctx, obj, indocg); + + if (config > -1) { + obj = pdf_dict_get(gctx, ocp, PDF_NAME(Configs)); + if (!pdf_is_array(gctx, obj)) { + THROWMSG(gctx, "bad config number"); + } + cfg = pdf_array_get(gctx, obj, config); + if (!cfg) { + THROWMSG(gctx, "bad config number"); + } + } else { + cfg = pdf_dict_get(gctx, ocp, PDF_NAME(D)); + } + + obj = pdf_dict_get(gctx, cfg, PDF_NAME(Order)); + if (!obj) { + obj = pdf_dict_put_array(gctx, cfg, PDF_NAME(Order), 1); + } + pdf_array_push(gctx, obj, indocg); + if (on) { + obj = pdf_dict_get(gctx, cfg, PDF_NAME(ON)); + if (!obj) { + obj = pdf_dict_put_array(gctx, cfg, PDF_NAME(ON), 1); + } + } else { + obj = pdf_dict_get(gctx, cfg, PDF_NAME(OFF)); + if (!obj) { + obj = pdf_dict_put_array(gctx, cfg, PDF_NAME(OFF), 1); + } + } + pdf_array_push(gctx, obj, indocg); + pdf_read_ocg(gctx, pdf); + xref = Py_BuildValue("i", pdf_to_num(gctx, indocg)); + } + fz_always(gctx) { + pdf_drop_obj(gctx, indocg); + } + fz_catch(gctx) { + Py_CLEAR(xref); + return NULL; + } + return xref; + } + + + //------------------------------------------------------------------ // Initialize document: set outline and metadata properties - //--------------------------------------------------------------------- + //------------------------------------------------------------------ %pythoncode %{ def initData(self): if self.isEncrypted: @@ -3277,6 +3688,7 @@ if not self.isFormPDF: self.Graftmaps = {} self.ShownPages = {} + self.InsertedImages = {} self.stream = None self._reset_page_refs = DUMMY self.__swig_destroy__ = DUMMY @@ -3707,7 +4119,7 @@ struct Page { fz_var(annot); fz_try(gctx) { ASSERT_PDF(page); - if (!PySequence_Check(list)) THROWMSG("arg must be a sequence"); + if (!PySequence_Check(list)) THROWMSG(gctx, "arg must be a sequence"); pdf_page_transform(gctx, page, NULL, &ctm); inv_ctm = fz_invert_matrix(ctm); annot = pdf_create_annot(gctx, page, PDF_ANNOT_INK); @@ -3722,7 +4134,7 @@ struct Page { for (i = 0; i < n1; i++) { p = PySequence_ITEM(sublist, i); if (!PySequence_Check(p) || PySequence_Size(p) != 2) - THROWMSG("3rd level entries must be pairs of floats"); + THROWMSG(gctx, "3rd level entries must be pairs of floats"); point = fz_transform_point(JM_point_from_py(p), inv_ctm); Py_CLEAR(p); pdf_array_push_real(gctx, stroke, point.x); @@ -3772,7 +4184,7 @@ struct Page { ASSERT_PDF(page); fz_rect r = JM_rect_from_py(rect); if (fz_is_infinite_rect(r) || fz_is_empty_rect(r)) - THROWMSG("rect must be finite and not empty"); + THROWMSG(gctx, "rect must be finite and not empty"); if (INRANGE(stamp, 0, n-1)) name = stamp_id[stamp]; annot = pdf_create_annot(gctx, page, PDF_ANNOT_STAMP); @@ -3814,7 +4226,7 @@ struct Page { fz_try(gctx) { ASSERT_PDF(page); filebuf = JM_BufferFromBytes(gctx, buffer); - if (!filebuf) THROWMSG("bad type: 'buffer'"); + if (!filebuf) THROWMSG(gctx, "bad type: 'buffer'"); annot = pdf_create_annot(gctx, page, PDF_ANNOT_FILE_ATTACHMENT); r = pdf_annot_rect(gctx, annot); r = fz_make_rect(p.x, p.y, p.x + r.x1 - r.x0, p.y + r.y1 - r.y0); @@ -3908,7 +4320,7 @@ struct Page { fz_try(gctx) { fz_rect r = JM_rect_from_py(rect); if (fz_is_infinite_rect(r) || fz_is_empty_rect(r)) - THROWMSG("rect must be finite and not empty"); + THROWMSG(gctx, "rect must be finite and not empty"); annot = pdf_create_annot(gctx, page, annot_type); pdf_set_annot_rect(gctx, annot, r); JM_add_annot_id(gctx, annot, "fitzannot"); @@ -3933,13 +4345,13 @@ struct Page { pdf_annot *annot = NULL; fz_try(gctx) { Py_ssize_t i, n = PySequence_Size(points); - if (n < 2) THROWMSG("bad list of points"); + if (n < 2) THROWMSG(gctx, "bad list of points"); annot = pdf_create_annot(gctx, page, annot_type); for (i = 0; i < n; i++) { PyObject *p = PySequence_ITEM(points, i); if (PySequence_Size(p) != 2) { Py_DECREF(p); - THROWMSG("bad list of points"); + THROWMSG(gctx, "bad list of points"); } fz_point point = JM_point_from_py(p); Py_DECREF(p); @@ -3981,7 +4393,7 @@ struct Page { pdf_annot *annot = NULL; fz_try(gctx) { if (fz_is_infinite_rect(r) || fz_is_empty_rect(r)) - THROWMSG("rect must be finite and not empty"); + THROWMSG(gctx, "rect must be finite and not empty"); annot = pdf_create_annot(gctx, page, PDF_ANNOT_FREE_TEXT); pdf_set_annot_contents(gctx, annot, text); pdf_set_annot_rect(gctx, annot, r); @@ -4365,7 +4777,7 @@ struct Page { fz_var(annot); fz_try(gctx) { annot = JM_create_widget(gctx, pdf, page, field_type, field_name); - if (!annot) THROWMSG("could not create widget"); + if (!annot) THROWMSG(gctx, "could not create widget"); JM_add_annot_id(gctx, annot, "fitzwidget"); } fz_catch(gctx) { @@ -4639,7 +5051,7 @@ struct Page { fz_rect mediabox = JM_rect_from_py(rect); if (fz_is_empty_rect(mediabox) || fz_is_infinite_rect(mediabox)) { - THROWMSG("rect must be finite and not empty"); + THROWMSG(gctx, "rect must be finite and not empty"); } pdf_dict_put_rect(gctx, page->obj, PDF_NAME(MediaBox), mediabox); pdf_dict_put_rect(gctx, page->obj, PDF_NAME(CropBox), mediabox); @@ -4940,7 +5352,7 @@ except: txtpy = PySequence_ITEM(linklist, (Py_ssize_t) i); text = JM_Python_str_AsChar(txtpy); Py_CLEAR(txtpy); - if (!text) THROWMSG("bad linklist item"); + if (!text) THROWMSG(gctx, "bad linklist item"); annot = pdf_add_object_drop(gctx, page->doc, JM_pdf_obj_from_str(gctx, page->doc, text)); JM_Python_str_DelForPy3(text); @@ -5030,7 +5442,7 @@ except: // Show a PDF page //--------------------------------------------------------------------- FITZEXCEPTION(_showPDFpage, !result) - PyObject *_showPDFpage(struct Page *fz_srcpage, int overlay=1, PyObject *matrix=NULL, int xref=0, PyObject *clip = NULL, struct Graftmap *graftmap = NULL, char *_imgname = NULL) + PyObject *_showPDFpage(struct Page *fz_srcpage, int overlay=1, PyObject *matrix=NULL, int xref=0, int oc=0, PyObject *clip = NULL, struct Graftmap *graftmap = NULL, char *_imgname = NULL) { pdf_obj *xobj1, *xobj2, *resources; fz_buffer *res=NULL, *nres=NULL; @@ -5063,7 +5475,9 @@ except: fz_append_string(gctx, res, "/fullpage Do"); xobj2 = pdf_new_xobject(gctx, pdfout, cropbox, mat, subres, res); - + if (oc > 0) { + JM_add_oc_object(gctx, pdfout, pdf_resolve_indirect(gctx, xobj2), oc); + } pdf_drop_obj(gctx, subres); fz_drop_buffer(gctx, res); @@ -5075,8 +5489,7 @@ except: resources = pdf_dict_get_inheritable(gctx, tpageref, PDF_NAME(Resources)); subres = pdf_dict_get(gctx, resources, PDF_NAME(XObject)); if (!subres) { - subres = pdf_new_dict(gctx, pdfout, 10); - pdf_dict_putl(gctx, tpageref, subres, PDF_NAME(Resources), PDF_NAME(XObject), NULL); + subres = pdf_dict_put_dict(gctx, resources, PDF_NAME(XObject), 5); } pdf_dict_puts(gctx, subres, _imgname, xobj2); @@ -5102,7 +5515,7 @@ except: // insert an image //--------------------------------------------------------------------- FITZEXCEPTION(_insertImage, !result) - PyObject *_insertImage(const char *filename=NULL, struct Pixmap *pixmap=NULL, PyObject *stream=NULL, PyObject *imask=NULL, int overlay=1, PyObject *matrix=NULL, + PyObject *_insertImage(const char *filename=NULL, struct Pixmap *pixmap=NULL, PyObject *stream=NULL, PyObject *imask=NULL, int overlay=1, int oc=0, int xref = 0, PyObject *matrix=NULL, const char *_imgname=NULL, PyObject *_imgpointer=NULL) { pdf_page *page = pdf_page_from_fz_page(gctx, (fz_page *) $self); @@ -5115,10 +5528,14 @@ except: fz_buffer *nres = NULL, *imgbuf = NULL, *maskbuf = NULL; fz_matrix mat = JM_matrix_from_py(matrix); // pre-calculated fz_compressed_buffer *cbuf1 = NULL; + int img_xref = 0; const char *template = "\nq\n%g %g %g %g %g %g cm\n/%s Do\nQ\n"; fz_image *zimg = NULL, *image = NULL; fz_try(gctx) { + if (xref > 0) { + goto image_exists; + } //------------------------------------------------------------- // create the image //------------------------------------------------------------- @@ -5137,7 +5554,7 @@ except: fz_image_resolution(image, &xres, &yres); if (EXISTS(imask)) { cbuf1 = fz_compressed_image_buffer(gctx, image); - if (!cbuf1) THROWMSG("cannot mask uncompressed image"); + if (!cbuf1) THROWMSG(gctx, "cannot mask uncompressed image"); maskbuf = JM_BufferFromBytes(gctx, imask); mask = fz_new_image_from_buffer(gctx, maskbuf); zimg = fz_new_image_from_compressed_buffer(gctx, w, h, @@ -5159,6 +5576,9 @@ except: fz_drop_image(gctx, image); image = zimg; zimg = NULL; + } else { + fz_drop_pixmap(gctx, pix); + pix = NULL; } } } else { // pixmap specified @@ -5177,19 +5597,25 @@ except: //------------------------------------------------------------- // image created - now put it in the PDF //------------------------------------------------------------- + image_exists:; pdf = page->doc; // owning PDF // get /Resources, /XObject resources = pdf_dict_get_inheritable(gctx, page->obj, PDF_NAME(Resources)); xobject = pdf_dict_get(gctx, resources, PDF_NAME(XObject)); if (!xobject) { // has no XObject yet, create one - xobject = pdf_new_dict(gctx, pdf, 10); - pdf_dict_putl_drop(gctx, page->obj, xobject, PDF_NAME(Resources), PDF_NAME(XObject), NULL); + xobject = pdf_dict_put_dict(gctx, resources, PDF_NAME(XObject), 5); } - - ref = pdf_add_image(gctx, pdf, image); - pdf_dict_puts(gctx, xobject, _imgname, ref); // update XObject - + if (xref > 0) { + ref = pdf_new_indirect(gctx, page->doc, xref, 0); + img_xref = xref; + } else { + ref = pdf_add_image(gctx, pdf, image); + img_xref = pdf_to_num(gctx, ref); + } + if (oc) JM_add_oc_object(gctx, pdf, ref, oc); + + pdf_dict_puts_drop(gctx, xobject, _imgname, ref); // update XObject // make contents stream that invokes the image nres = fz_new_buffer(gctx, 50); fz_append_printf(gctx, nres, template, @@ -5213,7 +5639,7 @@ except: return NULL; } pdf->dirty = 1; - return_none; + return Py_BuildValue("i", img_xref); } //--------------------------------------------------------------------- @@ -5363,7 +5789,7 @@ def insertFont(self, fontname="helv", fontfile=None, fontbuffer=None, font = fz_new_font_from_file(gctx, NULL, fontfile, idx, 0); } else { res = JM_BufferFromBytes(gctx, fontbuffer); - if (!res) THROWMSG("need one of fontfile, fontbuffer"); + if (!res) THROWMSG(gctx, "need one of fontfile, fontbuffer"); font = fz_new_font_from_buffer(gctx, NULL, res, idx, 0); } @@ -5504,11 +5930,11 @@ def insertFont(self, fontname="helv", fontfile=None, fontbuffer=None, ASSERT_PDF(page); if (!INRANGE(xref, 1, pdf_xref_len(gctx, page->doc) - 1)) - THROWMSG("bad xref"); + THROWMSG(gctx, "bad xref"); contents = pdf_new_indirect(gctx, page->doc, xref, 0); if (!pdf_is_stream(gctx, contents)) - THROWMSG("xref is not a stream"); + THROWMSG(gctx, "no stream at xref"); pdf_dict_put_drop(gctx, page->obj, PDF_NAME(Contents), contents); } @@ -5711,7 +6137,7 @@ Pixmap(PDFdoc, xref) - from an image at xref in a PDF document. fz_pixmap *pm = NULL; fz_try(gctx) { if (!fz_pixmap_colorspace(gctx, (fz_pixmap *) spix)) - THROWMSG("cannot copy pixmap with NULL colorspace"); + THROWMSG(gctx, "cannot copy pixmap with NULL colorspace"); pm = fz_convert_pixmap(gctx, (fz_pixmap *) spix, (fz_colorspace *) cs, NULL, NULL, fz_default_color_params, 1); } fz_catch(gctx) { @@ -5753,10 +6179,10 @@ Pixmap(PDFdoc, xref) - from an image at xref in a PDF document. fz_separations *seps = NULL; fz_try(gctx) { if (!INRANGE(alpha, 0, 1)) - THROWMSG("bad alpha value"); + THROWMSG(gctx, "bad alpha value"); fz_colorspace *cs = fz_pixmap_colorspace(gctx, src_pix); if (!cs && !alpha) - THROWMSG("cannot drop alpha for 'NULL' colorspace"); + THROWMSG(gctx, "cannot drop alpha for 'NULL' colorspace"); n = fz_pixmap_colorants(gctx, src_pix); w = fz_pixmap_width(gctx, src_pix); h = fz_pixmap_height(gctx, src_pix); @@ -5803,9 +6229,9 @@ Pixmap(PDFdoc, xref) - from an image at xref in a PDF document. size_t size = 0; unsigned char *c = NULL; res = JM_BufferFromBytes(gctx, samples); - if (!res) THROWMSG("bad samples data"); + if (!res) THROWMSG(gctx, "bad samples data"); size = fz_buffer_storage(gctx, res, &c); - if (stride * h != size) THROWMSG("bad samples length"); + if (stride * h != size) THROWMSG(gctx, "bad samples length"); pm = fz_new_pixmap(gctx, (fz_colorspace *) cs, w, h, seps, alpha); memcpy(pm->samples, c, size); } @@ -5852,7 +6278,7 @@ Pixmap(PDFdoc, xref) - from an image at xref in a PDF document. fz_pixmap *pm = NULL; fz_try(gctx) { res = JM_BufferFromBytes(gctx, imagedata); - if (!res) THROWMSG("bad image data"); + if (!res) THROWMSG(gctx, "bad image data"); img = fz_new_image_from_buffer(gctx, res); pm = fz_get_pixmap_from_image(gctx, img, NULL, NULL, NULL, NULL); int xres, yres; @@ -5885,11 +6311,11 @@ Pixmap(PDFdoc, xref) - from an image at xref in a PDF document. ASSERT_PDF(pdf); int xreflen = pdf_xref_len(gctx, pdf); if (!INRANGE(xref, 1, xreflen-1)) - THROWMSG("bad xref"); + THROWMSG(gctx, "bad xref"); ref = pdf_new_indirect(gctx, pdf, xref, 0); type = pdf_dict_get(gctx, ref, PDF_NAME(Subtype)); if (!pdf_name_eq(gctx, type, PDF_NAME(Image))) - THROWMSG("not an image"); + THROWMSG(gctx, "not an image"); img = pdf_load_image(gctx, pdf, ref); pix = fz_get_pixmap_from_image(gctx, img, NULL, NULL, NULL, NULL); } @@ -5987,9 +6413,9 @@ if not self.colorspace or self.colorspace.n > 3: fz_try(gctx) { fz_pixmap *pm = (fz_pixmap *) $self, *src_pix = (fz_pixmap *) src; if (!fz_pixmap_colorspace(gctx, src_pix)) - THROWMSG("cannot copy pixmap with NULL colorspace"); + THROWMSG(gctx, "cannot copy pixmap with NULL colorspace"); if (pm->alpha != src_pix->alpha) - THROWMSG("source and target alpha must be equal"); + THROWMSG(gctx, "source and target alpha must be equal"); fz_copy_pixmap_rect(gctx, pm, src_pix, JM_irect_from_py(bbox), NULL); } fz_catch(gctx) { @@ -6010,7 +6436,7 @@ If omitted, set alphas to 255."""%} fz_buffer *res = NULL; fz_pixmap *pix = (fz_pixmap *) $self; fz_try(gctx) { - if (pix->alpha == 0) THROWMSG("pixmap has no alpha"); + if (pix->alpha == 0) THROWMSG(gctx, "pixmap has no alpha"); size_t n = fz_pixmap_colorants(gctx, pix); size_t w = fz_pixmap_width(gctx, pix); size_t h = fz_pixmap_height(gctx, pix); @@ -6022,9 +6448,9 @@ If omitted, set alphas to 255."""%} if (res) { data_len = fz_buffer_storage(gctx, res, &data); if (data && data_len < w * h) - THROWMSG("not enough alpha values"); + THROWMSG(gctx, "not enough alpha values"); } - else THROWMSG("bad type: 'alphavalues'"); + else THROWMSG(gctx, "bad type: 'alphavalues'"); } size_t i = 0, k = 0, j = 0; while (i < balen) { @@ -6273,7 +6699,7 @@ Last item is the alpha if Pixmap.alpha is true."""%} fz_try(gctx) { fz_pixmap *pm = (fz_pixmap *) $self; if (!INRANGE(x, 0, pm->w - 1) || !INRANGE(y, 0, pm->h - 1)) - THROWMSG("coordinates outside image"); + THROWMSG(gctx, "outside image"); int n = pm->n; int stride = fz_pixmap_stride(gctx, pm); int j, i = stride * y + n * x; @@ -6299,17 +6725,17 @@ Last item is the alpha if Pixmap.alpha is true."""%} fz_try(gctx) { fz_pixmap *pm = (fz_pixmap *) $self; if (!INRANGE(x, 0, pm->w - 1) || !INRANGE(y, 0, pm->h - 1)) - THROWMSG("outside image"); + THROWMSG(gctx, "outside image"); int n = pm->n; if (!PySequence_Check(color) || PySequence_Size(color) != n) - THROWMSG("bad color arg"); + THROWMSG(gctx, "bad color arg"); int i, j; unsigned char c[5]; for (j = 0; j < n; j++) { if (JM_INT_ITEM(color, j, &i) == 1) - THROWMSG("bad color sequence"); + THROWMSG(gctx, "bad color sequence"); if (!INRANGE(i, 0, 255)) - THROWMSG("bad color sequence"); + THROWMSG(gctx, "bad color sequence"); c[j] = (unsigned char) i; } int stride = fz_pixmap_stride(gctx, pm); @@ -6364,14 +6790,14 @@ Use pillowWrite to reflect this in output image."""%} fz_pixmap *pm = (fz_pixmap *) $self; Py_ssize_t j, n = (Py_ssize_t) pm->n; if (!PySequence_Check(color) || PySequence_Size(color) != n) - THROWMSG("bad color arg"); + THROWMSG(gctx, "bad color arg"); unsigned char c[5]; int i; for (j = 0; j < n; j++) { if (JM_INT_ITEM(color, j, &i) == 1) - THROWMSG("bad color component"); + THROWMSG(gctx, "bad color component"); if (!INRANGE(i, 0, 255)) - THROWMSG("bad color component"); + THROWMSG(gctx, "bad color component"); c[j] = (unsigned char) i; } i = JM_fill_pixmap_rect_with_color(gctx, pm, c, JM_irect_from_py(bbox)); @@ -6880,7 +7306,7 @@ struct Annot fz_try(gctx) { pdf_obj *ap = pdf_dict_getl(gctx, annot->obj, PDF_NAME(AP), PDF_NAME(N), NULL); - if (!ap) THROWMSG("annot has no appearance stream"); + if (!ap) THROWMSG(gctx, "annot has no appearance stream"); fz_matrix mat = JM_matrix_from_py(matrix); pdf_dict_put_matrix(gctx, ap, PDF_NAME(Matrix), mat); } @@ -6911,7 +7337,7 @@ struct Annot fz_try(gctx) { pdf_obj *ap = pdf_dict_getl(gctx, annot->obj, PDF_NAME(AP), PDF_NAME(N), NULL); - if (!ap) THROWMSG("annot has no appearance stream"); + if (!ap) THROWMSG(gctx, "annot has no appearance stream"); fz_rect rect = JM_rect_from_py(bbox); pdf_dict_put_rect(gctx, ap, PDF_NAME(BBox), rect); } @@ -6988,6 +7414,53 @@ struct Annot } + //--------------------------------------------------------------------- + // annotation set optional content + //--------------------------------------------------------------------- + FITZEXCEPTION(getOC, !result) + PARENTCHECK(getOC, """Get annotation optional content reference.""") + PyObject *getOC() + { + PyObject *oc = NULL; + fz_try(gctx) { + pdf_annot *annot = (pdf_annot *) $self; + pdf_obj *obj = pdf_dict_get(gctx, annot->obj, PDF_NAME(OC)); + if (!obj) { + oc = Py_BuildValue("i", 0); + } else { + int n = pdf_to_num(gctx, obj); + oc = Py_BuildValue("i", n); + } + } + fz_catch(gctx) { + return NULL; + } + return oc;; + } + + + //--------------------------------------------------------------------- + // annotation set optional content + //--------------------------------------------------------------------- + FITZEXCEPTION(setOC, !result) + PARENTCHECK(setOC, """Set annotation optional content reference.""") + PyObject *setOC(int oc=0) + { + fz_try(gctx) { + pdf_annot *annot = (pdf_annot *) $self; + if (!oc) { + pdf_dict_del(gctx, annot->obj, PDF_NAME(OC)); + } else { + JM_add_oc_object(gctx, pdf_get_bound_document(gctx, annot->obj), annot->obj, oc); + } + } + fz_catch(gctx) { + return NULL; + } + return_none; + } + + %pythoncode%{@property%} %pythonprepend language %{"""Annotation language."""%} PyObject *language() @@ -7059,11 +7532,11 @@ struct Annot pdf_annot *annot = (pdf_annot *) $self; pdf_obj *apobj = pdf_dict_getl(gctx, annot->obj, PDF_NAME(AP), PDF_NAME(N), NULL); - if (!apobj) THROWMSG("annot has no /AP/N object"); + if (!apobj) THROWMSG(gctx, "annot has no /AP/N object"); if (!pdf_is_stream(gctx, apobj)) - THROWMSG("/AP/N object is no stream"); + THROWMSG(gctx, "/AP/N object is no stream"); res = JM_BufferFromBytes(gctx, ap); - if (!res) THROWMSG("invalid /AP stream argument"); + if (!res) THROWMSG(gctx, "invalid /AP stream argument"); JM_update_stream(gctx, annot->page->doc, apobj, res, 1); if (rect) { fz_rect bbox = pdf_dict_get_rect(gctx, annot->obj, PDF_NAME(Rect)); @@ -7366,7 +7839,7 @@ struct Annot pdf_obj *ap = pdf_dict_getl(gctx, annot->obj, PDF_NAME(AP), PDF_NAME(N), NULL); if (!ap) // should never happen - THROWMSG("annot has no /AP object"); + THROWMSG(gctx, "annot has no /AP object"); pdf_obj *resources = pdf_dict_get(gctx, ap, PDF_NAME(Resources)); if (!resources) { // no Resources yet: make one @@ -7837,10 +8310,10 @@ struct Annot fz_try(gctx) { int type = (int) pdf_annot_type(gctx, annot); if (type != PDF_ANNOT_FILE_ATTACHMENT) - THROWMSG("bad annot type"); + THROWMSG(gctx, "bad annot type"); stream = pdf_dict_getl(gctx, annot->obj, PDF_NAME(FS), PDF_NAME(EF), PDF_NAME(F), NULL); - if (!stream) THROWMSG("bad PDF: file entry not found"); + if (!stream) THROWMSG(gctx, "bad PDF: file entry not found"); } fz_catch(gctx) { return NULL; @@ -7888,10 +8361,10 @@ struct Annot fz_try(gctx) { int type = (int) pdf_annot_type(gctx, annot); if (type != PDF_ANNOT_FILE_ATTACHMENT) - THROWMSG("bad annot type"); + THROWMSG(gctx, "bad annot type"); stream = pdf_dict_getl(gctx, annot->obj, PDF_NAME(FS), PDF_NAME(EF), PDF_NAME(F), NULL); - if (!stream) THROWMSG("bad PDF: file entry not found"); + if (!stream) THROWMSG(gctx, "bad PDF: file entry not found"); buf = pdf_load_stream(gctx, stream); res = JM_BinFromBuffer(gctx, buf); } @@ -7921,9 +8394,9 @@ struct Annot int type = (int) pdf_annot_type(gctx, annot); pdf_obj *sound = pdf_dict_get(gctx, annot->obj, PDF_NAME(Sound)); if (type != PDF_ANNOT_SOUND || !sound) - THROWMSG("bad annot type"); + THROWMSG(gctx, "bad annot type"); if (pdf_dict_get(gctx, sound, PDF_NAME(F))) { - THROWMSG("unsupported sound stream"); + THROWMSG(gctx, "unsupported sound stream"); } res = PyDict_New(); obj = pdf_dict_get(gctx, sound, PDF_NAME(R)); @@ -7985,17 +8458,17 @@ struct Annot pdf = annot->page->doc; // the owning PDF int type = (int) pdf_annot_type(gctx, annot); if (type != PDF_ANNOT_FILE_ATTACHMENT) - THROWMSG("bad annot type"); + THROWMSG(gctx, "bad annot type"); stream = pdf_dict_getl(gctx, annot->obj, PDF_NAME(FS), PDF_NAME(EF), PDF_NAME(F), NULL); // the object for file content - if (!stream) THROWMSG("bad PDF: no /EF object"); + if (!stream) THROWMSG(gctx, "bad PDF: no /EF object"); fs = pdf_dict_get(gctx, annot->obj, PDF_NAME(FS)); // file content given res = JM_BufferFromBytes(gctx, buffer); - if (buffer && !res) THROWMSG("bad type: 'buffer'"); + if (buffer && !res) THROWMSG(gctx, "bad type: 'buffer'"); if (res) { JM_update_stream(gctx, pdf, stream, res, 1); // adjust /DL and /Size parameters @@ -9553,7 +10026,7 @@ struct Tools if (FZ_ENABLE_ICC) fz_enable_icc(gctx); else - THROWMSG("MuPDF generated without ICC suppot."); + THROWMSG(gctx, "MuPDF generated without ICC suppot."); } else if (FZ_ENABLE_ICC) { fz_disable_icc(gctx); } diff --git a/fitz/helper-fields.i b/fitz/helper-fields.i index 3a68ec40f..f0c567f55 100644 --- a/fitz/helper-fields.i +++ b/fitz/helper-fields.i @@ -745,9 +745,6 @@ void JM_set_widget_properties(fz_context *ctx, pdf_annot *annot, PyObject *Widge Py_CLEAR(value); // field value ------------------------------------------------------------ - // MuPDF function "pdf_set_field_value" always sets strings. For button - // fields this may lead to an unrecognized state for some PDF viewers. - //------------------------------------------------------------------------- value = GETATTR("field_value"); char *text = NULL; switch(field_type) @@ -755,8 +752,10 @@ void JM_set_widget_properties(fz_context *ctx, pdf_annot *annot, PyObject *Widge case PDF_WIDGET_TYPE_CHECKBOX: case PDF_WIDGET_TYPE_RADIOBUTTON: if (PyObject_RichCompareBool(value, Py_True, Py_EQ)) { - result = pdf_set_field_value(ctx, pdf, annot->obj, "Yes", 1); - pdf_dict_put_name(ctx, annot->obj, PDF_NAME(V), "Yes"); + pdf_obj *onstate = pdf_button_field_on_state(ctx, annot->obj); + const char *on = pdf_to_name(ctx, onstate); + result = pdf_set_field_value(ctx, pdf, annot->obj, on, 1); + pdf_dict_put_name(ctx, annot->obj, PDF_NAME(V), on); } else { result = pdf_set_field_value(ctx, pdf, annot->obj, "Off", 1); pdf_dict_put(ctx, annot->obj, PDF_NAME(V), PDF_NAME(Off)); diff --git a/fitz/helper-pdfinfo.i b/fitz/helper-pdfinfo.i index 1f3c2d658..c04b06b1d 100644 --- a/fitz/helper-pdfinfo.i +++ b/fitz/helper-pdfinfo.i @@ -17,6 +17,248 @@ void JM_ensure_identity(fz_context *ctx, pdf_document *pdf) } +//---------------------------------------------------------------------------- +// Ensure OCProperties, return /OCProperties key +//---------------------------------------------------------------------------- +pdf_obj * +JM_ensure_ocproperties(fz_context *ctx, pdf_document *pdf) +{ + pdf_obj *D, *ocp; + fz_try(ctx) { + ocp = pdf_dict_get(ctx, pdf_dict_get(gctx, pdf_trailer(ctx, pdf), PDF_NAME(Root)), PDF_NAME(OCProperties)); + if (ocp) goto finished; + pdf_obj *root = pdf_dict_get(ctx, pdf_trailer(ctx, pdf), PDF_NAME(Root)); + ocp = pdf_dict_put_dict(ctx, root, PDF_NAME(OCProperties), 2); + pdf_dict_put_array(ctx, ocp, PDF_NAME(OCGs), 0); + D = pdf_dict_put_dict(ctx, ocp, PDF_NAME(D), 5); + pdf_dict_put_array(ctx, D, PDF_NAME(ON), 0); + pdf_dict_put_array(ctx, D, PDF_NAME(OFF), 0); + pdf_dict_put_array(ctx, D, PDF_NAME(Order), 0); + pdf_dict_put_array(ctx, D, PDF_NAME(RBGroups), 0); + finished:; + } + fz_catch(ctx) { + fz_rethrow(ctx); + } + return ocp; +} + + +//---------------------------------------------------------------------------- +// Add OC configuration to the PDF catalog +//---------------------------------------------------------------------------- +void +JM_add_layer_config(fz_context *ctx, pdf_document *pdf, char *name, char *creator, PyObject *ON) +{ + pdf_obj *D, *ocp, *configs; + fz_try(ctx) { + ocp = JM_ensure_ocproperties(ctx, pdf); + configs = pdf_dict_get(ctx, ocp, PDF_NAME(Configs)); + if (!pdf_is_array(ctx, configs)) { + configs = pdf_dict_put_array(ctx,ocp, PDF_NAME(Configs), 1); + } + D = pdf_new_dict(ctx, pdf, 5); + pdf_dict_put_text_string(ctx, D, PDF_NAME(Name), name); + if (creator) { + pdf_dict_put_text_string(ctx, D, PDF_NAME(Creator), creator); + } + pdf_dict_put(ctx, D, PDF_NAME(BaseState), PDF_NAME(OFF)); + pdf_obj *onarray = pdf_dict_put_array(ctx, D, PDF_NAME(ON), 5); + if (!EXISTS(ON) || !PySequence_Check(ON) || !PySequence_Size(ON)) { + ; + } else { + pdf_obj *ocgs = pdf_dict_get(ctx, ocp, PDF_NAME(OCGs)); + int i, n = PySequence_Size(ON); + for (i = 0; i < n; i++) { + int xref = 0; + if (JM_INT_ITEM(ON, (Py_ssize_t) i, &xref) == 1) continue; + pdf_obj *ind = pdf_new_indirect(ctx, pdf, xref, 0); + if (pdf_array_contains(ctx, ocgs, ind)) { + pdf_array_push_drop(ctx, onarray, ind); + } else { + pdf_drop_obj(ctx, ind); + } + } + } + pdf_array_push_drop(ctx, configs, D); + } + fz_catch(ctx) { + fz_rethrow(ctx); + } +} + + +//---------------------------------------------------------------------------- +// Get OCG arrays from OC configuration +// Returns dict {"basestate":name, "on":list, "off":list, "rbg":list} +//---------------------------------------------------------------------------- +static PyObject * +JM_get_ocg_arrays_imp(fz_context *ctx, pdf_obj *arr) +{ + int i, n; + PyObject *list = PyList_New(0), *item = NULL; + pdf_obj *obj = NULL; + if (pdf_is_array(ctx, arr)) { + n = pdf_array_len(ctx, arr); + for (i = 0; i < n; i++) { + obj = pdf_array_get(ctx, arr, i); + item = Py_BuildValue("i", pdf_to_num(ctx, obj)); + if (!PySequence_Contains(list, item)) { + LIST_APPEND_DROP(list, item); + } else { + Py_DECREF(item); + } + } + } + return list; +} + +PyObject * +JM_get_ocg_arrays(fz_context *ctx, pdf_obj *conf) +{ + PyObject *rc = PyDict_New(), *list = NULL, *list1 = NULL; + int i, n; + pdf_obj *arr = NULL, *obj = NULL; + fz_try(ctx) { + arr = pdf_dict_get(ctx, conf, PDF_NAME(ON)); + list = JM_get_ocg_arrays_imp(ctx, arr); + if (PySequence_Size(list)) { + PyDict_SetItemString(rc, "on", list); + } + Py_DECREF(list); + arr = pdf_dict_get(ctx, conf, PDF_NAME(OFF)); + list = JM_get_ocg_arrays_imp(ctx, arr); + if (PySequence_Size(list)) { + PyDict_SetItemString(rc, "off", list); + } + Py_DECREF(list); + list = PyList_New(0); + arr = pdf_dict_get(ctx, conf, PDF_NAME(RBGroups)); + if (pdf_is_array(ctx, arr)) { + n = pdf_array_len(ctx, arr); + for (i = 0; i < n; i++) { + obj = pdf_array_get(ctx, arr, i); + list1 = JM_get_ocg_arrays_imp(ctx, obj); + LIST_APPEND_DROP(list, list1); + } + } + if (PySequence_Size(list)) { + PyDict_SetItemString(rc, "rbgroups", list); + } + Py_DECREF(list); + obj = pdf_dict_get(ctx, conf, PDF_NAME(BaseState)); + + if (obj) { + PyObject *state = NULL; + state = Py_BuildValue("s", pdf_to_name(ctx, obj)); + PyDict_SetItemString(rc, "basestate", state); + Py_DECREF(state); + } + } + fz_always(ctx) { + } + fz_catch(ctx) { + Py_CLEAR(rc); + fz_rethrow(ctx); + } + return rc; +} + + +//---------------------------------------------------------------------------- +// Set OCG arrays from dict of Python lists +// Works with dict like {"basestate":name, "on":list, "off":list, "rbg":list} +//---------------------------------------------------------------------------- +static void +JM_set_ocg_arrays_imp(fz_context *ctx, pdf_obj *arr, PyObject *list) +{ + int i, n = PySequence_Size(list); + pdf_obj *obj = NULL; + pdf_document *pdf = pdf_get_bound_document(ctx, arr); + for (i = 0; i < n; i++) { + int xref = 0; + if (JM_INT_ITEM(list, i, &xref) == 1) continue; + obj = pdf_new_indirect(ctx, pdf, xref, 0); + pdf_array_push_drop(ctx, arr, obj); + } + return; +} + +static void +JM_set_ocg_arrays(fz_context *ctx, pdf_obj *conf, const char *basestate, + PyObject *on, PyObject *off, PyObject *rbgroups) +{ + int i, n; + pdf_obj *arr = NULL, *obj = NULL, *indobj = NULL; + fz_try(ctx) { + if (basestate) { + pdf_dict_put_name(ctx, conf, PDF_NAME(BaseState), basestate); + } + + if (on != Py_None) { + pdf_dict_del(ctx, conf, PDF_NAME(ON)); + if (PySequence_Size(on)) { + arr = pdf_dict_put_array(ctx, conf, PDF_NAME(ON), 1); + JM_set_ocg_arrays_imp(ctx, arr, on); + } + } + + if (off != Py_None) { + pdf_dict_del(ctx, conf, PDF_NAME(OFF)); + if (PySequence_Size(off)) { + arr = pdf_dict_put_array(ctx, conf, PDF_NAME(OFF), 1); + JM_set_ocg_arrays_imp(ctx, arr, off); + } + } + + if (rbgroups != Py_None) { + pdf_dict_del(ctx, conf, PDF_NAME(RBGroups)); + if (PySequence_Size(rbgroups)) { + arr = pdf_dict_put_array(ctx, conf, PDF_NAME(RBGroups), 1); + n = PySequence_Size(rbgroups); + for (i = 0; i < n; i++) { + PyObject *item0 = PySequence_ITEM(rbgroups, i); + obj = pdf_array_push_array(ctx, arr, 1); + JM_set_ocg_arrays_imp(ctx, obj, item0); + Py_DECREF(item0); + } + } + } + } + fz_catch(ctx) { + fz_rethrow(ctx); + } + return; +} + + +//---------------------------------------------------------------------------- +// Add OC object reference to a dictionary +//---------------------------------------------------------------------------- +void +JM_add_oc_object(fz_context *ctx, pdf_document *pdf, pdf_obj *ref, int xref) +{ + pdf_obj *indobj = NULL; + fz_try(ctx) { + indobj = pdf_new_indirect(ctx, pdf, xref, 0); + if (!pdf_is_dict(ctx, indobj)) THROWMSG(ctx, "bad 'oc' reference"); + pdf_obj *type = pdf_dict_get(ctx, indobj, PDF_NAME(Type)); + if (pdf_objcmp(ctx, type, PDF_NAME(OCG)) == 0 || + pdf_objcmp(ctx, type, PDF_NAME(OCMD)) == 0) { + pdf_dict_put(ctx, ref, PDF_NAME(OC), indobj); + } else { + THROWMSG(ctx, "bad 'oc' type"); + } + } + fz_always(ctx) { + pdf_drop_obj(ctx, indobj); + } + fz_catch(ctx) { + fz_rethrow(ctx); + } +} + + //----------------------------------------------------------------------------- // Store info of a font in Python list //----------------------------------------------------------------------------- diff --git a/fitz/helper-select.i b/fitz/helper-select.i index d403a306b..083491da9 100644 --- a/fitz/helper-select.i +++ b/fitz/helper-select.i @@ -224,7 +224,7 @@ void retainpages(fz_context *ctx, globals *glo, PyObject *liste) for (page = 0; page < argc; page++) { i = (int) PyInt_AsLong(PySequence_ITEM(liste, page)); if (i < 0 || i >= pagecount) - THROWMSG("invalid page number(s)"); + THROWMSG(ctx, "invalid page number(s)"); retainpage(ctx, doc, pages, kids, i); } } diff --git a/fitz/helper-stext.i b/fitz/helper-stext.i index abdc7ce02..0d0d71b17 100644 --- a/fitz/helper-stext.i +++ b/fitz/helper-stext.i @@ -309,8 +309,7 @@ JM_print_stext_page_as_text(fz_context *ctx, fz_output *out, fz_stext_page *page fz_stext_line *line; fz_stext_char *ch; fz_rect rect = page->mediabox; - char utf[10]; - int i, n, last_char = 0; + int last_char = 0; for (block = page->first_block; block; block = block->next) { @@ -325,11 +324,9 @@ JM_print_stext_page_as_text(fz_context *ctx, fz_output *out, fz_stext_page *page { if (!fz_contains_rect(rect, JM_char_bbox(ch))) continue; last_char = ch->c; - n = fz_runetochar(utf, ch->c); - for (i = 0; i < n; i++) - fz_write_byte(ctx, out, utf[i]); + fz_write_rune(ctx, out, ch->c); } - if (last_char != 10) fz_write_string(ctx, out, "\n"); + if (last_char != 10 && last_char) fz_write_string(ctx, out, "\n"); } } } @@ -804,7 +801,7 @@ fz_font *JM_get_font(fz_context *ctx, goto fertig; fertig:; - if (!font) THROWMSG("could not find a matching font"); + if (!font) THROWMSG(ctx, "could not create font"); } fz_always(ctx) { fz_drop_buffer(ctx, res); diff --git a/fitz/utils.py b/fitz/utils.py index 0a25d2f20..386adf376 100644 --- a/fitz/utils.py +++ b/fitz/utils.py @@ -69,6 +69,7 @@ def showPDFpage( overlay=True, keep_proportion=True, rotate=0, + oc=0, reuse_xref=0, clip=None, ): @@ -178,6 +179,7 @@ def calc_matrix(sr, tr, keep=True, rotate=0): overlay=overlay, matrix=matrix, xref=xref, + oc=oc, clip=src_rect, graftmap=gmap, _imgname=_imgname, @@ -195,6 +197,7 @@ def insertImage( stream=None, mask=None, rotate=0, + oc=0, keep_proportion=True, overlay=True, ): @@ -209,10 +212,16 @@ def insertImage( stream: (bytes) an image in memory mask: (bytes) enforce this image mask rotate: (int) degrees (int multiple of 90) + oc: (int) xref of an optional content object keep_proportion: (bool) whether to maintain aspect ratio overlay: (bool) put in foreground """ + def calc_hash(stream): + m = hashlib.sha1() + m.update(stream) + return m.digest() + def calc_matrix(fw, fh, tr, rotate=0): """Calculate transformation matrix for image insertion. @@ -309,6 +318,7 @@ def calc_matrix(fw, fh, tr, rotate=0): if pixmap: # this is the easy case w = pixmap.width h = pixmap.height + digest = calc_hash(pixmap.samples) elif stream: # use tool to access the information # we also pass through the generated fz_image address @@ -316,11 +326,13 @@ def calc_matrix(fw, fh, tr, rotate=0): stream = stream.getvalue() img_prof = TOOLS.image_profile(stream, keep_image=True) w, h = img_prof["width"], img_prof["height"] + digest = calc_hash(stream) stream = None # make sure this arg is NOT used _imgpointer = img_prof["image"] # pointer to fz_image else: # worst case: must read the file stream = open(filename, "rb").read() + digest = calc_hash(stream) img_prof = TOOLS.image_profile(stream, keep_image=True) w, h = img_prof["width"], img_prof["height"] stream = None # make sure this arg is NOT used @@ -346,16 +358,22 @@ def calc_matrix(fw, fh, tr, rotate=0): i += 1 _imgname = n + str(i) # try new name - page._insertImage( + xref = doc.InsertedImages.get(digest, 0) # reuse any previously inserted image + + xref = page._insertImage( filename=filename, # image in file pixmap=pixmap, # image in pixmap stream=stream, # image in memory imask=mask, matrix=matrix, # generated matrix overlay=overlay, + oc=oc, # optional content object + xref=xref, _imgname=_imgname, # generated PDF resource name _imgpointer=_imgpointer, # address of fz_image ) + if xref > 0: + doc.InsertedImages[digest] = xref def searchFor(page, text, hit_max=16, quads=False, clip=None, flags=None): diff --git a/fitz/version.i b/fitz/version.i index 4d081c6c0..fb5e38ccc 100644 --- a/fitz/version.i +++ b/fitz/version.i @@ -1,6 +1,6 @@ %pythoncode %{ VersionFitz = "1.18.0" -VersionBind = "1.18.2" -VersionDate = "2020-10-23 09:17:55" -version = (VersionBind, VersionFitz, "20201023091755") +VersionBind = "1.18.3" +VersionDate = "2020-11-09 07:36:17" +version = (VersionBind, VersionFitz, "20201109073617") %} \ No newline at end of file diff --git a/setup.py b/setup.py index 0ccec0698..2180aca45 100644 --- a/setup.py +++ b/setup.py @@ -130,7 +130,7 @@ def load_libraries(): setup( name="PyMuPDF", - version="1.18.0", + version="1.18.3", description="Python bindings for the PDF rendering library MuPDF", long_description=long_desc, classifiers=classifier,