diff --git a/docs/annot.rst b/docs/annot.rst index 48ba7cb4d..319a143be 100644 --- a/docs/annot.rst +++ b/docs/annot.rst @@ -236,7 +236,7 @@ There is a parent-child relationship between an annotation and its page. If the :arg str name: the new name. - .. caution:: If you set the name of a 'Stamp' annotation, then this will **not change** the rectangle, nor will the text be layouted in any way. If you choose a standard text from :ref:`StampIcons` (the **exact** name piece after `"STAMP_"`), you should receive the original layout. An **arbitrary text** will not be changed to upper case, but be written in font "Times-Bold" as is, horizontally centered in **one line** and be shortened to fit. To get your text fully displayed, its length using fontsize 20 must not exceed 190 pixels. So please make sure that the following inequality is true: `fitz.get_text_length(text, fontname="tibo", fontsize=20) <= 190`. + .. caution:: If you set the name of a 'Stamp' annotation, then this will **not change** the rectangle, nor will the text be layouted in any way. If you choose a standard text from :ref:`StampIcons` (the **exact** name piece after `"STAMP_"`), you should receive the original layout. An **arbitrary text** will not be changed to upper case, but be written in font "Times-Bold" as is, horizontally centered in **one line** and be shortened to fit. To get your text fully displayed, its length using :data:`fontsize` 20 must not exceed 190 pixels. So please make sure that the following inequality is true: `fitz.get_text_length(text, fontname="tibo", fontsize=20) <= 190`. .. method:: set_rect(rect) @@ -328,7 +328,7 @@ There is a parent-child relationship between an annotation and its page. If the :arg float opacity: *(new in v1.16.14)* **valid for all annotation types:** change or set the annotation's transparency. Valid values are *0 <= opacity < 1*. :arg str blend_mode: *(new in v1.16.14)* **valid for all annotation types:** change or set the annotation's blend mode. For valid values see :ref:`BlendModes`. - :arg float fontsize: change font size of the text. 'FreeText' annotations only. + :arg float fontsize: change :data:`fontsize` of the text. 'FreeText' annotations only. :arg sequence,float text_color: change the text color. 'FreeText' annotations only. :arg sequence,float border_color: change the border color. 'FreeText' annotations only. :arg sequence,float fill_color: the fill color. diff --git a/docs/document.rst b/docs/document.rst index 1f8801e49..ac0fd6881 100644 --- a/docs/document.rst +++ b/docs/document.rst @@ -188,7 +188,7 @@ For details on **embedded files** refer to Appendix 3. :arg float height: may used together with *width* as an alternative to *rect* to specify layout information. - :arg float fontsize: the default fontsize for reflowable document types. This parameter is ignored if none of the parameters *rect* or *width* and *height* are specified. Will be used to calculate the page layout. + :arg float fontsize: the default :data:`fontsize` for reflowable document types. This parameter is ignored if none of the parameters *rect* or *width* and *height* are specified. Will be used to calculate the page layout. :raises TypeError: if the *type* of any parameter does not conform. :raises FileNotFoundError: if the file / path cannot be found. Re-implemented as subclass of `RuntimeError`. diff --git a/docs/font.rst b/docs/font.rst index 355db87d7..1c1fd7aea 100644 --- a/docs/font.rst +++ b/docs/font.rst @@ -213,7 +213,7 @@ A Font object also contains useful general information, like the font bbox, the .. method:: glyph_bbox(chr, language=None, script=0) - The glyph rectangle relative to fontsize 1. + The glyph rectangle relative to :data:`fontsize` 1. :arg int chr: *ord()* of the character. @@ -241,7 +241,7 @@ A Font object also contains useful general information, like the font bbox, the :arg str text: a text string, UTF-8 encoded. - :arg float fontsize: the fontsize. + :arg float fontsize: the :data:`fontsize`. :rtype: float @@ -265,7 +265,7 @@ A Font object also contains useful general information, like the font bbox, the :arg str text: a text string, UTF-8 encoded. - :arg float fontsize: the fontsize. + :arg float fontsize: the :data:`fontsize`. :rtype: tuple diff --git a/docs/functions.rst b/docs/functions.rst index a7bbc156e..42796b8e2 100644 --- a/docs/functions.rst +++ b/docs/functions.rst @@ -34,7 +34,7 @@ Yet others are handy, general-purpose utilities. :meth:`EMPTY_RECT` return the (standard) empty / invalid rectangle :meth:`get_pdf_now` return the current timestamp in PDF format :meth:`get_pdf_str` return PDF-compatible string -:meth:`get_text_length` return string length for a given font & fontsize +:meth:`get_text_length` return string length for a given font & :data:`fontsize` :meth:`glyph_name_to_unicode` return unicode from a glyph name :meth:`image_profile` return a dictionary of basic image properties :meth:`INFINITE_IRECT` return the (only existing) infinite rectangle @@ -361,11 +361,11 @@ Yet others are handy, general-purpose utilities. * New in version 1.14.7 - Calculate the length of text on output with a given **builtin** font, fontsize and encoding. + Calculate the length of text on output with a given **builtin** font, :data:`fontsize` and encoding. :arg str text: the text string. :arg str fontname: the fontname. Must be one of either the :ref:`Base-14-Fonts` or the CJK fonts, identified by their "reserved" fontnames (see table in :meth:`Page.insert_font`). - :arg float fontsize: the fontsize. + :arg float fontsize: the :data:`fontsize`. :arg int encoding: the encoding to use. Besides 0 = Latin, 1 = Greek and 2 = Cyrillic (Russian) are available. Relevant for Base-14 fonts "Helvetica", "Courier" and "Times" and their variants only. Make sure to use the same value as in the corresponding text insertion. :rtype: float :returns: the length in points the string will have (e.g. when used in :meth:`Page.insert_text`). @@ -568,7 +568,7 @@ Yet others are handy, general-purpose utilities. - 1: Stroked text -- equivalent to `1 Tr`, only the character borders are shown. - 3: Ignored text -- equivalent to `3 Tr` (hidden text). - 3. Line width in this context is important only for processing `span["type"] != 0`: it determines the thickness of the character's border line. This value may not be provided at all with the text data. In this case, a value of 5% of the fontsize (`span["size"] * 0,05`) is generated. Often, an "artificial" bold text in PDF is created by `2 Tr`. There is no equivalent span type for this case. Instead, respective text is represented by two consecutive spans -- which are identical in every aspect, except for their types, which are 0, resp 1. It is your responsibility to handle this type of situation - in :meth:`Page.get_text`, MuPDF is doing this for you. + 3. Line width in this context is important only for processing `span["type"] != 0`: it determines the thickness of the character's border line. This value may not be provided at all with the text data. In this case, a value of 5% of the :data:`fontsize` (`span["size"] * 0,05`) is generated. Often, an "artificial" bold text in PDF is created by `2 Tr`. There is no equivalent span type for this case. Instead, respective text is represented by two consecutive spans -- which are identical in every aspect, except for their types, which are 0, resp 1. It is your responsibility to handle this type of situation - in :meth:`Page.get_text`, MuPDF is doing this for you. 4. For data compactness, the character's unicode is provided here. Use built-in function `chr()` for the character itself. 5. The alpha / opacity value of the span's text, `0 <= opacity <= 1`, 0 is invisible text, 1 (100%) is intransparent. Depending on `span["type"]`, interpret this value as *fill* opacity or, resp. *stroke* opacity. 6. *(Changed in v1.19.0)* This value is equal or close to `char["bbox"]` of "rawdict". In particular, the bbox **height** value is always computed as if **"small glyph heights"** had been requested. @@ -703,7 +703,7 @@ Yet others are handy, general-purpose utilities. :arg int limit: limits the number of returned entries. The default of 256 is enforced for all fonts that only support 1-byte characters, so-called "simple fonts" (checked by this method). All :ref:`Base-14-Fonts` are simple fonts. :rtype: list - :returns: a list of *limit* tuples. Each character *c* has an entry *(g, w)* in this list with an index of *ord(c)*. Entry *g* (integer) of the tuple is the glyph id of the character, and float *w* is its normalized width. The actual width for some fontsize can be calculated as *w * fontsize*. For simple fonts, the *g* entry can always be safely ignored. In all other cases *g* is the basis for graphically representing *c*. + :returns: a list of *limit* tuples. Each character *c* has an entry *(g, w)* in this list with an index of *ord(c)*. Entry *g* (integer) of the tuple is the glyph id of the character, and float *w* is its normalized width. The actual width for some :data:`fontsize` can be calculated as *w * fontsize*. For simple fonts, the *g* entry can always be safely ignored. In all other cases *g* is the basis for graphically representing *c*. This function calculates the pixel width of a string called *text*:: diff --git a/docs/glossary.rst b/docs/glossary.rst index 46d8b6030..c1d1c2c97 100644 --- a/docs/glossary.rst +++ b/docs/glossary.rst @@ -138,6 +138,11 @@ Glossary Abbreviation for cross-reference number: this is an integer unique identification for objects in a PDF. There exists a cross-reference table (which may physically consist of several separate segments) in each PDF, which stores the relative position of each object for quick lookup. The cross-reference table is one entry longer than the number of existing object: item zero is reserved and must not be used in any way. Many PyMuPDF classes have an *xref* attribute (which is zero for non-PDFs), and one can find out the total number of objects in a PDF via :meth:`Document.xref_length` *- 1*. + +.. data:: fontsize + + When referring to font size this metric is measured in points where 1 inch = 72 points. + .. data:: resolution Images and :ref:`Pixmap` objects may contain resolution information provided as "dots per inch", dpi, in each direction (horizontal and vertical). When MuPDF reads an image from a file or from a PDF object, it will parse this information and put it in :attr:`Pixmap.xres`, :attr:`Pixmap.yres`, respectively. If it finds no meaningful information in the input (like non-positive values or values exceeding 4800), it will use "sane" defaults instead. The usual default value is 96, but it may also be 72 in some cases (e.g. for JPX images). diff --git a/docs/module.rst b/docs/module.rst index b14b60dd5..a4b227319 100644 --- a/docs/module.rst +++ b/docs/module.rst @@ -429,7 +429,7 @@ Extract text from arbitrary :ref:`supported documents` to After each page of the output file, a formfeed character, `hex(12)` is written -- even if the input page has no text at all. This behavior can be controlled via options. -.. note:: For "layout" mode, **only horizontal, left-to-right, top-to bottom** text is supported, other text is ignored. In this mode, text is also ignored, if its fontsize is too small. +.. note:: For "layout" mode, **only horizontal, left-to-right, top-to bottom** text is supported, other text is ignored. In this mode, text is also ignored, if its :data:`fontsize` is too small. "Simple" and "blocks" mode in contrast output **all text** for any text size or orientation. @@ -459,7 +459,7 @@ Command:: -skip-empty suppress pages with no text (default False) -output OUTPUT store text in this file (default inputfilename.txt) -grid GRID merge lines if closer than this (default 2) - -fontsize FONTSIZE only include text with a larger fontsize (default 3) + -fontsize FONTSIZE only include text with a larger :data:`fontsize` (default 3) .. note:: Command options may be abbreviated as long as no ambiguities are introduced. So the following do the same: @@ -475,7 +475,7 @@ Command:: * **noformfeed:** (bool) instead of `hex(12)` (formfeed), write linebreaks `\n` at end of output pages. * **skip-empty:** (bool) skip pages with no text. * **grid:** lines with a vertical coordinate difference of no more than this value (in points) will be merged into the same output line. Only relevant for "layout" mode. **Use with care:** 3 or the default 2 should be adequate in most cases. If **too large**, lines that are *intended* to be different in the original may be merged and will result in garbled and / or incomplete output. If **too low**, artifact separate output lines may be generated for some spans in the input line, just because they are coded in a different font with slightly deviating properties. -* **fontsize:** include text with fontsize larger than this value only (default 3). Only relevant for "layout" option. +* **fontsize:** include text with :data:`fontsize` larger than this value only (default 3). Only relevant for "layout" option. .. highlight:: python diff --git a/docs/page.rst b/docs/page.rst index 350023eb2..97999d878 100644 --- a/docs/page.rst +++ b/docs/page.rst @@ -190,7 +190,7 @@ In a nutshell, this is what you can do with PyMuPDF: :arg rect_like rect: the rectangle into which the text should be inserted. Text is automatically wrapped to a new line at box width. Lines not fitting into the box will be invisible. :arg str text: the text. *(New in v1.17.0)* May contain any mixture of Latin, Greek, Cyrillic, Chinese, Japanese and Korean characters. The respective required font is automatically determined. - :arg float fontsize: the font size. Default is 12. + :arg float fontsize: the :data:`fontsize`. Default is 12. :arg str fontname: the font name. Default is "Helv". Accepted alternatives are "Cour", "TiRo", "ZaDb" and "Symb". The name may be abbreviated to the first two characters, like "Co" for "Cour". Lower case is also accepted. *(Changed in v1.16.0)* Bold or italic variants of the fonts are **no longer accepted**. A user-contributed script provides a circumvention for this restriction -- see section *Using Buttons and JavaScript* in chapter :ref:`FAQ`. *(New in v1.17.0)* The actual font to use is now determined on a by-character level, and all required fonts (or sub-fonts) are automatically included. Therefore, you should rarely ever need to care about this parameter and let it default (except you insist on a serifed font for your non-CJK text parts). :arg sequence,float text_color: *(new in v1.16.0)* the text color. Default is black. @@ -279,7 +279,7 @@ In a nutshell, this is what you can do with PyMuPDF: ) page.add_redact_annot(..., fontname="newname") - :arg float fontsize: *(New in v1.16.12)* the fontsize to use for the replacing text. If the text is too large to fit, several insertion attempts will be made, gradually reducing the fontsize to no less than 4. If then the text will still not fit, no text insertion will take place at all. + :arg float fontsize: *(New in v1.16.12)* the :data:`fontsize` to use for the replacing text. If the text is too large to fit, several insertion attempts will be made, gradually reducing the :data:`fontsize` to no less than 4. If then the text will still not fit, no text insertion will take place at all. :arg int align: *(New in v1.16.12)* the horizontal alignment for the replacing text. See :meth:`insert_textbox` for available values. The vertical alignment is (approximately) centered if a PDF built-in font is used (CJK or :ref:`Base-14-Fonts`). diff --git a/docs/recipes-text.rst b/docs/recipes-text.rst index f86a95ad2..e4d433c18 100644 --- a/docs/recipes-text.rst +++ b/docs/recipes-text.rst @@ -322,7 +322,7 @@ Output some text lines on a page:: doc.save("text.pdf") -With this method, only the **number of lines** will be controlled to not go beyond page height. Surplus lines will not be written and the number of actual lines will be returned. The calculation uses a line height calculated from the fontsize and 36 points (0.5 inches) as bottom margin. +With this method, only the **number of lines** will be controlled to not go beyond page height. Surplus lines will not be written and the number of actual lines will be returned. The calculation uses a line height calculated from the :data:`fontsize` and 36 points (0.5 inches) as bottom margin. Line **width is ignored**. The surplus part of a line will simply be invisible. diff --git a/docs/shape.rst b/docs/shape.rst index 1e895e489..23cbc572e 100644 --- a/docs/shape.rst +++ b/docs/shape.rst @@ -544,7 +544,7 @@ Common Parameters **fontsize** (*float*) - Font size of text. + Font size of text, see: :data:`fontsize`. ---- diff --git a/docs/textpage.rst b/docs/textpage.rst index 8b8ac3a2f..4195f6e9e 100644 --- a/docs/textpage.rst +++ b/docs/textpage.rst @@ -279,7 +279,7 @@ chars (only for :meth:`extractRAWDICT`) *list* of character dictionari *(New in version 1.16.0):* *"color"* is the text color encoded in sRGB (int) format, e.g. 0xFF0000 for red. There are functions for converting this integer back to formats (r, g, b) (PDF with float values from 0 to 1) :meth:`sRGB_to_pdf`, or (R, G, B), :meth:`sRGB_to_rgb` (with integer values from 0 to 255). -*(New in v1.18.5):* *"ascender"* and *"descender"* are font properties, provided relative to fontsize 1. Note that descender is a negative value. The following picture shows the relationship to other values and properties. +*(New in v1.18.5):* *"ascender"* and *"descender"* are font properties, provided relative to :data:`fontsize` 1. Note that descender is a negative value. The following picture shows the relationship to other values and properties. .. image:: images/img-asc-desc.* :scale: 60 @@ -294,7 +294,7 @@ These numbers may be used to compute the minimum height of a character (or span) >>> r.y0 = r.y1 - span["size"] >>> # r now is a rectangle of height 'fontsize' -.. caution:: The above calculation may deliver a **larger** height! This may e.g. happen for OCRed documents, where the risk of all sorts of text artifacts is high. MuPDF tries to come up with a reasonable bbox height, independently from the fontsize found in the PDF. So please ensure that the height of `span["bbox"]` is **larger** than `span["size"]`. +.. caution:: The above calculation may deliver a **larger** height! This may e.g. happen for OCRed documents, where the risk of all sorts of text artifacts is high. MuPDF tries to come up with a reasonable bbox height, independently from the :data:`fontsize` found in the PDF. So please ensure that the height of `span["bbox"]` is **larger** than `span["size"]`. .. note:: You may request PyMuPDF to do all of the above automatically by executing `fitz.TOOLS.set_small_glyph_heights(True)`. This sets a global parameter so that all subsequent text searches and text extractions are based on reduced glyph heights, where meaningful. diff --git a/docs/textwriter.rst b/docs/textwriter.rst index 6cc326c01..7438de957 100644 --- a/docs/textwriter.rst +++ b/docs/textwriter.rst @@ -15,7 +15,7 @@ During **preparation**, a text writer stores any number of text pieces ("spans") A text writer is an elegant alternative to methods :meth:`Page.insert_text` and friends: * **Improved text positioning:** Choose any point where insertion of text should start. Storing text returns the "cursor position" after the *last character* of the span. -* **Free font choice:** Each text span has its own font and fontsize. This lets you easily switch when composing a larger text. +* **Free font choice:** Each text span has its own font and :data:`fontsize`. This lets you easily switch when composing a larger text. * **Automatic fallback fonts:** If a character is not supported by the chosen font, alternative fonts are automatically searched. This significantly reduces the risk of seeing unprintable symbols in the output ("TOFUs" -- looking like a small rectangle). PyMuPDF now also comes with the **universal font "Droid Sans Fallback Regular"**, which supports **all Latin** characters (including Cyrillic and Greek), and **all CJK** characters (Chinese, Japanese, Korean). * **Cyrillic and Greek Support:** The :ref:`Base-14-fonts` have integrated support of Cyrillic and Greek characters **without specifying encoding.** Your text may be a mixture of Latin, Greek and Cyrillic. * **Transparency support:** Parameter *opacity* is supported. This offers a handy way to create watermark-style text. @@ -71,7 +71,7 @@ Using this object entails three steps: :arg point_like pos: start position of the text, the bottom left point of the first character. :arg str text: a string of arbitrary length. It will be written starting at position "pos". :arg font: a :ref:`Font`. If omitted, `fitz.Font("helv")` will be used. - :arg float fontsize: the fontsize, a positive number, default 11. + :arg float fontsize: the :data:`fontsize`, a positive number, default 11. :arg str language: the language to use, e.g. "en" for English. Meaningful values should be compliant with the ISO 639 standards 1, 2, 3 or 5. Reserved for future use: currently has no effect as far as we know. :arg bool right_to_left: *(New in v1.18.9)* whether the text should be written from right to left. Applicable for languages like Arabian or Hebrew. Default is *False*. If *True*, any Latin parts within the text will automatically converted. There are no other consequences, i.e. :attr:`TextWriter.last_point` will still be the rightmost character, and there neither is any alignment taking place. Hence you may want to use :meth:`TextWriter.fill_textbox` instead. :arg bool small_caps: *(New in v1.18.15)* look for the character's Small Capital version in the font. If present, take that value instead. Otherwise the original character (this font or the fallback font) will be taken. The fallback font will never return small caps. For example, this snippet:: @@ -102,7 +102,7 @@ Using this object entails three steps: :arg point_like pos: start position of the text, the bottom left point of the first character. :arg str text: a string. It will be written starting at position "pos". :arg font: a :ref:`Font`. If omitted, `fitz.Font("helv")` will be used. - :arg float fontsize: the fontsize, a positive float, default 11. + :arg float fontsize: the :data:`fontsize`, a positive float, default 11. :arg str language: the language to use, e.g. "en" for English. Meaningful values should be compliant with the ISO 639 standards 1, 2, 3 or 5. Reserved for future use: currently has no effect as far as we know. :arg bool small_caps: *(New in v1.18.15)* see :meth:`append`. @@ -120,7 +120,7 @@ Using this object entails three steps: :arg str,sequ text: the text. Can be specified as a (UTF-8) string or a list / tuple of strings. A string will first be converted to a list using *splitlines()*. Every list item will begin on a new line (forced line breaks). :arg point_like pos: *(new in v1.17.3)* start storing at this point. Default is a point near rectangle top-left. :arg font: the :ref:`Font`, default `fitz.Font("helv")`. - :arg float fontsize: the fontsize. + :arg float fontsize: the :data:`fontsize`. :arg int align: text alignment. Use one of TEXT_ALIGN_LEFT, TEXT_ALIGN_CENTER, TEXT_ALIGN_RIGHT or TEXT_ALIGN_JUSTIFY. :arg bool right_to_left: *(New in v1.18.9)* whether the text should be written from right to left. Applicable for languages like Arabian or Hebrew. Default is *False*. If *True*, any Latin parts are automatically reverted. You must still set the alignment (if you want right alignment), it does not happen automatically -- the other alignment options remain available as well. :arg bool warn: on text overflow do nothing, warn, or raise an exception. Overflow text will never be written. **Changed in v1.18.9:** @@ -187,7 +187,7 @@ Using this object entails three steps: 1. Opacity and color apply to **all the text** in this object. 2. If you need different colors / transparency, you must create a separate TextWriter. Whenever you determine the color should change, simply append the text to the respective TextWriter using the previously returned :attr:`last_point` as position for the new text span. 3. Appending items or text boxes can occur in arbitrary order: only the position parameter controls where text appears. - 4. Font and fontsize can freely vary within the same TextWriter. This can be used to let text with different properties appear on the same displayed line: just specify *pos* accordingly, and e.g. set it to :attr:`last_point` of the previously added item. + 4. Font and :data:`fontsize` can freely vary within the same TextWriter. This can be used to let text with different properties appear on the same displayed line: just specify *pos* accordingly, and e.g. set it to :attr:`last_point` of the previously added item. 5. You can use the *pos* argument of :meth:`TextWriter.fill_textbox` to set the position of the first text character. This allows filling the same textbox with contents from different :ref:`TextWriter` objects, thus allowing for multiple colors, opacities, etc. 6. MuPDF does not support all fonts with this feature, e.g. no Type3 fonts. Starting with v1.18.0 this can be checked via the font attribute :attr:`Font.is_writable`. This attribute is also checked when using :ref:`TextWriter` methods. diff --git a/docs/widget.rst b/docs/widget.rst index d08ba0068..4e2fd2706 100644 --- a/docs/widget.rst +++ b/docs/widget.rst @@ -133,7 +133,7 @@ Like annotations, widgets live on PDF pages. Similar to annotations, the first w .. attribute:: text_fontsize - A float defining the text fontsize. Default value is zero, which causes PDF viewer software to dynamically choose a size suitable for the annotation's rectangle and text amount. + A float defining the text :data:`fontsize`. Default value is zero, which causes PDF viewer software to dynamically choose a size suitable for the annotation's rectangle and text amount. .. attribute:: text_maxlen @@ -234,7 +234,7 @@ TiRo Times-Roman ZaDb ZapfDingbats ============= ======================= -You are generally free to use any font for every widget. However, we recommend using *ZaDb* ("ZapfDingbats") and fontsize 0 for check boxes: typical viewers will put a correctly sized tickmark in the field's rectangle, when it is clicked. +You are generally free to use any font for every widget. However, we recommend using *ZaDb* ("ZapfDingbats") and :data:`fontsize` 0 for check boxes: typical viewers will put a correctly sized tickmark in the field's rectangle, when it is clicked. Supported Widget Types -----------------------