Word delimiter support, fixes #2637, #2556, #2553, #2522 #2661
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
For text extraction
get_text("words")
, or extractWORDS, words are defined as strings not containing white space. This change allows adding up to 64 characters to also function as delimiters. This allows for instance to separate words from punctuations or to decompose an e-mail address into its components.Other changes:
Fixing #2522: correcting the typo
Remove some unnecessary setting of flags when creating annotations.
Fixing #2553:
Adjust plain text extraction to use the same approach as other variants. This entails using Unicode escape strings on output instead of using the output of fz_chartorune. Another consequence is that standard text output is directed to a fz_buffer instead to a fz_output.
Fixing #2556: Add checking the existence of path dictionaries at every possible place. Includes an additional test function.
Add functions JM_ignore_rect / JM_ignore_irect which return a bool. The functions return True if the rectangle should be ignored. This is the case for infinite and empty rectangles, but also for any rectangle that has a common edge with the infinite rectangle.
Support variable setting of character border widths for insert_text() / insert_textbox(). This is a factor to be multiplied with the font size. Default is 0.05 (read: 5% of the fontsize). This value is relevant for text rendering modes 1 and 2 only.
Fixing #2637:
In Page.insert_textbox, when the last word of a line won't fit in the line buffer, we did not increase the line position. This is now handled correctly.