Releases: kermitt2/pdfalto
Version 0.4
New in version 0.4 (apart various bug fixes):
-
support for xpdf language support package for language-specific fonts like Arabic, Chinese-simplified, Japanese, etc. they are pre-installed locally and portable
-
refined line number detection and fixing a bug which could result in random missing numbers in the ALTO output
-
update to xpdf-4.03
-
fix issue with character spacing due to invalid rotation condition
-
update dependencies and dependency install script
Version 0.3
New in version 0.3:
-
line number detection: line numbers (typically added for review in manuscripts/preprints) are specifically identified and not anymore mixed with the rest of text content, they will be grouped in a separate block or, optionally, not outputted in the ALTO file (
noLineNumbers
option) -
removal of
-blocks
option, the block information are always returned for ensuring ALTO validation (<TextBlock>
element) -
bug fixing on reading order
-
fix possible incorrect XMax and YMax values at 0 on block coordinates having only one line
Version 0.2
New in version 0.2:
- support Unicode composition of characters
- generalize reading order to all blocks (it was limited to the blocks of the first page)
- use subscript/superscript text font style attribute
- use SVG as a format for vectorial images
- propagate unsolved character Unicode value (free Unicode range for embedded fonts) as encoded special character in ALTO (so-called "placeholder" approach)
- generate metadata information in a separate XML file (as ALTO schema does not support that)
- use the latest version of xpdf, version 4.00
- add cmake
- ALTO output is replacing custom Xerox XML format
Note: this released version was used for Grobid release 0.5.6