Description
Is your feature request related to a problem? Please describe.
Before version 1.17.6
when you extracted the page text dict you would get font names like
MJGHPI+TimesNewRomanPSMT
and
YCCJKF+TimesNewRomanPSMT
Now those same text spans both return
TimesNewRomanPSMT
Note the lack of leading prefix
This is a breaking change for our current implementation and we can not upgrade beyond 1.17.5
until we can specify that full font-names should be used.
This is the line that was changed
Describe the solution you'd like
It would nice to be be able to do one of the following:
- Add a flag/option to
get_text("dict")
that specifies that font names should be returned as is - The font's xref is included in the final span so that it's xref could be found in the page's font list
Describe alternatives you've considered
I explored option 2 above but I don't currently think that's possible
Additional context
If you think this is something that is in-line with the project's vision I will happily implement it.