Skip to content

Re-support font name prefixes in span font names  #896

Closed
@darraghmckay

Description

@darraghmckay

Is your feature request related to a problem? Please describe.

Before version 1.17.6 when you extracted the page text dict you would get font names like
MJGHPI+TimesNewRomanPSMT
and
YCCJKF+TimesNewRomanPSMT

Now those same text spans both return
TimesNewRomanPSMT

Note the lack of leading prefix

This is a breaking change for our current implementation and we can not upgrade beyond 1.17.5 until we can specify that full font-names should be used.

This is the line that was changed

https://github.com/pymupdf/PyMuPDF/compare/60e0c1fd5abadf61905253ea2fa19f62cb28e66e..10341cea796e8cbde86959a590d87b2596c27085#diff-04606915a2aa7f21b7798f15aba6f7b29a8900c7ac7403b13f2237f8214749ecR184

Describe the solution you'd like
It would nice to be be able to do one of the following:

  1. Add a flag/option to get_text("dict") that specifies that font names should be returned as is
  2. The font's xref is included in the final span so that it's xref could be found in the page's font list

Describe alternatives you've considered
I explored option 2 above but I don't currently think that's possible

Additional context
If you think this is something that is in-line with the project's vision I will happily implement it.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions