Skip to content

Wrong characters displayed after font subsetting (w/ native method) #4457

@Jeremy-Hibiki

Description

@Jeremy-Hibiki

Description of the bug

Image

We are working on a project to "inplace translate" academic papers and seeing some outputs weird.
Just load, subset and save the paper, and you will find texts with the sans font are unreadable.

The problem only occurs with native subsetting method, while using fonttools can produce normal outputs.

How to reproduce the bug

import fitz
import httpx

fstream = httpx.get("https://arxiv.org/pdf/2504.13181v1").read()
with open("original.pdf", "wb") as f:
    f.write(fstream)
doc = fitz.Document(stream=fstream)
doc.ez_save("before_subset.pdf")
doc.subset_fonts()
doc.ez_save("after_subset.pdf")

https://arxiv.org/pdf/2504.13180
https://arxiv.org/pdf/2504.13181

Both abnormal papers are from Meta, using the same template.

PyMuPDF version

1.25.5

Operating system

Linux

Python version

3.11

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions