Skip to content

"(none)" written to metadata when it should be omitted #724

Closed
@MerlijnWajer

Description

@MerlijnWajer

Please provide all mandatory information!

Describe the bug (mandatory)

When calling Document.setMetadata, some keys that are not the in dictionary are written the PDF document with the literal string value none. I think they should just be ommitted.

To Reproduce (mandatory)

>>> import fitz
>>> doc = fitz.open()
>>> doc.insertPage(-1)
0
>>> doc.PDFTrailer()
'<<\n  /Size 3\n  /Root 1 0 R\n>>'
>>> doc.save('test.pdf')
>>> doc.PDFTrailer()
'<<\n  /Size 3\n  /Root 1 0 R\n  /ID [ <DD34F5DD4BD314AE963D741F5238F847> <AC1F320E75267644CDBC99CE6D094BFA> ]\n>>'
>>> doc.setMetadata({'title': 'Test'})
>>> doc.save('test.pdf')
>>> doc.PDFTrailer()
'<<\n  /Size 3\n  /Root 1 0 R\n  /ID [ <DD34F5DD4BD314AE963D741F5238F847> <90970EF47F0CD252B8152FCABA5F3E03> ]\n  /Info 5 0 R\n>>'
>>> doc.xrefObject(5)
'<<\n  /Author (none)\n  /CreationDate (none)\n  /Creator (none)\n  /Keywords (none)\n  /ModDate (none)\n  /Producer (none)\n  /Subject (none)\n  /Title (Test)\n>>'

Workaround

I worked around this problem for now with this method (because I had to roll out my code to production yesterday):

#  pymupdf inserts stuff like '/Author (none)' when the author is not provided.
#  This is wrong. We'll file a bug, but let's first fix it here.
def fixup_pymupdf_metadata(doc):
    # Access to the Info xref is not in the API, so let's dig for it.
    trailer_lines = outdoc.PDFTrailer().split('\n')
    for line in trailer_lines:
        if '  /Info ' in line:
            s = line.replace('  /Info ', '')
            info_xref = s[:s.find(' ')]
            info_xref = int(info_xref)

            s = doc.xrefObject(info_xref)

            new_s = ''

            for infoline in s.split('\n'):
                if re.match('^.*\/[A-Za-z]+ \(none\)$', infoline):
                    continue

                new_s += infoline + '\n'

            doc.updateObject(info_xref, new_s)

            break

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions