Closed
Description
Please provide all mandatory information!
Describe the bug (mandatory)
When calling Document.setMetadata
, some keys that are not the in dictionary are written the PDF document with the literal string value none
. I think they should just be ommitted.
To Reproduce (mandatory)
>>> import fitz
>>> doc = fitz.open()
>>> doc.insertPage(-1)
0
>>> doc.PDFTrailer()
'<<\n /Size 3\n /Root 1 0 R\n>>'
>>> doc.save('test.pdf')
>>> doc.PDFTrailer()
'<<\n /Size 3\n /Root 1 0 R\n /ID [ <DD34F5DD4BD314AE963D741F5238F847> <AC1F320E75267644CDBC99CE6D094BFA> ]\n>>'
>>> doc.setMetadata({'title': 'Test'})
>>> doc.save('test.pdf')
>>> doc.PDFTrailer()
'<<\n /Size 3\n /Root 1 0 R\n /ID [ <DD34F5DD4BD314AE963D741F5238F847> <90970EF47F0CD252B8152FCABA5F3E03> ]\n /Info 5 0 R\n>>'
>>> doc.xrefObject(5)
'<<\n /Author (none)\n /CreationDate (none)\n /Creator (none)\n /Keywords (none)\n /ModDate (none)\n /Producer (none)\n /Subject (none)\n /Title (Test)\n>>'
Workaround
I worked around this problem for now with this method (because I had to roll out my code to production yesterday):
# pymupdf inserts stuff like '/Author (none)' when the author is not provided.
# This is wrong. We'll file a bug, but let's first fix it here.
def fixup_pymupdf_metadata(doc):
# Access to the Info xref is not in the API, so let's dig for it.
trailer_lines = outdoc.PDFTrailer().split('\n')
for line in trailer_lines:
if ' /Info ' in line:
s = line.replace(' /Info ', '')
info_xref = s[:s.find(' ')]
info_xref = int(info_xref)
s = doc.xrefObject(info_xref)
new_s = ''
for infoline in s.split('\n'):
if re.match('^.*\/[A-Za-z]+ \(none\)$', infoline):
continue
new_s += infoline + '\n'
doc.updateObject(info_xref, new_s)
break