Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"(none)" written to metadata when it should be omitted #724

Closed
MerlijnWajer opened this issue Nov 18, 2020 · 2 comments
Closed

"(none)" written to metadata when it should be omitted #724

MerlijnWajer opened this issue Nov 18, 2020 · 2 comments
Assignees
Labels

Comments

@MerlijnWajer
Copy link

MerlijnWajer commented Nov 18, 2020

Please provide all mandatory information!

Describe the bug (mandatory)

When calling Document.setMetadata, some keys that are not the in dictionary are written the PDF document with the literal string value none. I think they should just be ommitted.

To Reproduce (mandatory)

>>> import fitz
>>> doc = fitz.open()
>>> doc.insertPage(-1)
0
>>> doc.PDFTrailer()
'<<\n  /Size 3\n  /Root 1 0 R\n>>'
>>> doc.save('test.pdf')
>>> doc.PDFTrailer()
'<<\n  /Size 3\n  /Root 1 0 R\n  /ID [ <DD34F5DD4BD314AE963D741F5238F847> <AC1F320E75267644CDBC99CE6D094BFA> ]\n>>'
>>> doc.setMetadata({'title': 'Test'})
>>> doc.save('test.pdf')
>>> doc.PDFTrailer()
'<<\n  /Size 3\n  /Root 1 0 R\n  /ID [ <DD34F5DD4BD314AE963D741F5238F847> <90970EF47F0CD252B8152FCABA5F3E03> ]\n  /Info 5 0 R\n>>'
>>> doc.xrefObject(5)
'<<\n  /Author (none)\n  /CreationDate (none)\n  /Creator (none)\n  /Keywords (none)\n  /ModDate (none)\n  /Producer (none)\n  /Subject (none)\n  /Title (Test)\n>>'

Workaround

I worked around this problem for now with this method (because I had to roll out my code to production yesterday):

#  pymupdf inserts stuff like '/Author (none)' when the author is not provided.
#  This is wrong. We'll file a bug, but let's first fix it here.
def fixup_pymupdf_metadata(doc):
    # Access to the Info xref is not in the API, so let's dig for it.
    trailer_lines = outdoc.PDFTrailer().split('\n')
    for line in trailer_lines:
        if '  /Info ' in line:
            s = line.replace('  /Info ', '')
            info_xref = s[:s.find(' ')]
            info_xref = int(info_xref)

            s = doc.xrefObject(info_xref)

            new_s = ''

            for infoline in s.split('\n'):
                if re.match('^.*\/[A-Za-z]+ \(none\)$', infoline):
                    continue

                new_s += infoline + '\n'

            doc.updateObject(info_xref, new_s)

            break
@JorjMcKie
Copy link
Collaborator

Thanks for submitting this.
It will be fixed in the next version.

@JorjMcKie
Copy link
Collaborator

New version available on PyPI:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants