Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed catalog annotation #1

Merged
merged 3 commits into from
Jan 15, 2021
Merged

Fixed catalog annotation #1

merged 3 commits into from
Jan 15, 2021

Conversation

haiyangToAI
Copy link
Contributor

@haiyangToAI haiyangToAI commented Jan 12, 2021

  • issues caused by PDF 1.1 and PDF 1.2
  • For Named Destination, a destination may be referred to indirectly by means of a name object (PDF 1.1) or a byte string (PDF 1.2)
  • this PR will start supporting named destination extract for both PDF 1.1 and PDF 1.2

- issues caused by PDF 1.1 and PDF 1.2
@haiyangToAI haiyangToAI requested a review from ubmarco January 12, 2021 08:22
if isinstance(pdf_catalog['Names'], PDFObjRef) and 'Dests' in pdf_catalog['Names'].resolve():
name_tree = pdf_catalog['Names'].resolve()['Dests'].resolve()
elif isinstance(pdf_catalog['Names'], dict) and 'Dests' in pdf_catalog['Names']:
name_tree = pdf_catalog['Names']['Dests'].resolve()
# check if name tree not empty
if not name_tree:
LOG.info('Catalog extraction: name destination exists but is empty...')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this good or bad from a users' perspective?

Please remove the 3 dots. It either looks like the program processes something currently or something looks fishy in the PDF but the tool does not tell what.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, removed.

for index_name in range(0, len(item_dest['Names']), 2):
named_destination[name_obj_list[index_dest]['Names'][index_name].decode('utf-8')] = name_obj_list[
index_dest
]['Names'][index_name + 1]

for key_object in named_destination:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not load the code in my IDE, but it looks like named_destination is defined if name_tree is defined. Could it be named_destination is referenced before its definition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you are right. It is referenced before definition. Done.

@ubmarco ubmarco merged commit d676ee1 into master Jan 15, 2021
@ubmarco ubmarco deleted the fix_catalog_anno branch January 15, 2021 19:43
ubmarco added a commit that referenced this pull request Dec 14, 2021
ubmarco added a commit that referenced this pull request Dec 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants