Skip to content

fix/Handle empty lists and malformed PDF dictionary values #3426

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

mrscottyrose
Copy link

Fix: Handle empty lists and malformed PDF dictionary values

Description

This PR addresses issue #12 by improving error handling for empty lists and malformed PDF dictionary values in the PDF parser. The changes prevent the "List index out of range" error that occurred when parsing certain PDF files.

Changes

  • Added handling for empty lists in PDF dictionary parsing
  • Improved error handling for malformed list values
  • Added warning logging for unexpected PDF dictionary values
  • Made dictionary handling more robust by skipping problematic values instead of crashing

Testing

The changes have been tested with the problematic PDF file from the original issue (deniable removal 2 with incremental update.pdf) and other edge cases involving empty lists and malformed dictionaries.

Related Issue

Fixes #12

Additional Notes

While this PR improves the robustness of PDF parsing, further improvements to the _emit_dict function might be needed for complete handling of all edge cases.

@mrscottyrose mrscottyrose requested a review from ESultanik as a code owner April 28, 2025 06:52
@CLAassistant
Copy link

CLAassistant commented Apr 28, 2025

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

'List index out of range' on one of Ange's POC files
3 participants