Skip to content

bugfix relating to pdfminer when pdftotext not available, and some related tidying#27

Merged
martinburchell merged 1 commit intomasterfrom
fix_pdfminer_str_decode_bug
Feb 19, 2025
Merged

bugfix relating to pdfminer when pdftotext not available, and some related tidying#27
martinburchell merged 1 commit intomasterfrom
fix_pdfminer_str_decode_bug

Conversation

@RudolfCardinal
Copy link
Owner

For text extraction: pdftotext is preferred; but as fallback, we were using pdfminer and there was a bug relating to legacy Python 2 code. Fixed, but also shifted (for this optional component) to pdfminer.six (maintained), and some other removal of redundant/commented code.

Copy link
Collaborator

@martinburchell martinburchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@martinburchell martinburchell merged commit 5dd68a1 into master Feb 19, 2025
5 checks passed
@martinburchell martinburchell deleted the fix_pdfminer_str_decode_bug branch February 19, 2025 06:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants