Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: 'unicode' object has no attribute 'seek' #14

Closed
tringger opened this issue Apr 22, 2014 · 5 comments
Closed

AttributeError: 'unicode' object has no attribute 'seek' #14

tringger opened this issue Apr 22, 2014 · 5 comments
Assignees

Comments

@tringger
Copy link

Hi Virantha,

I really enjoy what you've built. Thank you very much. I'm reporting an issue I've run into when attempting to copy more than one PDF to the watch folder while using a config.yaml file. Here is the complete output from my command line, including the command. Also, I have imagemagick installed. Even though the message says it can't find "identify", I can run "identify" from the command line without any issues. Also, if I don't use a watch folder, I can run pypdfocr on a single file without any issue.

C:\Users\tringger006\Documents\ARCA\2014\OCR>pypdfocr -w C:\Users\tringger006\Documents\ARCA\2014\OCR -f -c config.yaml -d

Filing of PDFs is enabled

  • 2 target filing folders
  • 2 keywords

Starting to watch for new pdfs in C:\Users\tringger006\Documents\ARCA\2014\OCR
Starting conversion of C:\Users\tringger006\Documents\ARCA\2014\OCR\Document1.pdf

WARNING: Could not execute identify to calculate DPI (try installing imagemagick?), so defaulting to >300dpi
Completed conversion successfully to >C:\Users\tringger006\Documents\ARCA\2014\OCR\Document1_ocr.pdf
Traceback (most recent call last):
File "C:\Python27\Scripts\pypdfocr-script.py", line 9, in
load_entry_point('pypdfocr==0.7.4', 'console_scripts', 'pypdfocr')()
File "c:\Python27\lib\site-packages\pypdfocr\pypdfocr.py", line 411, in main
script.go(sys.argv[1:])
File "c:\Python27\lib\site-packages\pypdfocr\pypdfocr.py", line 396, in go
filing = self.file_converted_file(ocr_pdffilename, pdf_filename)
File "c:\Python27\lib\site-packages\pypdfocr\pypdfocr.py", line 331, in file_converted_file
filed_path = self.pdf_filer.move_to_matching_folder(ocr_pdffilename)
File "c:\Python27\lib\site-packages\pypdfocr\pypdfocr_pdffiler.py", line 65, in move_to_matching_folder
for page_text in self.iter_pdf_page_text(filename):
File "c:\Python27\lib\site-packages\pypdfocr\pypdfocr_pdffiler.py", line 42, in iter_pdf_page_text
reader = PdfFileReader(filename)
File "c:\Python27\lib\site-packages\PyPDF2\pdf.py", line 800, in init
self.read(stream)
File "c:\Python27\lib\site-packages\PyPDF2\pdf.py", line 1242, in read
stream.seek(-1, 2)
AttributeError: 'unicode' object has no attribute 'seek'

Here are the contents of my config.yaml file, stored in the watch directory:

target_folder: "docs/filed"
default_folder: "docs/filed/manual_sort"
original_move_folder: "docs/originals"

folders:
default_fees:
- appraisal
other:
- text

Thanks so much, again.

@virantha
Copy link
Owner

Thanks for reporting this! Let me take a look at it...

@virantha virantha self-assigned this Apr 23, 2014
@nextechinc
Copy link

I'm having the exact same problem.

@virantha
Copy link
Owner

I am unable to reproduce this under mac/linux. Could you send me the pdf you're working with?

@virantha
Copy link
Owner

Also, please make sure you're running version 0.7.5

@virantha
Copy link
Owner

virantha commented Sep 2, 2014

Closing issue as can't reproduce.

@virantha virantha closed this as completed Sep 2, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants