Load PDFs give error "cannot open broken document" #1067
Closed
Description
Search before asking
- I have searched the EvaDB issues and found no similar bug report.
Bug
09-07-2023 13:15:53 ERROR [plan_executor:plan_executor.py:execute_plan:0182] cannot open broken document
Traceback (most recent call last):
File "/home/great/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/executor/plan_executor.py", line 178, in execute_plan
yield from output
File "/home/great/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/executor/project_executor.py", line 33, in exec
for batch in child_executor.exec(**kwargs):
File "/home/great/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/executor/seq_scan_executor.py", line 40, in exec
for batch in child_executor.exec(**kwargs):
File "/home/great/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/storage/pdf_storage_engine.py", line 36, in read
for batch in reader.read():
File "/home/great/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/readers/abstract_reader.py", line 54, in read
for data in self._read():
File "/home/great/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/readers/pdf_reader.py", line 35, in _read
doc = fitz.open(self.file_url)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/great/projects/pyenvs/slacks/lib/python3.11/site-packages/fitz/fitz.py", line 4041, in __init__
_fitz.Document_swiginit(self, _fitz.new_Document(filename, stream, filetype, rect, width, height, fontsize))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
fitz.fitz.FileDataError: cannot open broken document
FileDataError Traceback (most recent call last)
[~/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/executor/plan_executor.py](https://file+.vscode-resource.vscode-cdn.net/home/great/projects/intern/evadb-slack-bot/~/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/executor/plan_executor.py) in ?(self, do_not_raise_exceptions, do_not_print_exceptions)
181 if do_not_print_exceptions is False:
182 logger.exception(str(e))
--> 183 raise ExecutorError(e)
[~/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/executor/project_executor.py](https://file+.vscode-resource.vscode-cdn.net/home/great/projects/intern/evadb-slack-bot/~/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/executor/project_executor.py) in ?(self, *args, **kwargs)
31 def exec(self, *args, **kwargs) -> Iterator[Batch]:
32 child_executor = self.children[0]
---> 33 for batch in child_executor.exec(**kwargs):
34 batch = apply_project(batch, self.target_list, self.catalog())
[~/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/executor/seq_scan_executor.py](https://file+.vscode-resource.vscode-cdn.net/home/great/projects/intern/evadb-slack-bot/~/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/executor/seq_scan_executor.py) in ?(self, *args, **kwargs)
38 def exec(self, *args, **kwargs) -> Iterator[Batch]:
39 child_executor = self.children[0]
---> 40 for batch in child_executor.exec(**kwargs):
41 # apply alias to the batch
[~/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/storage/pdf_storage_engine.py](https://file+.vscode-resource.vscode-cdn.net/home/great/projects/intern/evadb-slack-bot/~/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/storage/pdf_storage_engine.py) in ?(self, table)
34 # setting batch_mem_size = 1, we need fix it
35 reader = PDFReader(str(image_file), batch_mem_size=1)
---> 36 for batch in reader.read():
37 batch.frames[table.columns[0].name] = row_id
...
181 if do_not_print_exceptions is False:
182 logger.exception(str(e))
--> 183 raise ExecutorError(e)
ExecutorError: cannot open broken document
Environment
- EvaDB 0.3.3
- Ubuntu
- python 3.11
Document: https://readthedocs.org/projects/evadb/downloads/ downloaded from here version 0.2.4
Are you willing to submit a PR?
- Yes I'd like to help by submitting a PR!
Metadata
Assignees
Type
Projects
Status
Done