Skip to content

Load PDFs give error "cannot open broken document" #1067

Closed
@aryan-rajoria

Description

Search before asking

  • I have searched the EvaDB issues and found no similar bug report.

Bug

09-07-2023 13:15:53 ERROR [plan_executor:plan_executor.py:execute_plan:0182] cannot open broken document
Traceback (most recent call last):
  File "/home/great/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/executor/plan_executor.py", line 178, in execute_plan
    yield from output
  File "/home/great/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/executor/project_executor.py", line 33, in exec
    for batch in child_executor.exec(**kwargs):
  File "/home/great/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/executor/seq_scan_executor.py", line 40, in exec
    for batch in child_executor.exec(**kwargs):
  File "/home/great/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/storage/pdf_storage_engine.py", line 36, in read
    for batch in reader.read():
  File "/home/great/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/readers/abstract_reader.py", line 54, in read
    for data in self._read():
  File "/home/great/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/readers/pdf_reader.py", line 35, in _read
    doc = fitz.open(self.file_url)
          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/great/projects/pyenvs/slacks/lib/python3.11/site-packages/fitz/fitz.py", line 4041, in __init__
    _fitz.Document_swiginit(self, _fitz.new_Document(filename, stream, filetype, rect, width, height, fontsize))
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
fitz.fitz.FileDataError: cannot open broken document
FileDataError                             Traceback (most recent call last)
[~/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/executor/plan_executor.py](https://file+.vscode-resource.vscode-cdn.net/home/great/projects/intern/evadb-slack-bot/~/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/executor/plan_executor.py) in ?(self, do_not_raise_exceptions, do_not_print_exceptions)
    181                 if do_not_print_exceptions is False:
    182                     logger.exception(str(e))
--> 183                 raise ExecutorError(e)

[~/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/executor/project_executor.py](https://file+.vscode-resource.vscode-cdn.net/home/great/projects/intern/evadb-slack-bot/~/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/executor/project_executor.py) in ?(self, *args, **kwargs)
     31     def exec(self, *args, **kwargs) -> Iterator[Batch]:
     32         child_executor = self.children[0]
---> 33         for batch in child_executor.exec(**kwargs):
     34             batch = apply_project(batch, self.target_list, self.catalog())

[~/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/executor/seq_scan_executor.py](https://file+.vscode-resource.vscode-cdn.net/home/great/projects/intern/evadb-slack-bot/~/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/executor/seq_scan_executor.py) in ?(self, *args, **kwargs)
     38     def exec(self, *args, **kwargs) -> Iterator[Batch]:
     39         child_executor = self.children[0]
---> 40         for batch in child_executor.exec(**kwargs):
     41             # apply alias to the batch

[~/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/storage/pdf_storage_engine.py](https://file+.vscode-resource.vscode-cdn.net/home/great/projects/intern/evadb-slack-bot/~/projects/pyenvs/slacks/lib/python3.11/site-packages/evadb/storage/pdf_storage_engine.py) in ?(self, table)
     34                 # setting batch_mem_size = 1, we need fix it
     35                 reader = PDFReader(str(image_file), batch_mem_size=1)
---> 36                 for batch in reader.read():
     37                     batch.frames[table.columns[0].name] = row_id
...
    181 if do_not_print_exceptions is False:
    182     logger.exception(str(e))
--> 183 raise ExecutorError(e)

ExecutorError: cannot open broken document

Environment

  • EvaDB 0.3.3
  • Ubuntu
  • python 3.11

Document: https://readthedocs.org/projects/evadb/downloads/ downloaded from here version 0.2.4

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!

Metadata

Assignees

Labels

Bug 🐞EVA is not working as expectedCrash 💥EVA is crashing

Type

No type

Projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions