Skip to content

open stream can raise a FzErrorFormat error instead of FileDataError #3905

Closed
@cbm755

Description

@cbm755

Description of the bug

If I feed a .csv file to pymupdf.open, I get an FileDataError, as documented:

If you attempt to open an unsupported file then PyMuPDF will throw a file data error.

But if I instead pass the bytes of the same file to stream= I get an FzErrorFormat, which I was not expecting from the docs.

How to reproduce the bug

with open('myfile.csv', 'rb') as f:
    file_bytes = f.read()

It probably doesn't matter what's in csv but here's mine:

>> file_bytes
b'A,B,C,D\r\n1,2,1,2\r\n2,2,1,2\r\n'

Now we try to open this:

>> pymupdf.open(stream=file_bytes)
---------------------------------------------------------------------------
FzErrorFormat                             Traceback (most recent call last)
<ipython-input-21-668e9798a921> in ?()
----> 1 pymupdf.open(stream=file_bytes)

~/.local/lib/python3.12/site-packages/pymupdf/__init__.py in ?(self, filename, stream, filetype, rect, width, height, fontsize)
   2884                     self.page_count2 = extra.page_count_pdf
   2885                 else:
   2886                     self.page_count2 = extra.page_count_fz
   2887         finally:
-> 2888             JM_mupdf_show_errors = JM_mupdf_show_errors_old

~/.local/lib/python3.12/site-packages/pymupdf/mupdf.py in ?(magic, stream)
  44292 
  44293         NOTE: The caller retains ownership of 'stream' - the document will take its
  44294         own reference if required.
  44295     """
> 44296     return _mupdf.fz_open_document_with_stream(magic, stream)

FzErrorFormat: code=7: no objects found

Contrast this with what happens when I open the file directly:

pymupdf.open("myfile.csv")
---------------------------------------------------------------------------
FzErrorUnsupported                        Traceback (most recent call last)
~/.local/lib/python3.12/site-packages/pymupdf/__init__.py in ?(self, filename, stream, filetype, rect, width, height, fontsize)
   2886                     self.page_count2 = extra.page_count_fz
   2887         finally:
-> 2888             JM_mupdf_show_errors = JM_mupdf_show_errors_old

~/.local/lib/python3.12/site-packages/pymupdf/mupdf.py in ?(filename)
  44271         filename: a path to a file as it would be given to open(2).
  44272     """
> 44273     return _mupdf.fz_open_document(filename)

FzErrorUnsupported: code=6: cannot find document handler for file: myfile.csv

The above exception was the direct cause of the following exception:

FileDataError                             Traceback (most recent call last)
<ipython-input-22-b19d9e4e2772> in ?()
----> 1 pymupdf.open("myfile.csv")

~/.local/lib/python3.12/site-packages/pymupdf/__init__.py in ?(self, filename, stream, filetype, rect, width, height, fontsize)
   2884                     self.page_count2 = extra.page_count_pdf
   2885                 else:
   2886                     self.page_count2 = extra.page_count_fz
   2887         finally:
-> 2888             JM_mupdf_show_errors = JM_mupdf_show_errors_old

FileDataError: Failed to open file 'myfile.csv'.

(we can see it still fails with FzErrorUnSupported but this ultimately raises FileDataError as documented).

PyMuPDF version

1.24.10

Operating system

Linux

Python version

3.12

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions