Closed
Description
Description of the bug
If I feed a .csv file to pymupdf.open
, I get an FileDataError, as documented:
If you attempt to open an unsupported file then PyMuPDF will throw a file data error.
But if I instead pass the bytes of the same file to stream=
I get an FzErrorFormat
, which I was not expecting from the docs.
How to reproduce the bug
with open('myfile.csv', 'rb') as f:
file_bytes = f.read()
It probably doesn't matter what's in csv but here's mine:
>> file_bytes
b'A,B,C,D\r\n1,2,1,2\r\n2,2,1,2\r\n'
Now we try to open this:
>> pymupdf.open(stream=file_bytes)
---------------------------------------------------------------------------
FzErrorFormat Traceback (most recent call last)
<ipython-input-21-668e9798a921> in ?()
----> 1 pymupdf.open(stream=file_bytes)
~/.local/lib/python3.12/site-packages/pymupdf/__init__.py in ?(self, filename, stream, filetype, rect, width, height, fontsize)
2884 self.page_count2 = extra.page_count_pdf
2885 else:
2886 self.page_count2 = extra.page_count_fz
2887 finally:
-> 2888 JM_mupdf_show_errors = JM_mupdf_show_errors_old
~/.local/lib/python3.12/site-packages/pymupdf/mupdf.py in ?(magic, stream)
44292
44293 NOTE: The caller retains ownership of 'stream' - the document will take its
44294 own reference if required.
44295 """
> 44296 return _mupdf.fz_open_document_with_stream(magic, stream)
FzErrorFormat: code=7: no objects found
Contrast this with what happens when I open the file directly:
pymupdf.open("myfile.csv")
---------------------------------------------------------------------------
FzErrorUnsupported Traceback (most recent call last)
~/.local/lib/python3.12/site-packages/pymupdf/__init__.py in ?(self, filename, stream, filetype, rect, width, height, fontsize)
2886 self.page_count2 = extra.page_count_fz
2887 finally:
-> 2888 JM_mupdf_show_errors = JM_mupdf_show_errors_old
~/.local/lib/python3.12/site-packages/pymupdf/mupdf.py in ?(filename)
44271 filename: a path to a file as it would be given to open(2).
44272 """
> 44273 return _mupdf.fz_open_document(filename)
FzErrorUnsupported: code=6: cannot find document handler for file: myfile.csv
The above exception was the direct cause of the following exception:
FileDataError Traceback (most recent call last)
<ipython-input-22-b19d9e4e2772> in ?()
----> 1 pymupdf.open("myfile.csv")
~/.local/lib/python3.12/site-packages/pymupdf/__init__.py in ?(self, filename, stream, filetype, rect, width, height, fontsize)
2884 self.page_count2 = extra.page_count_pdf
2885 else:
2886 self.page_count2 = extra.page_count_fz
2887 finally:
-> 2888 JM_mupdf_show_errors = JM_mupdf_show_errors_old
FileDataError: Failed to open file 'myfile.csv'.
(we can see it still fails with FzErrorUnSupported but this ultimately raises FileDataError as documented).
PyMuPDF version
1.24.10
Operating system
Linux
Python version
3.12