Closed
Description
Environment
Delta-rs version:
0.7.0 (latest main)
Binding:
Python
Environment:
- Cloud provider: N/A (mounted server disk)
- OS: *NIX
- Other:
Bug
What happened:
I'm trying to read a delta-table, that located on a local server.
If we run the following code:
from deltalake import DeltaTable
path = 'file:///mnt/path/to/table/'
DeltaTable(path).to_pyarrow_table()
We will get the following error:
Traceback
Traceback (most recent call last):
File "/path/to/main.py", line 2, in <module>
dt.to_pyarrow_table()
File "/path/to/deltalake/table.py", line 401, in to_pyarrow_table
return self.to_pyarrow_dataset(
File "pyarrow/_dataset.pyx", line 369, in pyarrow._dataset.Dataset.to_table
File "pyarrow/_dataset.pyx", line 2818, in pyarrow._dataset.Scanner.to_table
File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/_fs.pyx", line 1551, in pyarrow._fs._cb_open_input_file
File "/path/to/deltalake/fs.py", line 22, in open_input_file
return pa.PythonFile(DeltaFileSystemHandler.open_input_file(self, path))
deltalake.PyDeltaTableError: Object at location /mnt/network_expansion/path/to/table/part-00000-aaaaaa-1111-2222-bbbbb-333333333.c000.snappy.parquet not found: No such file or directory (os error 2)
The *.parquet
file is present.
What you expected to happen:
Succeful read
How to reproduce it:
More details:
Worth noting:
- If we run similar code, but written in Rust, it will work just fine.
- The full network path contains characters that get URL-encoded.
- If we try to save schema (
file://
) in url-path, as suggested in this issue, the code will do some more progress (it will find and read all*.parquet
files), but will fail with a SIGABRT.
From (1) I'm taking a guess, that there's a broken logic either somewhere in deltalake python wrapper, because (looking at the traceback):
File "pyarrow/_fs.pyx", line 1551, in pyarrow._fs._cb_open_input_file
is a call from PyArrow to a SystemHandler
, which is DeltaFileSystemHandler
Activity