Skip to content

Fail to read delta table on mounted disk #1189

Closed
@kuksag

Description

@kuksag

Environment

Delta-rs version:
0.7.0 (latest main)

Binding:
Python

Environment:

  • Cloud provider: N/A (mounted server disk)
  • OS: *NIX
  • Other:

Bug

What happened:
I'm trying to read a delta-table, that located on a local server.

If we run the following code:

from deltalake import DeltaTable
path = 'file:///mnt/path/to/table/'
DeltaTable(path).to_pyarrow_table()

We will get the following error:

Traceback
Traceback (most recent call last):
  File "/path/to/main.py", line 2, in <module>
    dt.to_pyarrow_table()
  File "/path/to/deltalake/table.py", line 401, in to_pyarrow_table
    return self.to_pyarrow_dataset(
  File "pyarrow/_dataset.pyx", line 369, in pyarrow._dataset.Dataset.to_table
  File "pyarrow/_dataset.pyx", line 2818, in pyarrow._dataset.Scanner.to_table
  File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/_fs.pyx", line 1551, in pyarrow._fs._cb_open_input_file
  File "/path/to/deltalake/fs.py", line 22, in open_input_file
    return pa.PythonFile(DeltaFileSystemHandler.open_input_file(self, path))
deltalake.PyDeltaTableError: Object at location /mnt/network_expansion/path/to/table/part-00000-aaaaaa-1111-2222-bbbbb-333333333.c000.snappy.parquet not found: No such file or directory (os error 2)

The *.parquet file is present.

What you expected to happen:

Succeful read

How to reproduce it:

More details:

Worth noting:

  1. If we run similar code, but written in Rust, it will work just fine.
  2. The full network path contains characters that get URL-encoded.
  3. If we try to save schema (file://) in url-path, as suggested in this issue, the code will do some more progress (it will find and read all *.parquet files), but will fail with a SIGABRT.

From (1) I'm taking a guess, that there's a broken logic either somewhere in deltalake python wrapper, because (looking at the traceback):
File "pyarrow/_fs.pyx", line 1551, in pyarrow._fs._cb_open_input_file is a call from PyArrow to a SystemHandler, which is DeltaFileSystemHandler

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

binding/pythonIssues for the Python packagebugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions