You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
One can register a table with the file scheme file://, this in turns allows listing table to list files and find partitions.
Unfortunately, LocalStore returns a FileMetaStream where the SizedFile path has the prefix stripped. This could be fine except `datafusion::datasource::listing::helpers::parse_partitions_for_path``` calls strip_prefix on the file_path with the original path used to register the table, which contains the scheme.
There are two ways to fix this, either strip the scheme off the path in the registered table as well (would probably be best to let the ObjectStore implementation do that), or enhance FileMeta and use a URI instead of just a path.
To Reproduce
Steps to reproduce the behavior:
/tmp/listing_table/part1=value1/ and /tmp/listing_table/part1=value2/
should contain one parquet file each
Expected behavior
The above should count the lines in the files properly, with the current behavior it'll return 0.
Additional context
I'm trying to be consistent on my project and so I use schemes for both local and remote files. Finding this debug required a lot of debugging.
The text was updated successfully, but these errors were encountered:
We ran into the same issue in delta-rs. I think the ideal solution would be to normalize table path within the objecstore implementation when it's being created.
The issue with using URI in fileMeta is all the object stores' list calls do return full URIs, so we will have to perform a lot of string creations in the heap to construct the URIs, this could be expensive when we need to deal with millions of files.
Describe the bug
One can register a table with the file scheme
file://
, this in turns allows listing table to list files and find partitions.Unfortunately, LocalStore returns a FileMetaStream where the SizedFile path has the prefix stripped. This could be fine except `datafusion::datasource::listing::helpers::parse_partitions_for_path``` calls strip_prefix on the file_path with the original path used to register the table, which contains the scheme.
There are two ways to fix this, either strip the scheme off the path in the registered table as well (would probably be best to let the ObjectStore implementation do that), or enhance FileMeta and use a URI instead of just a path.
To Reproduce
Steps to reproduce the behavior:
/tmp/listing_table/part1=value1/
and/tmp/listing_table/part1=value2/
should contain one parquet file each
Expected behavior
The above should count the lines in the files properly, with the current behavior it'll return 0.
Additional context
I'm trying to be consistent on my project and so I use schemes for both local and remote files. Finding this debug required a lot of debugging.
The text was updated successfully, but these errors were encountered: