[Data] read_parquet
doesn't work with multiple input directories #46049
Open
Description
What happened + What you expected to happen
Title.
Versions / Dependencies
Reproduction script
import ray
ray.data.read_parquet(["s3://anonymous@air-example-data-2/10G-image-data-synthetic-raw-parquet"] * 2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/balaji/Documents/GitHub/ray/python/ray/data/read_api.py", line 772, in read_parquet
datasource = ParquetDatasource(
^^^^^^^^^^^^^^^^^^
File "/Users/balaji/Documents/GitHub/ray/python/ray/data/datasource/parquet_datasource.py", line 238, in __init__
_handle_read_os_error(e, paths)
File "/Users/balaji/Documents/GitHub/ray/python/ray/data/datasource/file_meta_provider.py", line 250, in _handle_read_os_error
raise error
File "/Users/balaji/Documents/GitHub/ray/python/ray/data/datasource/parquet_datasource.py", line 225, in __init__
pq_ds = pq.ParquetDataset(
^^^^^^^^^^^^^^^^^^
File "/Users/balaji/anaconda3/envs/ray/lib/python3.11/site-packages/pyarrow/parquet/core.py", line 1354, in __init__
self._dataset = ds.dataset(path_or_paths, filesystem=filesystem,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/balaji/anaconda3/envs/ray/lib/python3.11/site-packages/pyarrow/dataset.py", line 785, in dataset
return _filesystem_dataset(source, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/balaji/anaconda3/envs/ray/lib/python3.11/site-packages/pyarrow/dataset.py", line 475, in _filesystem_dataset
return factory.finish(schema)
^^^^^^^^^^^^^^^^^^^^^^
File "pyarrow/_dataset.pyx", line 3025, in pyarrow._dataset.DatasetFactory.finish
File "pyarrow/error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
FileNotFoundError: [Errno 2] Error creating dataset. Could not read schema from 'air-example-data-2/10G-image-data-synthetic-raw-parquet'. Is this a 'parquet' file?: Path does not exist 'air-example-data-2/10G-image-data-synthetic-raw-parquet'. Detail: [errno 2] No such file or directory
Issue Severity
Medium: It is a significant difficulty but I can work around it.