Closed
Description
Environment
Delta-rs version: 0.8.1
Binding:
Environment:
- OS: Ubuntu 22.04.2 LTS
- Python: 3.10.6
Bug
What happened:
Parquet file were not found
What you expected to happen:
I expected to_pandas
to load the parquet file
How to reproduce it:
from deltalake import DeltaTable, write_deltalake
from pandas import DataFrame
df = DataFrame(
[
["Pierre", "Python", 24, "R&D"], # special character: &
["David", "Python", 33, "R&D"],
["Cyril", "Typescript", 26, "R&D"],
["Marie", "Excel", 36, "Commerce"],
],
columns=["prenom", "skill", "age", "department"],
)
write_deltalake("./test/tables/garbage.delta", df, partition_by=["department"])
dt = DeltaTable("./test/tables/garbage.delta")
dt.to_pandas()
More details:
Traceback (most recent call last):
File "/home/zar3bski/Documents/Code/octaave/deltastic/test/minimally_reproductible.py", line 18, in <module>
dt.to_pandas()
File "/home/zar3bski/.cache/pypoetry/virtualenvs/deltastic-GlE5VuQW-py3.10/lib/python3.10/site-packages/deltalake/table.py", line 418, in to_pandas
return self.to_pyarrow_table(
File "/home/zar3bski/.cache/pypoetry/virtualenvs/deltastic-GlE5VuQW-py3.10/lib/python3.10/site-packages/deltalake/table.py", line 400, in to_pyarrow_table
return self.to_pyarrow_dataset(
File "pyarrow/_dataset.pyx", line 369, in pyarrow._dataset.Dataset.to_table
File "pyarrow/_dataset.pyx", line 2818, in pyarrow._dataset.Scanner.to_table
File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/_fs.pyx", line 1551, in pyarrow._fs._cb_open_input_file
File "/home/zar3bski/.cache/pypoetry/virtualenvs/deltastic-GlE5VuQW-py3.10/lib/python3.10/site-packages/deltalake/fs.py", line 22, in open_input_file
return pa.PythonFile(DeltaFileSystemHandler.open_input_file(self, path))
deltalake.PyDeltaTableError: Object at location /home/zar3bski/Documents/Code/octaave/deltastic/test/tables/garbage.delta/department=R&D/0-0294291a-0d31-410b-8b04-115377a6f9a2-0.parquet not found: No such file or directory (os error 2)
terminate called recursively
terminate called without an active exception
[1] 189090 IOT instruction (core dumped) poetry run python test/minimally_reproductible.py
When I look in my project files, I find the file in test/tables/garbage.delta/department=R%2526D/0-0294291a-0d31-410b-8b04-115377a6f9a2-0.parquet
There seems to be a problem with the URL encoding of & that should not be handled as %2526 in a local context
Activity