Skip to content

Special characters in partition path not handled locally  #1299

Closed
@zar3bski

Description

@zar3bski

Environment

Delta-rs version: 0.8.1

Binding:

Environment:

  • OS: Ubuntu 22.04.2 LTS
  • Python: 3.10.6

Bug

What happened:
Parquet file were not found

What you expected to happen:

I expected to_pandas to load the parquet file

How to reproduce it:

from deltalake import DeltaTable, write_deltalake
from pandas import DataFrame
df = DataFrame(
    [
        ["Pierre", "Python", 24, "R&D"], # special character: &
        ["David", "Python", 33, "R&D"],
        ["Cyril", "Typescript", 26, "R&D"],
        ["Marie", "Excel", 36, "Commerce"],
    ],
    columns=["prenom", "skill", "age", "department"],
)
write_deltalake("./test/tables/garbage.delta", df, partition_by=["department"])
dt = DeltaTable("./test/tables/garbage.delta")

dt.to_pandas()

More details:

Traceback (most recent call last):
  File "/home/zar3bski/Documents/Code/octaave/deltastic/test/minimally_reproductible.py", line 18, in <module>
    dt.to_pandas()
  File "/home/zar3bski/.cache/pypoetry/virtualenvs/deltastic-GlE5VuQW-py3.10/lib/python3.10/site-packages/deltalake/table.py", line 418, in to_pandas
    return self.to_pyarrow_table(
  File "/home/zar3bski/.cache/pypoetry/virtualenvs/deltastic-GlE5VuQW-py3.10/lib/python3.10/site-packages/deltalake/table.py", line 400, in to_pyarrow_table
    return self.to_pyarrow_dataset(
  File "pyarrow/_dataset.pyx", line 369, in pyarrow._dataset.Dataset.to_table
  File "pyarrow/_dataset.pyx", line 2818, in pyarrow._dataset.Scanner.to_table
  File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/_fs.pyx", line 1551, in pyarrow._fs._cb_open_input_file
  File "/home/zar3bski/.cache/pypoetry/virtualenvs/deltastic-GlE5VuQW-py3.10/lib/python3.10/site-packages/deltalake/fs.py", line 22, in open_input_file
    return pa.PythonFile(DeltaFileSystemHandler.open_input_file(self, path))
deltalake.PyDeltaTableError: Object at location /home/zar3bski/Documents/Code/octaave/deltastic/test/tables/garbage.delta/department=R&D/0-0294291a-0d31-410b-8b04-115377a6f9a2-0.parquet not found: No such file or directory (os error 2)
terminate called recursively
terminate called without an active exception
[1]    189090 IOT instruction (core dumped)  poetry run python test/minimally_reproductible.py

When I look in my project files, I find the file in test/tables/garbage.delta/department=R%2526D/0-0294291a-0d31-410b-8b04-115377a6f9a2-0.parquet There seems to be a problem with the URL encoding of & that should not be handled as %2526 in a local context

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions