Skip to content

Pandas can't handle zipfile.Path objects (ValueError: Invalid file path or buffer object type: <class 'zipfile.Path'>) #49906

Open
@buhtz

Description

@buhtz

This is reproducible in current latest Pandas 1.5.2.

In Python the zipfile.Path class is intendent to act similar (but not absolute equal!) to pathlib.Path. The latter is accepted by pandas but not the first.

Steps to reproduce:

  1. Create a zip file named foo.zip with one an csv-file in it named bar.csv.
  2. Create a path object directly pointing to that csv file in the zip file: zp = zipfile.Path('foo.zip', 'bar.csv')
  3. Use that path object (zp) in pandas.read_csv() as path object.

Because of that part of your code

pandas/pandas/io/common.py

Lines 446 to 452 in 3b09765

# is_file_like requires (read | write) & __iter__ but __iter__ is only
# needed for read_csv(engine=python)
if not (
hasattr(filepath_or_buffer, "read") or hasattr(filepath_or_buffer, "write")
):
msg = f"Invalid file path or buffer object type: {type(filepath_or_buffer)}"
raise ValueError(msg)

Python raise an " ValueError: Invalid file path or buffer object type: <class 'zipfile.Path'>".

EDIT:
I'm aware that pandas.read_csv() do offer the compressions argument and can read compressed csv files by its own. But this doesn't help in my case. I'm using pandas as a backend for a more higher level API reading data files. Pandas is just one part of it. And one shortcoming of pandas here is that it is not able to deal with ZIP files containing multiple CSV files.

pathlib.Path and zipfile.Path are standard python. And pandas IMHO should be able to deal with it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementIO DataIO issues that don't fit into a more specific label

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions