Skip to content

read_parquet S3 dir support #26388

Closed
Closed
@Seanspt

Description

@Seanspt

When using spark to process data and save to s3, the files are like

s3://bucket/path/_SUCCESS
s3://bucket/path/part-00000-uuid.snappy.parquet
s3://bucket/path/part-00002-uuid.snappy.parquet
s3://bucket/path/part-00001-uuid.snappy.parquet
s3://bucket/path/part-00003-uuid.snappy.parquet

It would be nice to load these files in one line.

df = pd.read_parquet("s3://bucket/path")

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions