Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Scan_parquet() should be able to accept directory as input #14342

Closed
lmocsi opened this issue Feb 7, 2024 · 2 comments
Closed

[Python] Scan_parquet() should be able to accept directory as input #14342

lmocsi opened this issue Feb 7, 2024 · 2 comments
Assignees
Labels
A-io-parquet Area: reading/writing Parquet files A-io-partitioning Area: reading/writing (Hive) partitioned files accepted Ready for implementation enhancement New feature or an improvement of an existing feature P-goal Priority: aligns with long-term Polars goals

Comments

@lmocsi
Copy link

lmocsi commented Feb 7, 2024

Description

As of polars version 0.20.7 pl.scan_parquet() only accepts reference to files.
Should be able to accept reference to a directory, like pl.scan_pyarrow_dataset() does.
Implementation seems quite simple: should just append '/**/*.parquet' to the given input directory, and you have the file reference, that can be fed to the existing pl.scan_parquet() function.

@lmocsi lmocsi added the enhancement New feature or an improvement of an existing feature label Feb 7, 2024
@stinodego stinodego added A-io-parquet Area: reading/writing Parquet files accepted Ready for implementation A-io-partitioning Area: reading/writing (Hive) partitioned files labels May 6, 2024
@Smotrov
Copy link

Smotrov commented May 8, 2024

BTW /**/*.parquet would not work is parquet files have no extensions. Such files could be generated by AWS Athena for example.

For files with no extension /**/* would not work either, because it seams it is taking 0 byes folder objet into consideration and complaining that there is No Body

@c-peters c-peters added the P-goal Priority: aligns with long-term Polars goals label Jun 13, 2024
@c-peters c-peters assigned c-peters and nameexhaustion and unassigned c-peters Jun 13, 2024
@nameexhaustion
Copy link
Collaborator

closed by #17017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-io-parquet Area: reading/writing Parquet files A-io-partitioning Area: reading/writing (Hive) partitioned files accepted Ready for implementation enhancement New feature or an improvement of an existing feature P-goal Priority: aligns with long-term Polars goals
Projects
Archived in project
Development

No branches or pull requests

5 participants