-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read parquet files from AWS S3 #652
Conversation
I'm having a bad time implementing the integration test using minIO. It seams that Polars (or Reqwest) does not accept connections that are not using TLS - only HTTPs endpoints are accepted. My first attempt to have localhost certs was to generate them with mkcert. Then I tested using a ngrok tunnel - that uses TLS - pointing to my minIO instance on localhost and it works 🎉 Unfortunately this is not a valid solution, so I'm considering running a minIO instance in a private server, or use the AWS S3 service, but with instructions on how to reproduce the environment. If any of you have ideas, please let me know! |
Did you try calling this in the object_store configuration? https://docs.rs/object_store/latest/object_store/aws/struct.AmazonS3Builder.html#method.with_allow_http The docs also mention localstack testing, so it may be worth looking at object_store own tests in case they use something similar? |
ead1e7f
to
a5f2133
Compare
@josevalim thank you! I didn't try this config. I cannot access the internal builder that this method is from, so I'm trying the way of using the |
The idea is to have an uniformed way to pass down the URI to polars.
Co-authored-by: Wojtek Mach <wojtekmach@users.noreply.github.com>
Since polars only uses the `s3://` scheme, we shouldn't support http yet.
this way, the backend can build the URL based on region automatically
2fc16b1
to
3d8eb4e
Compare
This is going to start a "localstack" server using podman or docker, and then we test if the file is downloaded correctly.
This is an implementation that uses the lazy backend, but collects the DF immediately.
@josevalim @wojtekmach @Qqwy @jonatanklosko I think it's ready for another pass, if you can review again :) PS: sorry for the amount of failed builds 😓 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work, just some minor final nits. :) Also please add to the README how to run the cloud tests. :)
Co-authored-by: José Valim <jose.valim@dashbit.co>
This feature enables reading Parquet files directly from services like the AWS S3.
It's also compatible with other services that implements the S3 API (like minIO, localstack, DO spaces, etc).
This is using Polars' "scan_parquet" feature, that is lazy and also works for reading multiple files at once.