Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to pick S3 download multipart threshold, and other settings as well. #968

Open
2 tasks
amircohere opened this issue Nov 22, 2023 · 0 comments
Open
2 tasks
Labels
feature-request A feature should be added or improved. high-level-library p3 This is a minor priority issue

Comments

@amircohere
Copy link

Describe the feature

In python with Boto3, I can do the following:

from boto3.s3.transfer import TransferConfig

config = TransferConfig(
    multipart_threshold=4 * 1024 * 1024 * 1024,  # 4GB
    max_concurrency=1,
    multipart_chunksize=32 * 1024 * 1024,  # 32MB
)

# some code here...

self.s3.download_file(
    Bucket="commoncrawl",
    Key="path_to_file.txt",
    Filename="local.txt",
    Config=config,
)

A way to do this from aws-rust-sdk, or at least use the locally configured rules, for example

> aws configure set s3.multipart_threshold 4GB

Use Case

Common Crawl's bucket is always rate limited (on requests) but not bandwidth, so avoiding multipart downloads is the only way to reliably download it. The time difference is between 10 seconds and 10 minutes.

Proposed Solution

No response

Other Information

No response

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change

A note for the community

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue, please leave a comment
@amircohere amircohere added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels Nov 22, 2023
@rcoh rcoh added high-level-library and removed needs-triage This issue or PR still needs to be triaged. labels Nov 24, 2023
@jmklix jmklix added the p3 This is a minor priority issue label Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request A feature should be added or improved. high-level-library p3 This is a minor priority issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants