Skip to content

HADOOP-14661. Add S3 requester pays bucket support to S3A #3962

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Mar 23, 2022

Conversation

dannycjones
Copy link
Contributor

@dannycjones dannycjones commented Feb 7, 2022

Description of PR

This change introduces support for "requester pays" S3 buckets (HADOOP-14661).

There was already some discussion and patches made available (e.g. #250), although the code base has changed a bit since then.

In summary:

  • Add some information to the main S3A documentation
  • Add example of error and how to fix/configure to troubleshooting doc
  • Add options to configure requester pays in S3A (fs.s3a.requester-pays.enabled) when creating S3A filesystem, which uses S3ClientFactory to create a new S3Client also taking this option
  • Add the required header inside S3ClientFactory if option was configured
  • Remove unused configuration options for requester pays in RequestFactoryImpl
  • Add new integration test which talks to publicly available usgs-landsat bucket (new version, not old) which is configured to require requester pays ('usgs-landsat' on Registry of Open Data on AWS).
  • Add information on how to configure the new tests in testing.md

How was this patch tested?

I created an EC2 instance and an S3 Bucket in eu-west-1 and ran the tests against the global S3 endpoint.

mvn -Dparallel-tests -DtestsThreadCount=16 clean verify

Note, the request pays specific tests introduced do not use the developer supplied bucket.

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@monthonk
Copy link
Contributor

Looks good to me.

Copy link
Contributor

@monthonk monthonk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved

@dannycjones
Copy link
Contributor Author

@mukund-thakur would you be able to review this PR?

@mukund-thakur
Copy link
Contributor

Sure will check early next week.

@mukund-thakur
Copy link
Contributor

CC @steveloughran @mehakmeet

@apache apache deleted a comment from hadoop-yetus Feb 28, 2022
@apache apache deleted a comment from hadoop-yetus Feb 28, 2022
Copy link
Contributor

@mukund-thakur mukund-thakur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. Some minor comments.

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this, sorry I've been unreliable at reviewing.

if the SDK does now attach the requester pays header everywhere then we don't need the stuff in the request factory...that was in as the discussion implied SDK coverage was incomplete.

Test-wise, made some comments. I think it also might be good to try and do some more API calls to make sure the requests are tagged, listing in particular

@apache apache deleted a comment from hadoop-yetus Mar 9, 2022
@dannycjones
Copy link
Contributor Author

dannycjones commented Mar 10, 2022

Thank you both for the comments, I have just pushed a few changes.

I also removed unused configuration from the RequestFactoryImpl - I am assuming since its under .impl package, it's fair game to remove without deprecation.

@steveloughran
Copy link
Contributor

I am assuming since its under .impl package, it's fair game to remove without deprecation.

yes. though we do regularly found out where that's not always true (s3 client factory, store context). how do we find out? change things and see what breaks, then add comments in the javadocs to warn others

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like the tests, much cleaner now. made some minor suggestions there.

if you can do those and cover how to change/disable the tests in testing.md, this is ready to go in

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
this is good to go. one little detail, which is those line endings in the markdown.

hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md:1638:S3A supports buckets with 
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md:1639:[Requester Pays](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RequesterPaysBuckets.html) 
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md:1640:enabled. When a bucket is configured with requester pays, the requester must cover 
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md:1643:For requests to be successful, the S3 client must acknowledge that they will pay 
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md:598:By default, the requester pays tests will look for a bucket that exists on Amazon S3 
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md:602:The test only requires an object of at least a few bytes in order 
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md:612:If the endpoint does not support requester pays, you can also disable the tests by configuring 

can you fix these as the github merge button doesn't do it for us?

@dannycjones
Copy link
Contributor Author

@steveloughran I caught those after I pushed most of the other fixes from the last feedback and pushed a second commit ae8e812.

There's two Yetus comments, latest one seems happy. Are we good to merge?

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I am happy

@steveloughran steveloughran merged commit 9edfe30 into apache:trunk Mar 23, 2022
@steveloughran
Copy link
Contributor

merged to trunk, will cherrypick to 3.3 locally, retest and if all is good merge there too

asfgit pushed a commit that referenced this pull request Mar 23, 2022
Adds the option fs.s3a.requester.pays.enabled, which, if set to true, allows
the client to access S3 buckets where the requester is billed for the IO.

Contributed by Daniel Carl Jones

Change-Id: I51f64d0f9b3be3c4ec493bcf91927fca3b20407a
@apache apache deleted a comment from hadoop-yetus Mar 23, 2022
@apache apache deleted a comment from hadoop-yetus Mar 23, 2022
@dannycjones dannycjones deleted the HADOOP-14661 branch March 23, 2022 21:07
@dannycjones
Copy link
Contributor Author

Thanks Steve, Mukund, and Monthon for the reviews!

HarshitGupta11 pushed a commit to HarshitGupta11/hadoop that referenced this pull request Nov 28, 2022
Adds the option fs.s3a.requester.pays.enabled, which, if set to true, allows
the client to access S3 buckets where the requester is billed for the IO.

Contributed by Daniel Carl Jones
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants