Skip to content

HADOOP-18168. Fix S3A ITestMarkerTool dep. on purged public bucket, introduce PublicDatasetTestUtils #4140

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
May 3, 2022

Conversation

dannycjones
Copy link
Contributor

@dannycjones dannycjones commented Apr 5, 2022

Description of PR

Addressing HADOOP-18168, this change replaces the use of landsat-pds with its successor usgs-landsat, another public dataset. landsat-pds recently was cleaned up excluding some specific files and is no longer suitable for this test case. usgs-landsat is a requester pays bucket, so costs will not be passed on to the owner of the bucket - instead, they will be paid by the test runner.

We introduce a new class PublicDatasetTestUtils which provides a simple interface for getting file systems for a specific purpose. This change allows the bucket used to be replaced or for the tests to be skipped - for example, for S3-compatible stores or non-aws partition S3 endpoints.

How was this patch tested?

It will be tested from a eu-west-1 EC2 instance against a eu-west-1 bucket.

@dannycjones
Copy link
Contributor Author

Integration tests passing against eu-west-1 bucket and EC2 instance.

@dannycjones dannycjones marked this pull request as ready for review April 5, 2022 10:29
Copy link
Contributor

@ahmarsuhail ahmarsuhail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@dannycjones dannycjones changed the title HADOOP-18168. Fix ITestMarkerTool dependency on purged public bucket HADOOP-18168. Fix s3a ITestMarkerTool dependency on purged public bucket Apr 5, 2022
@dannycjones dannycjones changed the title HADOOP-18168. Fix s3a ITestMarkerTool dependency on purged public bucket HADOOP-18168. Fix S3A ITestMarkerTool dependency on purged public bucket Apr 5, 2022
Copy link
Contributor

@monthonk monthonk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@dannycjones dannycjones marked this pull request as draft April 5, 2022 18:03
@dannycjones
Copy link
Contributor Author

Integration tests not currently passing with this commit, diving deeper here.

@dannycjones
Copy link
Contributor Author

dannycjones commented Apr 27, 2022

Tests passing against eu-west-1 bucket after merge.

@dannycjones dannycjones marked this pull request as ready for review April 27, 2022 14:39
@apache apache deleted a comment from hadoop-yetus Apr 28, 2022
Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, only some minor tweaks.

it is going to need a mention in the testing doc though as now there's a new option which needs to be set when testing against non-aws stores

@dannycjones
Copy link
Contributor Author

Tested latest patch against eu-west-1, all OK

@dannycjones
Copy link
Contributor Author

Thanks @steveloughran, I've added a note to testing.md.

I am hoping that if we get all public datasets into PublicDatasetTestUtils, we can slim down the notes in testing.md and instead point the tester to review PublicDatasetTestUtils to understand which configurations to update.

For now, I have not tried to move the CSV Scale tests but I'd like to see them moved in a future patch.

@dannycjones dannycjones changed the title HADOOP-18168. Fix S3A ITestMarkerTool dependency on purged public bucket HADOOP-18168. Fix S3A ITestMarkerTool dep. on purged public bucket, introduce PublicDatasetTestUtils Apr 29, 2022
@apache apache deleted a comment from hadoop-yetus May 3, 2022
@apache apache deleted a comment from hadoop-yetus May 3, 2022
@apache apache deleted a comment from hadoop-yetus May 3, 2022
@apache apache deleted a comment from hadoop-yetus May 3, 2022
@apache apache deleted a comment from hadoop-yetus May 3, 2022
@apache apache deleted a comment from hadoop-yetus May 3, 2022
@apache apache deleted a comment from hadoop-yetus May 3, 2022
@apache apache deleted a comment from hadoop-yetus May 3, 2022
Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for this; will cherrypick to branch 3.3 too

@steveloughran steveloughran merged commit 6ab7b72 into apache:trunk May 3, 2022
steveloughran added a commit that referenced this pull request May 3, 2022
asfgit pushed a commit that referenced this pull request May 3, 2022
asfgit pushed a commit that referenced this pull request May 3, 2022
…4140)

This moves off use of the purged s3a://landsat-pds bucket, so fixing tests
which had started failing.
* Adds a new class, PublicDatasetTestUtils to manage the use of public datasets.
* The new test bucket s3a://usgs-landsat/ is requester pays, so depends upon
  HADOOP-14661.

Consult the updated test documentation when running against other S3 stores.

Contributed by Daniel Carl Jones

Change-Id: Ie8585e4d9b67667f8cb80b2970225d79a4f8d257
@steveloughran
Copy link
Contributor

merged to trunk, albeit with the commit message partially cut. reverted that and recommitted with the full text. test with branch-3.3 in progress

@apache apache deleted a comment from hadoop-yetus May 3, 2022
@apache apache deleted a comment from hadoop-yetus May 3, 2022
asfgit pushed a commit that referenced this pull request May 3, 2022
…4140)

This moves off use of the purged s3a://landsat-pds bucket, so fixing tests
which had started failing.
* Adds a new class, PublicDatasetTestUtils to manage the use of public datasets.
* The new test bucket s3a://usgs-landsat/ is requester pays, so depends upon
  HADOOP-14661.

Consult the updated test documentation when running against other S3 stores.

Contributed by Daniel Carl Jones

Change-Id: Ie8585e4d9b67667f8cb80b2970225d79a4f8d257
@dannycjones dannycjones deleted the HADOOP-18168 branch May 3, 2022 15:01
HarshitGupta11 pushed a commit to HarshitGupta11/hadoop that referenced this pull request Nov 28, 2022
This moves off use of the purged s3a://landsat-pds bucket, so fixing tests
which had started failing.
* Adds a new class, PublicDatasetTestUtils to manage the use of public datasets.
* The new test bucket s3a://usgs-landsat/ is requester pays, so depends upon
  HADOOP-14661.

Consult the updated  test documentation when running against other S3 stores.

Contributed by Daniel Carl Jones
HarshitGupta11 pushed a commit to HarshitGupta11/hadoop that referenced this pull request Nov 28, 2022
HarshitGupta11 pushed a commit to HarshitGupta11/hadoop that referenced this pull request Nov 28, 2022
…pache#4140)

This moves off use of the purged s3a://landsat-pds bucket, so fixing tests
which had started failing.
* Adds a new class, PublicDatasetTestUtils to manage the use of public datasets.
* The new test bucket s3a://usgs-landsat/ is requester pays, so depends upon
  HADOOP-14661.

Consult the updated test documentation when running against other S3 stores.

Contributed by Daniel Carl Jones

Change-Id: Ie8585e4d9b67667f8cb80b2970225d79a4f8d257
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants