Skip to content

HADOOP-16465 listLocatedStatus() optimisation #1943

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

mukund-thakur
Copy link
Contributor

@mukund-thakur mukund-thakur commented Apr 7, 2020

Optimize S3AFileSystem.listLocatedStatus() to perform list
operations directly and then fallback to head checks for file

Ran test in ap-south-1 bucket with command:
mvn clean verify -Ds3guard -Ddynamo -Dparallel-tests -DtestsThreadCount=8

There are two failures. I am looking at them.

@hadoop-yetus

This comment has been minimized.

@mukund-thakur
Copy link
Contributor Author

2 of my new tests are failing on command line but succeeding in IDE.

[ERROR] ITestS3AFileOperationCost.testCostOfListLocatedStatusOnEmptyDir:141->verifyOperationCount:190->Assert.assertEquals:645->Assert.failNotEquals:834->Assert.fail:88 Count of object_list_requests starting=0 current=1 diff=1: object_list_requests expected:<0> but was:<1> [ERROR] ITestS3AFileOperationCost.testCostOfListLocatedStatusOnNonEmptyDir:159->verifyOperationCount:190->Assert.assertEquals:645->Assert.failNotEquals:834->Assert.fail:88 Count of object_list_requests starting=2 current=3 diff=1: object_list_requests expected:<0> but was:<1>

@mukund-thakur
Copy link
Contributor Author

@steveloughran @bgaborg @mehakmeet . Not sure why github is not letting me add anybody as reviewer.

@steveloughran steveloughran self-requested a review April 8, 2020 14:20
@steveloughran
Copy link
Contributor

  1. those failures happening when guarded or unguarded?
  2. check what options you are passing down to the test runner in the IDE; it may be guarded or unguarded differently from the command line.

in HADOOP-13208 #1861 I'm parameterising these tests so that they will always test guarded +unguarded and dir marker keep vs delete. Makes for more complex assertions so I'm also improving how we assert metric diffs and report their failures. It'll make the suite very different, but it's the only way to have consistent estimates of the different codepath costs

@mukund-thakur
Copy link
Contributor Author

Failures happening for guarded bucket.
This is a parameterised test which runs for both raw and guarded FS. If the guard settings are not enabled properly then tests actually skip rather than failing. So, I am not sure what am I missing here :(

@mukund-thakur
Copy link
Contributor Author

Found the issue. Described here.
https://issues.apache.org/jira/browse/HADOOP-16979

@mukund-thakur mukund-thakur force-pushed the HADOOP-16465-listlocatedstatus-optimisation branch from 9f53733 to 0d84322 Compare April 13, 2020 11:56
@hadoop-yetus

This comment has been minimized.

@steveloughran steveloughran changed the title Hadoop 16465 listlocatedstatus optimisation HADOOP-16465 listLocatedStatus() optimisation Apr 14, 2020
@steveloughran
Copy link
Contributor

checkstyle is unfixable and only a line length problem

./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:4319:  private RemoteIterator<S3ALocatedFileStatus> getLocatedFileStatusIteratorForDir(: Line is longer than 80 characters (found 82). [LineLength]

@steveloughran
Copy link
Contributor

LGTM. +1 from me, merging after a local test run

@steveloughran steveloughran merged commit 7b2d84d into apache:trunk Apr 14, 2020
@mukund-thakur
Copy link
Contributor Author

Thanks :)

mukund-thakur added a commit to mukund-thakur/hadoop that referenced this pull request Apr 15, 2020
Contributed by Mukund Thakur

Optimize S3AFileSystem.listLocatedStatus() to perform list
operations directly and then fallback to head checks for files
asfgit pushed a commit that referenced this pull request Apr 15, 2020
Contributed by Mukund Thakur

Optimize S3AFileSystem.listLocatedStatus() to perform list
operations directly and then fallback to head checks for files

Change-Id: Ia2c0fa6fcc5967c49b914b92f41135d07dab0464
zhangxiping1 pushed a commit to zhangxiping1/hadoop that referenced this pull request Dec 13, 2022
Contributed by Mukund Thakur

Optimize S3AFileSystem.listLocatedStatus() to perform list
operations directly and then fallback to head checks for files

Change-Id: Ia2c0fa6fcc5967c49b914b92f41135d07dab0464
jojochuang pushed a commit to jojochuang/hadoop that referenced this pull request May 23, 2023
Contributed by Mukund Thakur

Optimize S3AFileSystem.listLocatedStatus() to perform list
operations directly and then fallback to head checks for files

Change-Id: Ib3d37e360bf673eabb93d4c55539cea9d4627acc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants