-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HADOOP-17022 Tune S3AFileSystem.listFiles() api. #2038
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-17022 Tune S3AFileSystem.listFiles() api. #2038
Conversation
Perform list operations directly assuming the path to be a directory and then fallback to head checks for file. This optimization will reduce the number of remote calls to S3 when the input is a directory. This will also reduce the number of metastore calls in case of guarded store.
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AFileOperationCost.java
Outdated
Show resolved
Hide resolved
I have to setup ITestAssumeRole tests as well. |
// fallback to file existence check as the path | ||
// can be a file or empty directory. | ||
if (!listFilesAssumingDir.hasNext()) { | ||
final S3AFileStatus fileStatus = (S3AFileStatus) getFileStatus(path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
innerGetFileStatus with probes for file and dir marker -HEAD_OR_DIR_MARKER
; we don't need to issue a second LIST call, do we? This will save a call on listing a missing path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't do this as we will need a list for an directory which is empty. Else we will get FNFE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, but an empty dir is that HEAD marker. So no list, right?
Don't worry about it -my dir marker tuning changes things anyway, replacing the HEAD + / with a list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is an empty directory within a base directory. For example directory structure in test ITestS3AContractGetFileStatus>AbstractContractGetFileStatusTest.testListFilesEmptyDirectoryRecursive
There won't be any files thus listing will be empty.
} | ||
// If we have reached here, it means either there are files | ||
// in this directory or it is empty. | ||
return listFilesAssumingDir; | ||
} catch (AmazonClientException e) { | ||
// TODO S3Guard: retry on file not found exception |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to consider this and see if the TODO is already done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any retries in listFiles().
Even the listLocatedStatus() and listStatus() are called with Invoker.Once(). That means just the calls with exception translated without any retries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets cut that TODO line
} catch (AmazonClientException e) { | ||
// TODO S3Guard: retry on file not found exception | ||
throw translateException("listFiles", path, e); | ||
} | ||
} | ||
|
||
/** | ||
* List files under a path assuming the path to be a directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder if we can/should move this into the Listing class.
in #2046 I'm looking at doing that for openfile, just to pull something complex out of the main fs. I know it hurts maintenance a bit (all subsequent patches will need this patch before cherry picking), but your work is big enough that will happen anyway.
imagine we made the Listing
class a subclass of org.apache.hadoop.fs.s3a.impl.AbstractStoreOperation
and added a callback interface purely to those s3a FS calls it needs - just like we do with rename with org.apache.hadoop.fs.s3a.impl.OperationCallbacks
. we'd then move all the work over there and isolate it better.
of course, one thing this will make harder is for my directory marker changes, but that shouldn't be a blocker.
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AFileOperationCost.java
Show resolved
Hide resolved
Setup Assumed role tests. All good there.
|
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
Outdated
Show resolved
Hide resolved
The reason for failure of seem similar to the other test failure of ITestS3GuardListConsistency.testConsistentRenameAfterDelete. |
Mukund: can you paste in the full stack traces? that way I can use the IDE to see more about the problems. thanks |
Full stack traces for test failures: `java.lang.AssertionError: Recently renamed dir should not be visible: [s3a://mthakur-data/test/a/b/dir3-DELAY_LISTING_ME, s3a://mthakur-data/test/a]
`java.lang.AssertionError: Expected a java.io.FileNotFoundException to be thrown, but got the result: : "MagicCommitter{}"
|
Observed no change from previous implemetation.
🎊 +1 overall
This message was automatically generated. |
Latest commit fixes the above test failures.
|
💔 -1 overall
This message was automatically generated. |
yes, I think we will be needed the unguarded operation to work too. FWIW I do regularly test on both options. Try a -Dscale run too for even more coverage |
Fixed the raw test failures. Also ran the scale tests. All went fine apart from some timeout failures. |
All tests running fine other than getting timeout error in |
good to hear you are happy. What do others say? |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code is looking good, I think the next step for me is to pull this into my IDE and actually do a local build and test. Can you move to innerGetFileStatus and I'll check out and play with the updated PR tomorrow. This is probably good time to do a rebase too....this PR has been up for a few weeks.
// If file status was already passed, reuse it. | ||
final S3AFileStatus fileStatus = status != null | ||
? status | ||
: (S3AFileStatus) getFileStatus(path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's go to innerGetFileStatus here so the invocation count is consistent and we can declare we don't want to fall back to the list operation, just StatusProbeEnum.HEAD_OR_DIR_MARKER
. Saves a LIST call when listing a nonexistent path. Also lets us avoid that cast
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work as explained in #2038 (comment)
} | ||
// If we have reached here, it means either there are files | ||
// in this directory or it is empty. | ||
return listFilesAssumingDir; | ||
} catch (AmazonClientException e) { | ||
// TODO S3Guard: retry on file not found exception |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets cut that TODO line
*/ | ||
private RemoteIterator<S3ALocatedFileStatus> getListFilesAssumingDir( | ||
Path path, | ||
boolean recursive, Listing.FileStatusAcceptor acceptor, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add on a new line
if (recursive) { | ||
final PathMetadata pm = metadataStore.get(path, true); | ||
if (pm != null) { | ||
if (pm.isDeleted()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a copy/paste of the other use of this? if so, what about we pull lines 4250-4255 into a new static method in org.apache.hadoop.fs.s3a.s3guard.S3Guard and use in both places. could even have a test, maybe?
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AFileOperationCost.java
Show resolved
Hide resolved
resetMetricDiffs(); | ||
intercept(FileNotFoundException.class, | ||
() -> fs.listFiles(dir, true)); | ||
verifyOperationCount(2, 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with a move to innerGetFileStatus and only do the file/marker probes, we could get cost to (2, 1)
: (S3AFileStatus) getFileStatus(path); | ||
if (fileStatus.isFile()) { | ||
// if a status was given and it is a file. | ||
if (status != null && status.isFile()) { | ||
// simple case: File | ||
LOG.debug("Path is a file"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we are keeping this, include the path f in the log message, just to make more sense of it in a busy log
merged to trunk. thanks |
Perform list operations directly assuming the path to be a directory
and then fallback to head checks for file. This optimization will
reduce the number of remote calls to S3 when the input is a directory.
This will also reduce the number of metastore calls in case of guarded
store.
Ran all tests using below command in ap-south-1 bucket.
mvn clean verify -Ds3guard -Ddynamo -Dauth
Only two failures:
[ERROR] Failures:
[ERROR] ITestS3GuardListConsistency.testConsistentRenameAfterDelete:286 Recently renamed dir should not be visible: [s3a://mthakur-data/test/a]
[ERROR] ITestMagicCommitProtocol>AbstractITCommitProtocol.testCommitterWithDuplicatedCommit:806->AbstractITCommitProtocol.expectFNFEonTaskCommit:876 Expected a java.io.FileNotFoundException to be thrown, but got the result: : "MagicCommitter{}"