-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HADOOP-16635. S3A innerGetFileStatus scans for directories-only still does a HEAD. #1601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-16635. S3A innerGetFileStatus scans for directories-only still does a HEAD. #1601
Conversation
16d7149
to
1dc8eb6
Compare
1dc8eb6
to
938bc6e
Compare
… does a HEAD. 1. Make sure the probe set is passed down. 2. check for Head probe wraps only HEAD request. Change-Id: I7664a80da53f85c42c319cd6b56b3d6366e09fc6
* move assertion in ITestS3GuardTTL up to test setup * better resilience in ITestAuthoritativePath Change-Id: I88fed4779051de661ff0a7dd33a21d4f5004b096
…arded. This guarantees the unguarded cost is always measured. Change-Id: I380d6babbd6bf944bf9b69a81865755f5b99f8b6
Not really an expert on the S3AFs core functionality. That said, the current code does seem quite broken - if HEAD is not included in the probe set.
This may need a minor change. I believe the leading key.endsWith("/") is to avoid duplicating a HEAD request - if the key already contains a "/". However, if the set of probes does not contain "Head", and it does contain "DirMarker", but the key ends in a "/" - there won't be any HEAD requests. Is that expected? Other than this, looks good to me. General question, unrelated to the patch, is a single LIST not sufficient to cover both the with and without "/" scenario? |
938bc6e
to
80950e0
Compare
Sid, thanks for the comments, will review/update the patch Interesting point about the double list. This code path is how its always been, presumably descended from the s3n code. LIST is slower, costs more and much more prone to eventual consistency, which are all good arguments for HEAD first. I actually plan to tune some of the calls which always seem to get used on directory walks (listStatus, listFiles, listLocatedStatus) to do the subtree list first, and only go for the HEAD calls if they don't find any children. This is to reduce the cost of treewalks where the bias is towards populated directories |
… marker probe Address Sid's feedback about handling of keys ending in / by always using the dir marker HEAD for them. Added tests to verify that, and another to verify that the create file operation only does one HEAD when overwrite == true Change-Id: I6d7f39e41738adcbe2ab5609a146310d7e36b557
} else { | ||
LOG.debug("Found exact file: normal file"); | ||
if (!key.isEmpty()) { | ||
if (probes.contains(StatusProbeEnum.Head) && !key.endsWith("/")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is ready to review.
With this change - if someone asks for a "HEAD" only, on a URL which happens to have a trailing "/" nothing will be looked up.
IF HAED + DIRMARKER - then yes, at least 1 request will go out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. That's exactly my thought.
note: none of this API is public, its for avoiding problems on ops where we don't want to look for a file but do for a dir marker. And AFAIK, you can't go from a Path to a / as we strip that off.
How about
- I do a review of all places where we don't ask for the HEAD? So far I think I'm only doing it in create
- I clarify in javadocs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is handled in https://issues.apache.org/jira/browse/HADOOP-15430 right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- and list the issues related to it. Change-Id: Ia6e47fe1f60c7e17e97941d8ddcc03d524e27578
Now. can you create a Path with a trailing / ? I was about to say no, but remembered https://issues.apache.org/jira/browse/HADOOP-15430 .. one of the constructors of Path does let you get away with it, which is something which breaks S3Guard already |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1;
tested against ireland, no errors.
thx |
Change-Id: I7664a80da53f85c42c319cd6b56b3d6366e09fc6