Skip to content

HADOOP-16635. S3A innerGetFileStatus scans for directories-only still does a HEAD. #1601

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

steveloughran
Copy link
Contributor

  1. Make sure the probe set is passed down.
  2. check for Head probe wraps only HEAD request.

Change-Id: I7664a80da53f85c42c319cd6b56b3d6366e09fc6

@steveloughran steveloughran force-pushed the s3/HADOOP-16635-s3a-head+unguarded-tests branch from 16d7149 to 1dc8eb6 Compare October 5, 2019 12:26
@apache apache deleted a comment from hadoop-yetus Oct 7, 2019
@steveloughran steveloughran force-pushed the s3/HADOOP-16635-s3a-head+unguarded-tests branch from 1dc8eb6 to 938bc6e Compare October 7, 2019 17:02
… does a HEAD.

1. Make sure the probe set is passed down.
2. check for Head probe wraps only HEAD request.

Change-Id: I7664a80da53f85c42c319cd6b56b3d6366e09fc6
* move assertion in ITestS3GuardTTL up to test setup
* better resilience in ITestAuthoritativePath

Change-Id: I88fed4779051de661ff0a7dd33a21d4f5004b096
…arded.

This guarantees the unguarded cost is always measured.

Change-Id: I380d6babbd6bf944bf9b69a81865755f5b99f8b6
@sidseth
Copy link
Contributor

sidseth commented Oct 9, 2019

Not really an expert on the S3AFs core functionality. That said, the current code does seem quite broken - if HEAD is not included in the probe set.

if (!key.endsWith("/") && probes.contains(StatusProbeEnum.DirMarker)) {

This may need a minor change. I believe the leading key.endsWith("/") is to avoid duplicating a HEAD request - if the key already contains a "/". However, if the set of probes does not contain "Head", and it does contain "DirMarker", but the key ends in a "/" - there won't be any HEAD requests. Is that expected?

Other than this, looks good to me.

General question, unrelated to the patch, is a single LIST not sufficient to cover both the with and without "/" scenario?

@steveloughran steveloughran force-pushed the s3/HADOOP-16635-s3a-head+unguarded-tests branch from 938bc6e to 80950e0 Compare October 9, 2019 13:16
@apache apache deleted a comment from hadoop-yetus Oct 10, 2019
@apache apache deleted a comment from hadoop-yetus Oct 10, 2019
@apache apache deleted a comment from hadoop-yetus Oct 10, 2019
@steveloughran
Copy link
Contributor Author

Sid, thanks for the comments, will review/update the patch

Interesting point about the double list. This code path is how its always been, presumably descended from the s3n code. LIST is slower, costs more and much more prone to eventual consistency, which are all good arguments for HEAD first.

I actually plan to tune some of the calls which always seem to get used on directory walks (listStatus, listFiles, listLocatedStatus) to do the subtree list first, and only go for the HEAD calls if they don't find any children. This is to reduce the cost of treewalks where the bias is towards populated directories

… marker probe

Address Sid's feedback about handling of keys ending in / by always
using the dir marker HEAD for them. Added tests to verify that,
and another to verify that the create file operation only does one HEAD
when overwrite == true

Change-Id: I6d7f39e41738adcbe2ab5609a146310d7e36b557
} else {
LOG.debug("Found exact file: normal file");
if (!key.isEmpty()) {
if (probes.contains(StatusProbeEnum.Head) && !key.endsWith("/")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is ready to review.
With this change - if someone asks for a "HEAD" only, on a URL which happens to have a trailing "/" nothing will be looked up.
IF HAED + DIRMARKER - then yes, at least 1 request will go out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. That's exactly my thought.
note: none of this API is public, its for avoiding problems on ops where we don't want to look for a file but do for a dir marker. And AFAIK, you can't go from a Path to a / as we strip that off.

How about

  1. I do a review of all places where we don't ask for the HEAD? So far I think I'm only doing it in create
  2. I clarify in javadocs

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- and list the issues related to it.

Change-Id: Ia6e47fe1f60c7e17e97941d8ddcc03d524e27578
@steveloughran
Copy link
Contributor Author

steveloughran commented Oct 11, 2019

  1. updated the docs. The only place we don't do Head and dir marker is in create()
  2. also added a test to verify that the empty set of probes skips all http requests

Now. can you create a Path with a trailing / ? I was about to say no, but remembered https://issues.apache.org/jira/browse/HADOOP-15430 .. one of the constructors of Path does let you get away with it, which is something which breaks S3Guard already

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
0 reexec 82 Docker mode activated.
_ Prechecks _
+1 dupname 0 No case conflicting files found.
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+1 mvninstall 1280 trunk passed
+1 compile 32 trunk passed
+1 checkstyle 24 trunk passed
+1 mvnsite 38 trunk passed
+1 shadedclient 883 branch has no errors when building and testing our client artifacts.
+1 javadoc 26 trunk passed
0 spotbugs 63 Used deprecated FindBugs config; considering switching to SpotBugs.
+1 findbugs 61 trunk passed
_ Patch Compile Tests _
+1 mvninstall 32 the patch passed
+1 compile 28 the patch passed
+1 javac 28 the patch passed
+1 checkstyle 19 the patch passed
+1 mvnsite 34 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 shadedclient 876 patch has no errors when building and testing our client artifacts.
+1 javadoc 22 the patch passed
+1 findbugs 64 the patch passed
_ Other Tests _
+1 unit 69 hadoop-aws in the patch passed.
+1 asflicense 29 The patch does not generate ASF License warnings.
3698
Subsystem Report/Notes
Docker Client=19.03.3 Server=19.03.3 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1601/6/artifact/out/Dockerfile
GITHUB PR #1601
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux 22c4d6fb1156 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / ec86f42
Default Java 1.8.0_222
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1601/6/testReport/
Max. process+thread count 424 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1601/6/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

@apache apache deleted a comment from hadoop-yetus Oct 12, 2019
Copy link

@bgaborg bgaborg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1;
tested against ireland, no errors.

@steveloughran
Copy link
Contributor Author

thx
-merged

@steveloughran steveloughran deleted the s3/HADOOP-16635-s3a-head+unguarded-tests branch October 15, 2021 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants