Skip to content

HADOOP-16380: test to show that it is the root directory where the "tombstone problem" can be replicated #1079

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

steveloughran
Copy link
Contributor

You don't get it on deeper paths because the full prefix is returned "/test/dir/deleted", which doesn't match the short name of a tombstone.
this helps explain why we only see it on the root dir tests.

Change-Id: I6e00b39684b7f3ff76eb84ee877128fa01c0f2bc

This is not the fix

This test replicates the problem as seen in the IDE when debugging root dir test failures

Gabor Bota and others added 2 commits July 11, 2019 10:13
…ombstone problem" can be replicated

You don't get it on deeper paths because the full prefix is returned "/test/dir/deleted", which doesn't match the short name of a tombstone.

this helps explain why we only see it on the root dir tests.

Change-Id: I6e00b39684b7f3ff76eb84ee877128fa01c0f2bc
@steveloughran steveloughran added the fs/s3 changes related to hadoop-aws; submitter must declare test endpoint label Jul 11, 2019
@steveloughran steveloughran requested a review from bgaborg July 11, 2019 14:49
@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
0 reexec 165 Docker mode activated.
_ Prechecks _
+1 dupname 0 No case conflicting files found.
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+1 mvninstall 1399 trunk passed
+1 compile 36 trunk passed
+1 checkstyle 22 trunk passed
+1 mvnsite 41 trunk passed
+1 shadedclient 807 branch has no errors when building and testing our client artifacts.
+1 javadoc 28 trunk passed
0 spotbugs 69 Used deprecated FindBugs config; considering switching to SpotBugs.
+1 findbugs 66 trunk passed
_ Patch Compile Tests _
+1 mvninstall 33 the patch passed
+1 compile 30 the patch passed
+1 javac 30 the patch passed
+1 checkstyle 18 the patch passed
+1 mvnsite 35 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 shadedclient 843 patch has no errors when building and testing our client artifacts.
+1 javadoc 23 the patch passed
+1 findbugs 65 the patch passed
_ Other Tests _
+1 unit 282 hadoop-aws in the patch passed.
+1 asflicense 28 The patch does not generate ASF License warnings.
4032
Subsystem Report/Notes
Docker Client=18.09.7 Server=18.09.5 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1079/1/artifact/out/Dockerfile
GITHUB PR #1079
JIRA Issue HADOOP-16380
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux ebf201665f4f 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 22:49:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 9cec023
Default Java 1.8.0_212
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1079/1/testReport/
Max. process+thread count 318 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1079/1/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

Copy link

@bgaborg bgaborg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for creating a test that reproduces the issue. I've started to work on this.

@bgaborg
Copy link

bgaborg commented Jul 15, 2019

the tests will pass if I change the return new S3AFileStatus(Tristate.TRUE, path, username); to return new S3AFileStatus(Tristate.FALSE, path, username); at org.apache.hadoop.fs.s3a.S3AFileSystem#s3GetFileStatus (org/apache/hadoop/fs/s3a/S3AFileSystem.java:2760). I'll update my PR with a test against this.

@steveloughran
Copy link
Contributor Author

OK. Is there any other issue which this problem creates.
I'm worried about whether if there's a file /test/dir1/file.avro under, say, /test, and there's a tombstone for /test, will the file get found in a listing, and is that behaviour different in the root path from elsewhere?. I know a listFiles() is likely to find it, I'd expect a treewalk with listStatus() will miss it just because the topmost directory listing won't find the /test dir to scan

@bgaborg
Copy link

bgaborg commented Jul 15, 2019

I have to create a test to be sure. We created a tad bit more complex implementation to just say no or yes to your question :(

@bgaborg
Copy link

bgaborg commented Jul 15, 2019

OK. Is there any other issue which this problem creates.
I'm worried about whether if there's a file /test/dir1/file.avro under, say, /test, and there's a tombstone for /test, will the file get found in a listing, and is that behaviour different in the root path from elsewhere?. I know a listFiles() is likely to find it, I'd expect a treewalk with listStatus() will miss it just because the topmost directory listing won't find the /test dir to scan

If there's a tombstone for /test it won't be visible, but we should expect that behaviour.
This is why we have tombstone expiry. So eventually we don't have this problem - if /test/ has bee created by an OOB operation (I guess that's why we have a tombstone and still have the directory structure behind it) then the metadata and tombstone expiry will be eventually solved.

The problem starts to surface if we have a /test1/ tombstoned, and we also have /test9, or even /test2/dir1/file.avro. I will write a test for that.

@bgaborg
Copy link

bgaborg commented Jul 15, 2019

Also note that

    // PUT child to store
    Path child = new Path(firstPath, "child");
    StoreContext ctx = fs.createStoreContext();
    String childKey = ctx.pathToKey(child);
    String rootKey = ctx.pathToKey(root);
    AmazonS3 s3 = fs.getAmazonS3ClientForTesting("LIST");

    createEmptyObject(fs, childKey);

is an out of band operation by definition in your test.

The test will fail even without it - commenting out

      //createEmptyObject(fs, childKey);

and the check for it:

      // the listing has the firstDir path as a prefix, because of the child
      //Assertions.assertThat(listing.getCommonPrefixes())
      //    .describedAs("The prefixes of a LIST of %s", root)
       //   .contains(firstDir + "/");

Why do you add a child object to the store?

@steveloughran
Copy link
Contributor Author

I'm explicitly adding a child as that was the situation I was seeing; if you can replicate the problem in a different way, then that'd be another piece of information.

The key point: a file under a tombstone was enough to convince the client that the directory was empty

@steveloughran
Copy link
Contributor Author

superceded by #1123

@steveloughran steveloughran deleted the s3/HADOOP-16380-tombstone-test branch July 30, 2019 15:12
shanthoosh added a commit to shanthoosh/hadoop that referenced this pull request Oct 15, 2019
* Initial version of supporting large job models in standalone.

* Move the chuncking logic into zookeeper metadata store implementation.

* Add the checksum to end of the value in ZkMetadataStore.

* Code cleanup.

* Remove unused variable and imports.

* Code cleanup.

* Address review comments.

* Address review comments.

* Address the nitpick comment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fs/s3 changes related to hadoop-aws; submitter must declare test endpoint
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants