-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-16357 TeraSort Job failing on S3 DirectoryStagingCommitter: destination path exists #992
HADOOP-16357 TeraSort Job failing on S3 DirectoryStagingCommitter: destination path exists #992
Conversation
55d0ee9
to
af06a4a
Compare
Test results (rebased patch): MR tests fail, but not ITestDirectoryCommitProtocol, which is complaining that the dest dir exists in job setup
|
|
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
checkstyle is confused. This is an enum. There are no values. If I added a default I'd just get told off "all options are already covered" |
...c/test/java/org/apache/hadoop/fs/s3a/commit/staging/TestStagingDirectoryOutputCommitter.java
Outdated
Show resolved
Hide resolved
reviewing my own patch (and looking @ failure reports of unrelated runs), I plan to change the probes for _SUCCESS of the previous stage to be different: fail with a message "previous stage did not generate output"; always work out and report which stage in the list has the first missing entry, therefore is the first failure to worry about |
|
|
Thanks for working on this @steveloughran; this will stabilize the integration tests.
Maybe I do something wrong, forgot to set something? |
@bgaborg yes, seeing that too. |
63c1c24
to
e27aa70
Compare
Gabor, fixed that; also rebased to trunk.
|
🎊 +1 overall
This message was automatically generated. |
Tested: S3A ireland All good except the root dir tests, which Gabor and I are worrying about on the basis that it may be a sign of a regression |
💔 -1 overall
This message was automatically generated. |
This changes the default conflict resolution mode (in code, core-default and docs) to append, to be consistent with FileOutputCommitter. Also checks for all conflict modes that the destination doesn't exist; if not calls mkdirs. If it does exist, verify that it is a directory. Committer test doesn't set partition policy or conflict mode, so verifies the defaults work (the test was changed before the source tests, to verify this was true) Most the change is actually to the mocking tests, because of the switch from FS.exists(path) to getFileStatus() + mkdirs. this is why I hate mocking so much Change-Id: I6fe9a12eea702cf75b35b8c7fb0db3f185d3f40b
…citly go back to append; new defaults aren't being picked up Change-Id: I7837cd100c77eeaafafbb422074b0944282fde82
… clearing bucket options. Somehow locally the commit confict mode fs.s3a.committer.staging.conflict-mode was retaining as fail, even with the core-default.xml set. Extra tests to log the origin of the option on an assert failure implied somehow this was coming via test/resources/core-site.xml and any transitive XMLIncludes from there. resetting the per-bucket options was enough to make this "go away"; the test is retained in case someone else's setting is similar. Change-Id: I3a882919a78438b22fa90e9bceb516a4de640082 Note: if the core-site actually sets it for all buckets, rather than per-bucket, then the new test will fail. Why not clear it? Because we want to verify that the default is now append.
Change-Id: I8da28bce5c5715501fb539e5bafdc247a9c81b62
fix up a test failure Change-Id: I3af7235173b4e35ac51ba1258c4693ad4064b2b0
+ log the terasort CSV file to $TMP rather than the remote FS, where it was being deleted in teardown Change-Id: I5e2e68f7e3d356d8f3dbde784a5fb0796cc888e6
Change-Id: I6224eb5b11de937e6f21d834e5a28a101f1eef89
e4afee1
to
899c36b
Compare
rebase to trunk and push up again to kick off a retest |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
checkstyle is mistaken; this patch is ready to go in. |
I'm gonna give a +0.5 for now. I reviewed the code and it looks good to me, but I don't think I fully understand the practical consequences of changing the default resolution mode just yet. |
we're changing the default mode to == that of FileOutputCommitter, namely "it's ok if the dest exists" Spark will refuse to execute if the dest exists, so to you use any of the conflict options you actually need to permit overwrite -so you won't change behaviour without noticing What this does do is make the committer more consistent with the existing one |
Ok, sounds reasonable enough. +1 from me. |
thanks, committed to trunk. |
This patch
AbstractCommitTerasortIT.
to not use the simple parser, so fails if the file is present.Testing: S3A Ireland w/ S3Guard. the Staging test failed, which makes me think that the settings weren't getting through