HADOOP-15183 S3Guard store becomes inconsistent after partial failure of rename #843

steveloughran · 2019-05-22T10:45:34Z

This is a major patch which has grown to become a "fix up the final issues with Auth mode" patch, built according to my proposed refactoring S3A design.

new stuff goes into an o.a.h.fs.s3a.impl class to make clear its for implementation

Multi-object delete:

o.a.h.fs.s3a.impl.MultiObjectDeleteSupport handles partial delete failures by parsing the response and updating the metastore with those deleted entries

rename (i.e. copy + delete), plus scale and perf of commits

Each metastore has the notion of

BulkOperationState; this can be used by a store to track entries which have been added/found during a set of operations (sequential or in parallel)
and a RenameTracker which implements the algorithm for updating the store on rename. The DelayedUpdateTracker implements the classic "do it at the end" algorithm, while the ProgressiveRenameTracker does it incrementally.
and their own addAncestors() call which can update the bulk state based on examining the store and the current state of the BulkOperationState. This has been pushed down from S3Guard.addAncestors() to allow them to do their own thing with the bulk state.

S3AFilesystem.innerRename now copies in parallel; hard coded at 10 for now (having it a factor of the delete page size makes sense). each parallel copy operation notifies the current RenameTracker of its completion, so letting them choose what to do (DelayedUpdate: saves the change; ProgressiveRenameTracker: updates the store immediately).

The RenameTracker also has the homework of updating the store state on the case of a partial rename failure. DelayedUpdateTracker: update the metastore; ProgressiveUpdate: no-op.

Local and DDB metastores now only use the ProgressiveRenameTracker; this was done after I'd move the classic design into its own tracker (i.e out of S3AFileSystem). Oh, and theres a NullRenameTracker for the null metastore.

The BulkOperationState ends up being passed all the way to the committers; to avoid them explicitly having to understand this, the CommitOperations class now has an inner (non-static) CommitContext class which contains this and exports the commit/abort/revert operations for the committers to call. Internally it then invokes WriteOperationHelper calls with the context, which then gets all the way through to S3AFileSystem.finishedWrite() for the updates there to decide what entries to put into the store.

With this design

Renames are progressive and parallelized. So faster as well as keeping the the metastore consistent.
Bulk delete failures are handled
Commit operations don't generate excessive load on the store: if you are committing 500 files, then each parent dir will be checked ~once. (I say ~once as the get() is done out of a sync block, so multiple parallel entries may duplicate the work, but I'd rather that than lock during a remote RPC call).

Note that for both the parallelized commit and copy ops, we can also shuffle the source data so that we aren't doing it on the same lowest-level directory (what I'd expect today). That should reduce load on the S3 Store. With a move to parallel copy operations that will make a difference: the list comes in sequentially so the initial set of 10 renames will be adjacent. Actually I could be even cleverer there and sort by size. Why so? ensures that a large file doesn't become the straggler in the copy.

steveloughran · 2019-05-22T10:46:56Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java

      AmazonS3 client) {
    Preconditions.checkArgument(isNotEmpty(s3Attributes.getBucket()),
        "No Bucket");
    Preconditions.checkArgument(isNotEmpty(s3Attributes.getKey()), "No Key");
-    Preconditions.checkArgument(contentLength >= 0, "Negative content length");
+    long l = s3Attributes.getLen();


I'll rename this to "length"; too easy to mix l and 1. I'd orginally used 'len' but the IDE told me off.

steveloughran · 2019-05-22T10:47:53Z

...tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/staging/StagingCommitter.java

-          Tasks.foreach(commits)
-              .suppressExceptions()
-              .run(commit -> getCommitOperations().abortSingleCommit(commit));
+          try(CommitOperations.CommitContext commitContext


add a space

steveloughran · 2019-05-22T10:49:24Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/CallableSupplier.java

+
+  private final Callable<T> call;
+
+  CallableSupplier(final Callable<T> call) {


should be public. This class is to let me pass in a callable which raises an IOE as an arg to one of the java 8 operations which only allow RuntimeExceptions to be raised

steveloughran · 2019-05-22T10:52:45Z

...test/java/org/apache/hadoop/fs/s3a/commit/staging/integration/ITestDirectoryCommitMRJob.java

@@ -46,7 +46,7 @@ public static void setupClusters() throws IOException {

  @AfterClass
  public static void teardownClusters() throws IOException {
-    clusterBinding.terminate();
+    terminateCluster(clusterBinding);


see YARN-9568

steveloughran · 2019-05-22T10:53:08Z

...-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/impl/TestPartialDeleteFailures.java

+import static org.junit.Assert.assertEquals;
+
+/**
+ * Unit test suite covering translation of AWS SDK exceptions to S3A exceptions,


Integration test suite

steveloughran · 2019-05-22T10:56:01Z

...op-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/PathOrderComparators.java

+ *   <li>repeated sorts do not change the order</li>
+ * </ol>
+ */
+final class PathOrderComparators {


These are used to take the list of entries to add to a store and sort it so that topmost entries come first. This is to ensure that in the event of a partial failure during a put(list) operation, parent entries will have been added before children. That is: even if the DDB tables aren't complete w.r.t changes in the store (bad!) they do at least meet the requirement "no orphan entries"

hadoop-yetus · 2019-05-22T13:28:07Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
0	reexec	32	Docker mode activated.
		_ Prechecks _
+1	dupname	2	No case conflicting files found.
+1	@author	0	The patch does not contain any @author tags.
+1	test4tests	0	The patch appears to include 27 new or modified test files.
		_ trunk Compile Tests _
0	mvndep	68	Maven dependency ordering for branch
-1	mvninstall	1073	root in trunk failed.
+1	compile	1118	trunk passed
+1	checkstyle	137	trunk passed
+1	mvnsite	116	trunk passed
+1	shadedclient	916	branch has no errors when building and testing our client artifacts.
+1	javadoc	105	trunk passed
0	spotbugs	69	Used deprecated FindBugs config; considering switching to SpotBugs.
+1	findbugs	190	trunk passed
		_ Patch Compile Tests _
0	mvndep	22	Maven dependency ordering for patch
+1	mvninstall	81	the patch passed
+1	compile	1035	the patch passed
+1	javac	1035	the patch passed
-0	checkstyle	131	root: The patch generated 15 new + 95 unchanged - 3 fixed = 110 total (was 98)
+1	mvnsite	123	the patch passed
-1	whitespace	0	The patch has 51 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
-1	whitespace	1	The patch 600 line(s) with tabs.
+1	xml	3	The patch has no ill-formed XML file.
+1	shadedclient	649	patch has no errors when building and testing our client artifacts.
-1	javadoc	28	hadoop-tools_hadoop-aws generated 3 new + 1 unchanged - 0 fixed = 4 total (was 1)
+1	findbugs	205	the patch passed
		_ Other Tests _
+1	unit	535	hadoop-common in the patch passed.
+1	unit	291	hadoop-aws in the patch passed.
-1	asflicense	45	The patch generated 1 ASF License warnings.
		9589

Subsystem	Report/Notes
Docker	Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-843/1/artifact/out/Dockerfile
GITHUB PR	#843
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml
uname	Linux 796966f8c9ef 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / `67f9a7b`
Default Java	1.8.0_212
mvninstall	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/1/artifact/out/branch-mvninstall-root.txt
checkstyle	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/1/artifact/out/diff-checkstyle-root.txt
whitespace	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/1/artifact/out/whitespace-eol.txt
whitespace	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/1/artifact/out/whitespace-tabs.txt
javadoc	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/1/artifact/out/diff-javadoc-javadoc-hadoop-tools_hadoop-aws.txt
Test Results	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/1/testReport/
asflicense	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/1/artifact/out/patch-asflicense-problems.txt
Max. process+thread count	1654 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/1/console
versions	git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by	Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

steveloughran · 2019-05-28T15:03:16Z

The patch 600 line(s) with tabs.

not sure what's up there. Usually its stack traces in docs, but this is a lot.

steveloughran · 2019-05-28T20:50:10Z

the 600 tabs are from the file ./hadoop-tools/hadoop-rumen/hs_err_pid15861.log; not relevant

hadoop-yetus · 2019-05-29T01:42:34Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
0	reexec	33	Docker mode activated.
		_ Prechecks _
+1	dupname	1	No case conflicting files found.
+1	@author	0	The patch does not contain any @author tags.
+1	test4tests	0	The patch appears to include 27 new or modified test files.
		_ trunk Compile Tests _
0	mvndep	31	Maven dependency ordering for branch
+1	mvninstall	1158	trunk passed
+1	compile	1214	trunk passed
+1	checkstyle	188	trunk passed
+1	mvnsite	127	trunk passed
+1	shadedclient	1017	branch has no errors when building and testing our client artifacts.
+1	javadoc	89	trunk passed
0	spotbugs	65	Used deprecated FindBugs config; considering switching to SpotBugs.
+1	findbugs	190	trunk passed
-0	patch	97	Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
		_ Patch Compile Tests _
0	mvndep	27	Maven dependency ordering for patch
+1	mvninstall	83	the patch passed
+1	compile	1088	the patch passed
+1	javac	1088	the patch passed
-0	checkstyle	137	root: The patch generated 14 new + 95 unchanged - 3 fixed = 109 total (was 98)
+1	mvnsite	120	the patch passed
+1	whitespace	0	The patch has no whitespace issues.
+1	xml	3	The patch has no ill-formed XML file.
+1	shadedclient	620	patch has no errors when building and testing our client artifacts.
-1	javadoc	27	hadoop-tools_hadoop-aws generated 6 new + 1 unchanged - 0 fixed = 7 total (was 1)
+1	findbugs	191	the patch passed
		_ Other Tests _
-1	unit	537	hadoop-common in the patch failed.
+1	unit	285	hadoop-aws in the patch passed.
-1	asflicense	43	The patch generated 17 ASF License warnings.
		7224

Reason	Tests
Failed junit tests	hadoop.util.TestDiskCheckerWithDiskIo

Subsystem	Report/Notes
Docker	Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-843/2/artifact/out/Dockerfile
GITHUB PR	#843
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml
uname	Linux d160008701a7 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / `0c73dba`
Default Java	1.8.0_212
checkstyle	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/2/artifact/out/diff-checkstyle-root.txt
javadoc	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/2/artifact/out/diff-javadoc-javadoc-hadoop-tools_hadoop-aws.txt
unit	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/2/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt
Test Results	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/2/testReport/
asflicense	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/2/artifact/out/patch-asflicense-problems.txt
Max. process+thread count	1421 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/2/console
versions	git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by	Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

steveloughran · 2019-05-29T15:34:10Z

unit test failures in hadoop common are unrelated; look like directory setup on test VM.

java.nio.file.NoSuchFileException: /home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-843/src/hadoop-common-project/hadoop-common/target/test/data/4/test3854891488020485558
	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
	at sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
	at java.nio.file.Files.createDirectory(Files.java:674)
	at java.nio.file.TempFileHelper.create(TempFileHelper.java:136)
	at java.nio.file.TempFileHelper.createTempDirectory(TempFileHelper.java:173)
	at java.nio.file.Files.createTempDirectory(Files.java:950)
	at org.apache.hadoop.util.TestDiskCheckerWithDiskIo.createTempDir(TestDiskCheckerWithDiskIo.java:169)
	at org.apache.hadoop.util.TestDiskCheckerWithDiskIo.checkDirs(TestDiskCheckerWithDiskIo.java:152)
	at org.apache.hadoop.util.TestDiskCheckerWithDiskIo.testDiskIoIgnoresTransientWriteErrors(TestDiskCheckerWithDiskIo.java:76)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)

hadoop-yetus · 2019-05-29T17:33:54Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
0	reexec	45	Docker mode activated.
		_ Prechecks _
+1	dupname	1	No case conflicting files found.
+1	@author	0	The patch does not contain any @author tags.
+1	test4tests	0	The patch appears to include 28 new or modified test files.
		_ trunk Compile Tests _
0	mvndep	72	Maven dependency ordering for branch
+1	mvninstall	1151	trunk passed
+1	compile	1062	trunk passed
+1	checkstyle	146	trunk passed
+1	mvnsite	120	trunk passed
+1	shadedclient	1032	branch has no errors when building and testing our client artifacts.
+1	javadoc	93	trunk passed
0	spotbugs	64	Used deprecated FindBugs config; considering switching to SpotBugs.
+1	findbugs	184	trunk passed
-0	patch	97	Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
		_ Patch Compile Tests _
0	mvndep	21	Maven dependency ordering for patch
+1	mvninstall	76	the patch passed
+1	compile	950	the patch passed
+1	javac	950	the patch passed
-0	checkstyle	147	root: The patch generated 15 new + 98 unchanged - 4 fixed = 113 total (was 102)
+1	mvnsite	118	the patch passed
+1	whitespace	0	The patch has no whitespace issues.
+1	xml	3	The patch has no ill-formed XML file.
+1	shadedclient	736	patch has no errors when building and testing our client artifacts.
-1	javadoc	32	hadoop-tools_hadoop-aws generated 6 new + 1 unchanged - 0 fixed = 7 total (was 1)
+1	findbugs	200	the patch passed
		_ Other Tests _
+1	unit	523	hadoop-common in the patch passed.
+1	unit	285	hadoop-aws in the patch passed.
+1	asflicense	45	The patch does not generate ASF License warnings.
		7115

Subsystem	Report/Notes
Docker	Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-843/3/artifact/out/Dockerfile
GITHUB PR	#843
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml
uname	Linux bbc4f008d07b 4.4.0-139-generic #165~14.04.1-Ubuntu SMP Wed Oct 31 10:55:11 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / `544876f`
Default Java	1.8.0_212
checkstyle	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/3/artifact/out/diff-checkstyle-root.txt
javadoc	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/3/artifact/out/diff-javadoc-javadoc-hadoop-tools_hadoop-aws.txt
Test Results	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/3/testReport/
Max. process+thread count	1346 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/3/console
versions	git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by	Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

hadoop-yetus · 2019-05-29T21:38:38Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
0	reexec	49	Docker mode activated.
		_ Prechecks _
+1	dupname	1	No case conflicting files found.
+1	@author	1	The patch does not contain any @author tags.
+1	test4tests	0	The patch appears to include 28 new or modified test files.
		_ trunk Compile Tests _
0	mvndep	77	Maven dependency ordering for branch
+1	mvninstall	1150	trunk passed
+1	compile	1151	trunk passed
+1	checkstyle	144	trunk passed
+1	mvnsite	122	trunk passed
+1	shadedclient	1033	branch has no errors when building and testing our client artifacts.
+1	javadoc	97	trunk passed
0	spotbugs	66	Used deprecated FindBugs config; considering switching to SpotBugs.
+1	findbugs	200	trunk passed
-0	patch	98	Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
		_ Patch Compile Tests _
0	mvndep	23	Maven dependency ordering for patch
+1	mvninstall	84	the patch passed
+1	compile	1407	the patch passed
+1	javac	1407	the patch passed
-0	checkstyle	175	root: The patch generated 12 new + 98 unchanged - 4 fixed = 110 total (was 102)
+1	mvnsite	141	the patch passed
+1	whitespace	0	The patch has no whitespace issues.
+1	xml	5	The patch has no ill-formed XML file.
+1	shadedclient	762	patch has no errors when building and testing our client artifacts.
+1	javadoc	102	the patch passed
+1	findbugs	244	the patch passed
		_ Other Tests _
+1	unit	604	hadoop-common in the patch passed.
+1	unit	295	hadoop-aws in the patch passed.
+1	asflicense	58	The patch does not generate ASF License warnings.
		7933

Subsystem	Report/Notes
Docker	Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-843/4/artifact/out/Dockerfile
GITHUB PR	#843
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml
uname	Linux 174a51b985e1 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / 751f0df
Default Java	1.8.0_212
checkstyle	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/4/artifact/out/diff-checkstyle-root.txt
Test Results	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/4/testReport/
Max. process+thread count	1387 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/4/console
versions	git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by	Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

ajfabbri

Still looking through this patch. Like what I'm seeing so far. Couple of random comments. I'll try to resume tomorrow or Friday

ajfabbri · 2019-05-29T22:14:30Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java

+   * (There's always a block for that, it just makes more sense to
+   * perform the bulk delete after another block of copies have completed).
+   */
+  public static final int RENAME_PARALLEL_LIMIT = 10;


Yep, we can always make this tunable later if we care.

There should be some coordination with the Hive community on this, where there's another layer of parallelization. I think they'd currently have to be doing individual file renames, and if so I think they can just keep going, this change can go out, and after that they might want to remove their parallelization layer. And they don't have to interact. But I hope I'm right about that and this isn't going to cause some huge spike in number of concurrent copies or something.

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java

ajfabbri · 2019-05-30T03:42:51Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java

+            // the limit of active copies has been reached;
+            // wait for completion or errors to surface.
+            LOG.debug("Waiting for active copies to complete");
+            completeActiveCopies.apply("batch threshold reached");


Should be a significant improvement in runtime here. We can pipeline this in the future to eke out more performance. As we've been discussing on refactoring here: Might want to break up this rename code later on as well. Maybe just helper functions to decompose into smaller blocks?

yes, a refactoring would be much needed here. this part is getting huge. it's harder to review and later to debug.

Gabor: This is my attempt to clean it up. The rename logic has all been pulled out of S3AFS into the trackers and I'm defining a couple of lambda expressions in the method to avoid copy and paste duplication and effort.

The only way to do anything else would be to pull the entire code "somewhere", but because of all the calls into S3AFS, that will only make sense as part of a relayering of ops: we'd have this to work at the "object store + S3Guard layer", rather than through the FS APIs. That's a major change and not going to happen in a patch which is big enough as it is.

ajfabbri · 2019-05-30T04:31:28Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java

+    List<Path> undeletedObjects = new ArrayList<>();
+    try {
+      // remove the keys
+      // this does will update the metastore on a failure, but on


nit: /does will/does/

another nit: the asymmetry here in metadatastore updating (only on failure) feels a bit odd

I know, it grates on me too. But the renameTracker is invoked on success, so it is all contained in this operation.

ajfabbri · 2019-05-30T04:34:39Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java

+   */
+  @VisibleForTesting
+  @Retries.RetryMixed
+  void removeKeys(


Is there a test that exercises the partial failure logic here? (I may answer this as I get further in the review)

yes: ITestS3AFailureHandling testMultiObjectDeleteNoPermissions()

ajfabbri · 2019-05-30T04:39:41Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Statistic.java

+      "S3Guard metadata store records read"),
+  S3GUARD_METADATASTORE_RECORD_WRITES(
+      "s3guard_metadatastore_record_writes",
+      "S3Guard metadata store records written"),


Seems like this could be interesting for comparing different rename trackers? If so I'm curious what sort of stats / deltas you get with the new logic.

Good point. With the ancestor tracking the new renamer shouldn't do any more than before. And I can verify that the commit operations don't do duplicate puts of all the parents, the way they do currently

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java

steveloughran · 2019-06-04T17:41:52Z

thanks all for reviews, will go through them. FWIW, the latest patch has checks on dir updates if the batch update overwrites a file updated in the ongoing operation with a dir entry. And its failing in a prune test (one which shouldn't IMO, be doing a full store-wide prune)


[ERROR] testPruneCommandConf(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardToolDynamoDB)  Time elapsed: 15.828 s  <<< ERROR!
org.apache.hadoop.fs.PathIOException: `s3a://hwdev-steve-ireland-new/fork-0007/test/ITestMagicCommitProtocol-testParallelJobsToAdjacentPaths': Duplicate and inconsistent entry in update operation
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore$AncestorState.findDirectory(DynamoDBMetadataStore.java:2184)
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.completeAncestry(DynamoDBMetadataStore.java:820)
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.innerPut(DynamoDBMetadataStore.java:1144)
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.removeAuthoritativeDirFlag(DynamoDBMetadataStore.java:1418)
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.prune(DynamoDBMetadataStore.java:1366)
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.prune(DynamoDBMetadataStore.java:1325)
	at org.apache.hadoop.fs.s3a.s3guard.AbstractS3GuardToolTestBase.testPruneCommand(AbstractS3GuardToolTestBase.java:295)
	at org.apache.hadoop.fs.s3a.s3guard.AbstractS3GuardToolTestBase.testPruneCommandConf(AbstractS3GuardToolTestBase.java:320)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)

hadoop-yetus · 2019-06-04T20:56:50Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
0	reexec	31	Docker mode activated.
		_ Prechecks _
+1	dupname	1	No case conflicting files found.
+1	@author	0	The patch does not contain any @author tags.
+1	test4tests	0	The patch appears to include 30 new or modified test files.
		_ trunk Compile Tests _
0	mvndep	75	Maven dependency ordering for branch
+1	mvninstall	1383	trunk passed
+1	compile	1261	trunk passed
+1	checkstyle	165	trunk passed
+1	mvnsite	152	trunk passed
+1	shadedclient	1162	branch has no errors when building and testing our client artifacts.
+1	javadoc	110	trunk passed
0	spotbugs	74	Used deprecated FindBugs config; considering switching to SpotBugs.
+1	findbugs	217	trunk passed
-0	patch	109	Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
		_ Patch Compile Tests _
0	mvndep	25	Maven dependency ordering for patch
+1	mvninstall	94	the patch passed
+1	compile	1248	the patch passed
+1	javac	1248	the patch passed
-0	checkstyle	164	root: The patch generated 27 new + 97 unchanged - 5 fixed = 124 total (was 102)
+1	mvnsite	134	the patch passed
+1	whitespace	0	The patch has no whitespace issues.
+1	xml	3	The patch has no ill-formed XML file.
+1	shadedclient	786	patch has no errors when building and testing our client artifacts.
-1	javadoc	31	hadoop-tools_hadoop-aws generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1)
+1	findbugs	250	the patch passed
		_ Other Tests _
+1	unit	615	hadoop-common in the patch passed.
-1	unit	69	hadoop-aws in the patch failed.
-1	asflicense	51	The patch generated 7 ASF License warnings.
		8075

Reason	Tests
Failed junit tests	hadoop.fs.s3a.TestS3AGetFileStatus
	hadoop.fs.s3a.TestStreamChangeTracker

Subsystem	Report/Notes
Docker	Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-843/5/artifact/out/Dockerfile
GITHUB PR	#843
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml
uname	Linux 0cea5ead54a0 4.4.0-143-generic #169~14.04.2-Ubuntu SMP Wed Feb 13 15:00:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / 97607f3
Default Java	1.8.0_212
checkstyle	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/5/artifact/out/diff-checkstyle-root.txt
javadoc	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/5/artifact/out/diff-javadoc-javadoc-hadoop-tools_hadoop-aws.txt
unit	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/5/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt
Test Results	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/5/testReport/
asflicense	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/5/artifact/out/patch-asflicense-problems.txt
Max. process+thread count	1460 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/5/console
versions	git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by	Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

hadoop-yetus · 2019-06-05T15:05:25Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
0	reexec	31	Docker mode activated.
		_ Prechecks _
+1	dupname	1	No case conflicting files found.
+1	@author	0	The patch does not contain any @author tags.
+1	test4tests	0	The patch appears to include 32 new or modified test files.
		_ trunk Compile Tests _
0	mvndep	21	Maven dependency ordering for branch
+1	mvninstall	1027	trunk passed
+1	compile	1081	trunk passed
+1	checkstyle	127	trunk passed
+1	mvnsite	115	trunk passed
+1	shadedclient	915	branch has no errors when building and testing our client artifacts.
+1	javadoc	82	trunk passed
0	spotbugs	63	Used deprecated FindBugs config; considering switching to SpotBugs.
+1	findbugs	181	trunk passed
-0	patch	93	Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
		_ Patch Compile Tests _
0	mvndep	23	Maven dependency ordering for patch
+1	mvninstall	78	the patch passed
+1	compile	1014	the patch passed
+1	javac	1014	the patch passed
-0	checkstyle	130	root: The patch generated 28 new + 102 unchanged - 5 fixed = 130 total (was 107)
+1	mvnsite	108	the patch passed
+1	whitespace	0	The patch has no whitespace issues.
+1	xml	2	The patch has no ill-formed XML file.
+1	shadedclient	620	patch has no errors when building and testing our client artifacts.
-1	javadoc	28	hadoop-tools_hadoop-aws generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1)
+1	findbugs	191	the patch passed
		_ Other Tests _
+1	unit	537	hadoop-common in the patch passed.
+1	unit	295	hadoop-aws in the patch passed.
+1	asflicense	54	The patch does not generate ASF License warnings.
		6748

Subsystem	Report/Notes
Docker	Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-843/6/artifact/out/Dockerfile
GITHUB PR	#843
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml
uname	Linux 11e0754b52cd 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / `309501c`
Default Java	1.8.0_212
checkstyle	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/6/artifact/out/diff-checkstyle-root.txt
javadoc	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/6/artifact/out/diff-javadoc-javadoc-hadoop-tools_hadoop-aws.txt
Test Results	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/6/testReport/
Max. process+thread count	1600 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/6/console
versions	git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by	Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

This also fixed up ITestDynamoDBMetadataStore to do better destroy of test tables, as some were leaking. Change-Id: I7e70e743628498c3cbbe21c6cda796b08fd39d8d

…thout fixing it, do at least avoid teardown errors Change-Id: I869d943d9c1a110e27710ffb8e638b918db44574

Change-Id: I3a5337d36086a109687de16a8395ff8d7aa1b644

*Tests are failing; this is an interim patch* This makes S3Guard.addAncestors aware of the state of the bulk operation by pushing down the whole ancestor search and add problem into the store itself. Because each store can have its own model of bulk operation state, it can use this (on a bulk operation, obviously) to skip any queries of a remote DB to see if a path exists. And when there's a fallback to a get(path), and that finds a path entry, that's added to the state; no need to check again. - we could go better and add all parents of a found path to the bulk state; that way all ancestor creation in a put(path) will know to skip the parent entries - which may imply changing the contents of the DDBMetastore's AncestorState I've had to push the addAncestors() call into the metastores themselve to get this to work, as only DDB maintains a state map. Alternatively, we expose some contains(path), add(path) entry to the base class (Better: make that an interface; have the impl do a set<Path>) from the outset. Although the DDB AncestorState stores the state as a map of Path -> DDBPathMetadata, its only existence which is checked for. Change-Id: I64bf0bc372654af0471e13c324a7af0454615960

Change-Id: I9bceb5c5f30a555a63119ea6fdba24b023fc5458

* AncestorState explicitly exports its put/contains operations and makes inner field private. * the AncestorState set isn't updated before entries are put() (some duplicate entries are still being logged) * More log statements as needed to debug what is happening * Improved javadocs on new RenameTracker classes. * TestCommitOperations was failing a lot; eventually concluded that the DDB table was inconsistent. If you have a tombstone marker you can't delete its children, but if you do a GET on a child it will surface. This has not been fixed: tests simply clear up better. An FSCK is needed here. TODO * Tests to make sure that all directories are deleted in a rename; empty and implied. * Review of ProgressiveRenameTracker's end-of-rename operation to see if it is adding too many delete calls * See if I can think of more tests Change-Id: Ic062406f19bc7d4bf1ab794bd26e85691027235c

Change-Id: I47aef155a0f60da10ec34d30e6e9bc4de88b7fcd

* partial failure of delete when handling fake directories * Add a TODO comment where I need to do something * Fix a typo on a method name in TestPartialDeleteFailures Change-Id: Ic231d9dceb37ce4a726620e8d3343bf353fdf0c2

…e committing a job, so we can verify that job commits remove them Change-Id: I09805332a25bc8eb4158f9f162f43b97ff3a3c57

…ions is tracked ITestDynamoDBMetadataStore picks up the scale timeout Change-Id: I15428862ad5b8bf9e2713f9fd03f1645b5b4ef10

* getting the direct to DDB tests to scale better; especially ITestDynamoDBMetadataStore mainly by breaking up up the testBatchWrite test into five separate ones, with individual cleanup, and switching the cleanup from forgetMetadata to prune()...the latter does batch deletes in parallel and is a lot faster. As it was, on smaller stores, the throttling in teardown was the common source of failure. ThrottleTracker is pulled from the scale test for reuse in the non-scale option. * Metastore.put(DirListingMetadata) now takes a BulkOperationState and passes it down. This is not actively in use nor exported all the way up, though everything is ready in the store. I'm not just convinced it is needed yet. I've managed to change things enough for the ITestDynamoDBMetadataStore test to fail on the maven test runs, but not in the IDE. Also got a failure in ITestMagicCommitMRJob where there wasn't a cleanup of the underlying attempt dir Change-Id: I80169b12baacbecfb63970c0cf7ea061fe3ab117

…and generated inconsistency in the same operation. The DDB metastore now raises a PathIOE if the operation state lists a file at a path and another operation scans for a parent directory there. This is only going to happen in one of: deliberate test, someone has been creating files under files. Change-Id: Ie22cfa6dafa3b9cbc4533f82b115af432a1c4dfb

…rState failure on a file found is now optionally downgradeable to a warn. Still unsure what to do there, as its a symptom of a problem: there's a file in the store which is a parent of another file. For prune, we can downgrade this to a warn and then rely on FSCK to fix Change-Id: I2f2cf9b85bbd3acd8a606e669c1b7e27a9db16f4

hadoop-yetus · 2019-06-11T18:07:18Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
0	reexec	33	Docker mode activated.
		_ Prechecks _
+1	dupname	3	No case conflicting files found.
+1	@author	0	The patch does not contain any @author tags.
+1	test4tests	0	The patch appears to include 32 new or modified test files.
		_ trunk Compile Tests _
0	mvndep	22	Maven dependency ordering for branch
+1	mvninstall	1025	trunk passed
+1	compile	1014	trunk passed
+1	checkstyle	136	trunk passed
+1	mvnsite	131	trunk passed
+1	shadedclient	992	branch has no errors when building and testing our client artifacts.
+1	javadoc	107	trunk passed
0	spotbugs	69	Used deprecated FindBugs config; considering switching to SpotBugs.
+1	findbugs	188	trunk passed
-0	patch	109	Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
		_ Patch Compile Tests _
0	mvndep	24	Maven dependency ordering for patch
-1	mvninstall	30	hadoop-aws in the patch failed.
-1	compile	927	root in the patch failed.
-1	javac	927	root in the patch failed.
-0	checkstyle	143	root: The patch generated 28 new + 100 unchanged - 2 fixed = 128 total (was 102)
-1	mvnsite	46	hadoop-aws in the patch failed.
+1	whitespace	0	The patch has no whitespace issues.
+1	xml	3	The patch has no ill-formed XML file.
+1	shadedclient	686	patch has no errors when building and testing our client artifacts.
-1	javadoc	33	hadoop-tools_hadoop-aws generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1)
-1	findbugs	46	hadoop-aws in the patch failed.
		_ Other Tests _
+1	unit	531	hadoop-common in the patch passed.
-1	unit	46	hadoop-aws in the patch failed.
+1	asflicense	51	The patch does not generate ASF License warnings.
		6563

Subsystem	Report/Notes
Docker	Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-843/7/artifact/out/Dockerfile
GITHUB PR	#843
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml
uname	Linux 221b675e67c7 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / `b057479`
Default Java	1.8.0_212
mvninstall	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/7/artifact/out/patch-mvninstall-hadoop-tools_hadoop-aws.txt
compile	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/7/artifact/out/patch-compile-root.txt
javac	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/7/artifact/out/patch-compile-root.txt
checkstyle	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/7/artifact/out/diff-checkstyle-root.txt
mvnsite	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/7/artifact/out/patch-mvnsite-hadoop-tools_hadoop-aws.txt
javadoc	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/7/artifact/out/diff-javadoc-javadoc-hadoop-tools_hadoop-aws.txt
findbugs	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/7/artifact/out/patch-findbugs-hadoop-tools_hadoop-aws.txt
unit	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/7/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt
Test Results	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/7/testReport/
Max. process+thread count	1361 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/7/console
versions	git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by	Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

* Add HADOOP-16364: translate timeout on destroy() into new exception: TableDeleteTimeoutException. * ITestDynamoDBMetadataStore performance. Coalescing three tests into one, and inserting a new check for version marker missing a version field. The tests which create and destroy tables are a key part of the bottleneck here, which is where the coalescing makes a big difference. FWIW, I think we could actually use the shared table for the version checking and just restore it after, to save 60s or so. * Trying to recreate the failure I saw where Prune failed because of an inconsistent ancestor write. I can't seem to replicate the failure (and I deleted the table with the problem). There's now scope in the metastore to decide what to do if there is an inconsistency (throw vs warn), and I'm wondering if we need to use the operation state to choose how to react. That is: on a duplicate, inconsistent prune, do "something" else. But what? Note: if inconsistencies in the store can be amplified by prune, this is not a new problem. It's by tracking the ancestor state that it is being detected. Change-Id: I2e83d986a1f072d98251443e0f9c65c7036ba8ce

mackrorysd · 2019-06-11T16:50:15Z

hadoop-common-project/hadoop-common/src/main/resources/core-default.xml

@@ -1326,8 +1330,25 @@

 <property>
  <name>fs.s3a.max.total.tasks</name>
-  <value>5</value>


What do you think about sticking HADOOP-15729 in here? Essentially just providing 0 as the number of core threads to the executor, otherwise a lot of these threads never get cleaned up when idle. This manifests as a massive number of idle threads in long-running services like Impala and Hive Metastore when you've accessed many different buckets. I believe this patch would make the problem worse, so it's a good time to toss in the trivial fix and test it all together,

mackrorysd · 2019-06-11T17:04:38Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/BulkOperationState.java

+ * After the operation has completed, it <i>MUST</i> be closed so
+ * as to guarantee that all state is released.
+ */
+public class BulkOperationState implements Closeable {


I don't see when this ever gets associated with any actual state. What's the point of it?

mackrorysd · 2019-06-11T17:05:11Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/MetadataStore.java

+   */
+  @RetryTranslated
+  void put(PathMetadata meta,
+      @Nullable BulkOperationState BulkOperationState) throws IOException;


A variable with the same name as it's type - is this more Java 8 craziness I haven't seen before? :)

mackrorysd · 2019-06-11T17:12:56Z

...ools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/ProgressiveRenameTracker.java

+      checkArgument(!pathsToDelete.contains(sourcePath),
+          "File being renamed is already processed %s", destPath);
+      // create the file metadata and update the local structures.
+      S3Guard.addMoveFile(


Hopefully by the time I'm done reviewing I'll understand the architecture here enough to answer this myself, but we've previously talked about an alternate rename strategy where we create everything, and then we delete everything, so there's always 1 copy of the complete data. How well does this change fit in with that idea? In the event that there's a failure we delete the incomplete copy on retry. If that's the source, we're now done. If that's the destination, we now start over. Nice and recoverable, and until it succeeds you haven't disturbed anything.

mackrorysd · 2019-06-11T19:16:17Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/S3GuardTool.java


      // check if UNGUARDED_FLAG is passed and use NullMetadataStore in
      // config to avoid side effects like creating the table if not exists
+      Configuration conf0 = getConf();


Rename this to something like unguardedConf please

Change-Id: I1185b21cb14d9c2238a27dc00f5eac930959f1a4

steveloughran · 2019-06-11T22:16:53Z

I've just pushed up a new version which includes the checkstyle complaints and the various nits reported. Tested s3 ireland. All good except for the new test where I'm trying to create an inconsistency in the table to cause problems in prune. I think I'm going to put that to one side right now and worry about that in some tests for fsck in a subsequent patch

After this iteration I'm going to have to move to a new PR; yetus is unhappy again.

hadoop-yetus · 2019-06-11T22:53:08Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
0	reexec	111	Docker mode activated.
		_ Prechecks _
+1	dupname	2	No case conflicting files found.
+1	@author	0	The patch does not contain any @author tags.
+1	test4tests	0	The patch appears to include 32 new or modified test files.
		_ trunk Compile Tests _
0	mvndep	21	Maven dependency ordering for branch
+1	mvninstall	1140	trunk passed
+1	compile	970	trunk passed
+1	checkstyle	140	trunk passed
+1	mvnsite	122	trunk passed
+1	shadedclient	1045	branch has no errors when building and testing our client artifacts.
+1	javadoc	105	trunk passed
0	spotbugs	76	Used deprecated FindBugs config; considering switching to SpotBugs.
+1	findbugs	221	trunk passed
-0	patch	116	Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
		_ Patch Compile Tests _
0	mvndep	24	Maven dependency ordering for patch
+1	mvninstall	104	the patch passed
+1	compile	1207	the patch passed
+1	javac	1207	the patch passed
-0	checkstyle	161	root: The patch generated 31 new + 100 unchanged - 2 fixed = 131 total (was 102)
+1	mvnsite	148	the patch passed
+1	whitespace	0	The patch has no whitespace issues.
+1	xml	2	The patch has no ill-formed XML file.
+1	shadedclient	803	patch has no errors when building and testing our client artifacts.
-1	javadoc	39	hadoop-tools_hadoop-aws generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1)
+1	findbugs	235	the patch passed
		_ Other Tests _
+1	unit	584	hadoop-common in the patch passed.
+1	unit	293	hadoop-aws in the patch passed.
+1	asflicense	44	The patch does not generate ASF License warnings.
		7659

Subsystem	Report/Notes
Docker	Client=18.09.5 Server=18.09.5 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-843/8/artifact/out/Dockerfile
GITHUB PR	#843
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml
uname	Linux e1e471b5943e 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 08:28:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / `5740eea`
Default Java	1.8.0_212
checkstyle	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/8/artifact/out/diff-checkstyle-root.txt
javadoc	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/8/artifact/out/diff-javadoc-javadoc-hadoop-tools_hadoop-aws.txt
Test Results	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/8/testReport/
Max. process+thread count	1348 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/8/console
versions	git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by	Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

hadoop-yetus · 2019-06-12T00:16:19Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
0	reexec	31	Docker mode activated.
		_ Prechecks _
+1	dupname	1	No case conflicting files found.
+1	@author	0	The patch does not contain any @author tags.
+1	test4tests	0	The patch appears to include 32 new or modified test files.
		_ trunk Compile Tests _
0	mvndep	23	Maven dependency ordering for branch
+1	mvninstall	1094	trunk passed
+1	compile	1145	trunk passed
+1	checkstyle	144	trunk passed
+1	mvnsite	120	trunk passed
+1	shadedclient	975	branch has no errors when building and testing our client artifacts.
+1	javadoc	82	trunk passed
0	spotbugs	74	Used deprecated FindBugs config; considering switching to SpotBugs.
+1	findbugs	201	trunk passed
-0	patch	105	Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
		_ Patch Compile Tests _
0	mvndep	22	Maven dependency ordering for patch
+1	mvninstall	82	the patch passed
+1	compile	1009	the patch passed
+1	javac	1009	the patch passed
-0	checkstyle	144	root: The patch generated 20 new + 100 unchanged - 2 fixed = 120 total (was 102)
+1	mvnsite	126	the patch passed
+1	whitespace	0	The patch has no whitespace issues.
+1	xml	3	The patch has no ill-formed XML file.
+1	shadedclient	696	patch has no errors when building and testing our client artifacts.
+1	javadoc	99	the patch passed
+1	findbugs	223	the patch passed
		_ Other Tests _
-1	unit	504	hadoop-common in the patch failed.
+1	unit	293	hadoop-aws in the patch passed.
+1	asflicense	51	The patch does not generate ASF License warnings.
		7086

Reason	Tests
Failed junit tests	hadoop.util.TestDiskChecker

Subsystem	Report/Notes
Docker	Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-843/9/artifact/out/Dockerfile
GITHUB PR	#843
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml
uname	Linux 86f97d3cf0fe 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / 4fecc2a
Default Java	1.8.0_212
checkstyle	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/9/artifact/out/diff-checkstyle-root.txt
unit	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/9/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt
Test Results	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/9/testReport/
Max. process+thread count	1348 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output	https://builds.apache.org/job/hadoop-multibranch/job/PR-843/9/console
versions	git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by	Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

steveloughran · 2019-06-12T08:43:13Z

closed to create a squashed one and keep yetus happy; rebase to trunk first.

steveloughran · 2019-06-12T09:05:26Z

new PR is #951. Please can people review that as it is ~ready to go in and in need of criticism

This PR makes the following changes: * Moves all the state-restore code from TaskStorageManager.scala to ContainerStorageManager (and its internal private java classes). * Introduces a StoreMode in StorageEngineFactory.getStorageEngine to add a StoreMode enum. * Changes RocksDB store creation to use that enum and use Rocksdb's bulk load option when creating store in bulk-load mode. * Changes the ContainerStorageManager to create stores in BulkLoad mode when restoring, then closes such persistent and changelogged stores, and re-opens them in Read-Write mode. * Adds tests for ContainerStorageManager and changes tests for TaskStorageManager accordingly. Author: Ray Matharu <rmatharu@linkedin.com> Author: rmatharu <40646191+rmatharu@users.noreply.github.com> Reviewers: Jagadish Venkatraman <vjagadish1989@gmail.com> Closes apache#843 from rmatharu/refactoringCSM

steveloughran commented May 22, 2019

View reviewed changes

steveloughran force-pushed the s3/HADOOP-15183-s3guard-rename-failures branch from ea61a61 to 24dbe82 Compare May 28, 2019 15:00

ajfabbri reviewed May 30, 2019

View reviewed changes

bgaborg reviewed Jun 3, 2019

View reviewed changes

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java Outdated Show resolved Hide resolved

steveloughran force-pushed the s3/HADOOP-15183-s3guard-rename-failures branch from e7453a4 to bca1021 Compare June 4, 2019 17:42

steveloughran force-pushed the s3/HADOOP-15183-s3guard-rename-failures branch from bca1021 to 8c65dad Compare June 5, 2019 13:11

steveloughran and others added 9 commits June 11, 2019 14:28

HADOOP-15183 code is merged with trunk

752c8fc

This also fixed up ITestDynamoDBMetadataStore to do better destroy of test tables, as some were leaking. Change-Id: I7e70e743628498c3cbbe21c6cda796b08fd39d8d

HADOOP-15183. YARN-9568 is causing NPEs in MiniYarnCluster launch. Wi…

a3fb4dd

…thout fixing it, do at least avoid teardown errors Change-Id: I869d943d9c1a110e27710ffb8e638b918db44574

HADOOP-115183 checkstyle from latest yetus run

75850f0

Change-Id: I3a5337d36086a109687de16a8395ff8d7aa1b644

HADOOP-15183 code review

eb06550

Change-Id: I9bceb5c5f30a555a63119ea6fdba24b023fc5458

HADOOP-15183 checkstyle and javadocs

b0a5b25

Change-Id: I47aef155a0f60da10ec34d30e6e9bc4de88b7fcd

HADOOP-15183 changes after review with fabbri

6d319b3

* partial failure of delete when handling fake directories * Add a TODO comment where I need to do something * Fix a typo on a method name in TestPartialDeleteFailures Change-Id: Ic231d9dceb37ce4a726620e8d3343bf353fdf0c2

HADOOP-15183: force in a delete tombstone marker into the store befor…

c5a243b

…e committing a job, so we can verify that job commits remove them Change-Id: I09805332a25bc8eb4158f9f162f43b97ff3a3c57

steveloughran added 4 commits June 11, 2019 14:28

HADOOP-15183 number of records deleted/put during delete/prune operat…

dbfcefc

…ions is tracked ITestDynamoDBMetadataStore picks up the scale timeout Change-Id: I15428862ad5b8bf9e2713f9fd03f1645b5b4ef10

steveloughran force-pushed the s3/HADOOP-15183-s3guard-rename-failures branch from 8c65dad to 38d1f75 Compare June 11, 2019 16:16

mackrorysd reviewed Jun 11, 2019

View reviewed changes

HADOOP-15183 address checkstyle and review comments

115fb77

Change-Id: I1185b21cb14d9c2238a27dc00f5eac930959f1a4

steveloughran force-pushed the s3/HADOOP-15183-s3guard-rename-failures branch from 47e9d94 to 115fb77 Compare June 11, 2019 22:17

steveloughran closed this Jun 12, 2019

steveloughran mentioned this pull request Jun 12, 2019

HADOOP-15183. S3Guard store becomes inconsistent after partial failure of rename #951

Closed


		private final Callable<T> call;

		CallableSupplier(final Callable<T> call) {

HADOOP-15183 S3Guard store becomes inconsistent after partial failure of rename #843

HADOOP-15183 S3Guard store becomes inconsistent after partial failure of rename #843

Uh oh!

Conversation

steveloughran commented May 22, 2019

Multi-object delete:

rename (i.e. copy + delete), plus scale and perf of commits

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hadoop-yetus commented May 22, 2019

Uh oh!

steveloughran commented May 28, 2019

Uh oh!

steveloughran commented May 28, 2019

Uh oh!

hadoop-yetus commented May 29, 2019

Uh oh!

steveloughran commented May 29, 2019

Uh oh!

hadoop-yetus commented May 29, 2019

Uh oh!

hadoop-yetus commented May 29, 2019

Uh oh!

ajfabbri left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

steveloughran commented Jun 4, 2019

Uh oh!

hadoop-yetus commented Jun 4, 2019

Uh oh!

hadoop-yetus commented Jun 5, 2019

Uh oh!

hadoop-yetus commented Jun 11, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment