Skip to content

HADOOP-15183 S3Guard store becomes inconsistent after partial failure of rename #843

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

steveloughran
Copy link
Contributor

This is a major patch which has grown to become a "fix up the final issues with Auth mode" patch, built according to my proposed refactoring S3A design.

  • new stuff goes into an o.a.h.fs.s3a.impl class to make clear its for implementation

Multi-object delete:

  • o.a.h.fs.s3a.impl.MultiObjectDeleteSupport handles partial delete failures by parsing the response and updating the metastore with those deleted entries

rename (i.e. copy + delete), plus scale and perf of commits

Each metastore has the notion of

  • BulkOperationState; this can be used by a store to track entries which have been added/found during a set of operations (sequential or in parallel)
  • and a RenameTracker which implements the algorithm for updating the store on rename. The DelayedUpdateTracker implements the classic "do it at the end" algorithm, while the ProgressiveRenameTracker does it incrementally.
  • and their own addAncestors() call which can update the bulk state based on examining the store and the current state of the BulkOperationState. This has been pushed down from S3Guard.addAncestors() to allow them to do their own thing with the bulk state.

S3AFilesystem.innerRename now copies in parallel; hard coded at 10 for now (having it a factor of the delete page size makes sense). each parallel copy operation notifies the current RenameTracker of its completion, so letting them choose what to do (DelayedUpdate: saves the change; ProgressiveRenameTracker: updates the store immediately).

The RenameTracker also has the homework of updating the store state on the case of a partial rename failure. DelayedUpdateTracker: update the metastore; ProgressiveUpdate: no-op.

Local and DDB metastores now only use the ProgressiveRenameTracker; this was done after I'd move the classic design into its own tracker (i.e out of S3AFileSystem). Oh, and theres a NullRenameTracker for the null metastore.

The BulkOperationState ends up being passed all the way to the committers; to avoid them explicitly having to understand this, the CommitOperations class now has an inner (non-static) CommitContext class which contains this and exports the commit/abort/revert operations for the committers to call. Internally it then invokes WriteOperationHelper calls with the context, which then gets all the way through to S3AFileSystem.finishedWrite() for the updates there to decide what entries to put into the store.

With this design

  • Renames are progressive and parallelized. So faster as well as keeping the the metastore consistent.
  • Bulk delete failures are handled
  • Commit operations don't generate excessive load on the store: if you are committing 500 files, then each parent dir will be checked ~once. (I say ~once as the get() is done out of a sync block, so multiple parallel entries may duplicate the work, but I'd rather that than lock during a remote RPC call).

Note that for both the parallelized commit and copy ops, we can also shuffle the source data so that we aren't doing it on the same lowest-level directory (what I'd expect today). That should reduce load on the S3 Store. With a move to parallel copy operations that will make a difference: the list comes in sequentially so the initial set of 10 renames will be adjacent. Actually I could be even cleverer there and sort by size. Why so? ensures that a large file doesn't become the straggler in the copy.

AmazonS3 client) {
Preconditions.checkArgument(isNotEmpty(s3Attributes.getBucket()),
"No Bucket");
Preconditions.checkArgument(isNotEmpty(s3Attributes.getKey()), "No Key");
Preconditions.checkArgument(contentLength >= 0, "Negative content length");
long l = s3Attributes.getLen();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll rename this to "length"; too easy to mix l and 1. I'd orginally used 'len' but the IDE told me off.

Tasks.foreach(commits)
.suppressExceptions()
.run(commit -> getCommitOperations().abortSingleCommit(commit));
try(CommitOperations.CommitContext commitContext
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a space


private final Callable<T> call;

CallableSupplier(final Callable<T> call) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be public. This class is to let me pass in a callable which raises an IOE as an arg to one of the java 8 operations which only allow RuntimeExceptions to be raised

@@ -46,7 +46,7 @@ public static void setupClusters() throws IOException {

@AfterClass
public static void teardownClusters() throws IOException {
clusterBinding.terminate();
terminateCluster(clusterBinding);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see YARN-9568

import static org.junit.Assert.assertEquals;

/**
* Unit test suite covering translation of AWS SDK exceptions to S3A exceptions,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integration test suite

* <li>repeated sorts do not change the order</li>
* </ol>
*/
final class PathOrderComparators {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are used to take the list of entries to add to a store and sort it so that topmost entries come first. This is to ensure that in the event of a partial failure during a put(list) operation, parent entries will have been added before children. That is: even if the DDB tables aren't complete w.r.t changes in the store (bad!) they do at least meet the requirement "no orphan entries"

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 32 Docker mode activated.
_ Prechecks _
+1 dupname 2 No case conflicting files found.
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 27 new or modified test files.
_ trunk Compile Tests _
0 mvndep 68 Maven dependency ordering for branch
-1 mvninstall 1073 root in trunk failed.
+1 compile 1118 trunk passed
+1 checkstyle 137 trunk passed
+1 mvnsite 116 trunk passed
+1 shadedclient 916 branch has no errors when building and testing our client artifacts.
+1 javadoc 105 trunk passed
0 spotbugs 69 Used deprecated FindBugs config; considering switching to SpotBugs.
+1 findbugs 190 trunk passed
_ Patch Compile Tests _
0 mvndep 22 Maven dependency ordering for patch
+1 mvninstall 81 the patch passed
+1 compile 1035 the patch passed
+1 javac 1035 the patch passed
-0 checkstyle 131 root: The patch generated 15 new + 95 unchanged - 3 fixed = 110 total (was 98)
+1 mvnsite 123 the patch passed
-1 whitespace 0 The patch has 51 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
-1 whitespace 1 The patch 600 line(s) with tabs.
+1 xml 3 The patch has no ill-formed XML file.
+1 shadedclient 649 patch has no errors when building and testing our client artifacts.
-1 javadoc 28 hadoop-tools_hadoop-aws generated 3 new + 1 unchanged - 0 fixed = 4 total (was 1)
+1 findbugs 205 the patch passed
_ Other Tests _
+1 unit 535 hadoop-common in the patch passed.
+1 unit 291 hadoop-aws in the patch passed.
-1 asflicense 45 The patch generated 1 ASF License warnings.
9589
Subsystem Report/Notes
Docker Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-843/1/artifact/out/Dockerfile
GITHUB PR #843
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml
uname Linux 796966f8c9ef 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 67f9a7b
Default Java 1.8.0_212
mvninstall https://builds.apache.org/job/hadoop-multibranch/job/PR-843/1/artifact/out/branch-mvninstall-root.txt
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-843/1/artifact/out/diff-checkstyle-root.txt
whitespace https://builds.apache.org/job/hadoop-multibranch/job/PR-843/1/artifact/out/whitespace-eol.txt
whitespace https://builds.apache.org/job/hadoop-multibranch/job/PR-843/1/artifact/out/whitespace-tabs.txt
javadoc https://builds.apache.org/job/hadoop-multibranch/job/PR-843/1/artifact/out/diff-javadoc-javadoc-hadoop-tools_hadoop-aws.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-843/1/testReport/
asflicense https://builds.apache.org/job/hadoop-multibranch/job/PR-843/1/artifact/out/patch-asflicense-problems.txt
Max. process+thread count 1654 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-843/1/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

@steveloughran steveloughran force-pushed the s3/HADOOP-15183-s3guard-rename-failures branch from ea61a61 to 24dbe82 Compare May 28, 2019 15:00
@steveloughran
Copy link
Contributor Author

The patch 600 line(s) with tabs.

not sure what's up there. Usually its stack traces in docs, but this is a lot.

@steveloughran
Copy link
Contributor Author

the 600 tabs are from the file ./hadoop-tools/hadoop-rumen/hs_err_pid15861.log; not relevant

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 33 Docker mode activated.
_ Prechecks _
+1 dupname 1 No case conflicting files found.
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 27 new or modified test files.
_ trunk Compile Tests _
0 mvndep 31 Maven dependency ordering for branch
+1 mvninstall 1158 trunk passed
+1 compile 1214 trunk passed
+1 checkstyle 188 trunk passed
+1 mvnsite 127 trunk passed
+1 shadedclient 1017 branch has no errors when building and testing our client artifacts.
+1 javadoc 89 trunk passed
0 spotbugs 65 Used deprecated FindBugs config; considering switching to SpotBugs.
+1 findbugs 190 trunk passed
-0 patch 97 Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
0 mvndep 27 Maven dependency ordering for patch
+1 mvninstall 83 the patch passed
+1 compile 1088 the patch passed
+1 javac 1088 the patch passed
-0 checkstyle 137 root: The patch generated 14 new + 95 unchanged - 3 fixed = 109 total (was 98)
+1 mvnsite 120 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 xml 3 The patch has no ill-formed XML file.
+1 shadedclient 620 patch has no errors when building and testing our client artifacts.
-1 javadoc 27 hadoop-tools_hadoop-aws generated 6 new + 1 unchanged - 0 fixed = 7 total (was 1)
+1 findbugs 191 the patch passed
_ Other Tests _
-1 unit 537 hadoop-common in the patch failed.
+1 unit 285 hadoop-aws in the patch passed.
-1 asflicense 43 The patch generated 17 ASF License warnings.
7224
Reason Tests
Failed junit tests hadoop.util.TestDiskCheckerWithDiskIo
Subsystem Report/Notes
Docker Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-843/2/artifact/out/Dockerfile
GITHUB PR #843
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml
uname Linux d160008701a7 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 0c73dba
Default Java 1.8.0_212
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-843/2/artifact/out/diff-checkstyle-root.txt
javadoc https://builds.apache.org/job/hadoop-multibranch/job/PR-843/2/artifact/out/diff-javadoc-javadoc-hadoop-tools_hadoop-aws.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-843/2/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-843/2/testReport/
asflicense https://builds.apache.org/job/hadoop-multibranch/job/PR-843/2/artifact/out/patch-asflicense-problems.txt
Max. process+thread count 1421 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-843/2/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

@steveloughran
Copy link
Contributor Author

unit test failures in hadoop common are unrelated; look like directory setup on test VM.

java.nio.file.NoSuchFileException: /home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-843/src/hadoop-common-project/hadoop-common/target/test/data/4/test3854891488020485558
	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
	at sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
	at java.nio.file.Files.createDirectory(Files.java:674)
	at java.nio.file.TempFileHelper.create(TempFileHelper.java:136)
	at java.nio.file.TempFileHelper.createTempDirectory(TempFileHelper.java:173)
	at java.nio.file.Files.createTempDirectory(Files.java:950)
	at org.apache.hadoop.util.TestDiskCheckerWithDiskIo.createTempDir(TestDiskCheckerWithDiskIo.java:169)
	at org.apache.hadoop.util.TestDiskCheckerWithDiskIo.checkDirs(TestDiskCheckerWithDiskIo.java:152)
	at org.apache.hadoop.util.TestDiskCheckerWithDiskIo.testDiskIoIgnoresTransientWriteErrors(TestDiskCheckerWithDiskIo.java:76)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 45 Docker mode activated.
_ Prechecks _
+1 dupname 1 No case conflicting files found.
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 28 new or modified test files.
_ trunk Compile Tests _
0 mvndep 72 Maven dependency ordering for branch
+1 mvninstall 1151 trunk passed
+1 compile 1062 trunk passed
+1 checkstyle 146 trunk passed
+1 mvnsite 120 trunk passed
+1 shadedclient 1032 branch has no errors when building and testing our client artifacts.
+1 javadoc 93 trunk passed
0 spotbugs 64 Used deprecated FindBugs config; considering switching to SpotBugs.
+1 findbugs 184 trunk passed
-0 patch 97 Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
0 mvndep 21 Maven dependency ordering for patch
+1 mvninstall 76 the patch passed
+1 compile 950 the patch passed
+1 javac 950 the patch passed
-0 checkstyle 147 root: The patch generated 15 new + 98 unchanged - 4 fixed = 113 total (was 102)
+1 mvnsite 118 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 xml 3 The patch has no ill-formed XML file.
+1 shadedclient 736 patch has no errors when building and testing our client artifacts.
-1 javadoc 32 hadoop-tools_hadoop-aws generated 6 new + 1 unchanged - 0 fixed = 7 total (was 1)
+1 findbugs 200 the patch passed
_ Other Tests _
+1 unit 523 hadoop-common in the patch passed.
+1 unit 285 hadoop-aws in the patch passed.
+1 asflicense 45 The patch does not generate ASF License warnings.
7115
Subsystem Report/Notes
Docker Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-843/3/artifact/out/Dockerfile
GITHUB PR #843
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml
uname Linux bbc4f008d07b 4.4.0-139-generic #165~14.04.1-Ubuntu SMP Wed Oct 31 10:55:11 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 544876f
Default Java 1.8.0_212
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-843/3/artifact/out/diff-checkstyle-root.txt
javadoc https://builds.apache.org/job/hadoop-multibranch/job/PR-843/3/artifact/out/diff-javadoc-javadoc-hadoop-tools_hadoop-aws.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-843/3/testReport/
Max. process+thread count 1346 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-843/3/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
0 reexec 49 Docker mode activated.
_ Prechecks _
+1 dupname 1 No case conflicting files found.
+1 @author 1 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 28 new or modified test files.
_ trunk Compile Tests _
0 mvndep 77 Maven dependency ordering for branch
+1 mvninstall 1150 trunk passed
+1 compile 1151 trunk passed
+1 checkstyle 144 trunk passed
+1 mvnsite 122 trunk passed
+1 shadedclient 1033 branch has no errors when building and testing our client artifacts.
+1 javadoc 97 trunk passed
0 spotbugs 66 Used deprecated FindBugs config; considering switching to SpotBugs.
+1 findbugs 200 trunk passed
-0 patch 98 Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
0 mvndep 23 Maven dependency ordering for patch
+1 mvninstall 84 the patch passed
+1 compile 1407 the patch passed
+1 javac 1407 the patch passed
-0 checkstyle 175 root: The patch generated 12 new + 98 unchanged - 4 fixed = 110 total (was 102)
+1 mvnsite 141 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 xml 5 The patch has no ill-formed XML file.
+1 shadedclient 762 patch has no errors when building and testing our client artifacts.
+1 javadoc 102 the patch passed
+1 findbugs 244 the patch passed
_ Other Tests _
+1 unit 604 hadoop-common in the patch passed.
+1 unit 295 hadoop-aws in the patch passed.
+1 asflicense 58 The patch does not generate ASF License warnings.
7933
Subsystem Report/Notes
Docker Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-843/4/artifact/out/Dockerfile
GITHUB PR #843
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml
uname Linux 174a51b985e1 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 751f0df
Default Java 1.8.0_212
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-843/4/artifact/out/diff-checkstyle-root.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-843/4/testReport/
Max. process+thread count 1387 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-843/4/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@ajfabbri ajfabbri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still looking through this patch. Like what I'm seeing so far. Couple of random comments. I'll try to resume tomorrow or Friday

* (There's always a block for that, it just makes more sense to
* perform the bulk delete after another block of copies have completed).
*/
public static final int RENAME_PARALLEL_LIMIT = 10;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, we can always make this tunable later if we care.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be some coordination with the Hive community on this, where there's another layer of parallelization. I think they'd currently have to be doing individual file renames, and if so I think they can just keep going, this change can go out, and after that they might want to remove their parallelization layer. And they don't have to interact. But I hope I'm right about that and this isn't going to cause some huge spike in number of concurrent copies or something.

// the limit of active copies has been reached;
// wait for completion or errors to surface.
LOG.debug("Waiting for active copies to complete");
completeActiveCopies.apply("batch threshold reached");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be a significant improvement in runtime here. We can pipeline this in the future to eke out more performance. As we've been discussing on refactoring here: Might want to break up this rename code later on as well. Maybe just helper functions to decompose into smaller blocks?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, a refactoring would be much needed here. this part is getting huge. it's harder to review and later to debug.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gabor: This is my attempt to clean it up. The rename logic has all been pulled out of S3AFS into the trackers and I'm defining a couple of lambda expressions in the method to avoid copy and paste duplication and effort.

The only way to do anything else would be to pull the entire code "somewhere", but because of all the calls into S3AFS, that will only make sense as part of a relayering of ops: we'd have this to work at the "object store + S3Guard layer", rather than through the FS APIs. That's a major change and not going to happen in a patch which is big enough as it is.

List<Path> undeletedObjects = new ArrayList<>();
try {
// remove the keys
// this does will update the metastore on a failure, but on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: /does will/does/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another nit: the asymmetry here in metadatastore updating (only on failure) feels a bit odd

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, it grates on me too. But the renameTracker is invoked on success, so it is all contained in this operation.

*/
@VisibleForTesting
@Retries.RetryMixed
void removeKeys(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a test that exercises the partial failure logic here? (I may answer this as I get further in the review)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes: ITestS3AFailureHandling testMultiObjectDeleteNoPermissions()

"S3Guard metadata store records read"),
S3GUARD_METADATASTORE_RECORD_WRITES(
"s3guard_metadatastore_record_writes",
"S3Guard metadata store records written"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this could be interesting for comparing different rename trackers? If so I'm curious what sort of stats / deltas you get with the new logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. With the ancestor tracking the new renamer shouldn't do any more than before. And I can verify that the commit operations don't do duplicate puts of all the parents, the way they do currently

@steveloughran
Copy link
Contributor Author

thanks all for reviews, will go through them. FWIW, the latest patch has checks on dir updates if the batch update overwrites a file updated in the ongoing operation with a dir entry. And its failing in a prune test (one which shouldn't IMO, be doing a full store-wide prune)


[ERROR] testPruneCommandConf(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardToolDynamoDB)  Time elapsed: 15.828 s  <<< ERROR!
org.apache.hadoop.fs.PathIOException: `s3a://hwdev-steve-ireland-new/fork-0007/test/ITestMagicCommitProtocol-testParallelJobsToAdjacentPaths': Duplicate and inconsistent entry in update operation
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore$AncestorState.findDirectory(DynamoDBMetadataStore.java:2184)
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.completeAncestry(DynamoDBMetadataStore.java:820)
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.innerPut(DynamoDBMetadataStore.java:1144)
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.removeAuthoritativeDirFlag(DynamoDBMetadataStore.java:1418)
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.prune(DynamoDBMetadataStore.java:1366)
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.prune(DynamoDBMetadataStore.java:1325)
	at org.apache.hadoop.fs.s3a.s3guard.AbstractS3GuardToolTestBase.testPruneCommand(AbstractS3GuardToolTestBase.java:295)
	at org.apache.hadoop.fs.s3a.s3guard.AbstractS3GuardToolTestBase.testPruneCommandConf(AbstractS3GuardToolTestBase.java:320)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)

@steveloughran steveloughran force-pushed the s3/HADOOP-15183-s3guard-rename-failures branch from e7453a4 to bca1021 Compare June 4, 2019 17:42
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 31 Docker mode activated.
_ Prechecks _
+1 dupname 1 No case conflicting files found.
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 30 new or modified test files.
_ trunk Compile Tests _
0 mvndep 75 Maven dependency ordering for branch
+1 mvninstall 1383 trunk passed
+1 compile 1261 trunk passed
+1 checkstyle 165 trunk passed
+1 mvnsite 152 trunk passed
+1 shadedclient 1162 branch has no errors when building and testing our client artifacts.
+1 javadoc 110 trunk passed
0 spotbugs 74 Used deprecated FindBugs config; considering switching to SpotBugs.
+1 findbugs 217 trunk passed
-0 patch 109 Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
0 mvndep 25 Maven dependency ordering for patch
+1 mvninstall 94 the patch passed
+1 compile 1248 the patch passed
+1 javac 1248 the patch passed
-0 checkstyle 164 root: The patch generated 27 new + 97 unchanged - 5 fixed = 124 total (was 102)
+1 mvnsite 134 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 xml 3 The patch has no ill-formed XML file.
+1 shadedclient 786 patch has no errors when building and testing our client artifacts.
-1 javadoc 31 hadoop-tools_hadoop-aws generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1)
+1 findbugs 250 the patch passed
_ Other Tests _
+1 unit 615 hadoop-common in the patch passed.
-1 unit 69 hadoop-aws in the patch failed.
-1 asflicense 51 The patch generated 7 ASF License warnings.
8075
Reason Tests
Failed junit tests hadoop.fs.s3a.TestS3AGetFileStatus
hadoop.fs.s3a.TestStreamChangeTracker
Subsystem Report/Notes
Docker Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-843/5/artifact/out/Dockerfile
GITHUB PR #843
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml
uname Linux 0cea5ead54a0 4.4.0-143-generic #169~14.04.2-Ubuntu SMP Wed Feb 13 15:00:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 97607f3
Default Java 1.8.0_212
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-843/5/artifact/out/diff-checkstyle-root.txt
javadoc https://builds.apache.org/job/hadoop-multibranch/job/PR-843/5/artifact/out/diff-javadoc-javadoc-hadoop-tools_hadoop-aws.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-843/5/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-843/5/testReport/
asflicense https://builds.apache.org/job/hadoop-multibranch/job/PR-843/5/artifact/out/patch-asflicense-problems.txt
Max. process+thread count 1460 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-843/5/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

@steveloughran steveloughran force-pushed the s3/HADOOP-15183-s3guard-rename-failures branch from bca1021 to 8c65dad Compare June 5, 2019 13:11
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 31 Docker mode activated.
_ Prechecks _
+1 dupname 1 No case conflicting files found.
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 32 new or modified test files.
_ trunk Compile Tests _
0 mvndep 21 Maven dependency ordering for branch
+1 mvninstall 1027 trunk passed
+1 compile 1081 trunk passed
+1 checkstyle 127 trunk passed
+1 mvnsite 115 trunk passed
+1 shadedclient 915 branch has no errors when building and testing our client artifacts.
+1 javadoc 82 trunk passed
0 spotbugs 63 Used deprecated FindBugs config; considering switching to SpotBugs.
+1 findbugs 181 trunk passed
-0 patch 93 Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
0 mvndep 23 Maven dependency ordering for patch
+1 mvninstall 78 the patch passed
+1 compile 1014 the patch passed
+1 javac 1014 the patch passed
-0 checkstyle 130 root: The patch generated 28 new + 102 unchanged - 5 fixed = 130 total (was 107)
+1 mvnsite 108 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 xml 2 The patch has no ill-formed XML file.
+1 shadedclient 620 patch has no errors when building and testing our client artifacts.
-1 javadoc 28 hadoop-tools_hadoop-aws generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1)
+1 findbugs 191 the patch passed
_ Other Tests _
+1 unit 537 hadoop-common in the patch passed.
+1 unit 295 hadoop-aws in the patch passed.
+1 asflicense 54 The patch does not generate ASF License warnings.
6748
Subsystem Report/Notes
Docker Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-843/6/artifact/out/Dockerfile
GITHUB PR #843
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml
uname Linux 11e0754b52cd 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 309501c
Default Java 1.8.0_212
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-843/6/artifact/out/diff-checkstyle-root.txt
javadoc https://builds.apache.org/job/hadoop-multibranch/job/PR-843/6/artifact/out/diff-javadoc-javadoc-hadoop-tools_hadoop-aws.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-843/6/testReport/
Max. process+thread count 1600 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-843/6/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

steveloughran and others added 9 commits June 11, 2019 14:28
This also fixed up ITestDynamoDBMetadataStore to do better destroy of test tables, as some were leaking.

Change-Id: I7e70e743628498c3cbbe21c6cda796b08fd39d8d
…thout fixing it, do at least avoid teardown errors

Change-Id: I869d943d9c1a110e27710ffb8e638b918db44574
Change-Id: I3a5337d36086a109687de16a8395ff8d7aa1b644
*Tests are failing; this is an interim patch*

This makes S3Guard.addAncestors aware of the state of the bulk operation by pushing down the whole ancestor search and add problem into the store itself.

Because each store can have its own model of bulk operation state, it can use this (on a bulk operation, obviously) to skip any queries of a remote DB to see if a path exists. And when there's a fallback to a get(path), and that finds a path entry, that's added to the state; no need to check again.

- we could go better and add all parents of a found path to the bulk state; that way all ancestor creation in a put(path) will know to skip the parent entries
- which may imply changing the contents of the DDBMetastore's AncestorState

I've had to push the addAncestors() call into the metastores themselve to get this to work, as only DDB maintains a state map. Alternatively, we expose some contains(path), add(path) entry to the base class (Better: make that an interface; have the impl do a set<Path>) from the outset. Although the DDB AncestorState stores the state as a map of Path -> DDBPathMetadata, its only existence which is checked for.

Change-Id: I64bf0bc372654af0471e13c324a7af0454615960
Change-Id: I9bceb5c5f30a555a63119ea6fdba24b023fc5458
* AncestorState explicitly exports its put/contains operations and makes inner field private.
* the AncestorState set isn't updated before entries are put() (some duplicate entries are still being logged)
* More log statements as needed to debug what is happening
* Improved javadocs on new RenameTracker classes.
* TestCommitOperations was failing a lot; eventually concluded that the DDB table was inconsistent. If you have a tombstone marker you can't delete its children, but if you do a GET on a child it will surface. This has not been fixed: tests simply clear up better. An FSCK is needed here.

TODO
* Tests to make sure that all directories are deleted in a rename; empty and implied.
* Review of ProgressiveRenameTracker's end-of-rename operation to see if it is adding too many delete calls
* See if I can think of more tests

Change-Id: Ic062406f19bc7d4bf1ab794bd26e85691027235c
Change-Id: I47aef155a0f60da10ec34d30e6e9bc4de88b7fcd
* partial failure of delete when handling fake directories
* Add a TODO comment where I need to do something
* Fix a typo on a method name in TestPartialDeleteFailures

Change-Id: Ic231d9dceb37ce4a726620e8d3343bf353fdf0c2
…e committing a job, so we can verify that job commits remove them

Change-Id: I09805332a25bc8eb4158f9f162f43b97ff3a3c57
…ions is tracked

ITestDynamoDBMetadataStore picks up the scale timeout

Change-Id: I15428862ad5b8bf9e2713f9fd03f1645b5b4ef10
* getting the direct to DDB tests to scale better; especially ITestDynamoDBMetadataStore mainly by breaking up up the testBatchWrite test into five separate ones, with individual cleanup, and switching the cleanup from forgetMetadata to prune()...the latter does batch deletes in parallel and is a lot faster. As it was, on smaller stores, the throttling in teardown was the common source of failure. ThrottleTracker is pulled from the scale test for reuse in the non-scale option.
* Metastore.put(DirListingMetadata) now takes a BulkOperationState and passes it down. This is not actively in use nor exported all the way up, though everything is ready in the store. I'm not just convinced it is needed yet.

I've managed to change things enough for the ITestDynamoDBMetadataStore test to fail on the maven test runs, but not in the IDE. Also got a failure in ITestMagicCommitMRJob where there wasn't a cleanup of the underlying attempt dir

Change-Id: I80169b12baacbecfb63970c0cf7ea061fe3ab117
…and generated inconsistency in the same operation.

The DDB metastore now raises a PathIOE if the operation state lists a file at a path and another operation scans for a parent directory there.
This is only going to happen in one of: deliberate test, someone has been creating files under files.

Change-Id: Ie22cfa6dafa3b9cbc4533f82b115af432a1c4dfb
…rState failure on a file found is now optionally downgradeable to a warn.

Still unsure what to do there, as its a symptom of a problem: there's a file in the store which is a parent of another file. For prune, we can downgrade this
to a warn and then rely on FSCK to fix

Change-Id: I2f2cf9b85bbd3acd8a606e669c1b7e27a9db16f4
@steveloughran steveloughran force-pushed the s3/HADOOP-15183-s3guard-rename-failures branch from 8c65dad to 38d1f75 Compare June 11, 2019 16:16
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 33 Docker mode activated.
_ Prechecks _
+1 dupname 3 No case conflicting files found.
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 32 new or modified test files.
_ trunk Compile Tests _
0 mvndep 22 Maven dependency ordering for branch
+1 mvninstall 1025 trunk passed
+1 compile 1014 trunk passed
+1 checkstyle 136 trunk passed
+1 mvnsite 131 trunk passed
+1 shadedclient 992 branch has no errors when building and testing our client artifacts.
+1 javadoc 107 trunk passed
0 spotbugs 69 Used deprecated FindBugs config; considering switching to SpotBugs.
+1 findbugs 188 trunk passed
-0 patch 109 Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
0 mvndep 24 Maven dependency ordering for patch
-1 mvninstall 30 hadoop-aws in the patch failed.
-1 compile 927 root in the patch failed.
-1 javac 927 root in the patch failed.
-0 checkstyle 143 root: The patch generated 28 new + 100 unchanged - 2 fixed = 128 total (was 102)
-1 mvnsite 46 hadoop-aws in the patch failed.
+1 whitespace 0 The patch has no whitespace issues.
+1 xml 3 The patch has no ill-formed XML file.
+1 shadedclient 686 patch has no errors when building and testing our client artifacts.
-1 javadoc 33 hadoop-tools_hadoop-aws generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1)
-1 findbugs 46 hadoop-aws in the patch failed.
_ Other Tests _
+1 unit 531 hadoop-common in the patch passed.
-1 unit 46 hadoop-aws in the patch failed.
+1 asflicense 51 The patch does not generate ASF License warnings.
6563
Subsystem Report/Notes
Docker Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-843/7/artifact/out/Dockerfile
GITHUB PR #843
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml
uname Linux 221b675e67c7 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / b057479
Default Java 1.8.0_212
mvninstall https://builds.apache.org/job/hadoop-multibranch/job/PR-843/7/artifact/out/patch-mvninstall-hadoop-tools_hadoop-aws.txt
compile https://builds.apache.org/job/hadoop-multibranch/job/PR-843/7/artifact/out/patch-compile-root.txt
javac https://builds.apache.org/job/hadoop-multibranch/job/PR-843/7/artifact/out/patch-compile-root.txt
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-843/7/artifact/out/diff-checkstyle-root.txt
mvnsite https://builds.apache.org/job/hadoop-multibranch/job/PR-843/7/artifact/out/patch-mvnsite-hadoop-tools_hadoop-aws.txt
javadoc https://builds.apache.org/job/hadoop-multibranch/job/PR-843/7/artifact/out/diff-javadoc-javadoc-hadoop-tools_hadoop-aws.txt
findbugs https://builds.apache.org/job/hadoop-multibranch/job/PR-843/7/artifact/out/patch-findbugs-hadoop-tools_hadoop-aws.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-843/7/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-843/7/testReport/
Max. process+thread count 1361 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-843/7/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

* Add HADOOP-16364: translate timeout on destroy() into new exception: TableDeleteTimeoutException.
* ITestDynamoDBMetadataStore performance. Coalescing three tests into one, and inserting a new check for version marker missing a version field. The tests
which create and destroy tables are a key part of the bottleneck here, which is where the coalescing makes a big difference. FWIW,
I think we could actually use the shared table for the version checking and just restore it after, to save 60s or so.
* Trying to recreate the failure I saw where Prune failed because of an inconsistent ancestor write.

I can't seem to replicate the failure (and I deleted the table with the problem). There's now scope in the metastore to decide what to do if there is an inconsistency (throw vs warn), and I'm wondering if we need to use the operation state to choose how to react. That is: on a duplicate, inconsistent prune, do "something" else. But what?

Note: if inconsistencies in the store can be amplified by prune, this is not a new problem. It's by tracking the ancestor state that it is being detected.

Change-Id: I2e83d986a1f072d98251443e0f9c65c7036ba8ce
@@ -1326,8 +1330,25 @@

<property>
<name>fs.s3a.max.total.tasks</name>
<value>5</value>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about sticking HADOOP-15729 in here? Essentially just providing 0 as the number of core threads to the executor, otherwise a lot of these threads never get cleaned up when idle. This manifests as a massive number of idle threads in long-running services like Impala and Hive Metastore when you've accessed many different buckets. I believe this patch would make the problem worse, so it's a good time to toss in the trivial fix and test it all together,

* After the operation has completed, it <i>MUST</i> be closed so
* as to guarantee that all state is released.
*/
public class BulkOperationState implements Closeable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see when this ever gets associated with any actual state. What's the point of it?

*/
@RetryTranslated
void put(PathMetadata meta,
@Nullable BulkOperationState BulkOperationState) throws IOException;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A variable with the same name as it's type - is this more Java 8 craziness I haven't seen before? :)

checkArgument(!pathsToDelete.contains(sourcePath),
"File being renamed is already processed %s", destPath);
// create the file metadata and update the local structures.
S3Guard.addMoveFile(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully by the time I'm done reviewing I'll understand the architecture here enough to answer this myself, but we've previously talked about an alternate rename strategy where we create everything, and then we delete everything, so there's always 1 copy of the complete data. How well does this change fit in with that idea? In the event that there's a failure we delete the incomplete copy on retry. If that's the source, we're now done. If that's the destination, we now start over. Nice and recoverable, and until it succeeds you haven't disturbed anything.


// check if UNGUARDED_FLAG is passed and use NullMetadataStore in
// config to avoid side effects like creating the table if not exists
Configuration conf0 = getConf();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename this to something like unguardedConf please

Change-Id: I1185b21cb14d9c2238a27dc00f5eac930959f1a4
@steveloughran
Copy link
Contributor Author

I've just pushed up a new version which includes the checkstyle complaints and the various nits reported. Tested s3 ireland. All good except for the new test where I'm trying to create an inconsistency in the table to cause problems in prune. I think I'm going to put that to one side right now and worry about that in some tests for fsck in a subsequent patch

After this iteration I'm going to have to move to a new PR; yetus is unhappy again.

@steveloughran steveloughran force-pushed the s3/HADOOP-15183-s3guard-rename-failures branch from 47e9d94 to 115fb77 Compare June 11, 2019 22:17
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 111 Docker mode activated.
_ Prechecks _
+1 dupname 2 No case conflicting files found.
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 32 new or modified test files.
_ trunk Compile Tests _
0 mvndep 21 Maven dependency ordering for branch
+1 mvninstall 1140 trunk passed
+1 compile 970 trunk passed
+1 checkstyle 140 trunk passed
+1 mvnsite 122 trunk passed
+1 shadedclient 1045 branch has no errors when building and testing our client artifacts.
+1 javadoc 105 trunk passed
0 spotbugs 76 Used deprecated FindBugs config; considering switching to SpotBugs.
+1 findbugs 221 trunk passed
-0 patch 116 Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
0 mvndep 24 Maven dependency ordering for patch
+1 mvninstall 104 the patch passed
+1 compile 1207 the patch passed
+1 javac 1207 the patch passed
-0 checkstyle 161 root: The patch generated 31 new + 100 unchanged - 2 fixed = 131 total (was 102)
+1 mvnsite 148 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 xml 2 The patch has no ill-formed XML file.
+1 shadedclient 803 patch has no errors when building and testing our client artifacts.
-1 javadoc 39 hadoop-tools_hadoop-aws generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1)
+1 findbugs 235 the patch passed
_ Other Tests _
+1 unit 584 hadoop-common in the patch passed.
+1 unit 293 hadoop-aws in the patch passed.
+1 asflicense 44 The patch does not generate ASF License warnings.
7659
Subsystem Report/Notes
Docker Client=18.09.5 Server=18.09.5 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-843/8/artifact/out/Dockerfile
GITHUB PR #843
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml
uname Linux e1e471b5943e 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 08:28:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 5740eea
Default Java 1.8.0_212
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-843/8/artifact/out/diff-checkstyle-root.txt
javadoc https://builds.apache.org/job/hadoop-multibranch/job/PR-843/8/artifact/out/diff-javadoc-javadoc-hadoop-tools_hadoop-aws.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-843/8/testReport/
Max. process+thread count 1348 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-843/8/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 31 Docker mode activated.
_ Prechecks _
+1 dupname 1 No case conflicting files found.
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 32 new or modified test files.
_ trunk Compile Tests _
0 mvndep 23 Maven dependency ordering for branch
+1 mvninstall 1094 trunk passed
+1 compile 1145 trunk passed
+1 checkstyle 144 trunk passed
+1 mvnsite 120 trunk passed
+1 shadedclient 975 branch has no errors when building and testing our client artifacts.
+1 javadoc 82 trunk passed
0 spotbugs 74 Used deprecated FindBugs config; considering switching to SpotBugs.
+1 findbugs 201 trunk passed
-0 patch 105 Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
0 mvndep 22 Maven dependency ordering for patch
+1 mvninstall 82 the patch passed
+1 compile 1009 the patch passed
+1 javac 1009 the patch passed
-0 checkstyle 144 root: The patch generated 20 new + 100 unchanged - 2 fixed = 120 total (was 102)
+1 mvnsite 126 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 xml 3 The patch has no ill-formed XML file.
+1 shadedclient 696 patch has no errors when building and testing our client artifacts.
+1 javadoc 99 the patch passed
+1 findbugs 223 the patch passed
_ Other Tests _
-1 unit 504 hadoop-common in the patch failed.
+1 unit 293 hadoop-aws in the patch passed.
+1 asflicense 51 The patch does not generate ASF License warnings.
7086
Reason Tests
Failed junit tests hadoop.util.TestDiskChecker
Subsystem Report/Notes
Docker Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-843/9/artifact/out/Dockerfile
GITHUB PR #843
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml
uname Linux 86f97d3cf0fe 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 4fecc2a
Default Java 1.8.0_212
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-843/9/artifact/out/diff-checkstyle-root.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-843/9/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-843/9/testReport/
Max. process+thread count 1348 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-843/9/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

@steveloughran
Copy link
Contributor Author

closed to create a squashed one and keep yetus happy; rebase to trunk first.

@steveloughran
Copy link
Contributor Author

new PR is #951. Please can people review that as it is ~ready to go in and in need of criticism

shanthoosh pushed a commit to shanthoosh/hadoop that referenced this pull request Oct 15, 2019
This PR makes the following changes:
* Moves all the state-restore code from TaskStorageManager.scala to ContainerStorageManager (and its internal private java classes).
* Introduces a StoreMode in StorageEngineFactory.getStorageEngine to add a StoreMode enum.
* Changes RocksDB store creation to use that enum and use Rocksdb's bulk load option when creating store in bulk-load mode.
* Changes the ContainerStorageManager to create stores in BulkLoad mode when restoring, then closes such persistent and changelogged stores, and re-opens them in Read-Write mode.
* Adds tests for ContainerStorageManager and changes tests for TaskStorageManager accordingly.

Author: Ray Matharu <rmatharu@linkedin.com>
Author: rmatharu <40646191+rmatharu@users.noreply.github.com>

Reviewers: Jagadish Venkatraman <vjagadish1989@gmail.com>

Closes apache#843 from rmatharu/refactoringCSM
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants