Skip to content

HDDS-1982 Extend SCMNodeManager to support decommission and maintenance states #1344

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 12 commits into from

Conversation

sodonnel
Copy link
Contributor

Remove the existing decommission states from the protobuf definition.

At this stage, this PR is really a test to see if the build passes with these states removed.

@sodonnel
Copy link
Contributor Author

/label ozone

@elek elek added the ozone label Aug 23, 2019
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 41 Docker mode activated.
_ Prechecks _
+1 dupname 0 No case conflicting files found.
+1 @author 0 The patch does not contain any @author tags.
-1 test4tests 0 The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
0 mvndep 75 Maven dependency ordering for branch
+1 mvninstall 668 trunk passed
+1 compile 381 trunk passed
+1 checkstyle 80 trunk passed
+1 mvnsite 0 trunk passed
+1 shadedclient 862 branch has no errors when building and testing our client artifacts.
+1 javadoc 179 trunk passed
0 spotbugs 448 Used deprecated FindBugs config; considering switching to SpotBugs.
+1 findbugs 661 trunk passed
_ Patch Compile Tests _
0 mvndep 25 Maven dependency ordering for patch
+1 mvninstall 559 the patch passed
+1 compile 393 the patch passed
+1 cc 393 the patch passed
+1 javac 393 the patch passed
+1 checkstyle 88 the patch passed
+1 mvnsite 0 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 shadedclient 716 patch has no errors when building and testing our client artifacts.
+1 javadoc 201 the patch passed
+1 findbugs 783 the patch passed
_ Other Tests _
+1 unit 371 hadoop-hdds in the patch passed.
-1 unit 263 hadoop-ozone in the patch failed.
+1 asflicense 58 The patch does not generate ASF License warnings.
6570
Reason Tests
Failed junit tests hadoop.ozone.security.TestOzoneDelegationTokenSecretManager
Subsystem Report/Notes
Docker Client=19.03.1 Server=19.03.1 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/1/artifact/out/Dockerfile
GITHUB PR #1344
Optional Tests dupname asflicense compile cc mvnsite javac unit javadoc mvninstall shadedclient findbugs checkstyle
uname Linux dd5f8f252930 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / b69ac57
Default Java 1.8.0_222
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/1/artifact/out/patch-unit-hadoop-ozone.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/1/testReport/
Max. process+thread count 1138 (vs. ulimit of 5500)
modules C: hadoop-hdds/common hadoop-hdds/server-scm hadoop-hdds/tools U: hadoop-hdds
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/1/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

@sodonnel
Copy link
Contributor Author

This got a few test failures. TestSCMNodeMetrics was a legitimate failure, I have fixed it.

TestSecureContainerServer.testClientServerRatisGrpc() was failing on trunk, but has now been fixed.

TestBlockOutputStreamWithFailures.testWatchForCommitDatanodeFailure() seems flaky. It has passed and failed a few times locally.

@sodonnel sodonnel force-pushed the HDDS-1982-decom-states branch 2 times, most recently from e22b8d2 to a131589 Compare September 2, 2019 16:53
@sodonnel sodonnel force-pushed the HDDS-1982-decom-states branch from a131589 to b489d6a Compare September 4, 2019 10:03
Copy link
Contributor

@anuengineer anuengineer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks amazing; much cleaner and far better than I thought it will be. I can emphatically +1 this. You are in the right direction. Thank you for doing this so thoughtfully.

nodeOpStateSM.addTransition(
NodeOperationalState.ENTERING_MAINTENANCE,
NodeOperationalState.IN_MAINTENANCE,
NodeOperationStateEvent.ENTER_MAINTENANCE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From an English point of view, this is slightly confusing. But I see why :)

NodeOperationStateEvent.ENTER_MAINTENANCE);
nodeOpStateSM.addTransition(
NodeOperationalState.IN_MAINTENANCE, NodeOperationalState.IN_SERVICE,
NodeOperationStateEvent.RETURN_TO_SERVICE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we handle the edge of timeOut, Maintenance might have time out -- that is I put the maintenance for one day and forget about it. Or is that handled outside the state machine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't considered where to store that as yet. Probably it will be outside of the state machine, but need to consider where it fits in. Perhaps in NodeStatus, but that would change that object from being immutable, to carrying a time.

We will need some sort of decommission / maintenance mode monitor, probably separate from the heartbeat monitor. The decomm monitor will need to check when all blocks are replicated etc, so it could also keep track of the node maintenance timeout and hence switch the node to 'IN_SERVICE + DEAD" if it is dead and the timeout expires.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Along with your consideration, do we need an edge called Timeout that leads from IN_MAINTENANCE to IN_SERVICE? or do you plan to send in RETURN_TO_SERVICE event when there is a timeout? Either works, I was wondering if we should capture the time out edge in the state machine at all ?

updateNodeState(node, healthyNodeCondition, state,
NodeLifeCycleEvent.RESURRECT);
break;
default:
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why we need this loop change, but it does make code reading simpler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop didn't need to change for this change, but it seemed to be a double loop when it didn't really need to be, and was doing extra lookups from the NodeStateMap, so this makes it cleaner to read and slightly more efficient too.

return new NodeStatus(HddsProtos.NodeOperationalState.IN_SERVICE,
HddsProtos.NodeState.DEAD);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am presuming that you have to define the whole cross product at some point, but right now this is all we need?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I got tired of typing the whole new NodeStatus(...) and decided to try adding the static methods. It definitely makes the code cleaner, but the cross product worries me. At the moment its only 5 * 3 = 15 states, but what if we add a 3rd status or a couple more states. The number of helper methods will get out of control. We can see how it develops I guess.

@@ -185,7 +190,7 @@ public int getNodeCount(NodeState nodestate) {
@Override
public NodeState getNodeState(DatanodeDetails datanodeDetails) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to write alternate version which take the operational status too ..since these calls are internal. Again, not something that need to be done in this patch. I am just writing down things as I see them. Please don't treat any of my suggests as a code review thought. More like, something that might be useful in the long run is more appropriate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the 'external interface' of SCMNodeManager will need to change but I want to get these changes to be good internally before we push them up the stack.

Thanks for taking the time to review this WIP. Glad to hear this is going in the correct direction so I will look to tidy things up and then we can consider the next step.

Copy link
Contributor

@nandakumar131 nandakumar131 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the change looks pretty good, +1 on the approach.

@@ -43,7 +45,7 @@
/**
* Represents the current state of node.
*/
private final ConcurrentHashMap<NodeState, Set<UUID>> stateMap;
private final ConcurrentHashMap<UUID, NodeStatus> stateMap;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stateMap is no longer required, we can move NodeStatus inside DatanodeInfo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it makes sense to have a field inside DatanodeInfo of type NodeStatus, so we can always pass the states around as a pair, or should we add two individual fields to DatanodeInfo - nodeHealth and nodeOperationalState?

Also, one other thing to consider, is nodeStateMap originally kept a list of healthy, stale and dead, so it was possible to quickly return all nodes in that state. However now, we need to iterate over the whole list to find those nodes. One reason for this, is that we have 15 different states now instead of 3. If we move nodeStatus into datanodeInfo, it would be more difficult to optimise this later if needed. However it would simplify things if we simply remove this stateMap.

Copy link
Contributor

@nandakumar131 nandakumar131 Sep 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to have NodeStatus inside DatanodeInfo rather than having two separate fields.

Yes, stateMap helped us to easily get the list/count of nodes in a specific state, but with the current changes it is not straight forward to maintain state -> list of nodes. In any case we will be iterating over all the available nodes to get list of nodes in a given state.
The number of nodes in a cluster should not go beyond 3-4 order of magnitude. We can re-visit and optimize this, if we run into any performance issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even if you have 15x states, the number of nodes is less. if you have 100 nodes, there are only 1500 states, and if you have 1000 nodes, it is 15000 states. It is still trivial to keep these in memory. Here is the real kicker, just like we decided not to write all cross products for the NodeState static functions, we will end up needing lists of only frequently accessed pattern (in mind that would be (in_service, healthy). All other node queries can be retrieved by iterating the lists as needed.

@anuengineer
Copy link
Contributor

Just a note; Originally DatanodeInfo was based on the HDFS code. Then I think we copied and created our own structure. At this point, diverging should not be a big deal is what I think.

@sodonnel sodonnel force-pushed the HDDS-1982-decom-states branch from e52e3b8 to 1053f63 Compare September 10, 2019 15:21
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 82 Docker mode activated.
_ Prechecks _
+1 dupname 0 No case conflicting files found.
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
0 mvndep 67 Maven dependency ordering for branch
+1 mvninstall 628 trunk passed
+1 compile 390 trunk passed
+1 checkstyle 75 trunk passed
+1 mvnsite 0 trunk passed
+1 shadedclient 948 branch has no errors when building and testing our client artifacts.
+1 javadoc 175 trunk passed
0 spotbugs 459 Used deprecated FindBugs config; considering switching to SpotBugs.
+1 findbugs 682 trunk passed
_ Patch Compile Tests _
0 mvndep 31 Maven dependency ordering for patch
+1 mvninstall 576 the patch passed
+1 compile 388 the patch passed
+1 cc 388 the patch passed
+1 javac 388 the patch passed
-0 checkstyle 37 hadoop-hdds: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 mvnsite 0 the patch passed
+1 whitespace 1 The patch has no whitespace issues.
+1 shadedclient 736 patch has no errors when building and testing our client artifacts.
+1 javadoc 179 the patch passed
-1 findbugs 215 hadoop-hdds generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0)
_ Other Tests _
-1 unit 298 hadoop-hdds in the patch failed.
-1 unit 3499 hadoop-ozone in the patch failed.
+1 asflicense 61 The patch does not generate ASF License warnings.
9701
Reason Tests
FindBugs module:hadoop-hdds
Dead store to nodes in org.apache.hadoop.hdds.scm.node.NodeStateManager.getAllNodes() At NodeStateManager.java:org.apache.hadoop.hdds.scm.node.NodeStateManager.getAllNodes() At NodeStateManager.java:[line 396]
org.apache.hadoop.hdds.scm.node.states.NodeStateMap.getNodes(NodeStatus) does not release lock on all exception paths At NodeStateMap.java:on all exception paths At NodeStateMap.java:[line 156]
Failed junit tests hadoop.hdds.scm.block.TestBlockManager
hadoop.ozone.container.TestContainerReplication
hadoop.ozone.TestSecureOzoneCluster
hadoop.ozone.scm.node.TestQueryNode
hadoop.ozone.client.rpc.TestOzoneRpcClient
hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis
hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures
hadoop.ozone.scm.TestContainerSmallFile
hadoop.ozone.client.rpc.Test2WayCommitInRatis
hadoop.ozone.TestMiniChaosOzoneCluster
hadoop.ozone.container.common.statemachine.commandhandler.TestBlockDeletion
hadoop.ozone.scm.TestGetCommittedBlockLengthAndPutKey
hadoop.ozone.client.rpc.TestDeleteWithSlowFollower
Subsystem Report/Notes
Docker Client=19.03.1 Server=19.03.1 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/3/artifact/out/Dockerfile
GITHUB PR #1344
Optional Tests dupname asflicense compile cc mvnsite javac unit javadoc mvninstall shadedclient findbugs checkstyle
uname Linux 7a7a07260082 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / dc9abd2
Default Java 1.8.0_222
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/3/artifact/out/diff-checkstyle-hadoop-hdds.txt
findbugs https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/3/artifact/out/new-findbugs-hadoop-hdds.html
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/3/artifact/out/patch-unit-hadoop-hdds.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/3/artifact/out/patch-unit-hadoop-ozone.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/3/testReport/
Max. process+thread count 4364 (vs. ulimit of 5500)
modules C: hadoop-hdds/common hadoop-hdds/server-scm hadoop-hdds/tools hadoop-ozone/integration-test U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/3/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

@@ -417,9 +451,12 @@ private SCMNodeStat getNodeStatInternal(DatanodeDetails datanodeDetails) {

@Override
public Map<String, Integer> getNodeCount() {
// TODO - This does not consider decom, maint etc.
Map<String, Integer> nodeCountMap = new HashMap<String, Integer>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not <NodeState, Integer>? It makes it easier to consume for the caller in my opinion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing code had Map<String, Integer>, but I agree it would be better with <NodeState, Integer> or <NodeStatus, Integer>. I plan to leave this as is for now, as this method is used only for JMX right now, and I plan to split that out into a separate change via HDDS-2113 as there are some open questions there.

*/
private List<DatanodeInfo> filterNodes(
NodeOperationalState opState, NodeState health) {
if (opState != null && health != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we write Line 395-440 with one simple stream().filter? Nothing wrong with code itself but just a thought.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had not really looked into the Streams API before, but I change the code to use streams and it does make it easier to follow, so I have made this change. I still kept the IF statements at the start of the method as if both params are null we can just return the entire list with no searching and if both are non-null we can search using the NodeStatus which should be slightly more efficient.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please be aware that stream.filter kind of patterns have a huge overhead over normal for. If this code is going to be in any sort of critical path, it is better for the code to stay normal for.

Please see some fixes made by todd lipcon, in HDFS because of this issue.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 92 Docker mode activated.
_ Prechecks _
+1 dupname 1 No case conflicting files found.
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 15 new or modified test files.
_ trunk Compile Tests _
0 mvndep 24 Maven dependency ordering for branch
+1 mvninstall 649 trunk passed
+1 compile 405 trunk passed
+1 checkstyle 76 trunk passed
+1 mvnsite 0 trunk passed
+1 shadedclient 932 branch has no errors when building and testing our client artifacts.
+1 javadoc 182 trunk passed
0 spotbugs 499 Used deprecated FindBugs config; considering switching to SpotBugs.
+1 findbugs 750 trunk passed
_ Patch Compile Tests _
0 mvndep 32 Maven dependency ordering for patch
-1 mvninstall 299 hadoop-ozone in the patch failed.
-1 compile 247 hadoop-ozone in the patch failed.
-1 cc 247 hadoop-ozone in the patch failed.
-1 javac 247 hadoop-ozone in the patch failed.
-0 checkstyle 43 hadoop-ozone: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 mvnsite 0 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 shadedclient 739 patch has no errors when building and testing our client artifacts.
-1 javadoc 75 hadoop-hdds generated 20 new + 16 unchanged - 0 fixed = 36 total (was 16)
-1 findbugs 411 hadoop-ozone in the patch failed.
_ Other Tests _
+1 unit 339 hadoop-hdds in the patch passed.
-1 unit 466 hadoop-ozone in the patch failed.
+1 asflicense 40 The patch does not generate ASF License warnings.
6574
Subsystem Report/Notes
Docker Client=19.03.2 Server=19.03.2 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/4/artifact/out/Dockerfile
GITHUB PR #1344
Optional Tests dupname asflicense compile cc mvnsite javac unit javadoc mvninstall shadedclient findbugs checkstyle
uname Linux 2edae08c6f80 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / dacc448
Default Java 1.8.0_212
mvninstall https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/4/artifact/out/patch-mvninstall-hadoop-ozone.txt
compile https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/4/artifact/out/patch-compile-hadoop-ozone.txt
cc https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/4/artifact/out/patch-compile-hadoop-ozone.txt
javac https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/4/artifact/out/patch-compile-hadoop-ozone.txt
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/4/artifact/out/diff-checkstyle-hadoop-ozone.txt
javadoc https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/4/artifact/out/diff-javadoc-javadoc-hadoop-hdds.txt
findbugs https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/4/artifact/out/patch-findbugs-hadoop-ozone.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/4/artifact/out/patch-unit-hadoop-ozone.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/4/testReport/
Max. process+thread count 1247 (vs. ulimit of 5500)
modules C: hadoop-hdds/common hadoop-hdds/server-scm hadoop-hdds/tools hadoop-ozone/integration-test U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/4/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

@sodonnel
Copy link
Contributor Author

The failing unit test passes locally and the integration tests which failed, are flaky, I think. I will push the change to fix the style issue and see how the re-test goes.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 39 Docker mode activated.
_ Prechecks _
+1 dupname 1 No case conflicting files found.
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 15 new or modified test files.
_ trunk Compile Tests _
0 mvndep 65 Maven dependency ordering for branch
+1 mvninstall 609 trunk passed
+1 compile 409 trunk passed
+1 checkstyle 77 trunk passed
+1 mvnsite 0 trunk passed
+1 shadedclient 969 branch has no errors when building and testing our client artifacts.
+1 javadoc 169 trunk passed
0 spotbugs 428 Used deprecated FindBugs config; considering switching to SpotBugs.
+1 findbugs 627 trunk passed
_ Patch Compile Tests _
0 mvndep 32 Maven dependency ordering for patch
+1 mvninstall 541 the patch passed
+1 compile 375 the patch passed
+1 cc 374 the patch passed
+1 javac 374 the patch passed
+1 checkstyle 78 the patch passed
+1 mvnsite 0 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 shadedclient 729 patch has no errors when building and testing our client artifacts.
-1 javadoc 81 hadoop-hdds generated 1 new + 16 unchanged - 0 fixed = 17 total (was 16)
+1 findbugs 720 the patch passed
_ Other Tests _
+1 unit 271 hadoop-hdds in the patch passed.
-1 unit 2091 hadoop-ozone in the patch failed.
+1 asflicense 43 The patch does not generate ASF License warnings.
8200
Reason Tests
Failed junit tests hadoop.ozone.container.TestContainerReplication
hadoop.ozone.scm.TestContainerSmallFile
hadoop.ozone.TestSecureOzoneCluster
hadoop.ozone.om.TestOzoneManagerRestart
hadoop.ozone.om.TestOMRatisSnapshots
hadoop.ozone.client.rpc.TestWatchForCommit
hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures
Subsystem Report/Notes
Docker Client=19.03.1 Server=19.03.1 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/5/artifact/out/Dockerfile
GITHUB PR #1344
Optional Tests dupname asflicense compile cc mvnsite javac unit javadoc mvninstall shadedclient findbugs checkstyle
uname Linux f8e32d81502e 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / c255333
Default Java 1.8.0_222
javadoc https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/5/artifact/out/diff-javadoc-javadoc-hadoop-hdds.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/5/artifact/out/patch-unit-hadoop-ozone.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/5/testReport/
Max. process+thread count 5328 (vs. ulimit of 5500)
modules C: hadoop-hdds/common hadoop-hdds/server-scm hadoop-hdds/tools hadoop-ozone/integration-test U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/5/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 82 Docker mode activated.
_ Prechecks _
+1 dupname 1 No case conflicting files found.
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 15 new or modified test files.
_ trunk Compile Tests _
0 mvndep 80 Maven dependency ordering for branch
+1 mvninstall 639 trunk passed
+1 compile 397 trunk passed
+1 checkstyle 76 trunk passed
+1 mvnsite 0 trunk passed
+1 shadedclient 981 branch has no errors when building and testing our client artifacts.
+1 javadoc 234 trunk passed
0 spotbugs 531 Used deprecated FindBugs config; considering switching to SpotBugs.
+1 findbugs 786 trunk passed
_ Patch Compile Tests _
0 mvndep 44 Maven dependency ordering for patch
+1 mvninstall 723 the patch passed
+1 compile 467 the patch passed
+1 cc 467 the patch passed
+1 javac 467 the patch passed
+1 checkstyle 99 the patch passed
+1 mvnsite 0 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 shadedclient 876 patch has no errors when building and testing our client artifacts.
-1 javadoc 95 hadoop-hdds generated 1 new + 16 unchanged - 0 fixed = 17 total (was 16)
+1 findbugs 788 the patch passed
_ Other Tests _
+1 unit 418 hadoop-hdds in the patch passed.
-1 unit 334 hadoop-ozone in the patch failed.
+1 asflicense 60 The patch does not generate ASF License warnings.
7525
Reason Tests
Failed junit tests hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse
Subsystem Report/Notes
Docker Client=19.03.2 Server=19.03.2 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/6/artifact/out/Dockerfile
GITHUB PR #1344
Optional Tests dupname asflicense compile cc mvnsite javac unit javadoc mvninstall shadedclient findbugs checkstyle
uname Linux 425bc211517e 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 9221704
Default Java 1.8.0_222
javadoc https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/6/artifact/out/diff-javadoc-javadoc-hadoop-hdds.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/6/artifact/out/patch-unit-hadoop-ozone.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/6/testReport/
Max. process+thread count 426 (vs. ulimit of 5500)
modules C: hadoop-hdds/common hadoop-hdds/server-scm hadoop-hdds/tools hadoop-ozone/integration-test U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/6/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

@elek
Copy link
Member

elek commented Sep 19, 2019

LGTM

If I understood well everybody agreed with this approach and AFAIK all of the comments are addressed.

@anuengineer @nandakumar131 please let us now if you have any further comments.

I am planning to commit it tomorrow if no more objections.

I think we can commit it to the trunk, I am not sure if we need a separated branch (let me know if you prefer a feature branch).

  • It's smaller or the same size as the OM HA
  • Complexity is smaller (at least for the existing code base), most of the code will be new and independent.

@elek
Copy link
Member

elek commented Sep 19, 2019

@sodonnel Can you please rebase and push (some of the integration tests are fixed on trunk, we can double check the test results with a new, updated push)

S O'Donnell added 10 commits September 19, 2019 13:31
…tatus) rather than just NodeState. Still WIP with several TODOs
…code to use the new NodeStatus class upto the external boundry where SCMNodeManager interfaces with other components
…state2eventmap so events are only fired for IN_SERVICE transitions.

Prevent updating the nodeStatus directly in NodeStateMap to avoid race conditions. As NodeStatus contains two states, it could lead to lost updates. Instead allow each state to be updated seperately under the write lock.
…d the stateMap from NodeStateMap which is no longer needed
…e NodeStatus and fixed all the compile errors resulting from that
@sodonnel sodonnel force-pushed the HDDS-1982-decom-states branch from cc51696 to 20bef11 Compare September 19, 2019 12:32
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 35 Docker mode activated.
_ Prechecks _
+1 dupname 1 No case conflicting files found.
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 15 new or modified test files.
_ trunk Compile Tests _
0 mvndep 24 Maven dependency ordering for branch
-1 mvninstall 29 hadoop-ozone in trunk failed.
-1 compile 19 hadoop-ozone in trunk failed.
+1 checkstyle 50 trunk passed
+1 mvnsite 0 trunk passed
+1 shadedclient 941 branch has no errors when building and testing our client artifacts.
-1 javadoc 59 hadoop-ozone in trunk failed.
0 spotbugs 233 Used deprecated FindBugs config; considering switching to SpotBugs.
-1 findbugs 30 hadoop-ozone in trunk failed.
_ Patch Compile Tests _
0 mvndep 33 Maven dependency ordering for patch
-1 mvninstall 50 hadoop-ozone in the patch failed.
-1 compile 28 hadoop-ozone in the patch failed.
-1 cc 28 hadoop-ozone in the patch failed.
-1 javac 28 hadoop-ozone in the patch failed.
+1 checkstyle 65 the patch passed
+1 mvnsite 0 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 shadedclient 859 patch has no errors when building and testing our client artifacts.
-1 javadoc 89 hadoop-hdds generated 1 new + 16 unchanged - 0 fixed = 17 total (was 16)
-1 javadoc 58 hadoop-ozone in the patch failed.
-1 findbugs 26 hadoop-ozone in the patch failed.
_ Other Tests _
+1 unit 255 hadoop-hdds in the patch passed.
-1 unit 33 hadoop-ozone in the patch failed.
+1 asflicense 35 The patch does not generate ASF License warnings.
3762
Subsystem Report/Notes
Docker Client=19.03.1 Server=19.03.1 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/7/artifact/out/Dockerfile
GITHUB PR #1344
Optional Tests dupname asflicense compile cc mvnsite javac unit javadoc mvninstall shadedclient findbugs checkstyle
uname Linux ff3109c18cd4 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / d4205dc
Default Java 1.8.0_222
mvninstall https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/7/artifact/out/branch-mvninstall-hadoop-ozone.txt
compile https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/7/artifact/out/branch-compile-hadoop-ozone.txt
javadoc https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/7/artifact/out/branch-javadoc-hadoop-ozone.txt
findbugs https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/7/artifact/out/branch-findbugs-hadoop-ozone.txt
mvninstall https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/7/artifact/out/patch-mvninstall-hadoop-ozone.txt
compile https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/7/artifact/out/patch-compile-hadoop-ozone.txt
cc https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/7/artifact/out/patch-compile-hadoop-ozone.txt
javac https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/7/artifact/out/patch-compile-hadoop-ozone.txt
javadoc https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/7/artifact/out/diff-javadoc-javadoc-hadoop-hdds.txt
javadoc https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/7/artifact/out/patch-javadoc-hadoop-ozone.txt
findbugs https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/7/artifact/out/patch-findbugs-hadoop-ozone.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/7/artifact/out/patch-unit-hadoop-ozone.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/7/testReport/
Max. process+thread count 528 (vs. ulimit of 5500)
modules C: hadoop-hdds/common hadoop-hdds/server-scm hadoop-hdds/tools hadoop-ozone/integration-test U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/7/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

@anuengineer
Copy link
Contributor

@anuengineer @nandakumar131 please let us now if you have any further comments.

I am planning to commit it tomorrow if no more objections.

Let us commit this into a branch, not into Trunk. Thanks

@anuengineer
Copy link
Contributor

@nandakumar131 @elek @swagle Thank you for all the comments and discussion. @sodonnel Thank you for the contribution. I have committed this patch to the HDDS-1880-Decom branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants