Skip to content

[IOTDB-4140] Mark the datanode as removing status when execute remove-datanode.sh#7008

Merged
qiaojialin merged 10 commits intomasterfrom
beyyes/remove_datanode_precedure
Aug 19, 2022
Merged

[IOTDB-4140] Mark the datanode as removing status when execute remove-datanode.sh#7008
qiaojialin merged 10 commits intomasterfrom
beyyes/remove_datanode_precedure

Conversation

@Beyyes
Copy link
Member

@Beyyes Beyyes commented Aug 16, 2022

Description

When executed remove-datanode.sh TARGET_NODE, the node status of TARGET_NODE should be marked as REMOVING, and the LatestRegionRouteMap should be broadcasted to assure that no more read/write requests are forwarded to this datanode.

Running screenshots

1). 1 confignode and 4 datanodes.
image
image

2). after execute remove-datanode.sh 127.0.0.1:6669

In REMOVE_DATA_NODE_PREPARE state of RemoveDataNodeProcedure,
the region map order is [3,1,2] rather than [2,3,1]
image

When remove-datanode.sh is executed completely.
image
image

  1. The removing status of show cluster.
    image

The mainly changed are this method.
image

TODO:
change the datanode status when RemoveDataNodeProcedure is completed

@coveralls
Copy link

coveralls commented Aug 16, 2022

Coverage Status

Coverage decreased (-0.01%) to 41.988% when pulling 2ef345b on beyyes/remove_datanode_precedure into 544c60f on master.

Copy link
Member

@wangchao316 wangchao316 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

return false;
}

public void setNodeRemoving(int dataNodeId) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: add a interface, set node removing is false.
if procedure failed and rollback.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change method from setNodeRemovingStatus(int dataNodeId) to setNodeRemovingStatus(int dataNodeId, boolean isRemoving)

Copy link
Contributor

@CRZbulabula CRZbulabula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL. Besides the comments above, we should better special treat the updateLoadStatistic() operation on the Removing DataNode. Because they might trigger newer broadcast of RegionRouteMap, which will replace previous broadcast map.

// TODO: This class might be split into DataNodeCache and ConfigNodeCache
// when the response time of heartbeat is more than 20s,
// the datanode is considered as down
private static final int HEARTBEAT_TIMEOUT_TIME = 20_000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HEARTBEAT_TIMEOUT_TIME can be moved to INodeCache since it's also used in ConfigNodeHeartbeatCache

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed. and I think changing INodeCache from Interface to Abstract class is better. Because some variables are common used.

Comment on lines 535 to 538
int leaderDataNodeId = regionGroupCache.getLeaderDataNodeId();
if (!configManager.getNodeManager().isNodeRemoving(leaderDataNodeId)) {
result.put(consensusGroupId, leaderDataNodeId);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd better put -1 instead of nothing when the leaderNode happens to be in the status of Removing. This might avoid NPE.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 551 to 555
(consensusGroupId, regionGroupCache) -> {
int leaderDataNodeId = regionGroupCache.getLeaderDataNodeId();
if (!configManager.getNodeManager().isNodeRemoving(leaderDataNodeId)) {
result.put(consensusGroupId, leaderDataNodeId);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd better put -1 instead of nothing when the leaderNode happens to be in the status of Removing. This might avoid NPE.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Contributor

@CRZbulabula CRZbulabula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM~

@sonarqubecloud
Copy link

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 20 Code Smells

25.5% 25.5% Coverage
0.9% 0.9% Duplication

@qiaojialin qiaojialin merged commit a68377e into master Aug 19, 2022
@qiaojialin qiaojialin deleted the beyyes/remove_datanode_precedure branch August 19, 2022 01:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants