-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HDFS-17050. Erasure coding: fix bug for invalidating duplicated block when two ec block at the same datanode but different storage. #5753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… when two ec block at the same datanode but different storage.
💔 -1 overall
This message was automatically generated. |
Great catch here! I think @zhangshuyan0 should be more proper to involve to take review. |
@Hexiaoqiao Sir, thanks a lot for your reply, current modification seems to have some problems and cause the failed UT. |
💔 -1 overall
This message was automatically generated. |
@@ -195,11 +195,11 @@ public void run() { | |||
|
|||
if (extraSleepTime > warnThresholdMs) { | |||
++numGcWarnThresholdExceeded; | |||
LOG.warn(formatMessage( | |||
LOG.debug(formatMessage( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to be changed to debug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhangshuyan0 Hi, shuyan, thanks for your reviewing. Here is my fault, i modify this to disable JVMPause log in my loal. I will fix it.
if (BlockIdManager.convertToStripedID(blockInfo.getBlockId()) == blockGroupId) { | ||
Block blockOnOldStorage = ((BlockInfoStriped) blockInfo).getBlockOnStorage(old); | ||
if (blockOnOldStorage.getBlockId() != reportedBlock.getBlockId()) { | ||
blockIdNotEquals = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm a bit confused here. We retrieve old
using index
, How can blockOnOldStorage.getBlockId()
and reportedBlock.getBlockId()
be different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhangshuyan0 Thanks shuyan for your reviewing, this PR still has some problems, I will fix it soonly and push it. I think it is efficent for us to disscuss after then.
final StoredReplicaState s; | ||
if (storage.getState() == State.NORMAL) { | ||
final DatanodeDescriptor node = storage.getDatanodeDescriptor(); | ||
if (nodesCorrupt != null && nodesCorrupt.contains(node)) { | ||
if (nodesCorrupt != null && nodesCorrupt.contains(node) && | ||
(alreadyCorrupt == null || !alreadyCorrupt.contains(node))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this new added condition mean? I think the current implement of corruptReplicasMap
can not distinguish between different internal blocks on the same datanode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the problem makes sense, I am not able to decode the solution though in one pass and I saw there is a comment saying there is some problem in the current solution, so would circle back once it is ready for review state.
but if the problem is for real, then it is indeed a nice catch :)
* <p> | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* <p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't play with the licence
Sir, thanks for your replying. Yes, current solution still have some problems. I will fix it soonly and we can discuss then. |
Have moved to HDFS-17071. |
Description of PR
Please see HDFS-17050.
How was this patch tested?
add a UT