-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HDFS-15869. Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can cause the namenode to hang #2737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
…ogSyncNotify can cause the namenode to hang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @functioner for your works here. leave some nit comments. Please take a look. Thanks.
@Override | ||
public void run() { | ||
final LogSyncNotifyThread logSyncNotifyThread = new LogSyncNotifyThread(); | ||
logSyncNotifyThread.start(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
single thread to push the result back to client here, I think it is safe to trigger multi-thread to do that. right?
EditSyncEx editSyncEx = logSyncNotifyQ.take(); | ||
editSyncEx.edit.logSyncNotify(editSyncEx.ex); | ||
} | ||
} catch(InterruptedException ie) {} // just swallow it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
catch InterruptedException and no handle here? does it cause client never get result? IMO, we should set stopped = true here.
Thanks @Hexiaoqiao for your review. I have added the multi-threaded executor for it. For the exception handling part, I observe the original semantics of Lines 379 to 387 in aaedc51
and SyncEdit Lines 347 to 353 in aaedc51
Therefore, we don't need to add any extra handling mechanism. I have checked the test of |
@functioner would you mind to add unit test to cover this improvement? |
I'm writing a unit test to cover this improvement. After I read the test cases of However, if we use this design, the test needs to access some private methods and classes in |
💔 -1 overall
This message was automatically generated. |
@@ -63,6 +68,9 @@ | |||
DFS_NAMENODE_EDITS_ASYNC_LOGGING_PENDING_QUEUE_SIZE_DEFAULT); | |||
|
|||
editPendingQ = new ArrayBlockingQueue<>(editPendingQSize); | |||
|
|||
// the thread pool size should be configurable later, and justified with a rationale | |||
logSyncNotifyExecutor = Executors.newFixedThreadPool(10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to make this improvement be more configurable to use. Can we make thread pool size be configurable in this PR? if the pool size is configured as 0, that means this improvements is disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. What should be the default value? Many users use the default value, so probably we shouldn't set it as 0 by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@functioner , we could make 10 as the default pool size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@functioner any plan to push this improvement forward?
@@ -117,6 +125,7 @@ void openForWrite(int layoutVersion) throws IOException { | |||
public void close() { | |||
super.close(); | |||
stopSyncThread(); | |||
logSyncNotifyExecutor.shutdown(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could meet issue when transition active to standby then transition back here. Because executor has been shutdown and no response will be sent. Moreover I suspect that namenode process could fatal in this case. In my opinion, we do not need to shutdown this executor. FYI. Thanks.
@Hexiaoqiao Thanks for the comment, I removed I think we also need to add some comments in the code to show why we don't shutdown this executor, in case some developers may get confused, because at the first glance it's also normal for people to think that this executor should shutdown on Furthermore, I think your argument implies that the |
+1.
IMO, it is true that |
💔 -1 overall
This message was automatically generated. |
@functioner after review code, comment from Yiqun seems not resolved yet, do you mind to make thread pool size be configurable? |
@Hexiaoqiao thanks for the reminder. I've added:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @functioner for your works. LGTM, +1 from my side. Let's wait for what Yetus says.
Ping @linyiqun do you mind to take another reviews?
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
@functioner would you mind to submit addendum blank commit to trigger CI again. Just notice that there are many failed unit tests but it seems not related to this PR(not very sure), Let's wait what Yetus says. |
DFSConfigKeys.DFS_NAMENODE_EDITS_ASYNC_LOGSYNCNOTIFY_EXECUTOR_SIZE, | ||
DFSConfigKeys. | ||
DFS_NAMENODE_EDITS_ASYNC_LOGSYNCNOTIFY_EXECUTOR_SIZE_DEFAULT); | ||
logSyncNotifyExecutor = Executors.newFixedThreadPool(logSyncNotifyExecutorSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think that we would be better to shutdown logSyncNotifyExecutor in close method and re-initialize logSyncNotifyExecutor in FSEditLogAsync#openForWrite function. Every time NN does the failover, it will invoke startActiveServices method and FSEditLogAsync#openForWrite will be executed. Related code: FSNamesystem.java#L1373
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @linyiqun for your comments. It makes sense to me.
@functioner @Hexiaoqiao , I have another comment for logSyncNotifyExecutor close, please see my above comment. |
The discussion on PR #2878 and HDFS-15957 sounds to be around the same concept. By definition, if there is a "network issue", the NameNode in general because all RPCs will be problematic. So what makes that `call.sendResponse(); call get a special treatment compared RPC in the NameNode?
-1 on this patch. |
@amahussein Thanks for the comment. |
@amahussein Thanks for your quick response. I think this is not the same concept/issue between HDFS-15957 and HDFS-15869. I have leave comment at HDFS-15869 and suggest to change it to |
Thanks @Hexiaoqiao Hexiaoqiao for the comment.
Thanks @functioner ! I really appreciate that. |
@amahussein A common way to eliminate such overhead is preparing multiple consumer threads, and feed them with requests. |
@amahussein I still would like to argue about this "hanging" issue. There has been reported TCP network I/O issues which hangs for >15min without throwing any exception. ZOOKEEPER-2201 is a perfect example, and you can find the TCP level explanation for this hanging issue in https://www.usenix.org/conference/srecon16/program/presentation/nadolny
However, in our scenario (HDFS-15869), a possible counterargument is: the hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java Lines 3607 to 3616 in 3c57512
However, as we point out in HDFS-15869, when the payload is huge, in line 3611, it won't invoke channel.write(buffer) ; instead, it invokes channelIO(null, channel, buffer) which brings us to:hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java Lines 3646 to 3672 in 3c57512
If the payload is split in two batches, the second batch will have to wait for the first batch to be sent out, which may encounter high packet loss rate and thus slow I/O. Hence, I would say the hanging problem still exists. |
Another aspect of the argument is the design of availability and fault tolerance. Actually distributed systems can tolerate such hanging issues in many scenarios, but sometimes it's seen as a bug like ZOOKEEPER-2201. So an important question is: when it's a bug; and when it's not (i.e., it's a feature) I've been doing research on fault injection for some time and I have submitted multiple bug reports accepted by the open source community (e.g., HADOOP-17552). My criteria for determining whether it is bug, are:
In our scenario (HDFS-15869), this possible hanging (if you agree with my argument of network hanging) can block the So, The network service should be responsible for all its behaviors, and handle all the possible network issues (e.g., IOException, disconnection, hanging). It should determine how to handle them, e.g., by logging the error, rather than affecting other services like I'm not saying that we have to use a complete and slow RPC framework for this network service. But IMO, decoupling it from |
Thanks @functioner
Now, I am really even more confused about the (Bug Vs. Improvement). So, I am going to pass on reviewing.
This is not what I meant. It is recommended to avoid use of lambda expressions in hot execution paths. |
I see. I will make a comment of summary in Jira after the discussion in this PR is finalized.
Actually we are on the same page. Maybe my comment has made some confusion.
That's exactly what I meant. I will push a commit soon. |
@amahussein Thanks for your feedback, and your time! It's not a big deal whether it's marked as bug or improvement. One of my bug reports (HADOOP-17552) is also finally marked as improvement rather than bug. The point is that the developers (in HADOOP-17552) finally realized that there's a potential hanging issue as I point out, and the patch (as well as the relevant discussion) is very helpful for the developers and the users.
My intention is to defend that we should not remove the references to "hanging" problems. In short, the discussion above can be summarized into 3 arguments:
Both Argument 3 and Argument 1 can be resolved with this patch. In conclusion, this patch not only improves the performance, but also enhances the availability & fault-tolerance. So, I think the references to "hanging" problems should not be removed. If it keeps "Improvement" tag instead of "Bug" tag, that's fine. P.S. I will summarize our discussion with a comment in Jira after we reach a consensus. |
No, it's a non-blocking write so by definition it will never hang – unless induced by fault injection. |
@daryn-sharp I have considered this counterargument in #2737 (comment), where I proposed another argument that it may hang when it's a huge payload, because there's a while-loop. Please take a look. Yes, I'm not sure whether it can defend my argument. Can you provide more explanation? Maybe I'm not correct. Thanks! |
@functioner Can you move forward with this MR? I can help with review. If you don't have time, I can resign this ticket to other contributor. |
@ZanderXu According to the discussion so far, Daryn has some doubt on the issue and patch. Do you have any idea to move forward the case? |
Thanks for your works and discussions for this problem. I spent a long time to catch your ideas and concerns😂, it's so hard. I have some throughs and questions about this ticket. Some questions in HDFS-15869:
Some throughs in HDFS-15869: Actually, I encountered this problems in our prod environment that the thread
About "Bug" or "Improvement", I think it should be a performance improvement, since all processes are worked as expected, no blocking or hanging, just slow. Some throughs in HDFS-15957:
@functioner Looking forward your ideas and confirm. @daryn-sharp @Hexiaoqiao @linyiqun @amahussein Looking forward your ideas. I hope we can push this ticket forward. |
If I missed some other concerns, please let me know, we can find solutions together to push this ticket forward. |
I propose a fix for HDFS-15869.