-
Notifications
You must be signed in to change notification settings - Fork 242
Refine the message handler error handling logic to avoid unnecessary retry. #1500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
helix-core/src/main/java/org/apache/helix/messaging/handling/HelixTaskExecutor.java
Outdated
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/messaging/handling/HelixTaskExecutor.java
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/messaging/handling/HelixTaskExecutor.java
Outdated
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/messaging/handling/HelixTaskExecutor.java
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/messaging/handling/HelixTaskExecutor.java
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/messaging/handling/HelixTaskExecutor.java
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/messaging/handling/HelixTaskExecutor.java
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/messaging/handling/HelixTaskExecutor.java
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/messaging/handling/HelixTaskExecutor.java
Show resolved
Hide resolved
if the retry counts is large (say people put -1 instead 0), this is effectively retry forever. How can we make sure this is not going to happen? |
Let say later the participant resolve issue and can create the messageHandler for this partition replica. Then does it mean when they recover, they need to manually (or via rest) to remove this UNPROCESSED message? If so, do we have this REST API? |
@kaisun2000 What do you mean? If it is -1, then the current logic will try once and stop retrying. And according to what Junkai mentioned, I would like to check for the count even before the first try. So that should not be the case, right?
This is the plan. We have discussed this in the previous standup meeting. That task will be addressed separately. Before it is ready, we will require the application to reset the participant. |
Let us say if the message have a retry count of INT_MAX, this is not going to work. I think we can be a little bit more conservative that if the message have a retry count larger than a threshold, we just retry the threshold value.
|
@kaisun2000 , if the user explicitly sets the retry to be infinite, we shall not block it, right? Note that the retry won't cause any Helix controller issue, since it is only retried on the participant side. For ZK servers, yeah, it will lead to many ZK writes potentially. But I think that is a problem at a different level, ZK writes throttling for example. |
#1487 only makes it by default not logging to ZK. It can still log to ZK with some configuration. I think if you cap the retry count to some fixed value say 100 would be safer. Just my 2 cents. Up to you. |
junkaixue
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, LGTM. Good PR for cleaning up.
|
Approved by @dasahcc , I have added the comment as suggested. I will merge the PR shortly. |
Issues
Resolve #1488
Description
This PR enhances the message event processing logic to prevent silent failure and unnecessary retry.
Tests
TestHelixTaskExecutor
[WARNING] Tests run: 1237, Failures: 0, Errors: 0, Skipped: 1
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 02:00 h
[INFO] Finished at: 2020-10-29T22:59:19-07:00
[INFO] ------------------------------------------------------------------------
Documentation (Optional)
(Link the GitHub wiki you added)
Commits
Code Quality
(helix-style-intellij.xml if IntelliJ IDE is used)