-
Notifications
You must be signed in to change notification settings - Fork 9.1k
YARN-11183. Federation: Remove outdated ApplicationHomeSubCluster in … #4450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
…federation state store.
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
merge trunk
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
.add(ApplicationHomeSubCluster.newInstance(e.getKey(), e.getValue())); | ||
if (subClusterId == null || subClusterId.equals(e.getValue())) { | ||
result.add( | ||
ApplicationHomeSubCluster.newInstance(e.getKey(), e.getValue())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can e.getValue() be null?
@zhengchenyu thanks for the work, could you rebase this? @slfan1989 can you take a look? |
@goiri I will look at this pr code soon. |
@zhengchenyu Thank you very much for your contribution! From my personal point of view, I think adding AsyncDispatcher and defining a state machine to implement functions is a bit over-engineered. @goiri Thank you very much for inviting me to help review the code! I read this pr code carefully and I found that the function described by this pr has been implemented, The PRs involved are as follows: In YARN-11290, we improved the
In YARN-11323, we optimized the method of cleaning expired application data. More detailed information can be found in (#4954) From my personal point of view, I think YARN-11290 and YARN-11323 are enough, welcome to continue the discussion. |
@slfan1989 I read the PR #4954, it is lightweight way. But Why I not choose this way? removeApplicationIdFromStateStore(removeId) will connect to zookeeper. If zookeeper degrade, complete application may block the rmDispatcher. I think any operation which connect to 3rd store must do in async mode, It is why the statemachine of ZKRMStateStore is introduced.
|
@zhengchenyu Thank you very much for your answer, from my personal point of view, I think the |
@slfan1989 In AsyncDispatcher event execute in sequence. If one operation is slow, the size of queue will increase, resourcemanager will work in bad performance. We can't do any high cost operation in "RM Event dispatcher". The difference is below code. I think here should do in async mode Line 365 in a48e8c9
@szilard-nemeth @goiri We have some difference, can you give us some suggesstion? |
Description of PR
https://issues.apache.org/jira/browse/YARN-11183
Nowadays, ApplicationHomeSubCluster in federation state store can't be removed automatically.
The key is when remove ApplicationHomeSubCluster. What's the end of ApplicationHomeSubCluster's lifecycle?
We know when called getApplicationReport, we must know ApplicationHomeSubCluster. It may be the last possible to get this application's home subcluster. So I think we should remove ApplicationHomeSubCluster when the application is removed from resourcemanager's memory.
How was this patch tested?
unit test and test on our cluster.
For code changes:
Here I add AsyncDispatcher for handle federation state store. Two key point:
When application is removed from RM, will delete ApplicationHomeSubCluster from federation state store.
When ResourceManager start, will check all application whether is removed from resourcemanager memory.