-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
YARN-11122. Support getClusterNodes API in FederationClientInterceptor #4274
Conversation
Finish Juint Test With getClusterNodes
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Hi, @goiri , @szilard-nemeth Could you kindly help in reviewing this PR? Thanks. |
@@ -74,7 +76,9 @@ public final class RouterMetrics { | |||
@Metric("Total number of successful Retrieved getClusterMetrics and " | |||
+ "latency(ms)") | |||
private MutableRate totalSucceededGetClusterMetricsRetrieved; | |||
|
|||
@Metric("Total number of successful Retrieved getClusterNodes and " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think with the new checkstyle, you could make this one line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be a line indeed, I'll fix it.
import java.util.Collection; | ||
import java.util.HashMap; | ||
import java.util.Map; | ||
import java.util.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid messing too much with the imports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reminder, I'll fix it.
public static GetClusterNodesResponse mergeClusterNodesResponse( | ||
Collection<GetClusterNodesResponse> responses) { | ||
GetClusterNodesResponse clusterNodesResponse = Records.newRecord( | ||
GetClusterNodesResponse.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check the checkstyle report from Yetus.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I will check yetus checkstyle report.
💔 -1 overall
This message was automatically generated. |
@goiri Thank you very much for your help in reviewing the code. I see that there is 1 Junit Test Error, which will report an error in org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST. The related problem is caused by the patch HADOOP-18167. This patch It will cause DelegationTokenSecretManagerMetrics to be registered twice. For solution discussion, please refer to HADOOP-18222, PR-4266. |
HADOOP-18222 Fixed, I try to rerun Junit Test here. |
🎊 +1 overall
This message was automatically generated. |
@goiri Please help to review the code again, thanks! |
@@ -112,6 +116,10 @@ private RouterMetrics() { | |||
getClusterMetricsLatency = | |||
registry.newQuantiles("getClusterMetricsLatency", | |||
"latency of get cluster metrics", "ops", "latency", 10); | |||
|
|||
getClusterNodesLatency = | |||
registry.newQuantiles("getClusterNodesLatency", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yetus seems not to be running the checkstyle.
Can you run it by hand and fix indentation?
CC @ayushtkn any idea on why the checkstyle is not being ran?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no problem, I'll fix the indentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It did ran @goiri for the latest 14th build, for the 5th the results aren't there now
Branch:
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4274/14/artifact/out/buildtool-branch-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-router.txt
Patch:
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4274/14/artifact/out/buildtool-patch-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-router.txt
Both have same number hence 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, the Yetus report is not showing and is giving a +1.
@@ -790,8 +791,32 @@ <R> Map<SubClusterId, R> invokeConcurrent(Collection<SubClusterId> clusterIds, | |||
|
|||
@Override | |||
public GetClusterNodesResponse getClusterNodes(GetClusterNodesRequest request) | |||
throws YarnException, IOException { | |||
throw new NotImplementedException("Code is not implemented"); | |||
throws YarnException, IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I will fix it
throws YarnException, IOException { | ||
if (request == null) { | ||
routerMetrics.incrClusterNodesFailedRetrieved(); | ||
RouterServerUtil.logAndThrowException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Single line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I will fix it.
Map<SubClusterId, SubClusterInfo> subclusters = | ||
federationFacade.getSubClusters(true); | ||
Map<SubClusterId, GetClusterNodesResponse> clusterNodes = Maps.newHashMap(); | ||
for (Map.Entry<SubClusterId, SubClusterInfo> entry : subclusters.entrySet()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't use the values. We shouldn't use entrySet() but keys()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you, good advice, keysets should be used.
...test/java/org/apache/hadoop/yarn/server/router/clientrm/TestFederationClientInterceptor.java
Outdated
Show resolved
Hide resolved
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@goiri Please help to review the code again, I have fixed the indentation issue and added a test for metrics,thank you very much! |
...yarn-server-router/src/test/java/org/apache/hadoop/yarn/server/router/TestRouterMetrics.java
Show resolved
Hide resolved
...yarn-server-router/src/test/java/org/apache/hadoop/yarn/server/router/TestRouterMetrics.java
Show resolved
Hide resolved
public void testGetClusterNodesFailed() { | ||
long totalBadBefore = metrics.getClusterNodesFailedRetrieved(); | ||
badSubCluster.getClusterNodes(); | ||
Assert.assertEquals(totalBadBefore + 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I will fix the code here.
GetClusterNodesResponse response = | ||
interceptor.getClusterNodes(GetClusterNodesRequest.newInstance()); | ||
Assert.assertEquals(subClusters.size(), | ||
response.getNodeReports().size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will pay attention to related issues in the follow-up pr, your suggestions are very valuable, thank you very much.
💔 -1 overall
This message was automatically generated. |
@ayushtkn I found a problem(with JDK11), that is, OOM occurs when Maven compiles, follow as:
This problem seems to cause mvninstall, compile, javac errors. |
🎊 +1 overall
This message was automatically generated. |
Hi, @goiri I need your help to review the code, the relevant unit tests have passed, thank you!!! |
@goiri Thank you very much for helping to review the code, can you help merge to trunk branch? I will continue to follow up the related PR of YARN-10465. |
Description of PR
[Yarn-11122] Support getClusterNodes In Federation architecture (https://issues.apache.org/jira/browse/YARN-11122)
In YARN-10465, some methods have been implemented, from a personal point of view, some doubts.
Question 1: Without considering the metric implementation, it is impossible to understand the execution of related functions.
Question 2: Using multi-threading and reflection implementation, the readability of the related logic is relatively poor, and there is not much performance difference between this method and the conventional loop method to obtain the theory.
Question 3: The code is already 2 years old, merged into the local branch, there may be conflicts.
How was this patch tested?
Added Junit Test to test normal requests and NULL requests, all as expected
For code changes:
1.Use the loop method to call the getClusterNodes method of the subcluster one by one
2.Complete the Metric indicator counting function
3.Adapt to branch2.10 & branch3.3