-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HDFS-17232. RBF: Fix NoNamenodesAvailableException for a long time, when use observer. #6208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a quick scan and added a few minor comments. I've limited bandwidth til mid Nov for an in-depth review.
nsId, listObserversFirst, firstNamenodeId, namenode.getNamenodeId()); | ||
return rotatedNnContexts; | ||
|
||
// If the last namenode in the cache at this time is the namenode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the namenode
.
Should this be expanded to is the namenode whose priority needs to be lowered
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
return namenodeContexts; | ||
|
||
// Move the abnormal namenode to the end of the cache, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
abnormal
-> inaccessible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for your advice, done
return namenodeContexts; | ||
|
||
// Move the abnormal namenode to the end of the cache, | ||
// to ensure that the current namenode will not be accessed first next time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the current namenode here the same as the abnormal
one in the line above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I have modified the comment
...-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcClient.java
Show resolved
Hide resolved
} | ||
|
||
/** | ||
* Shutdown oberver namenode in the cluste. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo cluste
-> cluster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
/** | ||
* Shutdown oberver namenode in the cluste. | ||
* | ||
* @param num The number of shutdown oberver. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo oberver
-> observer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Thanks for reviewing this pr! |
🎊 +1 overall
This message was automatically generated. |
@@ -772,13 +779,22 @@ public static boolean isUnavailableException(IOException ioe) { | |||
* Check if the cluster of given nameservice id is available. | |||
* | |||
* @param nsId nameservice ID. | |||
* @param namenode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Complete this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
if (nameNode != null && nameNode.isObserverState()) { | ||
cluster.getCluster().shutdownNameNode(nnIndex); | ||
num--; | ||
if (num == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just do it in the for loop?
int numNns = cluster.getNamenodes().size();
for (nnIndex = 0; nnIndex < numNns && num > 0; nnIndex++) {
It is also a little weird to modify an input parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your advice, done
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just one small comment to make documentation easier to understand.
// Use observer and the namenode that causes the exception is an observer, | ||
// false is returned so that the observer can be marked as unavailable,so other observers | ||
// or active namenode which is standby in the cache of the router can be retried. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is a bit confusing to read. Maybe the following could make it clearer
If the operation is an observer read and the namenode that caused the exception is an observer ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your advice.
public FileSystem getFileSystemWithConfiguredFailoverProxyProvider() throws IOException { | ||
conf.set(DFS_NAMESERVICES, | ||
conf.get(DFS_NAMESERVICES)+ ",router-service"); | ||
conf.set(DFS_HA_NAMENODES_KEY_PREFIX + ".router-service", "router1"); | ||
conf.set(DFS_NAMENODE_RPC_ADDRESS_KEY+ ".router-service.router1", | ||
getFileSystemURI().toString()); | ||
conf.set(HdfsClientConfigKeys.Failover.PROXY_PROVIDER_KEY_PREFIX | ||
+ "." + "router-service", ConfiguredFailoverProxyProvider.class.getName()); | ||
DistributedFileSystem.setDefaultUri(conf, "hdfs://router-service"); | ||
|
||
return DistributedFileSystem.get(conf); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getFileSystemWithConfiguredFailoverProxyProvider
and getFileSystemWithObserverReadProxyProvider()
share the same configuration, beside the proxy provider. You can separate shared config these one into a separate function, then just set the proxy provider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@goiri @simbadzina If no more comments here, please help merge it, thanks! |
Merged the change. Thanks for the contribution @KeeProMise |
…hen use observer. (apache#6208)
Description of PR
see https://issues.apache.org/jira/browse/HDFS-17232
How was this patch tested?
Added new unit test class TestNoNamenodesAvailableLongTime.
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?