-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HDFS-17223. Add journalnode maintenance node list #6183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
|
||
int quorumThreshold = (this.loggers.size() / 2) + 1; | ||
Preconditions.checkArgument( | ||
(this.loggers.size() - skipNodesHostPort.length) >= quorumThreshold, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of creating a QJPFakedTranslatorPB.java, can we just remove these maintenance JNs from this.loggers
array? We can also make quorumThreshold
a class member to remember the minimum number of available JNs and the minimum responses we need to commit a write to HDFS.
e2fa107
to
471585f
Compare
💔 -1 overall
This message was automatically generated. |
11b8fd2
to
fc74e49
Compare
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
fc74e49
to
2ed5c35
Compare
💔 -1 overall
This message was automatically generated. |
Thank you very much for your review, @xinglin . Based on your suggestions, I have made adjustments to the relevant logic and removed the QJPFakedTranslatorPB class. I look forward to your review again. Additionally, the test failure in hadoop.hdfs.TestDFSUtil is related to #6249. I will rebase once that issue is resolved. |
2ed5c35
to
5f415d7
Compare
@@ -1982,4 +1983,28 @@ public static void addTransferRateMetric(final DataNodeMetrics metrics, final lo | |||
LOG.warn("Unexpected value for data transfer bytes={} duration={}", read, duration); | |||
} | |||
} | |||
|
|||
/** | |||
* Retrieve InetSocketAddress set by ip port string array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: -> "Construct a HostSet from an array of "ip:port" strings.
* @param nodesHostPort ip port string array. | ||
* @return HostSet of InetSocketAddress. | ||
*/ | ||
public static HostSet convertHostSet(String[] nodesHostPort) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: convertHostSet -> getHostSet() or ConvertToHostSet()
for (String hostPort : nodesHostPort) { | ||
try { | ||
URI uri = new URI("dummy", hostPort, null, null, null); | ||
int port = uri.getPort() == -1 ? 0 : uri.getPort(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it appropriate here? It seems 0 is a valid port but -1 should indicate port is not set?
A valid port value is between 0 and 65535. A port number of zero will let the system pick up an ephemeral port in a bind operation.
@@ -62,6 +66,7 @@ | |||
import org.apache.hadoop.classification.VisibleForTesting; | |||
import org.apache.hadoop.thirdparty.com.google.common.base.Joiner; | |||
import org.apache.hadoop.util.Preconditions; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this.
72216c2
to
711aa42
Compare
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
try { | ||
URI uri = new URI("dummy", hostPort, null, null, null); | ||
int port = uri.getPort(); | ||
if (port == -1 || port == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
port=0 is valid. I guess we should probably just check for -1 here?
/**
* Returns the port number of this URI.
*
* <p> The port component of a URI, if defined, is a non-negative
* integer. </p>
*
* @return The port component of this URI,
* or {@code -1} if the port is undefined
*/
public int getPort() {
return port;
}
@@ -144,7 +149,14 @@ public QuorumJournalManager(Configuration conf, | |||
this.uri = uri; | |||
this.nsInfo = nsInfo; | |||
this.nameServiceId = nameServiceId; | |||
this.loggers = new AsyncLoggerSet(createLoggers(loggerFactory)); | |||
this.loggers = new AsyncLoggerSet(createLoggers(loggerFactory), this.quorumJournalCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this.quorumJournalCount is set inside createLoggers(). Can we separate them as two separate steps, like the following?
// Keep this in comment:
// createLoggers() will set quorumJournalCount to total number of journal nodes while return a list of healthy/good journal nodes.
List<AsyncLogger> loggers = createLoggers(loggerFactory);
this.loggers = new AsyncLoggerSet(loggers, this.quorumJournalCount);
@@ -250,6 +262,9 @@ Map<AsyncLogger, NewEpochResponseProto> createNewUniqueEpoch() | |||
|
|||
@Override | |||
public void format(NamespaceInfo nsInfo, boolean force) throws IOException { | |||
if (isEnableJnMaintenance()) { | |||
throw new IOException("format() does not support enabling jn maintenance mode"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
isEnableJnMaintenance(
) -> isJNMainteanceEnabled()
or should it actually be IsJNInMaintenanceMode()
?
"format() does not support enabling jn maintenance mode" -> "Formatting a journal node is not supported while in JN maintenance mode"
String jid = parseJournalId(uri); | ||
for (InetSocketAddress addr : addrs) { | ||
if(skipSet.match(addr)) { | ||
LOG.info("The node {} is a maintenance node and will skip initialization.", addr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "will skip initialization" -> "will be skipped"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the changes! Overall LGTM, with minor comments.
@@ -667,6 +700,9 @@ AsyncLoggerSet getLoggerSetForTests() { | |||
|
|||
@Override | |||
public void doPreUpgrade() throws IOException { | |||
if (isEnableJnMaintenance()) { | |||
throw new IOException("doPreUpgrade() does not support enabling jn maintenance mode"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: doPreUpgrade() does not support enabling jn maintenance mode -> doPreUpgrade() is not support while in jn maintenance mode
@@ -684,6 +720,9 @@ public void doPreUpgrade() throws IOException { | |||
|
|||
@Override | |||
public void doUpgrade(Storage storage) throws IOException { | |||
if (isEnableJnMaintenance()) { | |||
throw new IOException("doUpgrade() does not support enabling jn maintenance mode"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here.
@@ -701,6 +740,9 @@ public void doUpgrade(Storage storage) throws IOException { | |||
|
|||
@Override | |||
public void doFinalize() throws IOException { | |||
if (isEnableJnMaintenance()) { | |||
throw new IOException("doFinalize() does not support enabling jn maintenance mode"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here. and for others functions.
to be restarted, it will try the downed journal node through the lengthy RPC retry mechanism, | ||
resulting in a long initialization time for the Namenode to provide services. By adding the | ||
downed journal node to the maintenance nodes, the initialization time of the Namenode in such | ||
scenarios can be accelerated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
->
In the case that one out of three journal nodes being down, theoretically HDFS can still
function. However, in reality, the unavailable journal node may not recover quickly. During this period, when we need to restart an Namenode, the Namenode will try to connect to the unavailable journal node through the lengthy RPC retry mechanism,
resulting in a long initialization time for the Namenode. By adding these
unavailable journal nodes to the maintenance nodes, we will skip these unavailable journal nodes during Namenode initialization and thus reduce namenode startup time.
1-node example values: <>
2-node example values: <>
|
||
@Test | ||
public void testGetHostSet() { | ||
String[] testAddrs = new String[] {NS1_NN_ADDR, NS1_NN1_ADDR}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test case is a bit confusing. Can we just use "unreachable-host1.com:9000" instead?
assertThrows(IllegalArgumentException.class, () -> createSpyingQJM()); | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume we also need a test case to to verify maintenance journal nodes are indeed excluded? total JN=3, with 1 is excluded. after we initialize, verify only 2 JNs are included.
711aa42
to
dc94c93
Compare
💔 -1 overall
This message was automatically generated. |
dc94c93
to
1046925
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! thanks for iterating over this.
@@ -1171,6 +1179,47 @@ public void testSelectViaRpcAfterJNRestart() throws Exception { | |||
} | |||
} | |||
|
|||
/** | |||
* Tests to throw an exception if the jn maintenance nodes exceeds half of the journalnode number. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
"if the jn maintenance nodes exceeds half of the journalnode number." -> "when more than half of the journal nodes are in maintenance mode".
🎊 +1 overall
This message was automatically generated. |
1046925
to
8c5e2b3
Compare
💔 -1 overall
This message was automatically generated. |
@gp1314, can you try not force-push next time? Just push incremental commits so that I can review the new change. thanks, |
@xinglin , thank you very much for code review and suggestions, I will pay attention next time. |
@Hexiaoqiao , could you help to review this improvement in the code? If it's feasible, can it be merged into the trunk branch? I would be very grateful. |
Thanks @gp1314 and @xinglin for your works. I am not very sure to get the total purpose here.
Do you mean that NameNode restart will cost extra over 30~40 minutes while 1/3 JN could not be available? It is interesting where it costs? IIUC, The majority JN work well, it will connect and interact well.
I used to maintain JNs online one by one, but didn't meet timeout at NameNode side. Not sure what different between them, one point is the version could have some differences(our version is based on 2.7.1 with some internal improvement.) Thanks again. Please correct me if I missed something. |
My understanding is a similar issue is happening here as what I tried to fix in HDFS-17030: when a JN is not responsive (either it is down or it hangs), the starting NN would try to connect to it anyway with retries. Thus, it would wait for |
Unfortunately, I didn't reproduce the problem. In the past, stopping a JN and restarting NN took a long time to initialize. I will pay more attention to the root cause of the problem. |
Description of PR
How was this patch tested?
unit test.
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?