Skip to content

HDDS-2107. Datanodes should retry forever to connect to SCM in an… #1424

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 16, 2019

Conversation

vivekratnavel
Copy link
Contributor

@vivekratnavel vivekratnavel commented Sep 11, 2019

… unsecure environment

In an unsecure environment, the datanodes try upto 10 times after waiting for 1000 milliseconds each time before throwing this error:

Unable to communicate to SCM server at scm:9861 for past 0 seconds.
java.net.ConnectException: Call From scm:9861 failed on connection exception: java.net.ConnectException: Connection refused;

This PR fixes that issue by having datanodes try forever to connect with SCM and not throw an error from the state machine.

I have also increased timeouts on a unit test to improve its stability.

@vivekratnavel
Copy link
Contributor Author

/label ozone

@vivekratnavel
Copy link
Contributor Author

@elek elek added the ozone label Sep 11, 2019
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 41 Docker mode activated.
_ Prechecks _
+1 dupname 0 No case conflicting files found.
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
0 mvndep 67 Maven dependency ordering for branch
+1 mvninstall 589 trunk passed
+1 compile 381 trunk passed
+1 checkstyle 83 trunk passed
+1 mvnsite 0 trunk passed
+1 shadedclient 868 branch has no errors when building and testing our client artifacts.
+1 javadoc 178 trunk passed
0 spotbugs 417 Used deprecated FindBugs config; considering switching to SpotBugs.
+1 findbugs 615 trunk passed
_ Patch Compile Tests _
0 mvndep 41 Maven dependency ordering for patch
+1 mvninstall 536 the patch passed
+1 compile 387 the patch passed
+1 javac 387 the patch passed
+1 checkstyle 90 the patch passed
+1 mvnsite 0 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 shadedclient 678 patch has no errors when building and testing our client artifacts.
+1 javadoc 175 the patch passed
+1 findbugs 631 the patch passed
_ Other Tests _
-1 unit 280 hadoop-hdds in the patch failed.
-1 unit 2824 hadoop-ozone in the patch failed.
+1 asflicense 55 The patch does not generate ASF License warnings.
8715
Reason Tests
Failed junit tests hadoop.hdds.scm.container.placement.algorithms.TestSCMContainerPlacementRackAware
hadoop.ozone.container.TestContainerReplication
hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient
hadoop.ozone.container.common.statemachine.commandhandler.TestBlockDeletion
hadoop.ozone.client.rpc.TestContainerStateMachineFailures
hadoop.ozone.client.rpc.Test2WayCommitInRatis
hadoop.ozone.TestSecureOzoneCluster
hadoop.ozone.scm.TestContainerSmallFile
hadoop.ozone.client.rpc.TestBlockOutputStream
hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures
hadoop.ozone.om.TestOzoneManagerHA
Subsystem Report/Notes
Docker Client=19.03.1 Server=19.03.1 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1424/1/artifact/out/Dockerfile
GITHUB PR #1424
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux 2f98f8163e51 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / f8f8598
Default Java 1.8.0_222
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-1424/1/artifact/out/patch-unit-hadoop-hdds.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-1424/1/artifact/out/patch-unit-hadoop-ozone.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1424/1/testReport/
Max. process+thread count 5408 (vs. ulimit of 5500)
modules C: hadoop-hdds/container-service hadoop-ozone/ozone-manager U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1424/1/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @vivekratnavel,

As far as I see, DataNode already tries forever due to the main loop in the state machine:

while (context.getState() != DatanodeStates.SHUTDOWN) {
try {
LOG.debug("Executing cycle Number : {}", context.getExecutionCount());
long heartbeatFrequency = context.getHeartbeatFrequency();
nextHB.set(Time.monotonicNow() + heartbeatFrequency);
context.execute(executorService, heartbeatFrequency,
TimeUnit.MILLISECONDS);
now = Time.monotonicNow();
if (now < nextHB.get()) {
if(!Thread.interrupted()) {
Thread.sleep(nextHB.get() - now);
}
}
} catch (InterruptedException e) {
// Some one has sent interrupt signal, this could be because
// 1. Trigger heartbeat immediately
// 2. Shutdown has be initiated.
} catch (Exception e) {
LOG.error("Unable to finish the execution.", e);
}
}

You can verify this by starting DataNode without SCM, and setting the IP for scm to the DataNode's own address:

cd hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/ozone
docker-compose up -d datanode
docker-compose exec datanode bash -c "tail -1 /etc/hosts | sed 's/\t\+[a-z0-9]*$/ scm/' | sudo tee -a /etc/hosts"
docker-compose logs -f --tail=10 datanode

Result:

...
datanode_1  | 2019-09-11 12:29:39 INFO  Client:948 - Retrying connect to server: scm/192.168.0.2:9861. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 12:29:39 ERROR EndpointStateMachine:204 - Unable to communicate to SCM server at scm:9861 for past 300 seconds.
...
datanode_1  | 2019-09-11 12:29:40 INFO  Client:948 - Retrying connect to server: scm/192.168.0.2:9861. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
...

@vivekratnavel
Copy link
Contributor Author

@adoroszlai You are right. With this change, we don't get the error from EndPointStateMachine and the result now looks like this:

datanode_1  | 2019-09-11 18:16:55 INFO  InitDatanodeState:140 - DatanodeDetails is persisted to /data/datanode.id
datanode_1  | 2019-09-11 18:16:57 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:16:58 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:16:59 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:00 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:01 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:02 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:03 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:04 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:05 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:06 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:07 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 10 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:08 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 11 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:09 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 12 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:10 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 13 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:11 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 14 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:12 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 15 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:13 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 16 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:14 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 17 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:15 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 18 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:16 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 19 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:17 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 20 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:18 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 21 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:19 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 22 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:20 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 23 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:21 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 24 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:22 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 25 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:23 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 26 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:24 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 27 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:25 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 28 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:26 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 29 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:27 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 30 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:28 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 31 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:29 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 32 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:30 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 33 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:31 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 34 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:32 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 35 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:33 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 36 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:34 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 37 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:35 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 38 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:36 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 39 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:37 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 40 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:38 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 41 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:39 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 42 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:40 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 43 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:41 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 44 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:43 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 45 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:44 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 46 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:45 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 47 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:45 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 48 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:46 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 49 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:47 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 50 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:48 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 51 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:49 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 52 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:50 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 53 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:51 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 54 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:52 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 55 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:53 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 56 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:54 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 57 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:55 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 58 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:56 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 59 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:58 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 60 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:17:59 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 61 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:18:00 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 62 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:18:01 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 63 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:18:02 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 64 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:18:03 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 65 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:18:04 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 66 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:18:05 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 67 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:18:06 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 68 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:18:07 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 69 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:18:08 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 70 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:18:09 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 71 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:18:10 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 72 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:18:11 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 73 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:18:12 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 74 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:18:13 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 75 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:18:14 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 76 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:18:15 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 77 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)
datanode_1  | 2019-09-11 18:18:16 INFO  Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 78 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS)

@hanishakoneru
Copy link
Contributor

Thank you @vivekratnavel for working on this.
LGTM. +1.
Can you please update the description. It states that DNs fail after 10 retries which is not the case.

@vivekratnavel
Copy link
Contributor Author

@hanishakoneru Sure.

@hanishakoneru
Copy link
Contributor

Thank you @vivekratnavel. +1. I will commit it.

@hanishakoneru hanishakoneru merged commit 66bd168 into apache:trunk Sep 16, 2019
amahussein pushed a commit to amahussein/hadoop that referenced this pull request Oct 29, 2019
RogPodge pushed a commit to RogPodge/hadoop that referenced this pull request Mar 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants