Zookeeper in 3 node Kubernetes cluster does not pass heath check in 2.8.0 #11070
Description
Describe the bug
Running the Kubernetes Helm chart with a 2.8.0 pulsar image, Zookeeper pod 0 runs but fails repeatedly. The rest of the Pulsar cluster is waiting on Zookeeper initialization. The Zookeeper health check is not passing "echo ruok | nc localhost 2181" is not returning "imok" - just hanging I think. So the kubernetes health check is failing repeatedly. Zookeeper is also complaining about not resolving zookeeper-1 and zookeeper-2 as usual but the zookeeper-0 log have a new NullReferenceException that we've not seen before:
10:58:55.990 [epollEventLoopGroup-4-2] WARN org.apache.zookeeper.server.NettyServerCnxnFactory - Exception caught
java.lang.NullPointerException: null
at org.apache.zookeeper.server.NettyServerCnxnFactory$CnxnChannelHandler.channelActive(NettyServerCnxnFactory.java:258) ~[org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:230) [io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:216) [io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelActive(AbstractChannelHandlerContext.java:209) [io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelActive(DefaultChannelPipeline.java:1398) [io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:230) [io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:216) [io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelActive(DefaultChannelPipeline.java:895) [io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:522) [io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:429) [io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:486) [io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [io.netty-netty-common-4.1.63.Final.jar:4.1.63.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) [io.netty-netty-common-4.1.63.Final.jar:4.1.63.Final]
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384) [io.netty-netty-transport-native-epoll-4.1.63.Final-linux-x86_64.jar:4.1.63.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [io.netty-netty-common-4.1.63.Final.jar:4.1.63.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [io.netty-netty-common-4.1.63.Final.jar:4.1.63.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.63.Final.jar:4.1.63.Final]
at java.lang.Thread.run(Thread.java:829) [?:?
To Reproduce
Pulsar 2.8.0 docker image on Kubernetes as deployed by Helm chart. Changing just the zookeeper image to 2.7.0 and the cluster came up.
Additional context
Possibly java 11 is part of the problem or just a bad version of Zookeeper?