-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
ISSUE:
Pinot controller startup fails with Zookeeper Kerberos authentication enabled.
2026/02/19 10:37:12.994 ERROR [HelixControllerMain] [main] Exception while starting controller
org.apache.helix.zookeeper.zkclient.exception.ZkTimeoutException: Waiting to be connected to ZK server has timed out.
at org.apache.helix.zookeeper.zkclient.ZkClient.waitForEstablishedSession(ZkClient.java:1990)
at org.apache.helix.manager.zk.ZKHelixManager.createClient(ZKHelixManager.java:775)
at org.apache.helix.manager.zk.ZKHelixManager.connect(ZKHelixManager.java:817)
at org.apache.helix.controller.HelixControllerMain.startHelixController(HelixControllerMain.java:159)
at org.apache.pinot.controller.helix.core.util.HelixSetupUtils.setupHelixController(HelixSetupUtils.java:131)
at org.apache.pinot.controller.BaseControllerStarter.setUpHelixController(BaseControllerStarter.java:469)
at org.apache.pinot.controller.BaseControllerStarter.start(BaseControllerStarter.java:440)
at org.apache.pinot.tools.service.PinotServiceManager.startController(PinotServiceManager.java:118)
at org.apache.pinot.tools.service.PinotServiceManager.startRole(PinotServiceManager.java:87)
at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.lambda$startBootstrapServices$0(StartServiceManagerCommand.java:240)
at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startPinotService(StartServiceManagerCommand.java:293)
at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startBootstrapServices(StartServiceManagerCommand.java:239)
at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.execute(StartServiceManagerCommand.java:183)
at org.apache.pinot.tools.admin.command.StartControllerCommand.execute(StartControllerCommand.java:180)
at org.apache.pinot.tools.Command.call(Command.java:33)
at org.apache.pinot.tools.Command.call(Command.java:29)
at picocli.CommandLine.executeUserObject(CommandLine.java:2031)
at picocli.CommandLine.access$1500(CommandLine.java:148)
at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2469)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2461)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2423)
at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277)
at picocli.CommandLine$RunLast.execute(CommandLine.java:2425)
at picocli.CommandLine.execute(CommandLine.java:2174)
at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:174)
at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:210)
at org.apache.pinot.tools.admin.PinotController.main(PinotController.java:38)
ANALYSIS:
The method org.apache.helix.zookeeper.zkclient.ZkClient.waitForEstablishedSession is designed to wait for the Zookeeper client to reach the SyncConnected state.
However, as shown in the logs, the client transitions to the SaslAuthenticated state instead.
This causes waitForEstablishedSession to timeout, since the code is specifically waiting for SyncConnected and does not recognize SaslAuthenticated as a valid connected state.
2026/02/19 10:34:02.799 DEBUG [ZkClient] [main] zkclient5 Awaiting connection to Zookeeper server
2026/02/19 10:34:02.799 DEBUG [ZkClient] [main] zkclient 5, Waiting for keeper state SyncConnected
2026/02/19 10:34:02.836 DEBUG [ZkClient] [main-EventThread] zkclient 5, Received event: WatchedEvent state:SyncConnected type:None path:null zxid: -1
2026/02/19 10:34:02.836 INFO [ZkClient] [main-EventThread] zkclient 5, zookeeper state changed ( SyncConnected )
....
2026/02/19 10:34:02.852 DEBUG [ZkClient] [main-EventThread] zkclient 5, Received event: WatchedEvent state:SaslAuthenticated type:None path:null zxid: -1
2026/02/19 10:34:02.852 INFO [ZkClient] [main-EventThread] zkclient 5, zookeeper state changed ( SaslAuthenticated )
2026/02/19 10:34:02.852 DEBUG [ZkClient] [main-EventThread] zkclient 5 Leaving process event
SOLUTION:
The fix needs to be implemented in the Apache Helix codebase. I will raise an issue with the Helix community and collaborate to resolve it. Need to handle SaslAuthenticated as the internal state