Skip to content

Pinot controller startup fails with Zookeeper Kerberos authentication enabled #17729

@arshadmohammad

Description

@arshadmohammad

ISSUE:
Pinot controller startup fails with Zookeeper Kerberos authentication enabled.

2026/02/19 10:37:12.994 ERROR [HelixControllerMain] [main] Exception while starting controller
org.apache.helix.zookeeper.zkclient.exception.ZkTimeoutException: Waiting to be connected to ZK server has timed out.
at org.apache.helix.zookeeper.zkclient.ZkClient.waitForEstablishedSession(ZkClient.java:1990)
at org.apache.helix.manager.zk.ZKHelixManager.createClient(ZKHelixManager.java:775)
at org.apache.helix.manager.zk.ZKHelixManager.connect(ZKHelixManager.java:817)
at org.apache.helix.controller.HelixControllerMain.startHelixController(HelixControllerMain.java:159)
at org.apache.pinot.controller.helix.core.util.HelixSetupUtils.setupHelixController(HelixSetupUtils.java:131)
at org.apache.pinot.controller.BaseControllerStarter.setUpHelixController(BaseControllerStarter.java:469)
at org.apache.pinot.controller.BaseControllerStarter.start(BaseControllerStarter.java:440)
at org.apache.pinot.tools.service.PinotServiceManager.startController(PinotServiceManager.java:118)
at org.apache.pinot.tools.service.PinotServiceManager.startRole(PinotServiceManager.java:87)
at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.lambda$startBootstrapServices$0(StartServiceManagerCommand.java:240)
at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startPinotService(StartServiceManagerCommand.java:293)
at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startBootstrapServices(StartServiceManagerCommand.java:239)
at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.execute(StartServiceManagerCommand.java:183)
at org.apache.pinot.tools.admin.command.StartControllerCommand.execute(StartControllerCommand.java:180)
at org.apache.pinot.tools.Command.call(Command.java:33)
at org.apache.pinot.tools.Command.call(Command.java:29)
at picocli.CommandLine.executeUserObject(CommandLine.java:2031)
at picocli.CommandLine.access$1500(CommandLine.java:148)
at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2469)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2461)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2423)
at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277)
at picocli.CommandLine$RunLast.execute(CommandLine.java:2425)
at picocli.CommandLine.execute(CommandLine.java:2174)
at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:174)
at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:210)
at org.apache.pinot.tools.admin.PinotController.main(PinotController.java:38)

ANALYSIS:

The method org.apache.helix.zookeeper.zkclient.ZkClient.waitForEstablishedSession is designed to wait for the Zookeeper client to reach the SyncConnected state.
However, as shown in the logs, the client transitions to the SaslAuthenticated state instead.
This causes waitForEstablishedSession to timeout, since the code is specifically waiting for SyncConnected and does not recognize SaslAuthenticated as a valid connected state.

2026/02/19 10:34:02.799 DEBUG [ZkClient] [main] zkclient5 Awaiting connection to Zookeeper server
2026/02/19 10:34:02.799 DEBUG [ZkClient] [main] zkclient 5, Waiting for keeper state SyncConnected
2026/02/19 10:34:02.836 DEBUG [ZkClient] [main-EventThread] zkclient 5, Received event: WatchedEvent state:SyncConnected type:None path:null zxid: -1
2026/02/19 10:34:02.836 INFO [ZkClient] [main-EventThread] zkclient 5, zookeeper state changed ( SyncConnected )
....
2026/02/19 10:34:02.852 DEBUG [ZkClient] [main-EventThread] zkclient 5, Received event: WatchedEvent state:SaslAuthenticated type:None path:null zxid: -1
2026/02/19 10:34:02.852 INFO [ZkClient] [main-EventThread] zkclient 5, zookeeper state changed ( SaslAuthenticated )
2026/02/19 10:34:02.852 DEBUG [ZkClient] [main-EventThread] zkclient 5 Leaving process event

SOLUTION:
The fix needs to be implemented in the Apache Helix codebase. I will raise an issue with the Helix community and collaborate to resolve it. Need to handle SaslAuthenticated as the internal state

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions