Skip to content

zoo-0.zoo is not defined in the deployment #203

Closed
@samv

Description

@samv

The example/default configuration file lists 5 servers:

server.4=zoo-0.zoo:2888:3888:participant

This looks like a mistake. It happens to work because there are 5 nodes defined and with 3 nodes in the statefulset, zookeeper will consider that a quorum. However, it is extremely fragile as any outage of a single node will bring the ZK cluster (and hence, the kafka deployment) to hard down; eg bootstrap times out:

$ kafkacat -b k8s.internal.example.com:32401 -L
% ERROR: Failed to acquire metadata: Local: Broker transport failure

Observed log lines from zookeeper:

Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running (org.apache.zookeeper.server.NIOServerCnxn)

I will be testing this theory out soon by removing these two lines and seeing if zk stays happy with a single statefulset node failure.

While I'm here, the statefulsets should be defined with a parallel Pod Management policy so that if eg broker 0 goes down, the statefulset doesn't do a rolling restart of brokers 1+, and the system can recover from multi-node failures faster.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions