Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LeaderNotAvailable on a simple ephemeral Kafka instance on local kubernetes cluster #636

Open
Maximus-1991 opened this issue Jan 3, 2021 · 0 comments

Comments

@Maximus-1991
Copy link

Maximus-1991 commented Jan 3, 2021

Hi,

I recently started to learn kafka and kubernetes. In order to learn more about setting up kubernetes and kafka clusters, I started a simple project involving a single node kubernetes 'cluster'. I intentionally did not opt for Minikube, as I want to be able to add more nodes to the cluster once I have solved the error described below. The goal is merely to learn more about how kubernetes and kafka works. Unfortunately, I ran into an error which after days of trying I have not been able to solve, so I hope someone can kindly help me out here. Issue is also posted on SO (https://stackoverflow.com/questions/65540179/problem-running-simple-ephemeral-kafka-instance-on-local-kubernetes-cluster)

I installed the kubernetes using kubeadm and use Calico as CNI plugin

OS:
18.04.4 LTS (Bionic Beaver)

Kubernetes version

kubeadm version: &version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.5", GitCommit:"e6503f8d8f769ace2f338794c914a96fc335df0f", GitTreeState:"clean", BuildDate:"2020-06-26T03:45:16Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

Process to reproduce

  1. Turn off swap memory (swapoff -a)
  2. Manually remove etcd (rm -rf
    /var/lib/etcd)
  3. Reset kubernetes cluster (kubeadm reset)
  4. kubeadm init
    --apiserver-advertise-address=192.168.1.37 --pod-network-cidr=10.200.0.0/16. Note that I did not use 192.168.0.0/16 as pod-network-cidr (as described here: https://docs.projectcalico.org/getting-started/kubernetes/quickstart) since it conflicts with my home network)
  5. mkdir -p $HOME/.kube
  6. cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  7. kubectl create -f https://docs.projectcalico.org/manifests/tigera-operator.yaml
  8. kubectl create -f custom-resources.yaml. Note: Same as https://docs.projectcalico.org/manifests/custom-resources.yaml, except I changed cidr:192.168.0.0/16 to cidr: 10.200.0.0/16
  9. kubectl taint nodes --all node-role.kubernetes.io/master-
  10. kubectl apply -f zookeeper-service.yaml (see code below)
  11. kubectl apply -f kafka-service.yaml (see code below)
  12. kubectl apply -f zookeeper-deployment.yaml (see code below)
  13. kubectl apply -f kafka-deployment.yaml (see code below)
  14. kubectl describe pod | grep IP --> use this Pod IP as bootstrap server for kafka-producer.yaml
  15. kubectl apply -f kafka-producer.yaml (see code below)

zookeeper-service.yaml

apiVersion: v1
kind: Service
metadata:
  name: zoo1
  labels:
    app: zookeeper-1
spec:
  ports:
  - name: client
    port: 2181
    protocol: TCP
  - name: follower
    port: 2888
    protocol: TCP
  - name: leader
    port: 3888
    protocol: TCP
  selector:
    app: zookeeper-1

kafka-service.yaml

apiVersion: v1
kind: Service
metadata:
  name: kafka-service
  labels:
    name: kafka
spec:
  ports:
  - port: 9092
    name: kafka-port
    protocol: TCP
  selector:
    app: kafka
    id: "0"
  type: LoadBalancer

zookeeper-deployment.yaml

kind: Deployment
apiVersion: apps/v1
metadata:
  name: zookeeper-deploy
spec:
  replicas: 1
  selector:
    matchLabels:
      app: zookeeper-1
  template:
    metadata:
      labels:
        app: zookeeper-1
    spec:
      containers:
      - name: zoo1
        image: wurstmeister/zookeeper:3.4.6
        ports:
        - containerPort: 2181
        env:
        - name: ZOOKEEPER_ID
          value: "1"
        - name: ZOOKEEPER_SERVER_1
          value: zoo1

kafka-deployment.yaml

kind: Deployment
apiVersion: apps/v1
metadata:
  name: kafka-broker0
spec:
  replicas: 1
  selector:
    matchLabels:
        app: kafka
        id: "0"
  template:
    metadata:
      labels:
        app: kafka
        id: "0"
    spec:
      containers:
      - name: kafka
        image: wurstmeister/kafka:2.12-2.2.0
        ports:
        - containerPort: 9092
        - containerPort: 9093
        env:
        - name: MY_POD_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
        - name: KAFKA_ADVERTISED_LISTENERS
          value: INSIDE://kafka:9093, OUTSIDE://$(MY_POD_IP):9092
        - name: KAFKA_LISTENER_SECURITY_PROTOCOL_MAP
          value: INSIDE:PLAINTEXT,OUTSIDE:PLAINTEXT
        - name: KAFKA_LISTENERS
          value: INSIDE://0.0.0.0:9093,OUTSIDE://0.0.0.0:9092
        - name: KAFKA_INTER_BROKER_LISTENER_NAME
          value: INSIDE
        - name: KAFKA_ZOOKEEPER_CONNECT
          value: zoo1:2181
        - name: KAFKA_BROKER_ID
          value: "0"
        - name: KAFKA_CREATE_TOPICS
          value: testtopic25:1:1

kafka-producer.yaml

kind: Deployment
apiVersion: apps/v1
metadata:
  name: producer-app-deploy
spec:
  replicas: 1
  selector:
    matchLabels:
      app: producer-app
  template:
    metadata:
      labels:
        app: producer-app
    spec:
      containers:
      - name: producer-app
        image: maximus1991/streamer_app
        ports:
        - containerPort: 2181
        env:
        - name: ENV_BOOTSTRAP_SERVER
          value: <KAFKA BROKER POD IP>:9092
        - name: PYTHONUNBUFFERED
          value: "1"
        - name: PYTHONIOENCODING
          value: "UTF-8"
        command: [ "/bin/bash" ]
        args: [ "-c", "echo 'Giving Kafka a bit of time to start up'; sleep 30; echo 'Starting now'; python /usr/src/app/main.py --topic testtopic25" ]

main.py

import argparse
import os

from json import dumps
from kafka import KafkaProducer

parser = argparse.ArgumentParser()
parser.add_argument('--topic', help='Kafka topic')
args = parser.parse_args()
topic = args.topic
bootstrap_server = os.environ['ENV_BOOTSTRAP_SERVER']

producer = KafkaProducer(bootstrap_servers=bootstrap_server,
                         value_serializer=lambda x: dumps(x).encode('utf-8'))

data = {'price':150} # Hardcoded for now, will be replaced with sensor data
producer.send(topic=topic, value=data)
producer.flush()

kafka-producer Dockerfile

FROM python:3
WORKDIR /usr/src/app
COPY . .
RUN pip install pipenv
RUN pipenv install --system --deploy

Error

All services and pods are up and running, except the pod running the kafka-producer keeps failing:

kube-system pods

Calico

Service status

Logs of kafka-producer pod:

File "/usr/src/app/main.py", line 18, in <module>
        sc.run()
      File "/usr/local/lib/python3.9/sched.py", line 151, in run
        action(*argument, **kwargs)
      File "/usr/src/app/src/data/send_data.py", line 17, in send_data
        producer.send(topic=topic, value=data)
      File "/usr/local/lib/python3.9/site-packages/kafka/producer/kafka.py", line 576, in send
        self._wait_on_metadata(topic, self.config['max_block_ms'] / 1000.0)
      File "/usr/local/lib/python3.9/site-packages/kafka/producer/kafka.py", line 702, in _wait_on_metadata
        raise Errors.KafkaTimeoutError(
    kafka.errors.KafkaTimeoutError: KafkaTimeoutError: Failed to update metadata after 60.0 secs.

Logs of kafka-broker (only added the last lines, as log is quite long):

[2021-01-02 13:09:07,893] INFO [GroupCoordinator 0]: Starting up. (kafka.coordinator.group.GroupCoordinator)
[2021-01-02 13:09:07,894] INFO [GroupCoordinator 0]: Startup complete. (kafka.coordinator.group.GroupCoordinator)
[2021-01-02 13:09:07,897] INFO [GroupMetadataManager brokerId=0] Removed 0 expired offsets in 3 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2021-01-02 13:09:07,902] INFO [ProducerId Manager 0]: Acquired new producerId block (brokerId:0,blockStartProducerId:0,blockEndProducerId:999) by writing to Zk with path version 1 (kafka.coordinator.transaction.ProducerIdManager)
[2021-01-02 13:09:07,918] INFO [TransactionCoordinator id=0] Starting up. (kafka.coordinator.transaction.TransactionCoordinator)
[2021-01-02 13:09:07,919] INFO [Transaction Marker Channel Manager 0]: Starting (kafka.coordinator.transaction.TransactionMarkerChannelManager)
[2021-01-02 13:09:07,919] INFO [TransactionCoordinator id=0] Startup complete. (kafka.coordinator.transaction.TransactionCoordinator)
[2021-01-02 13:09:07,948] INFO [/config/changes-event-process-thread]: Starting (kafka.common.ZkNodeChangeNotificationListener$ChangeEventProcessThread)
[2021-01-02 13:09:07,956] INFO [SocketServer brokerId=0] Started data-plane processors for 2 acceptors (kafka.network.SocketServer)
[2021-01-02 13:09:07,959] INFO Kafka version: 2.2.0 (org.apache.kafka.common.utils.AppInfoParser)
[2021-01-02 13:09:07,960] INFO Kafka commitId: 05fcfde8f69b0349 (org.apache.kafka.common.utils.AppInfoParser)
[2021-01-02 13:09:07,960] INFO [KafkaServer id=0] started (kafka.server.KafkaServer)
creating topics: testtopic25:1:1
Created topic testtopic25.
[2021-01-02 13:19:07,894] INFO [GroupMetadataManager brokerId=0] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2021-01-02 13:29:07,894] INFO [GroupMetadataManager brokerId=0] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)

Logs of zookeeper pod:

2021-01-02 13:08:48,630 [myid:] - INFO  [main:QuorumPeerConfig@103] - Reading configuration from: /opt/zookeeper-3.4.6/bin/../conf/zoo.cfg
2021-01-02 13:08:48,630 [myid:] - INFO  [main:ZooKeeperServerMain@95] - Starting server
2021-01-02 13:08:48,633 [myid:] - INFO  [main:Environment@100] - Server environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
2021-01-02 13:08:48,634 [myid:] - INFO  [main:Environment@100] - Server environment:host.name=zookeeper-deploy-666b4657cd-m87rw
2021-01-02 13:08:48,634 [myid:] - INFO  [main:Environment@100] - Server environment:java.version=1.7.0_65
2021-01-02 13:08:48,634 [myid:] - INFO  [main:Environment@100] - Server environment:java.vendor=Oracle Corporation
2021-01-02 13:08:48,634 [myid:] - INFO  [main:Environment@100] - Server environment:java.home=/usr/lib/jvm/java-7-openjdk-amd64/jre
2021-01-02 13:08:48,634 [myid:] - INFO  [main:Environment@100] - Server environment:java.class.path=/opt/zookeeper-3.4.6/bin/../build/classes:/opt/zookeeper-3.4.6/bin/../build/lib/*.jar:/opt/zookeeper-3.4.6/bin/../lib/slf4j-log4j12-1.6.1.jar:/opt/zookeeper-3.4.6/bin/../lib/slf4j-api-1.6.1.jar:/opt/zookeeper-3.4.6/bin/../lib/netty-3.7.0.Final.jar:/opt/zookeeper-3.4.6/bin/../lib/log4j-1.2.16.jar:/opt/zookeeper-3.4.6/bin/../lib/jline-0.9.94.jar:/opt/zookeeper-3.4.6/bin/../zookeeper-3.4.6.jar:/opt/zookeeper-3.4.6/bin/../src/java/lib/*.jar:/opt/zookeeper-3.4.6/bin/../conf:
2021-01-02 13:08:48,634 [myid:] - INFO  [main:Environment@100] - Server environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib
2021-01-02 13:08:48,634 [myid:] - INFO  [main:Environment@100] - Server environment:java.io.tmpdir=/tmp
2021-01-02 13:08:48,636 [myid:] - INFO  [main:Environment@100] - Server environment:java.compiler=<NA>
2021-01-02 13:08:48,636 [myid:] - INFO  [main:Environment@100] - Server environment:os.name=Linux
2021-01-02 13:08:48,636 [myid:] - INFO  [main:Environment@100] - Server environment:os.arch=amd64
2021-01-02 13:08:48,636 [myid:] - INFO  [main:Environment@100] - Server environment:os.version=5.5.2-050502-generic
2021-01-02 13:08:48,636 [myid:] - INFO  [main:Environment@100] - Server environment:user.name=root
2021-01-02 13:08:48,636 [myid:] - INFO  [main:Environment@100] - Server environment:user.home=/root
2021-01-02 13:08:48,636 [myid:] - INFO  [main:Environment@100] - Server environment:user.dir=/opt/zookeeper-3.4.6
2021-01-02 13:08:48,637 [myid:] - INFO  [main:ZooKeeperServer@755] - tickTime set to 2000
2021-01-02 13:08:48,637 [myid:] - INFO  [main:ZooKeeperServer@764] - minSessionTimeout set to -1
2021-01-02 13:08:48,637 [myid:] - INFO  [main:ZooKeeperServer@773] - maxSessionTimeout set to -1
2021-01-02 13:08:48,643 [myid:] - INFO  [main:NIOServerCnxnFactory@94] - binding to port 0.0.0.0/0.0.0.0:2181
2021-01-02 13:09:07,233 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.200.194.69:45848
2021-01-02 13:09:07,236 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /10.200.194.69:45848
2021-01-02 13:09:07,238 [myid:] - INFO  [SyncThread:0:FileTxnLog@199] - Creating new log file: log.1
2021-01-02 13:09:07,249 [myid:] - INFO  [SyncThread:0:ZooKeeperServer@617] - Established session 0x176c336f9890000 with negotiated timeout 6000 for client /10.200.194.69:45848
2021-01-02 13:09:07,292 [myid:] - INFO  [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x176c336f9890000 type:create cxid:0x2 zxid:0x3 txntype:-1 reqpath:n/a Error Path:/brokers Error:KeeperErrorCode = NoNode for /brokers
2021-01-02 13:09:07,296 [myid:] - INFO  [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x176c336f9890000 type:create cxid:0x6 zxid:0x7 txntype:-1 reqpath:n/a Error Path:/config Error:KeeperErrorCode = NoNode for /config
2021-01-02 13:09:07,300 [myid:] - INFO  [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x176c336f9890000 type:create cxid:0x9 zxid:0xa txntype:-1 reqpath:n/a Error Path:/admin Error:KeeperErrorCode = NoNode for /admin
2021-01-02 13:09:07,438 [myid:] - INFO  [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x176c336f9890000 type:create cxid:0x15 zxid:0x15 txntype:-1 reqpath:n/a Error Path:/cluster Error:KeeperErrorCode = NoNode for /cluster
2021-01-02 13:09:07,958 [myid:] - INFO  [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@592] - Got user-level KeeperException when processing sessionid:0x176c336f9890000 type:multi cxid:0x38 zxid:0x1c txntype:-1 reqpath:n/a aborting remaining multi ops. Error Path:/admin/preferred_replica_election Error:KeeperErrorCode = NoNode for /admin/preferred_replica_election
2021-01-02 13:09:17,204 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.200.194.69:45918
2021-01-02 13:09:17,205 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /10.200.194.69:45918
2021-01-02 13:09:17,208 [myid:] - INFO  [SyncThread:0:ZooKeeperServer@617] - Established session 0x176c336f9890001 with negotiated timeout 30000 for client /10.200.194.69:45918
2021-01-02 13:09:17,394 [myid:] - INFO  [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x176c336f9890001 type:setData cxid:0x4 zxid:0x1e txntype:-1 reqpath:n/a Error Path:/config/topics/testtopic25 Error:KeeperErrorCode = NoNode for /config/topics/testtopic25
2021-01-02 13:09:17,422 [myid:] - INFO  [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x176c336f9890001
2021-01-02 13:09:17,424 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /10.200.194.69:45918 which had sessionid 0x176c336f9890001

The zookeeper logs also seem to give an error:
KeeperErrorCode = NoNode for ...

I read many Stackoverflow posts, blogs, youtube vidoes, etc. to figure out what causes this, but haven't found a solution yet. I assume that the kafka broker cannot be reached, which seems to be confirmed by what I tried in the Troubleshooting section.

Troubleshooting

I tried disabling ufw, but issue still persisted.
When I log in to shell of the kafka-producer pod, I can ping the IP address of the kafka broker pod.
When I log in to shell of the kafka broker pod, I can ping the IP address of the zookeeper Pod and I can list the topics:

kubectl exec -it <kafka broker pod> -- /bin/bash
kafka-topics.sh --zookeeper zoo1:2181 --list
testtopic25

However, when I try to send a message to this topic, I get an error:

bash-4.4# kafka-console-producer.sh --broker-list localhost:9092 --topic testtopic25
foobar 
[2021-01-02 13:50:59,711] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id 82 : {testtopic25=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
[2021-01-02 13:50:59,798] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id 83 : {testtopic25=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
[2021-01-02 13:50:59,904] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id 84 : {testtopic25=LEADER_NOT_AVAILABLE} 

Anyone an idea what causes these errors? Your help is much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant