Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to start service etcd: Job for etcd.service - resolved urls do not match #11560

Open
Uncurlhalo opened this issue Sep 21, 2024 · 3 comments · May be fixed by #11725
Open

Unable to start service etcd: Job for etcd.service - resolved urls do not match #11560

Uncurlhalo opened this issue Sep 21, 2024 · 3 comments · May be fixed by #11725
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@Uncurlhalo
Copy link

Uncurlhalo commented Sep 21, 2024

What happened?

While attempting to stand up a k8s cluster with 3 control-plane nodes and 3 workers, the playbook failed when ensuring that etcd is running.

What did you expect to happen?

The playbook to successfully complete, etcd to be running on the control plane.

How can we reproduce it (as minimally and precisely as possible)?

Install kubespray via ansible-galaxy per documentation and create a playbook, install-cluster.yml with the following contents.

- name: Install Kubernetes
  ansible.builtin.import_playbook: kubernetes_sigs.kubespray.cluster

Create 6 VM's, provision each VM with one public IP and one private IP (in my use case public meaning public to my LAN and not a host only network on my hypervisor).

Define these IP's as show in my ansible inventory.

Execute the playbook with the command ansible-playbook -i invetory/invetory.ini --become --become-user=root cluster-install.yml

OS

Linux 6.10.9-200.fc40.x86_64 x86_64
NAME="Fedora Linux"
VERSION="40 (Workstation Edition)"
ID=fedora
VERSION_ID=40
VERSION_CODENAME=""
PLATFORM_ID="platform:f40"
PRETTY_NAME="Fedora Linux 40 (Workstation Edition)"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:40"
DEFAULT_HOSTNAME="fedora"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f40/system-administrators-guide/"
SUPPORT_URL="https://ask.fedoraproject.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=40
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=40
SUPPORT_END=2025-05-13
VARIANT="Workstation Edition"
VARIANT_ID=workstation

Version of Ansible

ansible [core 2.16.10]
config file = /etc/ansible/ansible.cfg
configured module search path = ['/home/jmelton/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python3.12/site-packages/ansible
ansible collection location = /home/jmelton/.ansible/collections:/usr/share/ansible/collections
executable location = /usr/bin/ansible
python version = 3.12.5 (main, Aug 23 2024, 00:00:00) [GCC 14.2.1 20240801 (Red Hat 14.2.1-1)] (/usr/bin/python3)
jinja version = 3.1.4
libyaml = True

Version of Python

Python 3.12.5

Version of Kubespray (commit)

89ff071

Network plugin used

calico

Full inventory with variables

[all]
k8s-control-plane-0 ansible_host=192.168.1.200 ansible_user=k8s-node ip=192.168.1.200 access_ip=10.0.1.100
k8s-control-plane-1 ansible_host=192.168.1.201 ansible_user=k8s-node ip=192.168.1.201 access_ip=10.0.1.101
k8s-control-plane-2 ansible_host=192.168.1.202 ansible_user=k8s-node ip=192.168.1.202 access_ip=10.0.1.102
k8s-worker-node-0 ansible_host=192.168.1.210 ansible_user=k8s-node ip=192.168.1.210 access_ip=10.0.1.200
k8s-worker-node-1 ansible_host=192.168.1.211 ansible_user=k8s-node ip=192.168.1.211 access_ip=10.0.1.201
k8s-worker-node-2 ansible_host=192.168.1.212 ansible_user=k8s-node ip=192.168.1.212 access_ip=10.0.1.202

[kube_control_plane]
k8s-control-plane-0
k8s-control-plane-1
k8s-control-plane-2

[etcd]
k8s-control-plane-0
k8s-control-plane-1
k8s-control-plane-2

[kube_node]
k8s-worker-node-0
k8s-worker-node-1
k8s-worker-node-2

[k8s_cluster:children]
kube_node
kube_control_plane

Command used to invoke ansible

ansible-playbook -i inventory/inventory.ini --become --become-user=root cluster-install.yml

Output of ansible run

TASK [kubernetes_sigs.kubespray.etcd : Configure | Ensure etcd is running] *****************************************************************************************************************************************************************************************************************************************************************************************
fatal: [k8s-control-plane-2]: FAILED! => {"changed": false, "msg": "Unable to start service etcd: Job for etcd.service failed because the control process exited with error code.\nSee \"systemctl status etcd.service\" and \"journalctl -xeu etcd.service\" for details.\n"}
fatal: [k8s-control-plane-1]: FAILED! => {"changed": false, "msg": "Unable to start service etcd: Job for etcd.service failed because the control process exited with error code.\nSee \"systemctl status etcd.service\" and \"journalctl -xeu etcd.service\" for details.\n"}
fatal: [k8s-control-plane-0]: FAILED! => {"changed": false, "msg": "Unable to start service etcd: Job for etcd.service failed because the control process exited with error code.\nSee \"systemctl status etcd.service\" and \"journalctl -xeu etcd.service\" for details.\n"}

PLAY RECAP *********************************************************************************************************************************************************************************************************************************************************************************************************************************************************
k8s-control-plane-0        : ok=414  changed=82   unreachable=0    failed=1    skipped=422  rescued=0    ignored=1   
k8s-control-plane-1        : ok=403  changed=79   unreachable=0    failed=1    skipped=403  rescued=0    ignored=1   
k8s-control-plane-2        : ok=403  changed=79   unreachable=0    failed=1    skipped=403  rescued=0    ignored=1   
k8s-worker-node-0          : ok=352  changed=56   unreachable=0    failed=0    skipped=434  rescued=0    ignored=0   
k8s-worker-node-1          : ok=327  changed=56   unreachable=0    failed=0    skipped=377  rescued=0    ignored=0   
k8s-worker-node-2          : ok=327  changed=56   unreachable=0    failed=0    skipped=377  rescued=0    ignored=0   

Anything else we need to know

A similar issue was opened here but the user never triaged the problem. Additionally relevant information from the control-plane nodes.

Output of journalctl -xeu etcd.service

Sep 21 13:09:42 k8s-control-plane-0 etcd[11167]: {"level":"info","ts":"2024-09-21T13:09:42.706328-0400","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_ADVERTISE_CLIENT_URLS","variable-value":"https://10.0.1.100:2379"}
Sep 21 13:09:42 k8s-control-plane-0 etcd[11167]: {"level":"info","ts":"2024-09-21T13:09:42.706572-0400","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_AUTO_COMPACTION_RETENTION","variable-value":"8"}
Sep 21 13:09:42 k8s-control-plane-0 etcd[11167]: {"level":"info","ts":"2024-09-21T13:09:42.706607-0400","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_CERT_FILE","variable-value":"/etc/ssl/etcd/ssl/member-k8s-control-plane-0.pem"}
Sep 21 13:09:42 k8s-control-plane-0 etcd[11167]: {"level":"info","ts":"2024-09-21T13:09:42.706627-0400","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_CLIENT_CERT_AUTH","variable-value":"true"}
Sep 21 13:09:42 k8s-control-plane-0 etcd[11167]: {"level":"info","ts":"2024-09-21T13:09:42.706653-0400","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_DATA_DIR","variable-value":"/var/lib/etcd"}
Sep 21 13:09:42 k8s-control-plane-0 etcd[11167]: {"level":"info","ts":"2024-09-21T13:09:42.706676-0400","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_ELECTION_TIMEOUT","variable-value":"5000"}
Sep 21 13:09:42 k8s-control-plane-0 etcd[11167]: {"level":"info","ts":"2024-09-21T13:09:42.706696-0400","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_ENABLE_V2","variable-value":"true"}
Sep 21 13:09:42 k8s-control-plane-0 etcd[11167]: {"level":"info","ts":"2024-09-21T13:09:42.706732-0400","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_EXPERIMENTAL_INITIAL_CORRUPT_CHECK","variable-value":"True"}
Sep 21 13:09:42 k8s-control-plane-0 etcd[11167]: {"level":"info","ts":"2024-09-21T13:09:42.706761-0400","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_EXPERIMENTAL_WATCH_PROGRESS_NOTIFY_INTERVAL","variable-value":"5s"}
Sep 21 13:09:42 k8s-control-plane-0 etcd[11167]: {"level":"info","ts":"2024-09-21T13:09:42.706785-0400","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_HEARTBEAT_INTERVAL","variable-value":"250"}
Sep 21 13:09:42 k8s-control-plane-0 etcd[11167]: {"level":"info","ts":"2024-09-21T13:09:42.70681-0400","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_INITIAL_ADVERTISE_PEER_URLS","variable-value":"https://10.0.1.100:2380"}
Sep 21 13:09:42 k8s-control-plane-0 etcd[11167]: {"level":"info","ts":"2024-09-21T13:09:42.706827-0400","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_INITIAL_CLUSTER","variable-value":"etcd1=https://192.168.1.200:2380,etcd2=https://192.168.1.201:2380,etcd3=https://192.168.1.202:2380"}
Sep 21 13:09:42 k8s-control-plane-0 etcd[11167]: {"level":"info","ts":"2024-09-21T13:09:42.706845-0400","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_INITIAL_CLUSTER_STATE","variable-value":"new"}
Sep 21 13:09:42 k8s-control-plane-0 etcd[11167]: {"level":"info","ts":"2024-09-21T13:09:42.706868-0400","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_INITIAL_CLUSTER_TOKEN","variable-value":"k8s_etcd"}
Sep 21 13:09:42 k8s-control-plane-0 etcd[11167]: {"level":"info","ts":"2024-09-21T13:09:42.706886-0400","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_KEY_FILE","variable-value":"/etc/ssl/etcd/ssl/member-k8s-control-plane-0-key.pem"}
Sep 21 13:09:42 k8s-control-plane-0 etcd[11167]: {"level":"info","ts":"2024-09-21T13:09:42.706923-0400","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_LISTEN_CLIENT_URLS","variable-value":"https://192.168.1.200:2379,https://127.0.0.1:2379"}
Sep 21 13:09:42 k8s-control-plane-0 etcd[11167]: {"level":"info","ts":"2024-09-21T13:09:42.706948-0400","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_LISTEN_PEER_URLS","variable-value":"https://192.168.1.200:2380"}
Sep 21 13:09:42 k8s-control-plane-0 etcd[11167]: {"level":"info","ts":"2024-09-21T13:09:42.706991-0400","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_METRICS","variable-value":"basic"}
Sep 21 13:09:42 k8s-control-plane-0 etcd[11167]: {"level":"info","ts":"2024-09-21T13:09:42.707008-0400","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_NAME","variable-value":"etcd1"}
Sep 21 13:09:42 k8s-control-plane-0 etcd[11167]: {"level":"info","ts":"2024-09-21T13:09:42.707028-0400","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_PEER_CERT_FILE","variable-value":"/etc/ssl/etcd/ssl/member-k8s-control-plane-0.pem"}
Sep 21 13:09:42 k8s-control-plane-0 etcd[11167]: {"level":"info","ts":"2024-09-21T13:09:42.707045-0400","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_PEER_CLIENT_CERT_AUTH","variable-value":"True"}
Sep 21 13:09:42 k8s-control-plane-0 etcd[11167]: {"level":"info","ts":"2024-09-21T13:09:42.707067-0400","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_PEER_KEY_FILE","variable-value":"/etc/ssl/etcd/ssl/member-k8s-control-plane-0-key.pem"}
Sep 21 13:09:42 k8s-control-plane-0 etcd[11167]: {"level":"info","ts":"2024-09-21T13:09:42.707083-0400","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_PEER_TRUSTED_CA_FILE","variable-value":"/etc/ssl/etcd/ssl/ca.pem"}
Sep 21 13:09:42 k8s-control-plane-0 etcd[11167]: {"level":"info","ts":"2024-09-21T13:09:42.707102-0400","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_PROXY","variable-value":"off"}
Sep 21 13:09:42 k8s-control-plane-0 etcd[11167]: {"level":"info","ts":"2024-09-21T13:09:42.707163-0400","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_SNAPSHOT_COUNT","variable-value":"10000"}
Sep 21 13:09:42 k8s-control-plane-0 etcd[11167]: {"level":"info","ts":"2024-09-21T13:09:42.707191-0400","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_TRUSTED_CA_FILE","variable-value":"/etc/ssl/etcd/ssl/ca.pem"}
Sep 21 13:09:42 k8s-control-plane-0 etcd[11167]: {"level":"warn","ts":"2024-09-21T13:09:42.7074-0400","caller":"embed/config.go:679","msg":"Running http and grpc server on single port. This is not recommended for production."}
Sep 21 13:09:42 k8s-control-plane-0 etcd[11167]: {"level":"info","ts":"2024-09-21T13:09:42.707472-0400","caller":"etcdmain/etcd.go:73","msg":"Running: ","args":["/usr/local/bin/etcd"]}
Sep 21 13:09:42 k8s-control-plane-0 etcd[11167]: {"level":"info","ts":"2024-09-21T13:09:42.707648-0400","caller":"etcdmain/etcd.go:116","msg":"server has been already initialized","data-dir":"/var/lib/etcd","dir-type":"member"}
Sep 21 13:09:42 k8s-control-plane-0 etcd[11167]: {"level":"warn","ts":"2024-09-21T13:09:42.70771-0400","caller":"embed/config.go:679","msg":"Running http and grpc server on single port. This is not recommended for production."}
Sep 21 13:09:42 k8s-control-plane-0 etcd[11167]: {"level":"info","ts":"2024-09-21T13:09:42.707734-0400","caller":"embed/etcd.go:127","msg":"configuring peer listeners","listen-peer-urls":["https://192.168.1.200:2380"]}
Sep 21 13:09:42 k8s-control-plane-0 etcd[11167]: {"level":"info","ts":"2024-09-21T13:09:42.707794-0400","caller":"embed/etcd.go:494","msg":"starting with peer TLS","tls-info":"cert = /etc/ssl/etcd/ssl/member-k8s-control-plane-0.pem, key = /etc/ssl/etcd/ssl/member-k8s-control-plane-0-key.pem, client-cert=, client-key=, trusted-ca = /etc/ssl/etcd/ssl/ca.pem, client-cert-auth = true, crl-file = ","cipher-suites":[]}
Sep 21 13:09:42 k8s-control-plane-0 etcd[11167]: {"level":"info","ts":"2024-09-21T13:09:42.709855-0400","caller":"embed/etcd.go:135","msg":"configuring client listeners","listen-client-urls":["https://127.0.0.1:2379","https://192.168.1.200:2379"]}
Sep 21 13:09:42 k8s-control-plane-0 etcd[11167]: {"level":"info","ts":"2024-09-21T13:09:42.710197-0400","caller":"embed/etcd.go:308","msg":"starting an etcd server","etcd-version":"3.5.12","git-sha":"e7b3bb6cc","go-version":"go1.20.13","go-os":"linux","go-arch":"amd64","max-cpu-set":4,"max-cpu-available":4,"member-initialized":false,"name":"etcd1","data-dir":"/var/lib/etcd","wal-dir":"","wal-dir-dedicated":"","member-dir":"/var/lib/etcd/member","force-new-cluster":false,"heartbeat-interval":"250ms","election-timeout":"5s","initial-election-tick-advance":true,"snapshot-count":10000,"max-wals":5,"max-snapshots":5,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["https://10.0.1.100:2380"],"listen-peer-urls":["https://192.168.1.200:2380"],"advertise-client-urls":["https://10.0.1.100:2379"],"listen-client-urls":["https://127.0.0.1:2379","https://192.168.1.200:2379"],"listen-metrics-urls":[],"cors":["*"],"host-whitelist":["*"],"initial-cluster":"etcd1=https://192.168.1.200:2380,etcd2=https://192.168.1.201:2380,etcd3=https://192.168.1.202:2380","initial-cluster-state":"new","initial-cluster-token":"k8s_etcd","quota-backend-bytes":2147483648,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"initial-corrupt-check":true,"corrupt-check-time-interval":"0s","compact-check-time-enabled":false,"compact-check-time-interval":"1m0s","auto-compaction-mode":"periodic","auto-compaction-retention":"8h0m0s","auto-compaction-interval":"8h0m0s","discovery-url":"","discovery-proxy":"","downgrade-check-interval":"5s"}
Sep 21 13:09:42 k8s-control-plane-0 etcd[11167]: {"level":"info","ts":"2024-09-21T13:09:42.713733-0400","caller":"etcdserver/backend.go:81","msg":"opened backend db","path":"/var/lib/etcd/member/snap/db","took":"3.030533ms"}
Sep 21 13:09:42 k8s-control-plane-0 etcd[11167]: {"level":"info","ts":"2024-09-21T13:09:42.715255-0400","caller":"netutil/netutil.go:112","msg":"resolved URL Host","url":"https://10.0.1.100:2380","host":"10.0.1.100:2380","resolved-addr":"10.0.1.100:2380"}
Sep 21 13:09:42 k8s-control-plane-0 etcd[11167]: {"level":"info","ts":"2024-09-21T13:09:42.715354-0400","caller":"netutil/netutil.go:112","msg":"resolved URL Host","url":"https://192.168.1.200:2380","host":"192.168.1.200:2380","resolved-addr":"192.168.1.200:2380"}
Sep 21 13:09:42 k8s-control-plane-0 etcd[11167]: {"level":"info","ts":"2024-09-21T13:09:42.722485-0400","caller":"embed/etcd.go:375","msg":"closing etcd server","name":"etcd1","data-dir":"/var/lib/etcd","advertise-peer-urls":["https://10.0.1.100:2380"],"advertise-client-urls":["https://10.0.1.100:2379"]}
Sep 21 13:09:42 k8s-control-plane-0 etcd[11167]: {"level":"info","ts":"2024-09-21T13:09:42.722713-0400","caller":"embed/etcd.go:377","msg":"closed etcd server","name":"etcd1","data-dir":"/var/lib/etcd","advertise-peer-urls":["https://10.0.1.100:2380"],"advertise-client-urls":["https://10.0.1.100:2379"]}
Sep 21 13:09:42 k8s-control-plane-0 etcd[11167]: {"level":"fatal","ts":"2024-09-21T13:09:42.722785-0400","caller":"etcdmain/etcd.go:204","msg":"discovery failed","error":"--initial-cluster has etcd1=https://192.168.1.200:2380 but missing from --initial-advertise-peer-urls=https://10.0.1.100:2380 (resolved urls: \"https://10.0.1.100:2380\" != \"https://192.168.1.200:2380\")","stacktrace":"go.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\tgo.etcd.io/etcd/server/v3/etcdmain/etcd.go:204\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\tgo.etcd.io/etcd/server/v3/etcdmain/main.go:40\nmain.main\n\tgo.etcd.io/etcd/server/v3/main.go:31\nruntime.main\n\truntime/proc.go:250"}
Sep 21 13:09:42 k8s-control-plane-0 systemd[1]: etcd.service: Main process exited, code=exited, status=1/FAILURE

The only thing of note seems to be resolved urls: \"https://10.0.1.100:2380\" != \"https://192.168.1.200:2380\. I'm going to attempt recreation of the cluster with only the public-ip's defined via ansible-host, but I'm pretty sure that was working fine the prior day.

@Uncurlhalo Uncurlhalo added the kind/bug Categorizes issue or PR as related to a bug. label Sep 21, 2024
@Uncurlhalo
Copy link
Author

Just an update, if the cluster is created without access_ip defined everything works as expected.

@pavlozt
Copy link

pavlozt commented Oct 9, 2024

I have the same problems when mixed using internal ip and real access_ip in cloud environment

kubespray commit 0b64ab1 ( tag v2.24.3)
ansible core 2.15.12
python 3.11.3

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
4 participants