Skip to content

Cluster can not start after reboot all pods #683

Open
@chernomor

Description

@chernomor

Report

I've setup cluster according https://docs.percona.com/percona-operator-for-mysql/ps/kubectl.html in single node k3s. All mysql pods working fine, when I've reboot k3s node and mysql cluster can not start.

More about the problem

I've touch file /var/lib/mysql/sleep-forever in master pod cluster1-mysql-0 and it running now, but slaves in CrashLoopBackOff:

$ kubectl -n mysql-test get pods -o wide
NAME                                             READY   STATUS             RESTARTS          AGE   IP           NODE                                  NOMINATED NODE   READINESS GATES
cluster1-haproxy-0                               2/2     Running            4 (18h ago)       22h   10.42.0.54   rt-chernomor   <none>           <none>
cluster1-haproxy-1                               2/2     Running            4 (18h ago)       22h   10.42.0.51   rt-chernomor   <none>           <none>
cluster1-haproxy-2                               2/2     Running            4 (18h ago)       22h   10.42.0.52   rt-chernomor   <none>           <none>
percona-server-mysql-operator-78ccf4bd45-67p2j   1/1     Running            2 (18h ago)       22h   10.42.0.47   rt-chernomor   <none>           <none>
cluster1-orc-0                                   2/2     Running            4 (18h ago)       22h   10.42.0.48   rt-chernomor   <none>           <none>
cluster1-orc-2                                   2/2     Running            4 (18h ago)       22h   10.42.0.50   rt-chernomor   <none>           <none>
cluster1-orc-1                                   2/2     Running            4 (18h ago)       22h   10.42.0.53   rt-chernomor   <none>           <none>
cluster1-mysql-0                                 3/3     Running            454 (41m ago)     18h   10.42.0.56   rt-chernomor   <none>           <none>
cluster1-mysql-2                                 2/3     CrashLoopBackOff   464 (2m24s ago)   18h   10.42.0.49   rt-chernomor   <none>           <none>
cluster1-mysql-1                                 1/3     CrashLoopBackOff   468 (26s ago)     18h   10.42.0.55   rt-chernomor   <none>           <none>

Some logs from bootstrap on slave pod:

$ kubectl -n mysql-test exec -it cluster1-mysql-1  -- tail -f /var/lib/mysql/bootstrap.log
Defaulted container "mysql" out of: mysql, xtrabackup, pt-heartbeat, mysql-init (init)
2024/06/26 07:54:31 bootstrap failed: select donor: connect to 10-42-0-49.cluster1-mysql-unready.mysql-test: ping DB: dial tcp 10.42.0.49:33062: connect: connection refused
2024/06/26 07:54:41 Peers: [10-42-0-49.cluster1-mysql-unready.mysql-test 10-42-0-55.cluster1-mysql-unready.mysql-test 10-42-0-56.cluster1-mysql-unready.mysql-test]
2024/06/26 07:54:41 bootstrap finished in 0.003150 seconds
2024/06/26 07:54:41 bootstrap failed: select donor: connect to 10-42-0-49.cluster1-mysql-unready.mysql-test: ping DB: dial tcp 10.42.0.49:33062: connect: connection refused
2024/06/26 07:54:51 Peers: [10-42-0-49.cluster1-mysql-unready.mysql-test 10-42-0-55.cluster1-mysql-unready.mysql-test 10-42-0-56.cluster1-mysql-unready.mysql-test]
2024/06/26 07:54:51 bootstrap finished in 0.003226 seconds
2024/06/26 07:54:51 bootstrap failed: select donor: connect to 10-42-0-49.cluster1-mysql-unready.mysql-test: ping DB: dial tcp 10.42.0.49:33062: connect: connection refused
2024/06/26 07:55:01 Peers: [10-42-0-49.cluster1-mysql-unready.mysql-test 10-42-0-55.cluster1-mysql-unready.mysql-test 10-42-0-56.cluster1-mysql-unready.mysql-test]
2024/06/26 07:55:01 bootstrap finished in 0.003110 seconds
2024/06/26 07:55:01 bootstrap failed: select donor: connect to 10-42-0-49.cluster1-mysql-unready.mysql-test: ping DB: dial tcp 10.42.0.49:33062: connect: connection refused
2024/06/26 08:00:31 Peers: [10-42-0-49.cluster1-mysql-unready.mysql-test 10-42-0-55.cluster1-mysql-unready.mysql-test 10-42-0-56.cluster1-mysql-unready.mysql-test]
2024/06/26 08:00:31 bootstrap finished in 0.003058 seconds
2024/06/26 08:00:31 bootstrap failed: select donor: connect to 10-42-0-49.cluster1-mysql-unready.mysql-test: ping DB: dial tcp 10.42.0.49:33062: connect: connection refused
2024/06/26 08:00:41 Peers: [10-42-0-49.cluster1-mysql-unready.mysql-test 10-42-0-55.cluster1-mysql-unready.mysql-test 10-42-0-56.cluster1-mysql-unready.mysql-test]
2024/06/26 08:00:41 bootstrap finished in 0.002679 seconds
2024/06/26 08:00:41 bootstrap failed: select donor: connect to 10-42-0-49.cluster1-mysql-unready.mysql-test: ping DB: dial tcp 10.42.0.49:33062: connect: connection refused
2024/06/26 08:00:51 Peers: [10-42-0-49.cluster1-mysql-unready.mysql-test 10-42-0-55.cluster1-mysql-unready.mysql-test 10-42-0-56.cluster1-mysql-unready.mysql-test]
2024/06/26 08:00:51 bootstrap finished in 0.003255 seconds
2024/06/26 08:00:51 bootstrap failed: select donor: connect to 10-42-0-49.cluster1-mysql-unready.mysql-test: ping DB: dial tcp 10.42.0.49:33062: connect: connection refused
2024/06/26 08:01:01 Peers: [10-42-0-49.cluster1-mysql-unready.mysql-test 10-42-0-55.cluster1-mysql-unready.mysql-test 10-42-0-56.cluster1-mysql-unready.mysql-test]
2024/06/26 08:01:01 bootstrap finished in 0.003455 seconds
2024/06/26 08:01:01 bootstrap failed: select donor: connect to 10-42-0-49.cluster1-mysql-unready.mysql-test: ping DB: dial tcp 10.42.0.49:33062: connect: connection refused
2024/06/26 08:01:11 Peers: [10-42-0-49.cluster1-mysql-unready.mysql-test 10-42-0-55.cluster1-mysql-unready.mysql-test 10-42-0-56.cluster1-mysql-unready.mysql-test]
2024/06/26 08:01:11 bootstrap finished in 0.002918 seconds
2024/06/26 08:01:11 bootstrap failed: select donor: connect to 10-42-0-49.cluster1-mysql-unready.mysql-test: ping DB: dial tcp 10.42.0.49:33062: connect: connection refused
command terminated with exit code 137

Steps to reproduce

  1. setup mysql cluster on single node
  2. reboot bode
  3. mysql pods do not running

Versions

  1. Kubernetes
    k3s version v1.29.5+k3s1 (4e53a323)
    go version go1.21.9

  2. Operator
    83b9f60, v0.7.0

  3. Database
    mysql Ver 8.0.36-28 for Linux on x86_64 (Percona Server (GPL), Release 28, Revision 47601f19)

Anything else?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions