Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot clone slaves after restore from big backup #740

Open
chernomor opened this issue Sep 3, 2024 · 1 comment
Open

Cannot clone slaves after restore from big backup #740

chernomor opened this issue Sep 3, 2024 · 1 comment

Comments

@chernomor
Copy link

Report

I create empty async cluster with 3 nodes and restore from backup. Master is alive and running, but slave cannot make clone.

More about the problem

...
Containers:
  mysql:
    Container ID:  containerd://5304dd2d181eca0d388fb606370d35d42f639a3bf70d6ed883525bbea2e39b61
    Image:         percona/percona-server:8.0.36-28
    Image ID:      docker.io/percona/percona-server@sha256:423acd206f94b34288d10ed041c3ba42543e26e44f3706621320504a010dd41f
    Ports:         3306/TCP, 33062/TCP, 33060/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP
    Command:
      /opt/percona/ps-entrypoint.sh
    Args:
      mysqld
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Tue, 03 Sep 2024 17:37:54 +0300
      Finished:     Tue, 03 Sep 2024 17:38:32 +0300
    Ready:          False
    Restart Count:  11
    Limits:
      memory:  24G
    Requests:
      memory:   20G
    Liveness:   exec [/opt/percona/healthcheck liveness] delay=15s timeout=10s period=10s #success=1 #failure=3
    Readiness:  exec [/opt/percona/healthcheck readiness] delay=30s timeout=3s period=5s #success=1 #failure=3
    Startup:    exec [/opt/percona/bootstrap] delay=15s timeout=43200s period=10s #success=1 #failure=1
...

Events:
  Type     Reason          Age                 From               Message
  ----     ------          ----                ----               -------
  Normal   Scheduled       21m                 default-scheduler  Successfully assigned racktables-prestable/racktables-prestable-mysql-1 to iva-n01.net
  Normal   AddedInterface  21m                 multus             Add eth0 [fcff:0:675:1:c278:c2c4:8308:288/128] from k8s-pod-network
  Normal   Pulling         21m                 kubelet            Pulling image "registry/percona-server-mysql-operator:0.8.0-debug5"
  Normal   Pulled          21m                 kubelet            Successfully pulled image "registry/percona-server-mysql-operator:0.8.0-debug5" in 247ms (247ms including waiting). Image size: 101676545 bytes.
  Normal   Created         21m                 kubelet            Created container mysql-init
  Normal   Started         21m                 kubelet            Started container mysql-init
  Normal   Pulled          21m                 kubelet            Successfully pulled image "percona/percona-server:8.0.36-28" in 1.665s (1.665s including waiting). Image size: 403423459 bytes.
  Normal   Pulling         21m                 kubelet            Pulling image "percona/percona-xtrabackup:8.0.35-31"
  Normal   Started         21m                 kubelet            Started container xtrabackup
  Normal   Pulled          21m                 kubelet            Successfully pulled image "percona/percona-xtrabackup:8.0.35-31" in 1.492s (1.492s including waiting). Image size: 311446308 bytes.
  Normal   Created         21m                 kubelet            Created container xtrabackup
  Normal   Started         21m                 kubelet            Started container pt-heartbeat
  Normal   Pulled          21m                 kubelet            Successfully pulled image "percona/percona-toolkit:3.6.0" in 1.519s (1.519s including waiting). Image size: 268173151 bytes.
  Warning  Unhealthy       20m                 kubelet            Startup probe failed:
  Normal   Killing         20m                 kubelet            Container mysql failed startup probe, will be restarted
  Normal   Pulling         19m (x2 over 21m)   kubelet            Pulling image "percona/percona-server:8.0.36-28"
  Normal   Started         19m (x2 over 21m)   kubelet            Started container mysql
  Normal   Created         19m (x2 over 21m)   kubelet            Created container mysql
  Normal   Pulled          19m                 kubelet            Successfully pulled image "percona/percona-server:8.0.36-28" in 1.608s (1.608s including waiting). Image size: 403423459 bytes.
  Normal   Pulling         19m (x2 over 21m)   kubelet            Pulling image "percona/percona-toolkit:3.6.0"
  Normal   Created         19m (x2 over 21m)   kubelet            Created container pt-heartbeat
  Normal   Pulled          19m                 kubelet            Successfully pulled image "percona/percona-toolkit:3.6.0" in 1.751s (1.751s including waiting). Image size: 268173151 bytes.
  Warning  BackOff         11m (x41 over 19m)  kubelet            Back-off restarting failed container pt-heartbeat in pod racktables-prestable-mysql-1_racktables-prestable(ebd7c624-abcd-489f-aae1-f9870fd097fc)
  Warning  BackOff         78s (x70 over 16m)  kubelet            Back-off restarting failed container mysql in pod racktables-prestable-mysql-1_racktables-prestable(ebd7c624-abcd-489f-aae1-f9870fd097fc)
2024-09-03T14:18:50.181793Z 0 [System] [MY-013292] [Server] Admin interface ready for connections, address: 'fcff:0:675:1:c278:c2c4:8308:288'  port: 33062
2024-09-03T14:19:07.310339Z 11 [Note] [MY-013272] [Clone] Plugin Clone reported: 'Client: Task Connect.'
2024-09-03T14:19:07.394163Z 11 [Note] [MY-013272] [Clone] Plugin Clone reported: 'Client: Source ACK Connect.'
2024-09-03T14:19:07.394248Z 11 [Note] [MY-013457] [InnoDB] Clone Apply Begin Master Version Check
2024-09-03T14:19:07.426148Z 11 [Note] [MY-013457] [InnoDB] Clone Apply Version End Master Task ID: 0 Passed, code: 0: 
2024-09-03T14:19:07.594923Z 11 [Note] [MY-013457] [InnoDB] Clone Apply Begin Master Task
2024-09-03T14:19:07.595131Z 11 [Warning] [MY-013460] [InnoDB] Clone removing all user data for provisioning: Started
2024-09-03T14:19:07.595153Z 11 [Note] [MY-011977] [InnoDB] Clone Drop all user data
2024-09-03T14:19:07.797571Z 11 [Note] [MY-011977] [InnoDB] Clone: Fix Object count: 191 task: 0
2024-09-03T14:19:07.801136Z 11 [Note] [MY-011977] [InnoDB] Clone Drop User schemas
2024-09-03T14:19:07.801377Z 11 [Note] [MY-011977] [InnoDB] Clone: Fix Object count: 4 task: 0
2024-09-03T14:19:07.801490Z 11 [Note] [MY-011977] [InnoDB] Clone Drop User tablespaces
2024-09-03T14:19:07.803175Z 11 [Note] [MY-011977] [InnoDB] Clone: Fix Object count: 6 task: 0
2024-09-03T14:19:07.944354Z 11 [Note] [MY-011977] [InnoDB] Clone Drop: finished successfully
2024-09-03T14:19:07.944403Z 11 [Warning] [MY-013460] [InnoDB] Clone removing all user data for provisioning: Finished
2024-09-03T14:19:07.949864Z 11 [Note] [MY-013272] [Clone] Plugin Clone reported: 'Client: Command COM_INIT.'
2024-09-03T14:19:07.970329Z 11 [Note] [MY-013458] [InnoDB] Clone Apply State Change : Number of tasks = 1
2024-09-03T14:19:07.970613Z 11 [Note] [MY-013458] [InnoDB] Clone Apply State FILE COPY: 
2024-09-03T14:19:07.970751Z 11 [Note] [MY-011978] [InnoDB] Clone estimated size: 200.85 GiB Available space: 272.84 GiB
2024-09-03T14:19:13.188566Z 11 [Note] [MY-013272] [Clone] Plugin Clone reported: 'Client: Tune Threads from: 1 to: 2 prev: 1 target: 2.'
2024-09-03T14:19:13.267361Z 0 [Note] [MY-013272] [Clone] Plugin Clone reported: 'Client: Task Connect.'
2024-09-03T14:19:13.268066Z 0 [Note] [MY-013457] [InnoDB] Clone Apply Begin Task ID: 1
2024-09-03T14:19:13.268180Z 0 [Note] [MY-013272] [Clone] Plugin Clone reported: 'Client: Command COM_ATTACH.'
2024-09-03T14:19:18.406307Z 11 [Note] [MY-013272] [Clone] Plugin Clone reported: 'Client: Tune continue, Data: 33 MiB/sec, Target: 28 MiB/sec.'
2024-09-03T14:19:18.406368Z 11 [Note] [MY-013272] [Clone] Plugin Clone reported: 'Client: Tune Threads from: 2 to: 4 prev: 2 target: 4.'
2024-09-03T14:19:18.558648Z 0 [Note] [MY-013272] [Clone] Plugin Clone reported: 'Client: Task Connect.'
2024-09-03T14:19:18.558665Z 0 [Note] [MY-013272] [Clone] Plugin Clone reported: 'Client: Task Connect.'
2024-09-03T14:19:18.561384Z 0 [Note] [MY-013457] [InnoDB] Clone Apply Begin Task ID: 2
2024-09-03T14:19:18.561446Z 0 [Note] [MY-013457] [InnoDB] Clone Apply Begin Task ID: 3
2024-09-03T14:19:18.561498Z 0 [Note] [MY-013272] [Clone] Plugin Clone reported: 'Client: Command COM_ATTACH.'
2024-09-03T14:19:18.561593Z 0 [Note] [MY-013272] [Clone] Plugin Clone reported: 'Client: Command COM_ATTACH.'
2024-09-03T14:19:21.236974Z 0 [System] [MY-013172] [Server] Received SHUTDOWN from user <via user signal>. Shutting down mysqld (Version: 8.0.36-28).
2024-09-03T14:19:21.238005Z 0 [Note] [MY-010067] [Server] Giving 4 client threads a chance to die gracefully
2024-09-03T14:19:21.238081Z 0 [Note] [MY-010117] [Server] Shutting down replica threads
2024-09-03T14:19:21.238099Z 0 [Note] [MY-010054] [Server] Event Scheduler: Killing the scheduler thread, thread id 7
2024-09-03T14:19:21.238112Z 0 [Note] [MY-010050] [Server] Event Scheduler: Waiting for the scheduler thread to reply
2024-09-03T14:19:21.238258Z 0 [Note] [MY-010048] [Server] Event Scheduler: Stopped
2024-09-03T14:19:21.305639Z 11 [Note] [MY-013458] [InnoDB] Clone Set Error code: 1317 Saved Error code: 0
2024-09-03T14:19:21.305721Z 11 [Note] [MY-013458] [InnoDB] Clone Set Error code: 1317 Saved Error code: 1317
2024-09-03T14:19:21.307106Z 11 [Note] [MY-013458] [InnoDB] Clone Set Error code: 1317 Saved Error code: 1317
2024-09-03T14:19:21.307141Z 11 [Note] [MY-013458] [InnoDB] Clone Apply set error code: 1317: Query execution was interrupted

Steps to reproduce

  1. create empty async cluster
  2. restore from big backup
  3. slave is broken

Versions

  1. Kubernetes v1.30.3
  2. Operator percona-server-mysql-operator:0.8.0 (with some debug patches)
  3. Database percona/percona-server:8.0.36-28

Anything else?

No response

@chernomor
Copy link
Author

# less bootstrap.log
...
2024/09/03 17:24:47 Clone required: true
2024/09/03 17:24:47 Checking if a clone in progress
2024/09/03 17:24:47 Clone in progress: false
2024/09/03 17:24:47 Cloning from racktables-prestable-mysql-0.racktables-prestable-mysql.racktables-prestable
2024/09/03 17:24:57 bootstrap finished in 10.125126 seconds
2024/09/03 17:24:57 bootstrap failed: clone from donor racktables-prestable-mysql-0.racktables-prestable-mysql.racktables-prestable: clone instance: invalid connection

It's like to one of these timeouts:

"readTimeout": "10s",

chernomor pushed a commit to chernomor/percona-server-mysql-operator that referenced this issue Sep 5, 2024
percona#740

2024/09/03 17:24:47 Cloning from c1-mysql-0.mysql.c1
2024/09/03 17:24:57 bootstrap failed: clone from donor c1-mysql-0.mysql.c1: clone instance: invalid connection
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant