Skip to content

5.5 server recovers to wrong position #226

Open
@temeo

Description

@temeo

Observed during seesaw test. A node2 was killed and recovered to rejoin the cluster via IST:

2015-11-01 09:17:52 INFO Sleeping for 3 secs
2015-11-01 09:17:55 INFO Sending signal SIGKILL to node2
2015-11-01 09:17:56 INFO Sleeping for 1 secs
2015-11-01 09:17:57 INFO Stopping node2
2015-11-01 09:17:57 INFO Sending signal SIGTERM to node2
2015-11-01 09:17:59 INFO Recovering node2 position
2015-11-01 09:18:03 INFO Reading recovery result
2015-11-01 09:18:03 INFO Start position 080cac6f-8070-11e5-8d16-9fb7d1087a45:219204
2015-11-01 09:18:03 INFO Starting node node2
2015-11-01 09:18:09 ERROR Caught exception:

However, joining failed because of applying error:

151101 10:18:05 [Note] WSREP: Quorum results:
        version    = 3,
        component  = PRIMARY,
        conf_id    = 34,
        members    = 2/3 (joined/total),
        act_id     = 221515,
        last_appl. = -1,
        protocols  = 0/7/3 (gcs/repl/appl),
        group UUID = 080cac6f-8070-11e5-8d16-9fb7d1087a45
151101 10:18:05 [Note] WSREP: Flow-control interval: [28, 28]
151101 10:18:05 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 221515)
151101 10:18:05 [Note] WSREP: State transfer required: 
        Group state: 080cac6f-8070-11e5-8d16-9fb7d1087a45:221515
        Local state: 080cac6f-8070-11e5-8d16-9fb7d1087a45:219204
151101 10:18:05 [Note] WSREP: New cluster view: global state: 080cac6f-8070-11e5
-8d16-9fb7d1087a45:221515, view# 35: Primary, number of nodes: 3, my index: 1, p
rotocol version 3
151101 10:18:05 [Warning] WSREP: Gap in state sequence. Need state transfer.
151101 10:18:05 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --addres
s '192.168.3.91:10023' --datadir '/home/galera/galera/node2/mysql/var/' --defaul
ts-file '/home/galera/galera/node2/mysql/etc/my.cnf' --parent '9301''
151101 10:18:05 [Warning] WSREP: last inactive check more than PT1.5S ago (PT2.0
0418S), skipping check
151101 10:18:05 [Note] WSREP: Prepared SST request: rsync|192.168.3.91:10023/rsy
nc_sst
151101 10:18:05 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notifica
tion.
151101 10:18:05 [Note] WSREP: REPL Protocols: 7 (3, 2)
151101 10:18:05 [Note] WSREP: Service thread queue flushed.
151101 10:18:05 [Note] WSREP: Assign initial position for certification: 221515,
 protocol version: 3
151101 10:18:05 [Note] WSREP: Service thread queue flushed.
151101 10:18:05 [Note] WSREP: Prepared IST receiver, listening at: tcp://192.168
.3.91:10022
151101 10:18:05 [Note] WSREP: Member 1.2 (node2) requested state transfer from '
*any*'. Selected 0.1 (node1)(SYNCED) as donor.
151101 10:18:05 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 221543)
151101 10:18:05 [Note] WSREP: Requesting state transfer: success, donor: 0
151101 10:18:05 [Note] WSREP: 0.1 (node1): State transfer to 1.2 (node2) complet
e.
151101 10:18:05 [Note] WSREP: Member 0.1 (node1) synced with group.
WSREP_SST: [INFO] Joiner cleanup. rsync PID: 9496 (20151101 10:18:06.382)
WSREP_SST: [INFO] Joiner cleanup done. (20151101 10:18:06.901)
151101 10:18:06 [Note] WSREP: SST complete, seqno: 219204
151101 10:18:06 [Warning] No argument was provided to --log-bin, and --log-bin-index was not used; so replication may break when this MySQL server acts as a master and has his hostname changed!! Please use '--log-bin=mysqld-bin' to avoid this problem.
151101 10:18:06 [Note] Plugin 'FEDERATED' is disabled.
151101 10:18:06 InnoDB: The InnoDB memory heap is disabled
151101 10:18:06 InnoDB: Mutexes and rw_locks use GCC atomic builtins
151101 10:18:06 InnoDB: Compressed tables use zlib 1.2.3
151101 10:18:06 InnoDB: Using Linux native AIO
151101 10:18:06 InnoDB: Initializing buffer pool, size = 128.0M
151101 10:18:06 InnoDB: Completed initialization of buffer pool
151101 10:18:06 InnoDB: highest supported file format is Barracuda.
151101 10:18:06  InnoDB: Waiting for the background threads to start
151101 10:18:07 InnoDB: 5.5.46 started; log sequence number 362665593
151101 10:18:08 [Note] Server hostname (bind-address): '0.0.0.0'; port: 3302
151101 10:18:08 [Note]   - '0.0.0.0' resolves to '0.0.0.0';
151101 10:18:08 [Note] Server socket created on IP: '0.0.0.0'.
151101 10:18:08 [Warning] 'proxies_priv' entry '@ root@ubuntu14' ignored in --skip-name-resolve mode.
151101 10:18:08 [Note] Event Scheduler: Loaded 0 events
151101 10:18:08 [Note] WSREP: Signalling provider to continue.
151101 10:18:08 [Note] WSREP: SST received: 080cac6f-8070-11e5-8d16-9fb7d1087a45:219204
151101 10:18:08 [Note] WSREP: Receiving IST: 2311 writesets, seqnos 219204-221515
151101 10:18:08 [Note] /home/galera/galera/node2/mysql/bin/mysqld: ready for connections.
Version: '5.5.46-log'  socket: '/home/galera/galera/node2/mysql/var/mysqld.sock'  port: 3302  MySQL Community Server (GPL), wsrep_25.12
151101 10:18:08 [ERROR] Slave SQL: Could not execute Write_rows event on table sysbench.sbtest4; Duplicate entry '3075' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 1205, Error_code: 1062
151101 10:18:08 [Warning] WSREP: RBR event 6 Write_rows apply warning: 121, 219205
151101 10:18:08 [Warning] WSREP: Failed to apply app buffer: seqno: 219205, status: 1
         at /home/jenkins/workspace/sssc-5.5/galera/galera/src/trx_handle.cpp:apply():351

Binlog from other node shows that the key entry 3075 is indeed applied in write set 219205

### UPDATE `sysbench`.`sbtest4`
### WHERE
###   @1=5002
###   @2=4995
###   @3='56360806138-90132390881-15796735454-29532277838-80305007101-1535649073
8-98613121489-26023348076-08247015366-64992154846'
###   @4='71015468605-53065053547-14284252033-33559493384-92272010741'
### SET
###   @1=5002
###   @2=4995
###   @3='21689572257-77126353993-95955501546-32645804962-70655432614-7424518978
4-58426193306-79961393837-60209028869-48767204011'
###   @4='71015468605-53065053547-14284252033-33559493384-92272010741'
# at 21713138
#151101 10:17:56 server id 1  end_log_pos 21713195      Table_map: `sysbench`.`s
btest4` mapped to number 33
# at 21713195
#151101 10:17:56 server id 1  end_log_pos 21713413      Write_rows: table id 33 
flags: STMT_END_F

BINLOG '
tMo1VhMBAAAAOQAAACtRSwEAACEAAAAAAAEACHN5c2JlbmNoAAdzYnRlc3Q0AAQDA/7+BP54/jwA
tMo1VhcBAAAA2gAAAAVSSwEAACEAAAAAAAEABP/wAwwAAHwTAAB3NDQ0NDI4OTk4NjQtNTk2NDYx
NDU5NjgtOTIwNDE4ODUyNDEtNDQ4MTg5ODU0MjItNjIyOTY4NzUwMzItNTI0Mzc1MTU3NjEtMjM5
MDk2MTQ4MDAtMDkzOTQwOTc2MzctNzk4NzE3MzU5MDItMDY1MDU2NDA2NjQ7MzA5NjE1NDAyODkt
Nzk4OTMxOTg4NDktMTcyMjA4MzE2NDAtNjgyMTA4MzY1MzMtNjg2ODk5MDM0OTQ=
'/*!*/;
### INSERT INTO `sysbench`.`sbtest4`
### SET
###   @1=3075
###   @2=4988
###   @3='44442899864-59646145968-92041885241-44818985422-62296875032-52437515761-23909614800-09394097637-79871735902-06505640664'
###   @4='30961540289-79893198849-17220831640-68210836533-68689903494'
# at 21713413
#151101 10:17:56 server id 1  end_log_pos 21713440      Xid = 219205

The same binlog didn't contain any other prim key 3075 references on the same table.

It seems that the transaction gets committed without updating tablespace header properly.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions