Open
Description
Observed during seesaw test. A node2 was killed and recovered to rejoin the cluster via IST:
2015-11-01 09:17:52 INFO Sleeping for 3 secs
2015-11-01 09:17:55 INFO Sending signal SIGKILL to node2
2015-11-01 09:17:56 INFO Sleeping for 1 secs
2015-11-01 09:17:57 INFO Stopping node2
2015-11-01 09:17:57 INFO Sending signal SIGTERM to node2
2015-11-01 09:17:59 INFO Recovering node2 position
2015-11-01 09:18:03 INFO Reading recovery result
2015-11-01 09:18:03 INFO Start position 080cac6f-8070-11e5-8d16-9fb7d1087a45:219204
2015-11-01 09:18:03 INFO Starting node node2
2015-11-01 09:18:09 ERROR Caught exception:
However, joining failed because of applying error:
151101 10:18:05 [Note] WSREP: Quorum results:
version = 3,
component = PRIMARY,
conf_id = 34,
members = 2/3 (joined/total),
act_id = 221515,
last_appl. = -1,
protocols = 0/7/3 (gcs/repl/appl),
group UUID = 080cac6f-8070-11e5-8d16-9fb7d1087a45
151101 10:18:05 [Note] WSREP: Flow-control interval: [28, 28]
151101 10:18:05 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 221515)
151101 10:18:05 [Note] WSREP: State transfer required:
Group state: 080cac6f-8070-11e5-8d16-9fb7d1087a45:221515
Local state: 080cac6f-8070-11e5-8d16-9fb7d1087a45:219204
151101 10:18:05 [Note] WSREP: New cluster view: global state: 080cac6f-8070-11e5
-8d16-9fb7d1087a45:221515, view# 35: Primary, number of nodes: 3, my index: 1, p
rotocol version 3
151101 10:18:05 [Warning] WSREP: Gap in state sequence. Need state transfer.
151101 10:18:05 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --addres
s '192.168.3.91:10023' --datadir '/home/galera/galera/node2/mysql/var/' --defaul
ts-file '/home/galera/galera/node2/mysql/etc/my.cnf' --parent '9301''
151101 10:18:05 [Warning] WSREP: last inactive check more than PT1.5S ago (PT2.0
0418S), skipping check
151101 10:18:05 [Note] WSREP: Prepared SST request: rsync|192.168.3.91:10023/rsy
nc_sst
151101 10:18:05 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notifica
tion.
151101 10:18:05 [Note] WSREP: REPL Protocols: 7 (3, 2)
151101 10:18:05 [Note] WSREP: Service thread queue flushed.
151101 10:18:05 [Note] WSREP: Assign initial position for certification: 221515,
protocol version: 3
151101 10:18:05 [Note] WSREP: Service thread queue flushed.
151101 10:18:05 [Note] WSREP: Prepared IST receiver, listening at: tcp://192.168
.3.91:10022
151101 10:18:05 [Note] WSREP: Member 1.2 (node2) requested state transfer from '
*any*'. Selected 0.1 (node1)(SYNCED) as donor.
151101 10:18:05 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 221543)
151101 10:18:05 [Note] WSREP: Requesting state transfer: success, donor: 0
151101 10:18:05 [Note] WSREP: 0.1 (node1): State transfer to 1.2 (node2) complet
e.
151101 10:18:05 [Note] WSREP: Member 0.1 (node1) synced with group.
WSREP_SST: [INFO] Joiner cleanup. rsync PID: 9496 (20151101 10:18:06.382)
WSREP_SST: [INFO] Joiner cleanup done. (20151101 10:18:06.901)
151101 10:18:06 [Note] WSREP: SST complete, seqno: 219204
151101 10:18:06 [Warning] No argument was provided to --log-bin, and --log-bin-index was not used; so replication may break when this MySQL server acts as a master and has his hostname changed!! Please use '--log-bin=mysqld-bin' to avoid this problem.
151101 10:18:06 [Note] Plugin 'FEDERATED' is disabled.
151101 10:18:06 InnoDB: The InnoDB memory heap is disabled
151101 10:18:06 InnoDB: Mutexes and rw_locks use GCC atomic builtins
151101 10:18:06 InnoDB: Compressed tables use zlib 1.2.3
151101 10:18:06 InnoDB: Using Linux native AIO
151101 10:18:06 InnoDB: Initializing buffer pool, size = 128.0M
151101 10:18:06 InnoDB: Completed initialization of buffer pool
151101 10:18:06 InnoDB: highest supported file format is Barracuda.
151101 10:18:06 InnoDB: Waiting for the background threads to start
151101 10:18:07 InnoDB: 5.5.46 started; log sequence number 362665593
151101 10:18:08 [Note] Server hostname (bind-address): '0.0.0.0'; port: 3302
151101 10:18:08 [Note] - '0.0.0.0' resolves to '0.0.0.0';
151101 10:18:08 [Note] Server socket created on IP: '0.0.0.0'.
151101 10:18:08 [Warning] 'proxies_priv' entry '@ root@ubuntu14' ignored in --skip-name-resolve mode.
151101 10:18:08 [Note] Event Scheduler: Loaded 0 events
151101 10:18:08 [Note] WSREP: Signalling provider to continue.
151101 10:18:08 [Note] WSREP: SST received: 080cac6f-8070-11e5-8d16-9fb7d1087a45:219204
151101 10:18:08 [Note] WSREP: Receiving IST: 2311 writesets, seqnos 219204-221515
151101 10:18:08 [Note] /home/galera/galera/node2/mysql/bin/mysqld: ready for connections.
Version: '5.5.46-log' socket: '/home/galera/galera/node2/mysql/var/mysqld.sock' port: 3302 MySQL Community Server (GPL), wsrep_25.12
151101 10:18:08 [ERROR] Slave SQL: Could not execute Write_rows event on table sysbench.sbtest4; Duplicate entry '3075' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 1205, Error_code: 1062
151101 10:18:08 [Warning] WSREP: RBR event 6 Write_rows apply warning: 121, 219205
151101 10:18:08 [Warning] WSREP: Failed to apply app buffer: seqno: 219205, status: 1
at /home/jenkins/workspace/sssc-5.5/galera/galera/src/trx_handle.cpp:apply():351
Binlog from other node shows that the key entry 3075 is indeed applied in write set 219205
### UPDATE `sysbench`.`sbtest4`
### WHERE
### @1=5002
### @2=4995
### @3='56360806138-90132390881-15796735454-29532277838-80305007101-1535649073
8-98613121489-26023348076-08247015366-64992154846'
### @4='71015468605-53065053547-14284252033-33559493384-92272010741'
### SET
### @1=5002
### @2=4995
### @3='21689572257-77126353993-95955501546-32645804962-70655432614-7424518978
4-58426193306-79961393837-60209028869-48767204011'
### @4='71015468605-53065053547-14284252033-33559493384-92272010741'
# at 21713138
#151101 10:17:56 server id 1 end_log_pos 21713195 Table_map: `sysbench`.`s
btest4` mapped to number 33
# at 21713195
#151101 10:17:56 server id 1 end_log_pos 21713413 Write_rows: table id 33
flags: STMT_END_F
BINLOG '
tMo1VhMBAAAAOQAAACtRSwEAACEAAAAAAAEACHN5c2JlbmNoAAdzYnRlc3Q0AAQDA/7+BP54/jwA
tMo1VhcBAAAA2gAAAAVSSwEAACEAAAAAAAEABP/wAwwAAHwTAAB3NDQ0NDI4OTk4NjQtNTk2NDYx
NDU5NjgtOTIwNDE4ODUyNDEtNDQ4MTg5ODU0MjItNjIyOTY4NzUwMzItNTI0Mzc1MTU3NjEtMjM5
MDk2MTQ4MDAtMDkzOTQwOTc2MzctNzk4NzE3MzU5MDItMDY1MDU2NDA2NjQ7MzA5NjE1NDAyODkt
Nzk4OTMxOTg4NDktMTcyMjA4MzE2NDAtNjgyMTA4MzY1MzMtNjg2ODk5MDM0OTQ=
'/*!*/;
### INSERT INTO `sysbench`.`sbtest4`
### SET
### @1=3075
### @2=4988
### @3='44442899864-59646145968-92041885241-44818985422-62296875032-52437515761-23909614800-09394097637-79871735902-06505640664'
### @4='30961540289-79893198849-17220831640-68210836533-68689903494'
# at 21713413
#151101 10:17:56 server id 1 end_log_pos 21713440 Xid = 219205
The same binlog didn't contain any other prim key 3075 references on the same table.
It seems that the transaction gets committed without updating tablespace header properly.