HA: Reduce deadlocks via exclusive locking (SELECT ... FOR UPDATE
)
#830
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Within the HA transaction, we always retrieve rows from the icingadb_instance table to determine responsibility between instances. Previously, this selection was performed using a shared lock (
SELECT ... LOCK IN SHARE MODE
). Although selecting rows is generally not an issue, both Icinga DB instances perform an upsert on the same table afterward, resulting in deadlocks most of the time. In order to reduce the deadlocks on both sides, an exclusive lock on the selected rows is necessary, which can be achieved using theSELECT ... FOR UPDATE
command. However, deadlocks can sill occur if theicingadb_instance
table is empty and no rows are available to lock exclusively.MySQL/MariaDB Tests
Empty Table
Note that none of the variants can prevent deadlocks when the table is empty.
START TRANSACTION;
START TRANSACTION;
SELECT id, heartbeat FROM icingadb_instance WHERE environment_id = 1 AND responsible = 'y' AND id <> 23 LOCK IN SHARE MODE;
Empty set (0.001 sec)
SELECT id, heartbeat FROM icingadb_instance WHERE environment_id = 1 AND responsible = 'y' AND id <> 42 LOCK IN SHARE MODE;
Empty set (0.001 sec)
INSERT INTO icingadb_instance VALUES (23, 1, 123, 'y') ON DUPLICATE KEY UPDATE id=VALUES(id), environment_id=VALUES(environment_id), heartbeat=VALUES(heartbeat), responsible=VALUES(responsible);
It was waiting for a lock until TxB was aborted due to the deadlock.
Query OK, 1 row affected (1.668 sec)
INSERT INTO icingadb_instance VALUES (42, 1, 123, 'y') ON DUPLICATE KEY UPDATE id=VALUES(id),environment_id=VALUES(environment_id),heartbeat=VALUES(heartbeat),responsible=VALUES(responsible);
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
Non Empty Table
Before
Result: Same as with an empty table, one Tx will deadlock while the other will succeed, typically the one that starts the upsert statement first.
After
START TRANSACTION;
START TRANSACTION;
SELECT id, heartbeat FROM icingadb_instance WHERE environment_id = 1 AND responsible = 'y' AND id <> 23 FOR UPDATE;
Empty set (0.003 sec)
SELECT id, heartbeat FROM icingadb_instance WHERE environment_id = 1 AND responsible = 'y' AND id <> 42 LOCK FOR UPDATE;
Blocked 🚫 (in waiting state)
INSERT INTO icingadb_instance VALUES (23, 1, 123, 'y') ON DUPLICATE KEY UPDATE id=VALUES(id), environment_id=VALUES(environment_id), heartbeat=VALUES(heartbeat), responsible=VALUES(responsible);
Query OK, 0 rows affected (0.001 sec)
COMMIT;
Query OK, 0 rows affected (0.001 sec)
PostgreSQL Tests
Empty Table
Before
START TRANSACTION ISOLATION LEVEL SERIALIZABLE READ WRITE;
START TRANSACTION ISOLATION LEVEL SERIALIZABLE READ WRITE;
SELECT id, heartbeat FROM icingadb_instance WHERE environment_id = 1 AND responsible = 'y' AND id <> 23;
(0 rows)
SELECT id, heartbeat FROM icingadb_instance WHERE environment_id = 1 AND responsible = 'y' AND id <> 42;
(0 rows)
INSERT INTO icingadb_instance VALUES (23, 1, 123, 'y') ON CONFLICT ON CONSTRAINT icingadb_instance_pkey DO NOTHING;
INSERT 0 1
INSERT INTO icingadb_instance VALUES (42, 1, 123, 'y') ON CONFLICT ON CONSTRAINT icingadb_instance_pkey DO NOTHING;
INSERT 0 1
COMMIT;
Committed successfully
COMMIT;
ERROR: could not serialize access due to
read/write dependencies among transactions
DETAIL: Reason code: Canceled on identification as a pivot, during commit attempt.
HINT: The transaction might succeed if retried.
After
Same result as before! The first transaction is successful, but the second one fails with:
ERROR: could not serialize access due to read/write dependencies among transactions DETAIL: Reason code: Canceled on identification as a pivot, during commit attempt. HINT: The transaction might succeed if retried.
Non Empty Table
Before and After (same result)
START TRANSACTION ISOLATION LEVEL SERIALIZABLE READ WRITE;
START TRANSACTION ISOLATION LEVEL SERIALIZABLE READ WRITE;
SELECT id, heartbeat FROM icingadb_instance WHERE environment_id = 1 AND responsible = 'y' AND id <> 23 FOR UPDATE;
(0 rows)
SELECT id, heartbeat FROM icingadb_instance WHERE environment_id = 1 AND responsible = 'y' AND id <> 42 FOR UPDATE;
(1 row)
INSERT INTO icingadb_instance VALUES (23, 1, 123, 'y') ON CONFLICT ON CONSTRAINT icingadb_instance_pkey DO NOTHING;
INSERT 0 0
INSERT INTO icingadb_instance VALUES (42, 1, 123, 'y') ON CONFLICT ON CONSTRAINT icingadb_instance_pkey DO NOTHING;
INSERT 0 1
COMMIT;
Commit succeeded
COMMIT;
Commit succeeded
PS: this is basically the same as #788 but adds some before and after test cases as requested in #788 (comment).
closes #788