Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HA: Reduce deadlocks via exclusive locking (SELECT ... FOR UPDATE) #830

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

yhabteab
Copy link
Member

@yhabteab yhabteab commented Oct 25, 2024

Within the HA transaction, we always retrieve rows from the icingadb_instance table to determine responsibility between instances. Previously, this selection was performed using a shared lock (SELECT ... LOCK IN SHARE MODE). Although selecting rows is generally not an issue, both Icinga DB instances perform an upsert on the same table afterward, resulting in deadlocks most of the time. In order to reduce the deadlocks on both sides, an exclusive lock on the selected rows is necessary, which can be achieved using the SELECT ... FOR UPDATE command. However, deadlocks can sill occur if the icingadb_instance table is empty and no rows are available to lock exclusively.

MySQL/MariaDB Tests

MariaDB [test_test]> show create table icingadb_instance\G
*************************** 1. row ***************************
       Table: icingadb_instance
Create Table: CREATE TABLE `icingadb_instance` (
  `id` int(11) NOT NULL,
  `environment_id` int(11) NOT NULL,
  `heartbeat` int(11) NOT NULL,
  `responsible` enum('n','y') NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin ROW_FORMAT=DYNAMIC
Empty Table
MariaDB [test_test]> SELECT * FROM icingadb_instance;
Empty set (0.001 sec)

Note that none of the variants can prevent deadlocks when the table is empty.

Tx A Tx B
START TRANSACTION; START TRANSACTION;
SELECT id, heartbeat FROM icingadb_instance WHERE environment_id = 1 AND responsible = 'y' AND id <> 23 LOCK IN SHARE MODE;
Empty set (0.001 sec)
SELECT id, heartbeat FROM icingadb_instance WHERE environment_id = 1 AND responsible = 'y' AND id <> 42 LOCK IN SHARE MODE;
Empty set (0.001 sec)
INSERT INTO icingadb_instance VALUES (23, 1, 123, 'y') ON DUPLICATE KEY UPDATE id=VALUES(id), environment_id=VALUES(environment_id), heartbeat=VALUES(heartbeat), responsible=VALUES(responsible);

It was waiting for a lock until TxB was aborted due to the deadlock.

Query OK, 1 row affected (1.668 sec)
INSERT INTO icingadb_instance VALUES (42, 1, 123, 'y') ON DUPLICATE KEY UPDATE id=VALUES(id),environment_id=VALUES(environment_id),heartbeat=VALUES(heartbeat),responsible=VALUES(responsible);

ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
MariaDB [test_test]> SELECT * FROM icingadb_instance;
+----+----------------+-----------+-------------+
| id | environment_id | heartbeat | responsible |
+----+----------------+-----------+-------------+
| 23 |              1 |       123 | y           |
+----+----------------+-----------+-------------+
Non Empty Table

Before

MariaDB [test_test]> SELECT * FROM icingadb_instance;
+----+----------------+-----------+-------------+
| id | environment_id | heartbeat | responsible |
+----+----------------+-----------+-------------+
| 23 |              1 |       123 | y           |
+----+----------------+-----------+-------------+

Result: Same as with an empty table, one Tx will deadlock while the other will succeed, typically the one that starts the upsert statement first.

After

MariaDB [test_test]> SELECT * FROM icingadb_instance;
+----+----------------+-----------+-------------+
| id | environment_id | heartbeat | responsible |
+----+----------------+-----------+-------------+
| 23 |              1 |       123 | y           |
+----+----------------+-----------+-------------+
Tx A Tx B
START TRANSACTION; START TRANSACTION;
SELECT id, heartbeat FROM icingadb_instance WHERE environment_id = 1 AND responsible = 'y' AND id <> 23 FOR UPDATE;
Empty set (0.003 sec)
SELECT id, heartbeat FROM icingadb_instance WHERE environment_id = 1 AND responsible = 'y' AND id <> 42 LOCK FOR UPDATE;
Blocked 🚫 (in waiting state)
INSERT INTO icingadb_instance VALUES (23, 1, 123, 'y') ON DUPLICATE KEY UPDATE id=VALUES(id), environment_id=VALUES(environment_id), heartbeat=VALUES(heartbeat), responsible=VALUES(responsible);

Query OK, 0 rows affected (0.001 sec)
The select statement from above is still blocked
COMMIT;
Query OK, 0 rows affected (0.001 sec)
SELECT Query from above succeeds and returned one row

PostgreSQL Tests

CREATE TABLE icingadb_instance(
   id serial,
   environment_id int NOT NULL,
   heartbeat int NOT NULL,
   responsible bool,

   PRIMARY KEY(id)
);
Empty Table
test_test=# SELECT * FROM icingadb_instance;
 id | environment_id | heartbeat | responsible 
----+----------------+-----------+-------------
(0 rows)

Before

Tx A Tx B
START TRANSACTION ISOLATION LEVEL SERIALIZABLE READ WRITE; START TRANSACTION ISOLATION LEVEL SERIALIZABLE READ WRITE;
SELECT id, heartbeat FROM icingadb_instance WHERE environment_id = 1 AND responsible = 'y' AND id <> 23;

(0 rows)
SELECT id, heartbeat FROM icingadb_instance WHERE environment_id = 1 AND responsible = 'y' AND id <> 42;

(0 rows)
INSERT INTO icingadb_instance VALUES (23, 1, 123, 'y') ON CONFLICT ON CONSTRAINT icingadb_instance_pkey DO NOTHING;

INSERT 0 1
INSERT INTO icingadb_instance VALUES (42, 1, 123, 'y') ON CONFLICT ON CONSTRAINT icingadb_instance_pkey DO NOTHING;

INSERT 0 1
COMMIT;

Committed successfully
COMMIT;

ERROR: could not serialize access due to
read/write dependencies among transactions
DETAIL: Reason code: Canceled on identification as a pivot, during commit attempt.
HINT: The transaction might succeed if retried.
test_test=# SELECT * FROM icingadb_instance;
 id | environment_id | heartbeat | responsible 
----+----------------+-----------+-------------
 23 |              1 |       123 | t
(1 row)

After

Same result as before! The first transaction is successful, but the second one fails with:

ERROR:  could not serialize access due to read/write dependencies among transactions
DETAIL:  Reason code: Canceled on identification as a pivot, during commit attempt.
HINT:  The transaction might succeed if retried.
Non Empty Table

Before and After (same result)

Tx A Tx B
START TRANSACTION ISOLATION LEVEL SERIALIZABLE READ WRITE; START TRANSACTION ISOLATION LEVEL SERIALIZABLE READ WRITE;
SELECT id, heartbeat FROM icingadb_instance WHERE environment_id = 1 AND responsible = 'y' AND id <> 23 FOR UPDATE;

(0 rows)
SELECT id, heartbeat FROM icingadb_instance WHERE environment_id = 1 AND responsible = 'y' AND id <> 42 FOR UPDATE;

(1 row)
INSERT INTO icingadb_instance VALUES (23, 1, 123, 'y') ON CONFLICT ON CONSTRAINT icingadb_instance_pkey DO NOTHING;

INSERT 0 0
INSERT INTO icingadb_instance VALUES (42, 1, 123, 'y') ON CONFLICT ON CONSTRAINT icingadb_instance_pkey DO NOTHING;

INSERT 0 1
COMMIT;
Commit succeeded
COMMIT;
Commit succeeded
test_test=# SELECT * FROM icingadb_instance;
 id | environment_id | heartbeat | responsible 
----+----------------+-----------+-------------
 23 |              1 |       123 | t
 42 |              1 |       123 | t
(2 rows)

PS: this is basically the same as #788 but adds some before and after test cases as requested in #788 (comment).

closes #788

Within the HA transaction, we always retrieve rows from the
`icingadb_instance` table to determine responsibility between the
instances. Previously, this selection was performed using a shared lock
(`SELECT ... LOCK IN SHARE MODE`). Although selectin rows is generally
not an issue, both Icinga DB instances perform an upsert on the same
table right after the select statement, resulting in deadlock errors
most of the time. In order to reduce the deadlocks on both sides, an
exclusive lock on the selected rows is necessary, which can be achieved
using the `SELECT ... FOR UDPATE` command. However, deadlocks can still
occur if the `icingadb_instance` table is empty, i.e. when rows are
available to lock exclusively.
@cla-bot cla-bot bot added the cla/signed label Oct 25, 2024
@yhabteab yhabteab added this to the 1.3.0 milestone Oct 25, 2024
@yhabteab yhabteab added enhancement New feature or request area/ha labels Oct 25, 2024
@yhabteab yhabteab requested a review from oxzi October 25, 2024 10:52
Copy link
Member

@lippserd lippserd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be tested with PostgreSQL as well. Anyway, I think we can just remove the driver switch and use FOR UPDATE for MySQL and PostgreSQL.

@yhabteab
Copy link
Member Author

Should be tested with PostgreSQL as well.

See the updated PR description!

Anyway, I think we can just remove the driver switch and use FOR UPDATE for MySQL and PostgreSQL.

Done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants