Skip to content

Commit 29d1c0f

Browse files
CuriousGeorgiyGerold103
authored andcommitted
box: update synchro quorum in on_commit trigger instead of on_replace
Currently, we update the synchronous replication quorum from the `on_replace` trigger of the `_cluster` space when registering a new replica. However, during the join process, the replica cannot ack its own insertion into the `_cluster` space. In the scope of tarantool#9723, we are going to enable synchronous replication for most of the system spaces, including the `_cluster` space. There are several problems with this: 1. Joining a replica to a 1-member cluster without manual changing of quorum won't work: it is impossible to commit the insertion into the `_cluster` space with only 1 node, since the quorum will equal to 2 right after the insertion. 2. Joining a replica to a 3-member cluster may fail: the quorum will become equal to 3 right after the insertion, the newly joined replica cannot ACK its own insertion into the `_cluster` space — if one out of original 3 nodes fails, then reconfiguration will fail. Generally speaking, it will be impossible to join a new replica to the cluster, if a quorum, which includes the newly added replica (which cannot ACK), cannot be gathered. To solve these problems, let's update the quorum in the `on_commit` trigger. This way we’ll be able to insert a node regardless of the current configuration. This somewhat contradicts with the Raft specification, which requires application of all configuration changes in the `on_replace` trigger (i.e., as soon as they are persisted in the WAL, without quorum confirmation), but still forbids several reconfigurations at the same time. Closes tarantool#10087 NO_DOC=<no special documentation page devoted to cluster reconfiguration>
1 parent 42c4c34 commit 29d1c0f

File tree

4 files changed

+116
-1
lines changed

4 files changed

+116
-1
lines changed
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
## feature/replication
2+
3+
* Now the synchronous replication quorum is updated after the cluster
4+
reconfiguration change is confirmed by a quorum rather than immediately after
5+
persisting the configuration change in the WAL (gh-10087).

src/box/alter.cc

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4424,6 +4424,15 @@ on_replace_cluster_clear_id(struct trigger *trigger, void * /* event */)
44244424
return 0;
44254425
}
44264426

4427+
/** Update the synchronous replication quorum. */
4428+
static int
4429+
on_replace_cluster_update_quorum(struct trigger * /* trigger */,
4430+
void * /* event */)
4431+
{
4432+
box_update_replication_synchro_quorum();
4433+
return 0;
4434+
}
4435+
44274436
/** Replica definition. */
44284437
struct replica_def {
44294438
/** Instance ID. */
@@ -4639,6 +4648,15 @@ on_replace_dd_cluster_insert(const struct replica_def *new_def)
46394648
tt_uuid_str(&replica->uuid));
46404649
return -1;
46414650
}
4651+
/*
4652+
* Update the quorum only after commit. Otherwise the replica would have
4653+
* to ack its own insertion.
4654+
*/
4655+
struct trigger *on_commit = txn_alter_trigger_new(
4656+
on_replace_cluster_update_quorum, NULL);
4657+
if (on_commit == NULL)
4658+
return -1;
4659+
txn_stmt_on_commit(stmt, on_commit);
46424660
struct trigger *on_rollback = txn_alter_trigger_new(
46434661
on_replace_cluster_clear_id, NULL);
46444662
if (on_rollback == NULL)

src/box/replication.cc

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -376,7 +376,6 @@ replica_set_id(struct replica *replica, uint32_t replica_id)
376376
say_info("assigned id %d to replica %s",
377377
replica->id, tt_uuid_str(&replica->uuid));
378378
replica->anon = false;
379-
box_update_replication_synchro_quorum();
380379
box_broadcast_ballot();
381380
}
382381

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
local t = require('luatest')
2+
local replica_set = require('luatest.replica_set')
3+
local server = require('luatest.server')
4+
5+
local g_one_member_cluster = t.group('one_member_cluster')
6+
local g_three_member_cluster = t.group('three_member_cluster')
7+
8+
g_one_member_cluster.before_all(function(cg)
9+
cg.master = server:new{alias = 'master'}
10+
cg.master:start()
11+
-- Make `_cluster` space synchronous.
12+
cg.master:exec(function()
13+
box.ctl.promote()
14+
box.space._cluster:alter{is_sync = true}
15+
end)
16+
end)
17+
18+
-- Test that synchronous insertion into 1-member cluster works properly.
19+
g_one_member_cluster.test_insertion = function(cg)
20+
cg.replica = server:new{alias = 'replica', box_cfg = {
21+
replication = cg.master.net_box_uri,
22+
}}
23+
cg.replica:start()
24+
cg.master:wait_for_downstream_to(cg.replica)
25+
cg.replica:exec(function()
26+
t.assert_not_equals(box.space._cluster:get{box.info.id}, nil)
27+
end)
28+
end
29+
30+
g_one_member_cluster.after_all(function(cg)
31+
cg.master:drop()
32+
if cg.replica ~= nil then
33+
cg.replica:drop()
34+
end
35+
end)
36+
37+
g_three_member_cluster.before_all(function(cg)
38+
cg.replica_set = replica_set:new{}
39+
cg.master = cg.replica_set:build_and_add_server{alias = 'master'}
40+
cg.master:start()
41+
cg.replica_to_be_disabled =
42+
cg.replica_set:build_and_add_server{alias = 'to_be_disabled',
43+
box_cfg = {
44+
replication = {
45+
cg.master.net_box_uri,
46+
server.build_listen_uri('replica', cg.replica_set.id),
47+
},
48+
}}
49+
cg.replica = cg.replica_set:build_and_add_server{alias = 'replica',
50+
box_cfg = {
51+
replication = {
52+
cg.master.net_box_uri,
53+
server.build_listen_uri('to_be_disabled', cg.replica_set.id),
54+
},
55+
}}
56+
cg.replica_set:start()
57+
58+
-- Make `_cluster` space synchronous.
59+
cg.master:exec(function()
60+
box.ctl.promote()
61+
box.space._cluster:alter{is_sync = true}
62+
end)
63+
64+
cg.master:wait_for_downstream_to(cg.replica_to_be_disabled)
65+
cg.master:wait_for_downstream_to(cg.replica)
66+
end)
67+
68+
-- Test that synchronous insertion into 3-member cluster with 1 disabled node
69+
-- works properly.
70+
g_three_member_cluster.test_insertion = function(cg)
71+
cg.replica_to_be_disabled:exec(function()
72+
box.cfg{replication = ''}
73+
end)
74+
cg.replica_to_be_added =
75+
cg.replica_set:build_and_add_server{alias = 'to_be_added',
76+
box_cfg = {
77+
replication = {
78+
cg.master.net_box_uri,
79+
server.build_listen_uri('to_be_disabled', cg.replica_set.id),
80+
server.build_listen_uri('replica', cg.replica_set.id),
81+
},
82+
}}
83+
cg.replica_to_be_added:start()
84+
cg.master:wait_for_downstream_to(cg.replica)
85+
cg.master:wait_for_downstream_to(cg.replica_to_be_added)
86+
cg.replica_to_be_added:exec(function()
87+
t.assert_not_equals(box.space._cluster:get{box.info.id}, nil)
88+
end)
89+
end
90+
91+
g_three_member_cluster.after_all(function(cg)
92+
cg.replica_set:drop()
93+
end)

0 commit comments

Comments
 (0)