-
Notifications
You must be signed in to change notification settings - Fork 54
Add master-replica switching support #165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8e03331
to
c88e15d
Compare
c29a8be
to
eb163ac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! Thank you for the patchset. I left comments. I haven't reviewed it all yet, but I plan to keep reviewing and commenting.
5ec056e
to
54f27e7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! Thank you for the patchset. It looks good.
About the commit message title:
- add subsystem (like "test: add master replica switching", "sessions: move inactive...")
About the commit message body:
- add "Follows up #xxx" or "Part of #xxx"
Please check the effect of the changes on the queue performance (#139 (comment))
See comments bellow.
Queue states needs inactive_sessions on both master and replica nodes. Part of #120
b6d7d1b
to
3fd8d24
Compare
t/200-master-replica.t
Outdated
local queue_state = require('queue.abstract.queue_state') | ||
rawset(_G, 'queue', require('queue')) | ||
|
||
-- Replica connection handler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
No.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have some test for "switching replicaset mode" otherwise it will be break periodically.
queue/abstract.lua
Outdated
@@ -601,6 +601,61 @@ function method.create_tube(tube_name, tube_type, opts) | |||
return self | |||
end | |||
|
|||
-- Replicaset mode switch. | |||
local function migrate_queue_taken_space(mode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mode
-> replicaset_mode
(or replicaset_mode_on
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Mark all commits after |
3fd8d24
to
fa612a0
Compare
queue/abstract/queue_state.lua
Outdated
local function max_lag() | ||
local max_lag = 0 | ||
|
||
for i=1, #box.info.replication do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tarantool> box.info.replication
---
- 1:
id: 1
uuid: 84560a97-1cba-4e24-8814-d49fa1f9d6a5
lsn: 1001048
3:
id: 3
uuid: 2d82e625-3a6c-427a-9075-93bfc5c07dd7
lsn: 0
...
tarantool> #box.info.replication
---
- 1
...
tarantool> table.maxn(box.info.replication)
---
- 3
...
Use table.maxn()
for possibly holed arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
fa612a0
to
7181e7d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OMG, I think this set of patches has gone too far). It will be great if it doesn't crash some existing installations.
I leave a few comments and would like to clarify for @Totktonada :In the current implementation, after switching from master to replica, the user will need to re-identify on the replica.
@0x501D great job.
README.md
Outdated
@@ -846,9 +856,6 @@ Driver class must implement the following API: | |||
1. `create_space` - creates the supporting space. The arguments are: | |||
* space name | |||
* space options | |||
|
|||
Driver class should implement the following API: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must be removed in readme: add queue states description
commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
queue/abstract.lua
Outdated
-- When the queue is running in single mode, | ||
-- the space is converted to temporary mode to increase performance. | ||
-- | ||
local function switch_in_replicaset(mode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to use the argument name replicaset_mode
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
queue/abstract.lua
Outdated
@@ -657,11 +719,11 @@ function method.start() | |||
box.space._queue_taken:drop() | |||
end | |||
|
|||
local _taken = box.space._queue_taken_2 | |||
if _taken == nil then | |||
local _mode = queue.cfg['in_replicaset'] or false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to use name replicaset_mode
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
queue/abstract/queue_session.lua
Outdated
-- When the queue is running in single mode, | ||
-- the space is converted to temporary mode to increase performance. | ||
-- | ||
local function switch_in_replicaset(mode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to use the argument name replicaset_mode
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
}) | ||
end | ||
else | ||
box.space._queue_shared_sessions:truncate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't queue_session._on_session_remove(...)
be called for each deleted session?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, fixed.
queue/abstract/queue_session.lua
Outdated
@@ -158,6 +185,10 @@ local function identify(conn_id, session_uuid) | |||
-- Generate new UUID for the session. | |||
cur_uuid = uuid.bin() | |||
queue_session_ids:insert{conn_id, cur_uuid} | |||
local ttr = queue_session.cfg['ttr'] or 0 | |||
if ttr > 0 then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIU we must insert the session into the table _queue_shared_sessions
in any case, regardless of the value of the ttr
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
queue/abstract/queue_session.lua
Outdated
@@ -207,11 +238,21 @@ local function disconnect(conn_id) | |||
{ limit = 1 })[1] | |||
|
|||
-- If a queue session doesn't have any active connections it should be | |||
-- removed (if ttr is absent) or moved to the "inactive sessions" list. | |||
-- removed (if ttr is absent) or moved to the "shared sessions" list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now the session is not moved to the table, it just changes the active flag from true
to false
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
queue/abstract/queue_session.lua
Outdated
local function exist_inactive(session_uuid) | ||
if box.space._queue_inactive_sessions:get{session_uuid} then | ||
local function exist_shared(session_uuid) | ||
if not box.space._queue_shared_sessions then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We create this space ourselves, why can't it exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
31e85eb
to
b005b27
Compare
b005b27
to
4ca6584
Compare
This patch adds ability to use queue in a master-replica scheme. The queue will monitor the operation mode of the tarantool and perform the necessary actions accordingly. This patch adds five states for queue: INIT, STARTUP, RUNNING, ENDING and WAITING. When the tarantool is launched for the first time, the state of the queue is always INIT until box.info.ro is false. States switching scheme: +-----------+ | RUNNING | +-----------+ ^ | (rw -> ro) | v +------+ +---------+ +--------+ | INIT | ---> | STARTUP | | ENDING | +------+ +---------+ +--------+ ^ | (ro -> rw) | v +-----------+ | WAITING | +-----------+ In the STARTUP state, the queue is waiting for possible data synchronization with other cluster members by the time of the largest upstream lag multiplied by two. After that, all taken tasks are released, except for tasks with session uuid matching inactive sessions uuids. This makes possible to take a task, switch roles on the cluster, and release the task within the timeout specified by the queue.cfg({ttr = N}) parameter. Note: all clients that take() and do not ack()/release() tasks must be disconnected before changing the role. And the last step in the STARTUP state is starting tube driver using new method called start(). Each tube driver must implement start() and stop() methods. The start() method should start the driver fibers, if any, in other words, initialize all asynchronous work with the tubes space. The stop() methods shuld stop the driver fibers, if any, respectively. In the RUNNING state, the queue is working as usually. The ENDING state calls stop() method. in the WAITING state, the queue listens for a change in the read_only flag. All states except INIT is controlled by new fiber called 'queue_state_fiber'. A new release_all() method has also been added, which forcibly returns all taken tasks to a ready state. This method can be called per tube. Closes #120
Follows up #120
Follows up #120
Since version 2.4.1, Tarantool has the popen built-in module that supports execution of external programs. popen() uses in tests for creating replica instance. Follows up #120
Running a queue in master-replica mode requires that _queue_inactive_sessions and _queue_taken_2 were not temporary spases. This results in a 20 percent performance loss. The `in_replicaset` option allows you to get the same performance in single mode. Follows up #120
In order for the user to be able to finish servicing the task after the queue switched from replica to master, the `_queue_inactive_session` space was renamed to `_queue_shared_sessions` and had an `active` field in it. When the `take` method is called, the session uuid is automatically added to the `_queue_shared_sessions` space with `exp_time` set to zero and `active` field set to true. When the client closes the session, this entry becomes inactive (`active` = false) and `exp_time` is set. At the start of the queue, we make all active sessions inactive, and set an expiration time for them. If the `ttr` setting is not set, then we delete all sessions. If `ttr` is not set shared sessions is disabled. Follows up #120
In replicaset mode we cannot use temporary tubes. Data on them will not be replicated and the correct operation of the queue will be disrupted. This patch will not allow creation of the temporary tube if `in_replicaset == true` or will not allow to set `in_replicaset = true` if temporary tubes exists. Follows up #120
4ca6584
to
3e2efcd
Compare
This patch adds ability to use queue in a master-replica scheme.
The queue will monitor the operation mode of the tarantool and
perform the necessary actions accordingly.
This patch adds five states for queue: INIT, STARTUP, RUNNING, ENDING
and WAITING. When the tarantool is launched for the first time,
the state of the queue is always INIT until box.info.ro is false.
States switching scheme:
In the STARTUP state, the queue is waiting for possible data synchronization
with other cluster members by the time of the largest upstream lag multiplied
by two. After that, all taken tasks are released, except for tasks with
session uuid matching inactive sessions uuids. This makes possible to take
a task, switch roles on the cluster, and release the task within the timeout
specified by the queue.cfg({ttr = N}) parameter. Note: all clients that take()
and do not ack()/release() tasks must be disconnected before changing the role.
And the last step in the STARTUP state is starting tube driver using new
method called start(). Each tube driver must implement start() and stop()
methods. The start() method should start the driver fibers, if any, in other
words, initialize all asynchronous work with the tubes space. The stop()
methods shuld stop the driver fibers, if any, respectively.
In the RUNNING state, the queue is working as usually. The ENDING state calls
stop() method. in the WAITING state, the queue listens for a change in the
read_only flag.
All states except INIT is controlled by new fiber called 'queue_state_fiber'.
A new release_all() method has also been added, which forcibly returns all
taken tasks to a ready state. This method can be called per tube.
Closes #120