-
Notifications
You must be signed in to change notification settings - Fork 50
Description
The Problem
One of the steps of GracefulMasterTakeover is making the master instance read-only:
orchestrator/go/logic/topology_recovery.go
Lines 2170 to 2173 in 181f94a
| log.Infof("GracefulMasterTakeover: Will set %+v as read_only", clusterMaster.Key) | |
| if clusterMaster, err = inst.SetReadOnly(&clusterMaster.Key, true); err != nil { | |
| return nil, nil, err | |
| } |
If the process fails right after that, for example here, the replicaset will be intact, though master will remain in read-only status.
The Proposed Solution
-
Ensure the following code or similar one is executed before any
return nil, nil, errand after the master is set to read-only:orchestrator/go/logic/topology_recovery.go
Lines 2192 to 2197 in 181f94a
if topologyRecovery.SuccessorKey == nil { // Promotion fails. // Undo setting read-only on original master. inst.SetReadOnly(&clusterMaster.Key, false) return nil, nil, fmt.Errorf("GracefulMasterTakeover: Recovery attempted yet no replica promoted; err=%+v", err) } -
Add
PostUnsuccessfulGracefulTakeoverProcessesconfig entry and execute it if graceful takeover was not successful, similar to other takeover/failover processes. This will allow users to add their own hooks to check the master status and update it if needed.
Could you please suggest which of the two solutions (or both) is better to implement, or propose the other way to work around the issue?