Skip to content

GracefulMasterTakeover does not set master back to writable state in case of an error  #43

@o-fedorov

Description

@o-fedorov

The Problem

One of the steps of GracefulMasterTakeover is making the master instance read-only:

log.Infof("GracefulMasterTakeover: Will set %+v as read_only", clusterMaster.Key)
if clusterMaster, err = inst.SetReadOnly(&clusterMaster.Key, true); err != nil {
return nil, nil, err
}

If the process fails right after that, for example here, the replicaset will be intact, though master will remain in read-only status.

The Proposed Solution

  1. Ensure the following code or similar one is executed before any return nil, nil, err and after the master is set to read-only:

    if topologyRecovery.SuccessorKey == nil {
    // Promotion fails.
    // Undo setting read-only on original master.
    inst.SetReadOnly(&clusterMaster.Key, false)
    return nil, nil, fmt.Errorf("GracefulMasterTakeover: Recovery attempted yet no replica promoted; err=%+v", err)
    }

  2. Add PostUnsuccessfulGracefulTakeoverProcesses config entry and execute it if graceful takeover was not successful, similar to other takeover/failover processes. This will allow users to add their own hooks to check the master status and update it if needed.


Could you please suggest which of the two solutions (or both) is better to implement, or propose the other way to work around the issue?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions