Skip to content

[Docs] Step-by-step tutorial for uni-directional CCR failover #84854

Closed
@Leaf-Lin

Description

@Leaf-Lin

Description

As of writing, ccr does not offer automatic failover. Can we please add the following tutorial for the failover scenario?

The initial setup can be skipped as it's similar to Tutorial: Set up cross-cluster replication. Adding it here for completeness.

Initial setup (uni-directional CCR with DR cluster following Production cluster)

Step1: Create remote clusters on DR and point to production

### On DR cluster ###
PUT /_cluster/settings
{
  "persistent" : {
    "cluster" : {
      "remote" : {
        "production" : {
          "seeds" : [
            "127.0.0.1:9300" 
          ]
        }
      }
    }
  }
}

Step2: Create an index on Production

### On Production cluster ###
PUT /my_index
POST /my_index/_doc/1
{
  "foo":"bar"
}

Step3: Create follower index on DR

### On DR cluster ###
PUT /my_index/_ccr/follow 
{ 
  "remote_cluster" : "production", 
  "leader_index" : "my_index" 
}

Step4: Test follower index on DR

### On DR cluster ###
GET /my_index/search

### This should show up the content created on production (foo/bar)

⚠️ Ingestion should only be written to the Production cluster, all search queries can be directed to either Production or DR clusters.

When Production down:

Step1: On the Client's side, pause ingestion of my_index into Production.

Step2: On the Elasticsearch side, turn the follower indices in the DR into regular indices:

Ensure no writes are occurring on the leader index (if the data centre is down, or cluster is unavailable, no action needed)
On DR: Convert the follower index to a normal index in Elasticsearch (capable of accepting writes)

### On DR cluster ###
POST /my_index/_ccr/pause_follow
POST /my_index/_close           
POST /my_index/_ccr/unfollow    
POST /my_index/_open

Step3: On the Client side, manually re-enable ingestion of my_index to the DR cluster. (You can test that the index should be writable:

### On DR cluster ###
POST my_index/_doc/2
{
  "foo": "new"
}  

⚠️ Make sure all traffic is redirected to the DR cluster during this time.

Once the Production comes back:

Step1: On the clients side, stop writes to my_index on DR cluster.

Step2: Create remote clusters on Production and points to DR

### On Production cluster ###
PUT _cluster/settings
{
  "persistent" : {
    "cluster" : {
      "remote" : {
        "dr" : {
          "seeds" : [
            "127.0.0.2:9300" 
          ]
        }
      }
    }
  }
}

Step3: Create follower indices in Production, connecting them to the leader in DR. The former leader indices in Production have outdated data and will need to be discarded/deleted. Wait for Production follower indices to catch up. Once it is caught up, you can turn the follower indices in Production to regular index again.

### On Production cluster ###
DELETE my_index

### Create follower index on Production to follow from DR cluster
PUT /my_index/_ccr/follow 
{ 
  "remote_cluster" : "dr", 
  "leader_index" : "my_index" 
}

### Wait for my_index to catch up with DR and contain all the documents.
GET my_index/_search

### Stop following from DR to turn my_index into a regular index.
POST /my_index/_ccr/pause_follow
POST /my_index/_close
POST /my_index/_ccr/unfollow
POST /my_index/_open 

Step4: Delete the former DR writeable indices that contain outdated data now. Create follower indices in the DR again to ensure that all changes from Production are streamed to DR. (This is the same as the initial setup)

### On DR cluster ###
DELETE my_index

### Create follower index on `DR` to follow from the `Production` cluster
PUT /my_index/_ccr/follow 
{ 
  "remote_cluster" : "production", 
  "leader_index" : "my_index" 
}

Step5: On the Client side, manually re-enable ingestion to the Production cluster.

⚠️ Ingestion should only be written onto Production, all search queries can be directed to either Production or DR clusters.

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Distributed Indexing/CCRIssues around the Cross Cluster State Replication features>enhancementTeam:Distributed (Obsolete)Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions