Skip to content

RFC : Repository Registration for Remote Backed Storage #8623

@psychbot

Description

@psychbot

Problem Statement

OpenSearch with remote backed storage enables storing indexed data to remote data store which guarantees data durability. As of today the user has to register the repository manually by calling PUT /_snapshot/remote-repository and update either the cluster level remote repository settings or index level remote repository settings or both in order to use the remote backed storage feature.

Cluster Settings for Remote Repository -

  • cluster.remote_store.repository
  • cluster.remote_store.translog.repository

IndexSettings for Remote Repository -

  • index.remote_store.segment.repository
  • index.remote_store.translog.repository

Once the user updates these settings then only the indexed data will be backed to remote store which essentially means any index created before this process will not be backed to remote store until we have #7986 built in OpenSearch which allows migrating older indices to remote store.

Due to this manual process in between we will miss on backing up system indices to remote store as all the system indices gets created during the cluster bootstrap.

Requirements

Functional

  • Existing functionality of repository registration should function the same how it functions today.
  • The repositories supplied during the cluster bootstrap should be the first thing to register in order to achieve backing up system indices to remote backed storage.
  • Add support to tag a repository whose some of the fields cannot be altered and the repository cannot be deleted such that repository registered as remote store repository. E.g. restricted : false or restricted : true or remote_store_repository : false or remote_store_repository : true
  • The repository information will be supplied during cluster bootstrap via yml file.

Non-Functional

  • The repository registration during bootstrap should have minimal or no impact in cluster bootstrap time.

Assumptions

  • Its users responsibility to keep repository information on all node in sync.
  • Its users responsibility to not alter or delete repository information in yml file.

Background

OpenSearch has a plugin based architecture which allows developers to build plugins using the interfaces provided by the core and run them as part of the OpenSearch engine. Some of the plugins create system indices and stores information necessary for their functioning during cluster bootstrap.

Remote backed storage in its current state can’t back these system indices which are created during cluster bootstrap and hence we want to support the registration of repositories during cluster bootstrap via yml and register the repositories at the very starting of cluster bootstrap.

[Solution 1] Cluster Settings based approach

In this solution we will be passing the repository information in Opensearch yml and during the cluster bootstrap the active cluster manager will register the repository.

Algorithm

The solution will have the following steps

  1. Supplying repository information and cluster settings - Currently we do not accept repository information via yml file. We will allow supplying repository information via yml and use the same during node bootstrap.
    Below is the format how repository information and cluster settings will be supplied via yml
    "repository_information":
        "my-remote-segment-store":
            "type": "s3"
            "settings": "{\"bucket\": \"my-s3-bucket\",\"base_path\": \"my/snapshot/directory\"}"
            "restricted": true
        "my-remote-translog-store":
            "type": "fs"
            "settings": "{\"location\": \"/mnt/remote\"}"
            "restricted": true
            
    "cluster.remote_store.repository": "my-remote-segment-store"
    "cluster.remote_store.translog.repository": "my-remote-translog-store"
  1. Registering the repository - We want the repository registration to happen instantly when the cluster manager is elected.
    Their are two ways to achieve this -
    a. [Preferred] Cluster State Change Event - Listening to cluster state change event and when the cluster manager is elected the task for registering the repository will be submitted. The ClusterStateListener implementation will be removed once the repository is registered.
    b. Background Thread - A background thread which will keep polling local cluster state periodically and once the the cluster manager is elected the executor will stop.
  2. Registration task should be submitted by one node - In order to achieve this the repository registration logic will be functional only on the active cluster manager. Once the repository is registered it will remove the ClusterStateListener implementation.

RepoRegistrationSequenceDiag

Failure Scenarios

  1. Handling Node/Process Restart- If the node is not active cluster manager, During restart the ClusterStateListener implementation will be added to StateListener during bootstrap and upon first cluster state changed event it will be removed from StateListener as this is not the active cluster manager and repository is already registered.
  2. Handling Node Reboot of Active Cluster Manager (Single Node Cluster) - If the node is active cluster manager, During restart the ClusterStateListener implementation will be added to StateListener during bootstrap and upon first cluster state changed event it will check if the repository information is already present in the cluster state. As the information will be already present the ClusterStateListener implementation will be removed from StateListener.

Migration/Upgrade Scenarios

All the nodes which supports remote backed storage will have a node attribute lets say remote_backed_storage. Below are some of the scenarios -

  1. Remote Store Node sends join request to Non Remote Store cluster - The non remote store cluster manager doesn't have the validator of node attribute and hence will allow the validators to succeed and send a validate join request as the request is from non remote store cluster manager the validator will be skipped allowing the node to join the cluster.
  2. Remote Store Node with incorrect repository Information sends join request to Non Remote Store cluster - The non remote store cluster manager doesn't have the validator of node attribute and hence will allow the validators to succeed and send a validate join request as the request is from non remote store cluster manager the validator will be skipped allowing the node to join the cluster.
  3. Remote Store Node with incorrect repository Information sends join request to Remote Store cluster - A node join request will be sent from data node to cluster manager and both of them will have the node attribute which will allow the validators to succeed, post that a validate join request will be sent from cluster manager to data node and validator checks if the cluster state information is same as the yml information and as the information is different the validator fails leading to node not joining the cluster.
  4. Non Remote Store Node sends join request joining to remote store cluster manager- A node join request from non remote store node will fail as the validator on remote store cluster manager will not get the node attribute from the data node leading to fail the join request.
  5. Remote Store Node sends join request to remote store cluster - A node join request will be sent from data node to cluster manager and both of them will have the node attribute which will allow the validators to succeed, post that a validate join request will be sent from cluster manager to data node and validator checks if the cluster state information is same as the yml information and as the information is same validator passes leading to node joining the cluster.
  6. Remove Conflicting Nodes During Upgrades - During upgrades once the new cluster manager(i.e. remote store cluster manager node) gets elected it will reject the node join request from older nodes which doesn't have the node attributes and yml information isn't matching the cluster state.
  7. Repository Registration During Upgrades - Repository registration will only happen once the active cluster manager which has the repository information gets elected.

Pros

  • As the registration of repository happens when the cluster manager gets elected this will work for single node cluster as well.

Cons

  • Only when the cluster manger node which has the repository information gets elected as the active cluster manager then only the repository will get registered.
  • Cluster settings will be exposed to the customer and can be updated manually.

[Preferred][Solution 2] Node Attribute based approach

In this solution we will pass the information via OpenSearch yml and during the node bootstrap the repository information will be added to the node attributes and during the node join the node attributes will be passed to active cluster manager to register the repository and to perform validation.

Algorithm

  1. Supplying repository information and cluster settings - Currently we do not accept repository information via yml file. We will allow supplying repository information via yml in the form of node attributes and use the same during cluster bootstrap.Below is the format how repository information and cluster settings will be supplied via yml
# Node Attributes
node.attr.remote_store.segment.repository : "my-remote-segment-store"
node.attr.remote_store.repository.my-remote-segment-store.type : "s3"
node.attr.remote_store.repository.my-remote-segment-store.settings :
    bucket : "my-s3-bucket"
    base_path : "my/snapshot/directory"
    system_repository: true
node.attr.remote_store.translog.repository : "my-remote-translog-store"
node.attr.remote_store.repository.my-remote-translog-store.type : "fs"
node.attr.remote_store.repository.my-remote-translog-store.settings :
    location : "/mnt/remote"
    system_repository: true

# Cluster Settings
"cluster.remote_store.repository": "my-remote-segment-store"
"cluster.remote_store.translog.repository": "my-remote-translog-store"
  1. Registering the repository - We want the repository registration to happen instantly when the cluster is formed/forming. When a node tries to join the cluster it will send the repository information to the active cluster manager, the cluster manager will validate the repository information against the repository information in its node attributes and register the same if it matches otherwise reject the node join request.

  2. Registration task should be submitted by one node - In order to achieve this the repository registration logic will be functional only on the active cluster manager. The node joining will send the repository information in node attributes to active cluster manager and it will validate the information to register the repository if not already registered for all the subsequent node join request if the repository is registered the registration logic will be No-Op.
    

Failure Scenarios

  1. Handling Node/Process Restart - If the node is not active cluster manager, During restart the node will send a join request with repository information in node attribute and as the repository will be already registered it will be a No-Op.
  2. Handling Node Restarts of Active Cluster Manager (Single Node Cluster) - Not sure how this will be exaclty handled.

Migration/Upgrade Scenarios

Below are some of the scenarios -

  1. Remote Store Node sends join request to Non Remote Store cluster - The non remote store cluster manager doesn't have the validator of node attribute and hence will allow the validators to succeed and send a validate join request as the request is from non remote store cluster manager the validator will be skipped allowing the node to join the cluster.
  2. Remote Store Node with incorrect repository Information sends join request to Non Remote Store cluster - The non remote store cluster manager doesn't have the validator of node attribute and hence will allow the validators to succeed and send a validate join request as the request is from non remote store cluster manager the validator will be skipped allowing the node to join the cluster.
  3. Remote Store Node with incorrect repository Information sends join request to Remote Store cluster - A node join request will be sent from data node to cluster manager and both of them will have the node attribute which will fail as the information present in the node attributes will be different from whats present on cluster manager node leading to node not joining the cluster.
  4. Non Remote Store Node sends join request joining to remote store cluster manager - A node join request from non remote store node will fail as the validator on remote store cluster manager will not get the node attribute from the data node leading to fail the join request.
  5. Remote Store Node sends join request to remote store cluster - A node join request will be sent from data node to cluster manager and both of them will have the node attribute which will allow the validators to succeed, post that a validate join request will be sent from cluster manager to data node and validator checks if the cluster state information is same as the yml information and as the information is same validator passes leading to node joining the cluster.
  6. Remove Conflicting Nodes During Upgrades - During upgrades once the new cluster manager(i.e. remote store cluster manager node) gets elected it will reject node join request of nodes which doesn't have the matching node attributes.
  7. Repository Registration During Upgrades - Repository registration will happen when a node will try to join a cluster with all the repository information in its node attribute and the active cluster manager will register the repository by reading the same.

Pros

  • This overcomes the limitation of first approach where the repository registration will only happen when the cluster manager which has the repository information in yml gets elected. With this approach once a remote store node joins the cluster with all the information in its node attribute the cluster manager will register the repository during node join.
  • No Cluster settings will be exposed and cannot be updated manually as its a node level attribute.

[Solution 3] Extended Node Attribute based approach

This approach is similar to second approach instead of storing node attributes in the form of key value pair of string to string it will be stored in a string to json serialized object. The other node reading the node attribute will have to deserialize the object to get the information present against the set attribute.
Below is the high level idea of how the information will be stored -

node.attrs.remote_store.repository_information : "JsonSerializedObject@1234"
node.attrs.remote_store.translog.repository_information : "JsonSerializedObject@1234"

Pros

  • Provides better and stronger validation mechanism as the data present in the node attributes will be serialized and if the serialization fails the node join request will be rejected.

Cons

  • If tomorrow we update the object format we will have to think about the backward compatibility and avoid any change which is backward incompatible.
  • Even a minor mistake while serializing the repository information can lead to failure of node joining as node attribute will be incorrect or incompatible.

FAQ

  1. What will happen if there is a partial success during multiple repository registration?
    Will be adding retries on the repositories which were not able to register successfully the first time. If there is consistent failure we will let the cluster changed event kick in and handle the flow again.

Appendix

Migration/Upgrade Scenario

Screenshot 2023-07-26 at 11 46 04 AM Screenshot 2023-07-26 at 11 46 32 AM Screenshot 2023-07-26 at 11 47 04 AM Screenshot 2023-07-26 at 11 47 26 AM

Metadata

Metadata

Assignees

No one assigned

    Labels

    RFCIssues requesting major changesStorageIssues and PRs relating to data and metadata storageStorage:DurabilityIssues and PRs related to the durability frameworkenhancementEnhancement or improvement to existing feature or requestv2.10.0

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions