Skip to content

Latest commit

 

History

History
272 lines (230 loc) · 15.7 KB

40-upgrading-clusters.adoc

File metadata and controls

272 lines (230 loc) · 15.7 KB

Upgrading Clusters

Foreword

Before proceeding, one should make themselves familiar with custom resource definitions and with the API spec document (in particular with the AerospikeCluster and AerospikeNamespaceBackup custom resource definitions).

Upgrading an Aerospike cluster

In order to benefit from the latest features, improvements and bug fixes made to Aerospike, one will certainly find the need to upgrade an Aerospike cluster to a later version during its lifetime. aerospike-operator provides first-class support for performing version upgrades in live Aerospike clusters.

Pre-requisites

Before actually starting an upgrade operation, aerospike-operator performs a mandatory backup of the Aerospike namespace managed by the target Aerospike cluster. This is done in order to guarantee the safety of the data in case of a major failure during the upgrade process. Hence, and before being able to upgrade an Aerospike cluster, one must configure automatic pre-upgrade backups for the target Aerospike cluster. This is done by making sure that the pre-requisites for the core backup functionality have been met, and by specifying a spec for these backups in the associated AerospikeCluster resource.

⚠️
Although aerospike-operator performs pre-upgrade backups of the Aerospike namespace managed by the target Aerospike cluster before actually starting the upgrade process, automatic restore of these backups in case of a failure during the upgrade is NOT supported.
⚠️
For the remainder of this document, it is assumed that the core backup functionality was adequately configured in one’s Kubernetes cluster by following the steps detailed in the Pre-requisites section of the Backing-up Namespaces document.

Pre-upgrade backups are configured via the .spec.backupSpec field of an AerospikeCluster resource. This field contains a nested storage field that is similar in structure and semantics to the .spec.storage field of an AerospikeNamespaceBackup resource:

apiVersion: aerospike.travelaudience.com/v1alpha2
kind: AerospikeCluster
(...)
spec:
  backupSpec:
    storage:
      type: gcs
      bucket: aerospike-backup
      secret: gcs-secret
      secretNamespace: kubernetes-namespace-0
      secretKey: key.json
  (...)
ℹ️
secretNamespace must be set to the name of the Kubernetes namespace where the secret to be used exists. It is an optional field that defaults to the name of the Kubernetes namespace the AerospikeCluster resource belongs to.
ℹ️
secretKey must be set to the name of the field inside the secret that contains the credentials to be used. It is also an optional field and defaults to key.json.

The .spec.backupSpec field can be specified either when first creating the AerospikeCluster resource or at a later time by updating it (e.g. using kubectl edit). aerospike-operator will refuse to upgrade an Aerospike cluster for which this field has not been specified [1]:

$ kubectl edit asc as-cluster-0
(...)
error: aerospikeclusters "as-cluster-0" could not be patched: admission webhook "aerospikeclusters.aerospike.travelaudience.com" denied the request: no value for .spec.backupSpec has been specified
ℹ️
The .spec.backupSpec field is only required if one intends to perform version upgrades on the target Aerospike cluster. In simpler usage scenarios, such as when creating an Aerospike cluster for testing purposes, this field is not strictly required and can be omitted.

Supported versions and upgrades

In order to minimize the chances of a failed upgrade, aerospike-operator includes a whitelist of supported and tested Aerospike versions. aerospike-operator will refuse to upgrade an Aerospike cluster to a version of Aerospike that is not whitelisted. In practice this means that before upgrading an Aerospike cluster to a later version one may need to upgrade aerospike-operator itself as described in the Upgrading aerospike-operator document. The current version of aerospike-operator supports the following Aerospike CE versions:

Future versions of aerospike-operator will introduce support for new minor, patch and release versions as they become available.

⚠️
At any given time, the availability of a given version of Aerospike is dependent on the existence of the respective tag in the aerospike/aerospike-server official repository.

It should be noted that after upgrading an Aerospike cluster to a later version, downgrading is NOT supported. To downgrade to an older version one must create a new AerospikeCluster resource based on the desired version and restore the managed Aerospike namespace using the pre-upgrade backup created as part of the upgrade process.

Performing an upgrade

The interface for upgrading an Aerospike cluster managed by aerospike-operator is the AerospikeCluster custom resource definition. To perform an upgrade on a given Aerospike cluster, one must specify the desired target version in the .spec.version field of the associated AerospikeCluster resource. Changes in the value of this field will cause aerospike-operator to perform a rolling upgrade [2] on the associated Aerospike cluster.

⚠️
Maximum service availability during the rolling upgrade process can only be guaranteed when the target Aerospike cluster consists of more than one node (i.e., has a value of .spec.nodeCount greater than one). Similarly, maximum data availability can only be ensured if the managed Aerospike namespace has a replication factor greater than one (i.e. .spec.namespaces[0].replicationFactor is greater than one).
⚠️
In order to ensure that the upgrade operation has the least possible impact on service and data availability, aerospike-operator will refuse to perform any configuration or topology changes on an Aerospike cluster while is is being upgraded. This means, for example, that upgrading the cluster to a later version and scaling it up or down at the same time is not supported. To perform both operations, one should first perform the upgrade operation, wait for it to succeed and only them scale the cluster up or down.

The upgrade procedure is better understood using an example. For illustration purposes, it is assumed that the following AerospikeCluster resource has previously been created:

apiVersion: aerospike.travelaudience.com/v1alpha2
kind: AerospikeCluster
metadata:
  name: as-cluster-0
  namespace: kubernetes-namespace-0
spec:
  backupSpec:
    storage:
      type: gcs
      bucket: aerospike-backup
      secret: gcs-secret
  version: "4.2.0.3"
  nodeCount: 2
  namespaces:
  - name: as-namespace-0
    replicationFactor: 2
    memorySize: 1G
    defaultTTL: 0s
    storage:
      type: file
      size: 1G

At this point, setting .spec.version to 4.2.0.4 in the as-cluster-0 resource will cause aerospike-operator to start the upgrade procedure:

$ kubectl -n kubernetes-namespace-0 edit asc as-cluster-0  # .spec.version was set to 4.2.0.4
(...)
aerospikecluster.aerospike.travelaudience.com "as-cluster-0" edited

After a few moments, an AerospikeNamespaceBackup resource will have been created, and a ClusterAutoBackupStarted condition will have been appended to the AerospikeCluster resource:

$ kubectl -n kubernetes-namespace-0 get aerospikenamespacebackups
NAME                               TARGET CLUSTER   TARGET NAMESPACE   AGE
as-namespace-0-4203-4203-upgrade   as-cluster-0     as-namespace-0     2m
$ kubectl -n kubernetes-namespace-0 describe asc as-cluster-0
(...)
Status:
  Conditions:
    Last Transition Time:  2018-07-02T16:01:59Z
    Message:               cluster backup started
    Reason:                ClusterAutoBackupStarted
    Status:                True
    Type:                  AutoBackupStarted
(...)
Events:
  Type    Reason                     Age   From              Message
  ----    ------                     ----  ----              -------
(...)
  Normal  ClusterUpgradeStarted      2m    aerospikecluster  cluster backup started

Depending on the size of the managed Aerospike namespace, it can take from a few minutes to a few hours for this backup to complete. By the time the underlying job are complete, a ClusterAutoBackupFinished condition will be appended to the AerospikeCluster resource:

$ kubectl -n kubernetes-namespace-0 describe asc as-cluster-0
(...)
Status:
  Conditions:
    Last Transition Time:  2018-07-02T16:01:59Z
    Message:               cluster backup started
    Reason:                ClusterAutoBackupStarted
    Status:                True
    Type:                  AutoBackupStarted
    Last Transition Time:  2018-07-02T16:05:34Z
    Message:               cluster backup finished
    Reason:                ClusterAutoBackupFinished
    Status:                True
    Type:                  AutoBackupFinished
(...)
Events:
  Type    Reason                     Age   From              Message
  ----    ------                     ----  ----              -------
(...)
  Normal  ClusterUpgradeStarted      1h    aerospikecluster  cluster backup started
  Normal  ClusterUpgradeStarted      2m    aerospikecluster  cluster backup finished

At this point, aerospike-operator will start working on the upgrade itself, and a ClusterUpgradeStarted condition will be appended to the AerospikeCluster resource:

$ kubectl -n kubernetes-namespace-0 describe asc as-cluster-0
(...)
Status:
  Conditions:
    Last Transition Time:  2018-07-02T16:01:59Z
    Message:               cluster backup started
    Reason:                ClusterAutoBackupStarted
    Status:                True
    Type:                  AutoBackupStarted
    Last Transition Time:  2018-07-02T16:05:34Z
    Message:               cluster backup finished
    Reason:                ClusterAutoBackupFinished
    Status:                True
    Type:                  AutoBackupFinished
    Last Transition Time:  2018-07-02T16:05:35Z
    Message:               upgrade from version 4.2.0.3 to 4.2.0.4 started
    Reason:                ClusterUpgradeStarted
    Status:                True
    Type:                  UpgradeStarted
(...)
Events:
  Type    Reason                     Age   From              Message
  ----    ------                     ----  ----              -------
(...)
  Normal  ClusterUpgradeStarted      1h    aerospikecluster  cluster backup started
  Normal  ClusterUpgradeStarted      2m    aerospikecluster  cluster backup finished
  Normal  ClusterUpgradeStarted      2m    aerospikecluster  upgrade from version 4.2.0.3 to 4.2.0.4 started

As aerospike-operator progresses through each of the pods, it will report the current state by associating events with the AerospikeCluster resource. By the time the upgrade procedure finishes, a ClusterUpgradeFinished condition is appended to the AerospikeCluster resource:

$ kubectl -n kubernetes-namespace-0 describe asc as-cluster-0
(...)
Status:
  Conditions:
    Last Transition Time:  2018-07-02T16:01:59Z
    Message:               cluster backup started
    Reason:                ClusterAutoBackupStarted
    Status:                True
    Type:                  AutoBackupStarted
    Last Transition Time:  2018-07-02T16:05:34Z
    Message:               cluster backup finished
    Reason:                ClusterAutoBackupFinished
    Status:                True
    Type:                  AutoBackupFinished
    Last Transition Time:  2018-07-02T16:05:35Z
    Message:               upgrade from version 4.2.0.3 to 4.2.0.4 started
    Reason:                ClusterUpgradeStarted
    Status:                True
    Type:                  UpgradeStarted
    Last Transition Time:  2018-07-02T16:25:43Z
    Message:               finished upgrade from version 4.2.0.3 to 4.2.0.4
    Reason:                ClusterUpgradeFinished
    Status:                True
    Type:                  UpgradeFinished
(...)
Events:
  Type    Reason                     Age   From              Message
  ----    ------                     ----  ----              -------
(...)
  Normal  ClusterUpgradeStarted      2h    aerospikecluster  cluster backup started
  Normal  ClusterUpgradeStarted      1h    aerospikecluster  cluster backup finished
  Normal  ClusterUpgradeStarted      1h    aerospikecluster  upgrade from version 4.2.0.3 to 4.2.0.4 started
(...)
  Normal  ClusterUpgradeFinished     2m    aerospikecluster  finished upgrade from version 4.2.0.3 to 4.2.0.4

At this point, all the pods that make up the Aerospike cluster will be running the 4.2.0.4 version of Aerospike:

$ kubectl -n kubernetes-namespace-0 logs as-cluster-0-0
Jul 02 2018 16:10:03 GMT: INFO (as): (as.c:319) <><><><><><><><><><>  Aerospike Community Edition build 4.2.0.4  <><><><><><><><><><>
(...)

Failed upgrades

An upgrade operation can fail for a number of reasons, such as the inability to perform the pre-upgrade backup or the inability to start one of the pods running the target version. In the presence of a failure during the upgrade process, aerospike-operator appends either an AutoBackupFailed or a ClusterUpgradeFailed condition to the AerospikeCluster resource. From that moment on, aerospike-operator stops processing this Aerospike cluster and manual disaster recovery is required. In such a scenarion, the best approach to proper disaster recovery is to create a new Aerospike cluster and restore the pre-upgrade backup made by aerospike-operator by following the steps detailed in Restoring Namespaces.


1. Assuming that the validating admission webhook has not been disabled.
2. For further details on the upgrade procedure one should refer to the design document.