Before proceeding, one should make themselves familiar with custom resource definitions and with the API spec document (in particular with the AerospikeCluster and AerospikeNamespaceBackup custom resource definitions).
In order to benefit from the latest features, improvements and bug fixes made to Aerospike, one will certainly find the need to upgrade an Aerospike cluster to a later version during its lifetime. aerospike-operator
provides first-class support for performing version upgrades in live Aerospike clusters.
Before actually starting an upgrade operation, aerospike-operator
performs a mandatory backup of the Aerospike namespace managed by the target Aerospike cluster. This is done in order to guarantee the safety of the data in case of a major failure during the upgrade process. Hence, and before being able to upgrade an Aerospike cluster, one must configure automatic pre-upgrade backups for the target Aerospike cluster. This is done by making sure that the pre-requisites for the core backup functionality have been met, and by specifying a spec for these backups in the associated AerospikeCluster
resource.
|
Although aerospike-operator performs pre-upgrade backups of the Aerospike namespace managed by the target Aerospike cluster before actually starting the upgrade process, automatic restore of these backups in case of a failure during the upgrade is NOT supported.
|
|
For the remainder of this document, it is assumed that the core backup functionality was adequately configured in one’s Kubernetes cluster by following the steps detailed in the Pre-requisites section of the Backing-up Namespaces document. |
Pre-upgrade backups are configured via the .spec.backupSpec
field of an AerospikeCluster
resource. This field contains a nested storage
field that is similar in structure and semantics to the .spec.storage
field of an AerospikeNamespaceBackup
resource:
apiVersion: aerospike.travelaudience.com/v1alpha2
kind: AerospikeCluster
(...)
spec:
backupSpec:
storage:
type: gcs
bucket: aerospike-backup
secret: gcs-secret
secretNamespace: kubernetes-namespace-0
secretKey: key.json
(...)
ℹ️
|
secretNamespace must be set to the name of the Kubernetes namespace where the secret to be used exists. It is an optional field that defaults to the name of the Kubernetes namespace the AerospikeCluster resource belongs to.
|
ℹ️
|
secretKey must be set to the name of the field inside the secret that contains the credentials to be used. It is also an optional field and defaults to key.json .
|
The .spec.backupSpec
field can be specified either when first creating the AerospikeCluster
resource or at a later time by updating it (e.g. using kubectl edit
). aerospike-operator
will refuse to upgrade an Aerospike cluster for which this field has not been specified [1]:
$ kubectl edit asc as-cluster-0
(...)
error: aerospikeclusters "as-cluster-0" could not be patched: admission webhook "aerospikeclusters.aerospike.travelaudience.com" denied the request: no value for .spec.backupSpec has been specified
ℹ️
|
The .spec.backupSpec field is only required if one intends to perform version upgrades on the target Aerospike cluster. In simpler usage scenarios, such as when creating an Aerospike cluster for testing purposes, this field is not strictly required and can be omitted.
|
In order to minimize the chances of a failed upgrade, aerospike-operator
includes a whitelist of supported and tested Aerospike versions. aerospike-operator
will refuse to upgrade an Aerospike cluster to a version of Aerospike that is not whitelisted. In practice this means that before upgrading an Aerospike cluster to a later version one may need to upgrade aerospike-operator
itself as described in the Upgrading aerospike-operator
document. The current version of aerospike-operator
supports the following Aerospike CE versions:
Future versions of aerospike-operator
will introduce support for new minor, patch and release versions as they become available.
|
At any given time, the availability of a given version of Aerospike is dependent on the existence of the respective tag in the aerospike/aerospike-server official repository.
|
It should be noted that after upgrading an Aerospike cluster to a later version, downgrading is NOT supported. To downgrade to an older version one must create a new AerospikeCluster
resource based on the desired version and restore the managed Aerospike namespace using the pre-upgrade backup created as part of the upgrade process.
The interface for upgrading an Aerospike cluster managed by aerospike-operator
is the AerospikeCluster custom resource definition. To perform an upgrade on a given Aerospike cluster, one must specify the desired target version in the .spec.version
field of the associated AerospikeCluster
resource. Changes in the value of this field will cause aerospike-operator
to perform a rolling upgrade [2] on the associated Aerospike cluster.
|
Maximum service availability during the rolling upgrade process can only be guaranteed when the target Aerospike cluster consists of more than one node (i.e., has a value of .spec.nodeCount greater than one). Similarly, maximum data availability can only be ensured if the managed Aerospike namespace has a replication factor greater than one (i.e. .spec.namespaces[0].replicationFactor is greater than one).
|
|
In order to ensure that the upgrade operation has the least possible impact on service and data availability, aerospike-operator will refuse to perform any configuration or topology changes on an Aerospike cluster while is is being upgraded. This means, for example, that upgrading the cluster to a later version and scaling it up or down at the same time is not supported. To perform both operations, one should first perform the upgrade operation, wait for it to succeed and only them scale the cluster up or down.
|
The upgrade procedure is better understood using an example. For illustration purposes, it is assumed that the following AerospikeCluster
resource has previously been created:
apiVersion: aerospike.travelaudience.com/v1alpha2
kind: AerospikeCluster
metadata:
name: as-cluster-0
namespace: kubernetes-namespace-0
spec:
backupSpec:
storage:
type: gcs
bucket: aerospike-backup
secret: gcs-secret
version: "4.2.0.3"
nodeCount: 2
namespaces:
- name: as-namespace-0
replicationFactor: 2
memorySize: 1G
defaultTTL: 0s
storage:
type: file
size: 1G
At this point, setting .spec.version
to 4.2.0.4
in the as-cluster-0
resource will cause aerospike-operator
to start the upgrade procedure:
$ kubectl -n kubernetes-namespace-0 edit asc as-cluster-0 # .spec.version was set to 4.2.0.4
(...)
aerospikecluster.aerospike.travelaudience.com "as-cluster-0" edited
After a few moments, an AerospikeNamespaceBackup
resource will have been created, and a ClusterAutoBackupStarted
condition will have been appended to the AerospikeCluster
resource:
$ kubectl -n kubernetes-namespace-0 get aerospikenamespacebackups
NAME TARGET CLUSTER TARGET NAMESPACE AGE
as-namespace-0-4203-4203-upgrade as-cluster-0 as-namespace-0 2m
$ kubectl -n kubernetes-namespace-0 describe asc as-cluster-0
(...)
Status:
Conditions:
Last Transition Time: 2018-07-02T16:01:59Z
Message: cluster backup started
Reason: ClusterAutoBackupStarted
Status: True
Type: AutoBackupStarted
(...)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
(...)
Normal ClusterUpgradeStarted 2m aerospikecluster cluster backup started
Depending on the size of the managed Aerospike namespace, it can take from a few minutes to a few hours for this backup to complete. By the time the underlying job are complete, a ClusterAutoBackupFinished
condition will be appended to the AerospikeCluster
resource:
$ kubectl -n kubernetes-namespace-0 describe asc as-cluster-0
(...)
Status:
Conditions:
Last Transition Time: 2018-07-02T16:01:59Z
Message: cluster backup started
Reason: ClusterAutoBackupStarted
Status: True
Type: AutoBackupStarted
Last Transition Time: 2018-07-02T16:05:34Z
Message: cluster backup finished
Reason: ClusterAutoBackupFinished
Status: True
Type: AutoBackupFinished
(...)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
(...)
Normal ClusterUpgradeStarted 1h aerospikecluster cluster backup started
Normal ClusterUpgradeStarted 2m aerospikecluster cluster backup finished
At this point, aerospike-operator
will start working on the upgrade itself, and a ClusterUpgradeStarted
condition will be appended to the AerospikeCluster
resource:
$ kubectl -n kubernetes-namespace-0 describe asc as-cluster-0
(...)
Status:
Conditions:
Last Transition Time: 2018-07-02T16:01:59Z
Message: cluster backup started
Reason: ClusterAutoBackupStarted
Status: True
Type: AutoBackupStarted
Last Transition Time: 2018-07-02T16:05:34Z
Message: cluster backup finished
Reason: ClusterAutoBackupFinished
Status: True
Type: AutoBackupFinished
Last Transition Time: 2018-07-02T16:05:35Z
Message: upgrade from version 4.2.0.3 to 4.2.0.4 started
Reason: ClusterUpgradeStarted
Status: True
Type: UpgradeStarted
(...)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
(...)
Normal ClusterUpgradeStarted 1h aerospikecluster cluster backup started
Normal ClusterUpgradeStarted 2m aerospikecluster cluster backup finished
Normal ClusterUpgradeStarted 2m aerospikecluster upgrade from version 4.2.0.3 to 4.2.0.4 started
As aerospike-operator
progresses through each of the pods, it will report the current state by associating events with the AerospikeCluster
resource. By the time the upgrade procedure finishes, a ClusterUpgradeFinished
condition is appended to the AerospikeCluster
resource:
$ kubectl -n kubernetes-namespace-0 describe asc as-cluster-0
(...)
Status:
Conditions:
Last Transition Time: 2018-07-02T16:01:59Z
Message: cluster backup started
Reason: ClusterAutoBackupStarted
Status: True
Type: AutoBackupStarted
Last Transition Time: 2018-07-02T16:05:34Z
Message: cluster backup finished
Reason: ClusterAutoBackupFinished
Status: True
Type: AutoBackupFinished
Last Transition Time: 2018-07-02T16:05:35Z
Message: upgrade from version 4.2.0.3 to 4.2.0.4 started
Reason: ClusterUpgradeStarted
Status: True
Type: UpgradeStarted
Last Transition Time: 2018-07-02T16:25:43Z
Message: finished upgrade from version 4.2.0.3 to 4.2.0.4
Reason: ClusterUpgradeFinished
Status: True
Type: UpgradeFinished
(...)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
(...)
Normal ClusterUpgradeStarted 2h aerospikecluster cluster backup started
Normal ClusterUpgradeStarted 1h aerospikecluster cluster backup finished
Normal ClusterUpgradeStarted 1h aerospikecluster upgrade from version 4.2.0.3 to 4.2.0.4 started
(...)
Normal ClusterUpgradeFinished 2m aerospikecluster finished upgrade from version 4.2.0.3 to 4.2.0.4
At this point, all the pods that make up the Aerospike cluster will be running the 4.2.0.4
version of Aerospike:
$ kubectl -n kubernetes-namespace-0 logs as-cluster-0-0
Jul 02 2018 16:10:03 GMT: INFO (as): (as.c:319) <><><><><><><><><><> Aerospike Community Edition build 4.2.0.4 <><><><><><><><><><>
(...)
An upgrade operation can fail for a number of reasons, such as the inability to perform the pre-upgrade backup or the inability to start one of the pods running the target version. In the presence of a failure during the upgrade process, aerospike-operator
appends either an AutoBackupFailed
or a ClusterUpgradeFailed
condition to the AerospikeCluster
resource. From that moment on, aerospike-operator
stops processing this Aerospike cluster and manual disaster recovery is required. In such a scenarion, the best approach to proper disaster recovery is to create a new Aerospike cluster and restore the pre-upgrade backup made by aerospike-operator
by following the steps detailed in Restoring Namespaces.