How is it useful? Simply add an annotation to your PersistentVolume
or
PersistentVolumeClaim
resources, and let this tool create and expire snapshots
according to your specifications.
Supported Environments:
- Google Compute Engine disks.
- AWS EBS disks.
Want to help adding support for other backends? It's pretty straightforward. Have a look at the API that backends need to implement.
Let's run k8s-snapshots in your cluster:
cat <<EOF | kubectl create -f -
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: k8s-snapshots
namespace: kube-system
spec:
replicas: 1
template:
metadata:
labels:
app: k8s-snapshots
spec:
containers:
- name: k8s-snapshots
image: elsdoerfer/k8s-snapshots:latest
EOF
Add a backup annotation to one of your persistent volumes:
kubectl patch pv pvc-01f74065-8fe9-11e6-abdd-42010af00148 -p \
'{"metadata": {"annotations": {"backup.kubernetes.io/deltas": "P1D P30D P360D"}}}'
k8s-snapshots will now run in your cluster, and per the deltas given, it will create a daily snapshot of the volume. It will keep 30 daily snapshots, and then for one year it will keep a monthly snapshot. If the daemon is not running for a while, it will still try to approximate your desired snapshot scheme as closely as possible.
k8s-snapshots need EBS and S3 permissions to take and save snapshots. Under the kops IAM Role scheme, only Masters have these permissions. The easiest solution is to run k8s-snapshots on Masters.
To run on a Master, we need to:
To do this, add the following to the above manifest for the k8s-snapshots Deployment:
spec:
...
template:
...
spec:
...
tolerations:
- key: "node-role.kubernetes.io/master"
operator: "Equal"
value: ""
effect: "NoSchedule"
nodeSelector:
kubernetes.io/role: master
The expiry logic of tarsnapper is used.
The generations are defined by a list of deltas formatted as ISO 8601
durations (this differs from
tarsnapper). PT60S
or PT1M
means a minute, PT12H
or P0.5D
is
half a day, P1W
or P7D
is a week. The number of backups in each
generation is implied by it's and the parent generation's delta.
For example, given the deltas PT1H P1D P7D
, the first generation will
consist of 24 backups each one hour older than the previous
(or the closest approximation possible given the available backups),
the second generation of 7 backups each one day older than the previous,
and backups older than 7 days will be discarded for good.
The most recent backup is always kept.
The first delta is the backup interval.
Use the example deployment file given above to start off. If you run the daemon on a Google Container Engine cluster, it should already have access to all the resources it needs.
However, depending on your configuration, you may need to assign the correct RBAC rules, or give it access to a Google Cloud identity that has permissions to create snapshots. See below for more on the available configuration options.
To backup a volume, you should add an annotation with the name
backup.kubernetes.io/deltas
to either your PersistentVolume
or
PersistentVolumeClaim
resources.
Since PersistentVolumes
are often created automatically for you
by Kubernetes, you may want to annotate the volume claim in your
resource definition file. Alteratively, you can kubectl edit pv
a PersistentVolume
created by Kubernetes and add the annotation.
The value of the annotation are a set of deltas that define how often a snapshot is created, and how many snapshots should be kept. See the section above for more information on how deltas work.
In the end, your annotation may look like this:
backup.kubernetes.io/deltas: PT1H P2D P30D P180D
There is also the option of manually specifying the volume names to be backed up as options to the k8s-snapshots daemon. See below for more information.
If there are no default credentials to Kubernetes and the Cloud snapshot API, or the default credentials do not have the required access scope, you may need to configure these.
CLOUD_PROVIDER | Set to 'google' to use gcloud exclusively. Can be detected based on volume spec gcePersistentDisk. |
GCLOUD_PROJECT | Name of the Google Cloud project. This is required to use the Google Cloud API, but if it's not given, we try to read the value from the [instance metadata service](https://cloud.google.com/compute/docs/storing-retrieving-metadata) which will usually work. |
GCLOUD_CREDENTIALS_FILE | Filename to the JSON gcloud credentials file used to authenticate. You'll want to mount it into the container. By default set to here for for PyKube: ~/.config/gcloud/application_default_credentials.json PyKube doesn't use env to locate the config but GOOGLE_APPLICATION_CREDENTIALS takes precedence. |
GOOGLE_APPLICATION_CREDENTIALS | The contents of the JSON keyfile that is used to authenticate. |
KUBE_CONFIG_FILE | Authentification with the Kubernetes API. By default, the pod service account is used. |
When using a service account with a custom role to access the Google Cloud API, the following permissions are required:
compute.disks.createSnapshot
compute.snapshots.create
compute.snapshots.delete
compute.snapshots.get
compute.snapshots.list
compute.snapshots.setLabels
compute.zoneOperations.get
AWS will need the following role
If there are no default credentials to the Cloud API, or the default credentials do not have the required access scope, you may need to configure these environment variables.
AWS_ACCESS_KEY_ID | AWS IAM Access Key ID that is used to authenticate. |
AWS_SECRET_ACCESS_KEY | AWS IAM Secret Access Key that is used to authenticate. |
AWS_REGION | The region is usually detected via the meta data service. You can override the value. |
In Kubernetes clusters with RBAC, the required permissions need to be provided to the k8s-snapshots
pods to watch and list persistentvolume
or persistentvolumeclaims
. We provide a manifest to setup a ServiceAccount
with a minimal set of permissions in rbac.yaml.
kubectl apply -f manifests/rbac.yaml
Furthermore, under GKE, "Because of the way Container Engine checks permissions when you create a Role or ClusterRole, you must first create a RoleBinding that grants you all of the permissions included in the role you want to create."
If the above kubectl apply command produces an error about "attempt to grant extra privileges", the following will grant your user the necessary privileges first, so that you can then bind them to the service account:
kubectl create clusterrolebinding your-user-cluster-admin-binding --clusterrole=cluster-admin --user=your.google.cloud.email@example.org
Finally, adjust the deployment by adding serviceAccountName: k8s-snapshots
to the spec (else you'll end up using the "default" service account), as follows:
<snip>
spec:
serviceAccountName: k8s-snapshots
containers:
- name: k8s-snapshots
image: elsdoerfer/k8s-snapshots:v2.0
</snip>
PING_URL | We'll send a GET request to this url whenever a backup completes. This is useful for integrating with monitoring services like Cronitor or Dead Man's Snitch. |
If your persistent volumes are auto-provisioned by Kubernetes, then
you'll end up with snapshot names such as
pv-pvc-01f74065-8fe9-11e6-abdd-42010af00148
. If you want that
prettier, set the enviroment variable USE_CLAIM_NAME=true
. Instead
of the auto-generated name of the persistent volume, k8s-snapshots
will instead use the name that you give to your
PersistentVolumeClaim
.
It's possible to ask k8s-snapshots to create snapshots of volumes
for which no PersistentVolume
object exists within the Kubernetes
cluster. For example, you might have a volume at your Cloud provider
that you use within Kubernetes by referencing it directly.
To do this, we use a custom Kubernetes resource, SnapshotRule
.
First, you need to create this custom resource.
On Kubernetes 1.7 and higher:
cat <<EOF | kubectl create -f -
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: snapshotrules.k8s-snapshots.elsdoerfer.com
spec:
group: k8s-snapshots.elsdoerfer.com
version: v1
scope: Namespaced
names:
plural: snapshotrules
singular: snapshotrule
kind: SnapshotRule
shortNames:
- sr
EOF
Or on Kubernetes 1.6 and lower:
cat <<EOF | kubectl create -f -
apiVersion: extensions/v1beta1
kind: ThirdPartyResource
metadata:
name: snapshot-rule.k8s-snapshots.elsdoerfer.com
description: "Defines snapshot management rules for a disk."
versions:
- name: v1
EOF
You can then create SnapshotRule
resources:
cat <<EOF | kubectl apply -f -
apiVersion: "k8s-snapshots.elsdoerfer.com/v1"
kind: SnapshotRule
metadata:
name: mysql
spec:
deltas: P1D P30D
backend: aws
disk:
region: eu-west-1
volumeId: vol-0aa6f44aad0daf9f2
EOF
This is an example for backing up an EBS disk on the Amazon cloud. The
disk
option requires different keys, depending on the backend. See
the examples folder.
You may also point SnapshotRule
resources to PersistentVolumes (or
PersistentVolumeClaims). This is intended as an alternative to adding
an annotation; it may be desirable for some to separate the snapshot
functionality from the resource.
cat <<EOF | kubectl apply -f -
apiVersion: "k8s-snapshots.elsdoerfer.com/v1"
kind: SnapshotRule
metadata:
name: mysql
spec:
deltas: P1D P30D
persistentVolumeClaim: datadir-mysql
EOF
After setting up the custom resource definitions (see previous section), use
snapshot rules as defined in the examples/backup-kops-etcd.yml
file. Reference
the volume ids of your etcd volumes.
LOG_LEVEL | **Default: INFO**. Possible values: DEBUG, INFO, WARNING, ERROR |
JSON_LOG | **Default: False**. Output the log messages as JSON objects for easier processing. |
TZ | **Default: UTC**. Used to change the timezone. ie. TZ=America/Montreal |
What if I manually create snapshots for the same volumes that k8s-snapshots manages?
Starting with v0.3, when k8s-snapshots decides when to create the next snapshot, and which snapshots it deletes, it no longer considers snapshots that are not correctly labeled by it.
For local development, you can still connect to an existing Google Cloud Project and Kubernetes cluster using the config options available. If you are lucky, your local workstation is already setup the way you need it. If we can find credentials for Google Cloud or Kubernetes, they will be used automatically.
However, depending on the backend, you need to provide some options that otherwise would be read from the instance metadata:
For AWS:
$ AWS_REGION=eu-west-1 python -m k8s_snapshots
For Google Cloud:
$ GCLOUD_PROJECT=revolving-randy python -m k8s_snapshots