Skip to content

Commit f3e1e80

Browse files
authored
Add logical backup (zalando#442)
* Add k8s cron job to spawn logical backups * Minor doc updates
1 parent 2c02b37 commit f3e1e80

24 files changed

+526
-55
lines changed

charts/postgres-operator/templates/clusterrole.yaml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -141,4 +141,15 @@ rules:
141141
- bind
142142
resourceNames:
143143
- {{ template "postgres-operator.fullname" . }}
144+
- apiGroups:
145+
- batch
146+
resources:
147+
- cronjobs # enables logical backups
148+
verbs:
149+
- create
150+
- delete
151+
- get
152+
- list
153+
- patch
154+
- update
144155
{{ end }}

charts/postgres-operator/values.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,9 @@ config:
6262
pod_management_policy: "ordered_ready"
6363
enable_pod_antiaffinity: "false"
6464
pod_antiaffinity_topology_key: "kubernetes.io/hostname"
65+
logical_backup_schedule: "30 00 * * *"
66+
logical_backup_docker_image: "registry.opensource.zalan.do/acid/logical-backup"
67+
logical_backup_s3_bucket: ""
6568
rbac:
6669
# Specifies whether RBAC resources should be created
6770
create: true

docs/administrator.md

Lines changed: 15 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -340,9 +340,18 @@ Postgres database cluster:
340340

341341
## Understanding rolling update of Spilo pods
342342

343-
The operator logs reasons for a rolling update with the `info` level and
344-
a diff between the old and new StatefulSet specs with the `debug` level.
345-
To read the latter log entry with the escaped characters rendered, view it
346-
in CLI with `echo -e`. Note that the resultant message will contain some
347-
noise because the `PodTemplate` used by the operator is yet to be updated
348-
with the default values used internally in Kubernetes.
343+
The operator logs reasons for a rolling update with the `info` level and a diff between the old and new StatefulSet specs with the `debug` level. To benefit from numerous escape characters in the latter log entry, view it in CLI with `echo -e`. Note that the resultant message will contain some noise because the `PodTemplate` used by the operator is yet to be updated with the default values used internally in Kubernetes.
344+
345+
## Logical backups
346+
347+
The operator can manage k8s cron jobs to run logical backups of Postgres clusters. The cron job periodically spawns a batch job that runs a single pod. The backup script within this pod's container can connect to a DB for a logical backup. The operator updates cron jobs during Sync if the job schedule changes; the job name acts as the job identifier. These jobs are to be enabled for each indvidual Postgres cluster by setting `enableLogicalBackup: true` in its manifest. Notes:
348+
349+
1. The provided `registry.opensource.zalan.do/acid/logical-backup` image implements the backup via `pg_dumpall` and upload of (compressed) results to an S3 bucket; `pg_dumpall` requires a `superuser` access to a DB and runs on the replica when possible.
350+
351+
2. Due to the [limitation of Kubernetes cron jobs](https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/#cron-job-limitations) it is highly advisable to set up additional monitoring for this feature; such monitoring is outside of the scope of operator responsibilities.
352+
353+
3. The operator does not remove old backups.
354+
355+
4. You may use your own image by overwriting the relevant field in the operator configuration. Any such image must ensure the logical backup is able to finish [in presence of pod restarts](https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#handling-pod-and-container-failures) and [simultaneous invocations](https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/#cron-job-limitations) of the backup cron job.
356+
357+
5. For that feature to work, your RBAC policy must enable operations on the `cronjobs` resource from the `batch` API group for the operator service account. See [example RBAC](../manifests/operator-service-account-rbac.yaml)

docs/developer.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -203,7 +203,7 @@ localhost:8080 by doing:
203203
The inner 'query' gets the name of the postgres operator pod, and the outer
204204
enables port forwarding. Afterwards, you can access the operator API with:
205205

206-
$ curl http://127.0.0.1:8080/$endpoint| jq .
206+
$ curl --location http://127.0.0.1:8080/$endpoint | jq .
207207

208208
The available endpoints are listed below. Note that the worker ID is an integer
209209
from 0 up to 'workers' - 1 (value configured in the operator configuration and
@@ -323,6 +323,9 @@ be updated. As explained [here](reference/operator_parameters.md), it's possible
323323
to configure the operator either with a ConfigMap or CRD, but currently we aim
324324
to synchronize parameters everywhere.
325325

326+
When choosing a parameter name for a new option in a PG manifest, keep in mind
327+
the naming conventions there. The `snake_case` variables come from the Patroni/Postgres world, while the `camelCase` from the k8s world.
328+
326329
Note: If one option is defined in the operator configuration and in the cluster
327330
[manifest](../manifests/complete-postgres-manifest.yaml), the latter takes
328331
precedence.

docs/reference/cluster_manifest.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,8 @@ measurements. Please, refer to the [Kubernetes
1414
documentation](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/)
1515
for the possible values of those.
1616

17+
:exclamation: If both operator configmap/CRD and a Postgres cluster manifest define the same parameter, the value from the Postgres cluster manifest is applied.
18+
1719
## Manifest structure
1820

1921
A postgres manifest is a `YAML` document. On the top level both individual
@@ -45,7 +47,7 @@ Those parameters are grouped under the `metadata` top-level key.
4547

4648
## Top-level parameters
4749

48-
Those are parameters grouped directly under the `spec` key in the manifest.
50+
These parameters are grouped directly under the `spec` key in the manifest.
4951

5052
* **teamId**
5153
name of the team the cluster belongs to. Changing it after the cluster
@@ -117,6 +119,12 @@ Those are parameters grouped directly under the `spec` key in the manifest.
117119
is `false`, then no volume will be mounted no matter how operator was
118120
configured (so you can override the operator configuration).
119121

122+
* **enableLogicalBackup**
123+
Determines if the logical backup of this cluster should be taken and uploaded to S3. Default: false.
124+
125+
* **logicalBackupSchedule**
126+
Schedule for the logical backup k8s cron job. Please take [the reference schedule format](https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#schedule) into account. Default: "30 00 \* \* \*"
127+
120128
## Postgres parameters
121129

122130
Those parameters are grouped under the `postgresql` top-level key.

docs/reference/operator_parameters.md

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,8 @@ parameters, those parameters have no effect and are replaced by the
5151
`CRD_READY_WAIT_INTERVAL` and `CRD_READY_WAIT_TIMEOUT` environment variables.
5252
They will be deprecated and removed in the future.
5353

54+
For the configmap operator configuration, the [default parameter values](https://github.com/zalando-incubator/postgres-operator/blob/master/pkg/util/config/config.go#L14) mentioned here are likely to be overwritten in your local operator installation via your local version of the operator configmap. In the case you use the operator CRD, all the CRD defaults are provided in the [operator's default configuration manifest](https://github.com/zalando-incubator/postgres-operator/blob/master/manifests/postgresql-operator-default-configuration.yaml)
55+
5456
Variable names are underscore-separated words.
5557

5658

@@ -476,4 +478,16 @@ scalyr sidecar. In the CRD-based configuration they are grouped under the
476478
Memory limit value for the Scalyr sidecar. The default is `1Gi`.
477479

478480

479-
For the configmap operator configuration, the [default parameter values](https://github.com/zalando/postgres-operator/blob/master/pkg/util/config/config.go#L14) mentioned here are likely to be overwritten in your local operator installation via your local version of the operator configmap. In the case you use the operator CRD, all the CRD defaults are provided in the [operator's default configuration manifest](https://github.com/zalando/postgres-operator/blob/master/manifests/postgresql-operator-default-configuration.yaml)
481+
## Logical backup
482+
483+
These parameters configure a k8s cron job managed by the operator to produce Postgres logical backups.
484+
In the CRD-based configuration those parameters are grouped under the `logical_backup` key.
485+
486+
* **logical_backup_schedule**
487+
Backup schedule in the cron format. Please take [the reference schedule format](https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#schedule) into account. Default: "30 00 \* \* \*"
488+
489+
* **logical_backup_docker_image**
490+
Docker image for the pods of the cron job. Must implement backup logic and correctly handle pod and job restarts. The default image runs `pg_dumpall` (on a replica if possible) and uploads compressed results to an S3 bucket under the key `/spilo/pg_cluster_name/cluster_k8s_uuid/logical_backups` Default: "registry.opensource.zalan.do/acid/logical-backup"
491+
492+
* **logical_backup_s3_bucket**
493+
S3 bucket to store backup results. The bucket has to be present and accessible by Postgres pods. Default: empty.

docs/user.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -347,3 +347,11 @@ every 6 hours.
347347
Note that if the statefulset is scaled down before resizing the size changes
348348
are only applied to the volumes attached to the running pods. The size of the
349349
volumes that correspond to the previously running pods is not changed.
350+
351+
## Logical backups
352+
353+
If you add
354+
```
355+
enableLogicalBackup: true
356+
```
357+
to the cluster manifest, the operator will create and sync a k8s cron job to do periodic logical backups of this particular Postgres cluster. Due to the [limitation of Kubernetes cron jobs](https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/#cron-job-limitations) it is highly advisable to set up additional monitoring for this feature; such monitoring is outside of the scope of operator responsibilities. See [configuration reference](reference/cluster_manifest.md) and [administrator documentation](administrator.md) for details on how backups are executed.

manifests/complete-postgres-manifest.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,10 @@ spec:
6464
# cluster: "acid-batman"
6565
# timestamp: "2017-12-19T12:40:33+01:00" # timezone required (offset relative to UTC, see RFC 3339 section 5.6)
6666
# s3_wal_path: "s3://custom/path/to/bucket"
67+
68+
# run periodic backups with k8s cron jobs
69+
# enableLogicalBackup: true
70+
# logicalBackupSchedule: "30 00 * * *"
6771
maintenanceWindows:
6872
- 01:00-06:00 #UTC
6973
- Sat:00:00-04:00

manifests/configmap.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,3 +54,7 @@ data:
5454
resource_check_interval: 3s
5555
resource_check_timeout: 10m
5656
resync_period: 5m
57+
58+
# logical_backup_schedule: "30 00 * * *"
59+
# logical_backup_docker_image: "registry.opensource.zalan.do/acid/logical-backup"
60+
# logical_backup_s3_bucket: ""

manifests/minimal-postgres-manifest.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,6 @@ spec:
1717
# role for application foo
1818
foo_user: []
1919

20-
2120
#databases: name->owner
2221
databases:
2322
foo: zalando

manifests/operator-service-account-rbac.yaml

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -142,7 +142,17 @@ rules:
142142
- bind
143143
resourceNames:
144144
- zalando-postgres-operator
145-
145+
- apiGroups:
146+
- batch
147+
resources:
148+
- cronjobs # enables logical backups
149+
verbs:
150+
- create
151+
- delete
152+
- get
153+
- list
154+
- patch
155+
- update
146156
---
147157
apiVersion: rbac.authorization.k8s.io/v1
148158
kind: ClusterRoleBinding

manifests/postgresql-operator-default-configuration.yaml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,4 +91,7 @@ configuration:
9191
# scalyr_api_key: ""
9292
# scalyr_image: ""
9393
# scalyr_server_url: ""
94-
94+
logical_backup:
95+
logical_backup_schedule: "30 00 * * *"
96+
logical_backup_docker_image: "registry.opensource.zalan.do/acid/logical-backup"
97+
logical_backup_s3_bucket: ""

pkg/apis/acid.zalan.do/v1/operator_configuration_type.go

Lines changed: 26 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -143,25 +143,26 @@ type ScalyrConfiguration struct {
143143

144144
// OperatorConfigurationData defines the operation config
145145
type OperatorConfigurationData struct {
146-
EtcdHost string `json:"etcd_host,omitempty"`
147-
DockerImage string `json:"docker_image,omitempty"`
148-
Workers uint32 `json:"workers,omitempty"`
149-
MinInstances int32 `json:"min_instances,omitempty"`
150-
MaxInstances int32 `json:"max_instances,omitempty"`
151-
ResyncPeriod Duration `json:"resync_period,omitempty"`
152-
RepairPeriod Duration `json:"repair_period,omitempty"`
153-
Sidecars map[string]string `json:"sidecar_docker_images,omitempty"`
154-
PostgresUsersConfiguration PostgresUsersConfiguration `json:"users"`
155-
Kubernetes KubernetesMetaConfiguration `json:"kubernetes"`
156-
PostgresPodResources PostgresPodResourcesDefaults `json:"postgres_pod_resources"`
157-
SetMemoryRequestToLimit bool `json:"set_memory_request_to_limit,omitempty"`
158-
Timeouts OperatorTimeouts `json:"timeouts"`
159-
LoadBalancer LoadBalancerConfiguration `json:"load_balancer"`
160-
AWSGCP AWSGCPConfiguration `json:"aws_or_gcp"`
161-
OperatorDebug OperatorDebugConfiguration `json:"debug"`
162-
TeamsAPI TeamsAPIConfiguration `json:"teams_api"`
163-
LoggingRESTAPI LoggingRESTAPIConfiguration `json:"logging_rest_api"`
164-
Scalyr ScalyrConfiguration `json:"scalyr"`
146+
EtcdHost string `json:"etcd_host,omitempty"`
147+
DockerImage string `json:"docker_image,omitempty"`
148+
Workers uint32 `json:"workers,omitempty"`
149+
MinInstances int32 `json:"min_instances,omitempty"`
150+
MaxInstances int32 `json:"max_instances,omitempty"`
151+
ResyncPeriod Duration `json:"resync_period,omitempty"`
152+
RepairPeriod Duration `json:"repair_period,omitempty"`
153+
Sidecars map[string]string `json:"sidecar_docker_images,omitempty"`
154+
PostgresUsersConfiguration PostgresUsersConfiguration `json:"users"`
155+
Kubernetes KubernetesMetaConfiguration `json:"kubernetes"`
156+
PostgresPodResources PostgresPodResourcesDefaults `json:"postgres_pod_resources"`
157+
SetMemoryRequestToLimit bool `json:"set_memory_request_to_limit,omitempty"`
158+
Timeouts OperatorTimeouts `json:"timeouts"`
159+
LoadBalancer LoadBalancerConfiguration `json:"load_balancer"`
160+
AWSGCP AWSGCPConfiguration `json:"aws_or_gcp"`
161+
OperatorDebug OperatorDebugConfiguration `json:"debug"`
162+
TeamsAPI TeamsAPIConfiguration `json:"teams_api"`
163+
LoggingRESTAPI LoggingRESTAPIConfiguration `json:"logging_rest_api"`
164+
Scalyr ScalyrConfiguration `json:"scalyr"`
165+
LogicalBackup OperatorLogicalBackupConfiguration `json:"logical_backup"`
165166
}
166167

167168
// OperatorConfigurationUsers defines configration for super user
@@ -174,3 +175,9 @@ type OperatorConfigurationUsers struct {
174175

175176
//Duration shortens this frequently used name
176177
type Duration time.Duration
178+
179+
type OperatorLogicalBackupConfiguration struct {
180+
Schedule string `json:"logical_backup_schedule,omitempty"`
181+
DockerImage string `json:"logical_backup_docker_image,omitempty"`
182+
S3Bucket string `json:"logical_backup_s3_bucket,omitempty"`
183+
}

pkg/apis/acid.zalan.do/v1/postgresql_type.go

Lines changed: 14 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ package v1
33
import (
44
"time"
55

6-
"k8s.io/api/core/v1"
6+
v1 "k8s.io/api/core/v1"
77
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
88
)
99

@@ -43,17 +43,19 @@ type PostgresSpec struct {
4343
// load balancers' source ranges are the same for master and replica services
4444
AllowedSourceRanges []string `json:"allowedSourceRanges"`
4545

46-
NumberOfInstances int32 `json:"numberOfInstances"`
47-
Users map[string]UserFlags `json:"users"`
48-
MaintenanceWindows []MaintenanceWindow `json:"maintenanceWindows,omitempty"`
49-
Clone CloneDescription `json:"clone"`
50-
ClusterName string `json:"-"`
51-
Databases map[string]string `json:"databases,omitempty"`
52-
Tolerations []v1.Toleration `json:"tolerations,omitempty"`
53-
Sidecars []Sidecar `json:"sidecars,omitempty"`
54-
InitContainers []v1.Container `json:"init_containers,omitempty"`
55-
PodPriorityClassName string `json:"pod_priority_class_name,omitempty"`
56-
ShmVolume *bool `json:"enableShmVolume,omitempty"`
46+
NumberOfInstances int32 `json:"numberOfInstances"`
47+
Users map[string]UserFlags `json:"users"`
48+
MaintenanceWindows []MaintenanceWindow `json:"maintenanceWindows,omitempty"`
49+
Clone CloneDescription `json:"clone"`
50+
ClusterName string `json:"-"`
51+
Databases map[string]string `json:"databases,omitempty"`
52+
Tolerations []v1.Toleration `json:"tolerations,omitempty"`
53+
Sidecars []Sidecar `json:"sidecars,omitempty"`
54+
InitContainers []v1.Container `json:"init_containers,omitempty"`
55+
PodPriorityClassName string `json:"pod_priority_class_name,omitempty"`
56+
ShmVolume *bool `json:"enableShmVolume,omitempty"`
57+
EnableLogicalBackup bool `json:"enableLogicalBackup,omitempty"`
58+
LogicalBackupSchedule string `json:"logicalBackupSchedule,omitempty"`
5759
}
5860

5961
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object

pkg/apis/acid.zalan.do/v1/zz_generated.deepcopy.go

Lines changed: 17 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)