Skip to content

Commit

Permalink
Add logical backup (zalando#442)
Browse files Browse the repository at this point in the history
* Add k8s cron job to spawn logical backups

* Minor doc updates
  • Loading branch information
sdudoladov committed May 16, 2019
1 parent 2c02b37 commit f3e1e80
Show file tree
Hide file tree
Showing 24 changed files with 526 additions and 55 deletions.
11 changes: 11 additions & 0 deletions charts/postgres-operator/templates/clusterrole.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -141,4 +141,15 @@ rules:
- bind
resourceNames:
- {{ template "postgres-operator.fullname" . }}
- apiGroups:
- batch
resources:
- cronjobs # enables logical backups
verbs:
- create
- delete
- get
- list
- patch
- update
{{ end }}
3 changes: 3 additions & 0 deletions charts/postgres-operator/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,9 @@ config:
pod_management_policy: "ordered_ready"
enable_pod_antiaffinity: "false"
pod_antiaffinity_topology_key: "kubernetes.io/hostname"
logical_backup_schedule: "30 00 * * *"
logical_backup_docker_image: "registry.opensource.zalan.do/acid/logical-backup"
logical_backup_s3_bucket: ""
rbac:
# Specifies whether RBAC resources should be created
create: true
Expand Down
21 changes: 15 additions & 6 deletions docs/administrator.md
Original file line number Diff line number Diff line change
Expand Up @@ -340,9 +340,18 @@ Postgres database cluster:

## Understanding rolling update of Spilo pods

The operator logs reasons for a rolling update with the `info` level and
a diff between the old and new StatefulSet specs with the `debug` level.
To read the latter log entry with the escaped characters rendered, view it
in CLI with `echo -e`. Note that the resultant message will contain some
noise because the `PodTemplate` used by the operator is yet to be updated
with the default values used internally in Kubernetes.
The operator logs reasons for a rolling update with the `info` level and a diff between the old and new StatefulSet specs with the `debug` level. To benefit from numerous escape characters in the latter log entry, view it in CLI with `echo -e`. Note that the resultant message will contain some noise because the `PodTemplate` used by the operator is yet to be updated with the default values used internally in Kubernetes.

## Logical backups

The operator can manage k8s cron jobs to run logical backups of Postgres clusters. The cron job periodically spawns a batch job that runs a single pod. The backup script within this pod's container can connect to a DB for a logical backup. The operator updates cron jobs during Sync if the job schedule changes; the job name acts as the job identifier. These jobs are to be enabled for each indvidual Postgres cluster by setting `enableLogicalBackup: true` in its manifest. Notes:

1. The provided `registry.opensource.zalan.do/acid/logical-backup` image implements the backup via `pg_dumpall` and upload of (compressed) results to an S3 bucket; `pg_dumpall` requires a `superuser` access to a DB and runs on the replica when possible.

2. Due to the [limitation of Kubernetes cron jobs](https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/#cron-job-limitations) it is highly advisable to set up additional monitoring for this feature; such monitoring is outside of the scope of operator responsibilities.

3. The operator does not remove old backups.

4. You may use your own image by overwriting the relevant field in the operator configuration. Any such image must ensure the logical backup is able to finish [in presence of pod restarts](https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#handling-pod-and-container-failures) and [simultaneous invocations](https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/#cron-job-limitations) of the backup cron job.

5. For that feature to work, your RBAC policy must enable operations on the `cronjobs` resource from the `batch` API group for the operator service account. See [example RBAC](../manifests/operator-service-account-rbac.yaml)
5 changes: 4 additions & 1 deletion docs/developer.md
Original file line number Diff line number Diff line change
Expand Up @@ -203,7 +203,7 @@ localhost:8080 by doing:
The inner 'query' gets the name of the postgres operator pod, and the outer
enables port forwarding. Afterwards, you can access the operator API with:

$ curl http://127.0.0.1:8080/$endpoint| jq .
$ curl --location http://127.0.0.1:8080/$endpoint | jq .

The available endpoints are listed below. Note that the worker ID is an integer
from 0 up to 'workers' - 1 (value configured in the operator configuration and
Expand Down Expand Up @@ -323,6 +323,9 @@ be updated. As explained [here](reference/operator_parameters.md), it's possible
to configure the operator either with a ConfigMap or CRD, but currently we aim
to synchronize parameters everywhere.

When choosing a parameter name for a new option in a PG manifest, keep in mind
the naming conventions there. The `snake_case` variables come from the Patroni/Postgres world, while the `camelCase` from the k8s world.

Note: If one option is defined in the operator configuration and in the cluster
[manifest](../manifests/complete-postgres-manifest.yaml), the latter takes
precedence.
Expand Down
10 changes: 9 additions & 1 deletion docs/reference/cluster_manifest.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ measurements. Please, refer to the [Kubernetes
documentation](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/)
for the possible values of those.

:exclamation: If both operator configmap/CRD and a Postgres cluster manifest define the same parameter, the value from the Postgres cluster manifest is applied.

## Manifest structure

A postgres manifest is a `YAML` document. On the top level both individual
Expand Down Expand Up @@ -45,7 +47,7 @@ Those parameters are grouped under the `metadata` top-level key.

## Top-level parameters

Those are parameters grouped directly under the `spec` key in the manifest.
These parameters are grouped directly under the `spec` key in the manifest.

* **teamId**
name of the team the cluster belongs to. Changing it after the cluster
Expand Down Expand Up @@ -117,6 +119,12 @@ Those are parameters grouped directly under the `spec` key in the manifest.
is `false`, then no volume will be mounted no matter how operator was
configured (so you can override the operator configuration).

* **enableLogicalBackup**
Determines if the logical backup of this cluster should be taken and uploaded to S3. Default: false.

* **logicalBackupSchedule**
Schedule for the logical backup k8s cron job. Please take [the reference schedule format](https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#schedule) into account. Default: "30 00 \* \* \*"

## Postgres parameters

Those parameters are grouped under the `postgresql` top-level key.
Expand Down
16 changes: 15 additions & 1 deletion docs/reference/operator_parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,8 @@ parameters, those parameters have no effect and are replaced by the
`CRD_READY_WAIT_INTERVAL` and `CRD_READY_WAIT_TIMEOUT` environment variables.
They will be deprecated and removed in the future.

For the configmap operator configuration, the [default parameter values](https://github.com/zalando-incubator/postgres-operator/blob/master/pkg/util/config/config.go#L14) mentioned here are likely to be overwritten in your local operator installation via your local version of the operator configmap. In the case you use the operator CRD, all the CRD defaults are provided in the [operator's default configuration manifest](https://github.com/zalando-incubator/postgres-operator/blob/master/manifests/postgresql-operator-default-configuration.yaml)
Variable names are underscore-separated words.
Expand Down Expand Up @@ -476,4 +478,16 @@ scalyr sidecar. In the CRD-based configuration they are grouped under the
Memory limit value for the Scalyr sidecar. The default is `1Gi`.
For the configmap operator configuration, the [default parameter values](https://github.com/zalando/postgres-operator/blob/master/pkg/util/config/config.go#L14) mentioned here are likely to be overwritten in your local operator installation via your local version of the operator configmap. In the case you use the operator CRD, all the CRD defaults are provided in the [operator's default configuration manifest](https://github.com/zalando/postgres-operator/blob/master/manifests/postgresql-operator-default-configuration.yaml)
## Logical backup
These parameters configure a k8s cron job managed by the operator to produce Postgres logical backups.
In the CRD-based configuration those parameters are grouped under the `logical_backup` key.
* **logical_backup_schedule**
Backup schedule in the cron format. Please take [the reference schedule format](https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#schedule) into account. Default: "30 00 \* \* \*"
* **logical_backup_docker_image**
Docker image for the pods of the cron job. Must implement backup logic and correctly handle pod and job restarts. The default image runs `pg_dumpall` (on a replica if possible) and uploads compressed results to an S3 bucket under the key `/spilo/pg_cluster_name/cluster_k8s_uuid/logical_backups` Default: "registry.opensource.zalan.do/acid/logical-backup"
* **logical_backup_s3_bucket**
S3 bucket to store backup results. The bucket has to be present and accessible by Postgres pods. Default: empty.
8 changes: 8 additions & 0 deletions docs/user.md
Original file line number Diff line number Diff line change
Expand Up @@ -347,3 +347,11 @@ every 6 hours.
Note that if the statefulset is scaled down before resizing the size changes
are only applied to the volumes attached to the running pods. The size of the
volumes that correspond to the previously running pods is not changed.

## Logical backups

If you add
```
enableLogicalBackup: true
```
to the cluster manifest, the operator will create and sync a k8s cron job to do periodic logical backups of this particular Postgres cluster. Due to the [limitation of Kubernetes cron jobs](https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/#cron-job-limitations) it is highly advisable to set up additional monitoring for this feature; such monitoring is outside of the scope of operator responsibilities. See [configuration reference](reference/cluster_manifest.md) and [administrator documentation](administrator.md) for details on how backups are executed.
4 changes: 4 additions & 0 deletions manifests/complete-postgres-manifest.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,10 @@ spec:
# cluster: "acid-batman"
# timestamp: "2017-12-19T12:40:33+01:00" # timezone required (offset relative to UTC, see RFC 3339 section 5.6)
# s3_wal_path: "s3://custom/path/to/bucket"

# run periodic backups with k8s cron jobs
# enableLogicalBackup: true
# logicalBackupSchedule: "30 00 * * *"
maintenanceWindows:
- 01:00-06:00 #UTC
- Sat:00:00-04:00
4 changes: 4 additions & 0 deletions manifests/configmap.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -54,3 +54,7 @@ data:
resource_check_interval: 3s
resource_check_timeout: 10m
resync_period: 5m

# logical_backup_schedule: "30 00 * * *"
# logical_backup_docker_image: "registry.opensource.zalan.do/acid/logical-backup"
# logical_backup_s3_bucket: ""
1 change: 0 additions & 1 deletion manifests/minimal-postgres-manifest.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@ spec:
# role for application foo
foo_user: []


#databases: name->owner
databases:
foo: zalando
Expand Down
12 changes: 11 additions & 1 deletion manifests/operator-service-account-rbac.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,17 @@ rules:
- bind
resourceNames:
- zalando-postgres-operator

- apiGroups:
- batch
resources:
- cronjobs # enables logical backups
verbs:
- create
- delete
- get
- list
- patch
- update
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
Expand Down
5 changes: 4 additions & 1 deletion manifests/postgresql-operator-default-configuration.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -91,4 +91,7 @@ configuration:
# scalyr_api_key: ""
# scalyr_image: ""
# scalyr_server_url: ""

logical_backup:
logical_backup_schedule: "30 00 * * *"
logical_backup_docker_image: "registry.opensource.zalan.do/acid/logical-backup"
logical_backup_s3_bucket: ""
45 changes: 26 additions & 19 deletions pkg/apis/acid.zalan.do/v1/operator_configuration_type.go
Original file line number Diff line number Diff line change
Expand Up @@ -143,25 +143,26 @@ type ScalyrConfiguration struct {

// OperatorConfigurationData defines the operation config
type OperatorConfigurationData struct {
EtcdHost string `json:"etcd_host,omitempty"`
DockerImage string `json:"docker_image,omitempty"`
Workers uint32 `json:"workers,omitempty"`
MinInstances int32 `json:"min_instances,omitempty"`
MaxInstances int32 `json:"max_instances,omitempty"`
ResyncPeriod Duration `json:"resync_period,omitempty"`
RepairPeriod Duration `json:"repair_period,omitempty"`
Sidecars map[string]string `json:"sidecar_docker_images,omitempty"`
PostgresUsersConfiguration PostgresUsersConfiguration `json:"users"`
Kubernetes KubernetesMetaConfiguration `json:"kubernetes"`
PostgresPodResources PostgresPodResourcesDefaults `json:"postgres_pod_resources"`
SetMemoryRequestToLimit bool `json:"set_memory_request_to_limit,omitempty"`
Timeouts OperatorTimeouts `json:"timeouts"`
LoadBalancer LoadBalancerConfiguration `json:"load_balancer"`
AWSGCP AWSGCPConfiguration `json:"aws_or_gcp"`
OperatorDebug OperatorDebugConfiguration `json:"debug"`
TeamsAPI TeamsAPIConfiguration `json:"teams_api"`
LoggingRESTAPI LoggingRESTAPIConfiguration `json:"logging_rest_api"`
Scalyr ScalyrConfiguration `json:"scalyr"`
EtcdHost string `json:"etcd_host,omitempty"`
DockerImage string `json:"docker_image,omitempty"`
Workers uint32 `json:"workers,omitempty"`
MinInstances int32 `json:"min_instances,omitempty"`
MaxInstances int32 `json:"max_instances,omitempty"`
ResyncPeriod Duration `json:"resync_period,omitempty"`
RepairPeriod Duration `json:"repair_period,omitempty"`
Sidecars map[string]string `json:"sidecar_docker_images,omitempty"`
PostgresUsersConfiguration PostgresUsersConfiguration `json:"users"`
Kubernetes KubernetesMetaConfiguration `json:"kubernetes"`
PostgresPodResources PostgresPodResourcesDefaults `json:"postgres_pod_resources"`
SetMemoryRequestToLimit bool `json:"set_memory_request_to_limit,omitempty"`
Timeouts OperatorTimeouts `json:"timeouts"`
LoadBalancer LoadBalancerConfiguration `json:"load_balancer"`
AWSGCP AWSGCPConfiguration `json:"aws_or_gcp"`
OperatorDebug OperatorDebugConfiguration `json:"debug"`
TeamsAPI TeamsAPIConfiguration `json:"teams_api"`
LoggingRESTAPI LoggingRESTAPIConfiguration `json:"logging_rest_api"`
Scalyr ScalyrConfiguration `json:"scalyr"`
LogicalBackup OperatorLogicalBackupConfiguration `json:"logical_backup"`
}

// OperatorConfigurationUsers defines configration for super user
Expand All @@ -174,3 +175,9 @@ type OperatorConfigurationUsers struct {

//Duration shortens this frequently used name
type Duration time.Duration

type OperatorLogicalBackupConfiguration struct {
Schedule string `json:"logical_backup_schedule,omitempty"`
DockerImage string `json:"logical_backup_docker_image,omitempty"`
S3Bucket string `json:"logical_backup_s3_bucket,omitempty"`
}
26 changes: 14 additions & 12 deletions pkg/apis/acid.zalan.do/v1/postgresql_type.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ package v1
import (
"time"

"k8s.io/api/core/v1"
v1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

Expand Down Expand Up @@ -43,17 +43,19 @@ type PostgresSpec struct {
// load balancers' source ranges are the same for master and replica services
AllowedSourceRanges []string `json:"allowedSourceRanges"`

NumberOfInstances int32 `json:"numberOfInstances"`
Users map[string]UserFlags `json:"users"`
MaintenanceWindows []MaintenanceWindow `json:"maintenanceWindows,omitempty"`
Clone CloneDescription `json:"clone"`
ClusterName string `json:"-"`
Databases map[string]string `json:"databases,omitempty"`
Tolerations []v1.Toleration `json:"tolerations,omitempty"`
Sidecars []Sidecar `json:"sidecars,omitempty"`
InitContainers []v1.Container `json:"init_containers,omitempty"`
PodPriorityClassName string `json:"pod_priority_class_name,omitempty"`
ShmVolume *bool `json:"enableShmVolume,omitempty"`
NumberOfInstances int32 `json:"numberOfInstances"`
Users map[string]UserFlags `json:"users"`
MaintenanceWindows []MaintenanceWindow `json:"maintenanceWindows,omitempty"`
Clone CloneDescription `json:"clone"`
ClusterName string `json:"-"`
Databases map[string]string `json:"databases,omitempty"`
Tolerations []v1.Toleration `json:"tolerations,omitempty"`
Sidecars []Sidecar `json:"sidecars,omitempty"`
InitContainers []v1.Container `json:"init_containers,omitempty"`
PodPriorityClassName string `json:"pod_priority_class_name,omitempty"`
ShmVolume *bool `json:"enableShmVolume,omitempty"`
EnableLogicalBackup bool `json:"enableLogicalBackup,omitempty"`
LogicalBackupSchedule string `json:"logicalBackupSchedule,omitempty"`
}

// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
Expand Down
17 changes: 17 additions & 0 deletions pkg/apis/acid.zalan.do/v1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit f3e1e80

Please sign in to comment.