Skip to content

Add logical backup #442

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 63 commits into from
May 16, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
ad39704
Extend Postgres manifest with new params
Jan 8, 2019
4088e00
Sketch operator conf/docs
Jan 8, 2019
64e1640
respond to review
Jan 9, 2019
5811175
Address review comments
Jan 11, 2019
ac9442d
minor changes
Jan 11, 2019
5c7233a
Add a stub method for cron job creation
Jan 11, 2019
144a7ce
Extend KubeClient to work with cron jobs
Jan 11, 2019
68755ff
Generate empty cron job spec
Jan 14, 2019
11019f5
Submit hello world cron job
Jan 15, 2019
9d430fc
resolve merge conflicts
Feb 27, 2019
f4d8ec2
update generated code
Feb 27, 2019
18e2d7a
address a code review
Feb 27, 2019
825c513
Merge branch 'master' into add-logical-backup
Mar 5, 2019
bc0923d
use custmom schedule
Mar 7, 2019
0e5ed5e
Merge branch 'master' into add-logical-backup
Apr 1, 2019
918860b
resolve merge conflict
Apr 2, 2019
5d44904
minor doc fixes
Apr 2, 2019
a44da93
Update RBAC
Apr 2, 2019
de9ffb6
minor formatting things
Apr 2, 2019
4017563
properly generate podTemplate for the cron job's pod
Apr 2, 2019
99a3712
Delete the cron job on cluster deletion
Apr 3, 2019
2f12c89
minor bug fixes
Apr 3, 2019
6d46f9c
remove unnecessary volume mount
Apr 3, 2019
c95aa29
generate env for logical backup pod
Apr 3, 2019
d30dd9e
Document S3 bucket for logical backups
Apr 5, 2019
9ad38d5
Minor doc/code fixes
Apr 5, 2019
daf4e3a
add basic affinity setup
Apr 5, 2019
76c00f3
Consistently name the docker image param
Apr 5, 2019
6f855d5
minore changes
Apr 8, 2019
0f1e196
add code skeleton for CronJob Sync
Apr 8, 2019
b79ee0d
fix static check violations
Apr 11, 2019
f1dbd3c
Merge branch 'master' into add-logical-backup
Apr 15, 2019
a020314
update branch after merging master
Apr 15, 2019
a816483
add update skeleton
Apr 15, 2019
d4ebd33
minor changes
Apr 16, 2019
fe1d1b2
add special case for Sync
Apr 16, 2019
2d720c9
properly log job's name
Apr 16, 2019
dc2f707
adjust deletion and cronjob conf
Apr 16, 2019
273c8e3
address static checks
Apr 16, 2019
54c4a04
doc and conf updates
Apr 16, 2019
eb82078
minor code fixes for cluster creation
Apr 16, 2019
a73b986
Minor code changes
Apr 16, 2019
bd9d0fc
code cleanup
Apr 17, 2019
4cf37c4
properly set the job name
Apr 17, 2019
e2a4876
change delete logic
Apr 17, 2019
65b87d1
remove unnecessary pointer
Apr 17, 2019
20364e5
update doc
Apr 17, 2019
01b066b
remove namespace from backup name
Apr 17, 2019
0cac277
add suffix for the logical backup bucket
Apr 23, 2019
0f9d6dc
adjust names of env vars
Apr 23, 2019
b2d7034
minor correction
Apr 23, 2019
6dc97d6
Address first part of the code review
May 3, 2019
587ff74
address 2nd part of code review
May 3, 2019
3430506
remove enable_logical_backup from operator configmap/crd
May 3, 2019
422d803
merge master
May 15, 2019
9c547c9
fix complete manifest
May 15, 2019
24b9e9f
fix env vars
May 15, 2019
7d8d8c3
remove unused parameter
May 16, 2019
88199ff
Address docs' review
May 16, 2019
b88c133
unify default values
May 16, 2019
53b797b
fix complete manifest after incorrect merge
May 16, 2019
d9f465e
address code review
May 16, 2019
faf668b
fix helm chart
May 16, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions charts/postgres-operator/templates/clusterrole.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -141,4 +141,15 @@ rules:
- bind
resourceNames:
- {{ template "postgres-operator.fullname" . }}
- apiGroups:
- batch
resources:
- cronjobs # enables logical backups
verbs:
- create
- delete
- get
- list
- patch
- update
{{ end }}
3 changes: 3 additions & 0 deletions charts/postgres-operator/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,9 @@ config:
pod_management_policy: "ordered_ready"
enable_pod_antiaffinity: "false"
pod_antiaffinity_topology_key: "kubernetes.io/hostname"
logical_backup_schedule: "30 00 * * *"
logical_backup_docker_image: "registry.opensource.zalan.do/acid/logical-backup"
logical_backup_s3_bucket: ""
rbac:
# Specifies whether RBAC resources should be created
create: true
Expand Down
21 changes: 15 additions & 6 deletions docs/administrator.md
Original file line number Diff line number Diff line change
Expand Up @@ -340,9 +340,18 @@ Postgres database cluster:

## Understanding rolling update of Spilo pods

The operator logs reasons for a rolling update with the `info` level and
a diff between the old and new StatefulSet specs with the `debug` level.
To read the latter log entry with the escaped characters rendered, view it
in CLI with `echo -e`. Note that the resultant message will contain some
noise because the `PodTemplate` used by the operator is yet to be updated
with the default values used internally in Kubernetes.
The operator logs reasons for a rolling update with the `info` level and a diff between the old and new StatefulSet specs with the `debug` level. To benefit from numerous escape characters in the latter log entry, view it in CLI with `echo -e`. Note that the resultant message will contain some noise because the `PodTemplate` used by the operator is yet to be updated with the default values used internally in Kubernetes.

## Logical backups

The operator can manage k8s cron jobs to run logical backups of Postgres clusters. The cron job periodically spawns a batch job that runs a single pod. The backup script within this pod's container can connect to a DB for a logical backup. The operator updates cron jobs during Sync if the job schedule changes; the job name acts as the job identifier. These jobs are to be enabled for each indvidual Postgres cluster by setting `enableLogicalBackup: true` in its manifest. Notes:

1. The provided `registry.opensource.zalan.do/acid/logical-backup` image implements the backup via `pg_dumpall` and upload of (compressed) results to an S3 bucket; `pg_dumpall` requires a `superuser` access to a DB and runs on the replica when possible.

2. Due to the [limitation of Kubernetes cron jobs](https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/#cron-job-limitations) it is highly advisable to set up additional monitoring for this feature; such monitoring is outside of the scope of operator responsibilities.

3. The operator does not remove old backups.

4. You may use your own image by overwriting the relevant field in the operator configuration. Any such image must ensure the logical backup is able to finish [in presence of pod restarts](https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#handling-pod-and-container-failures) and [simultaneous invocations](https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/#cron-job-limitations) of the backup cron job.

5. For that feature to work, your RBAC policy must enable operations on the `cronjobs` resource from the `batch` API group for the operator service account. See [example RBAC](../manifests/operator-service-account-rbac.yaml)
5 changes: 4 additions & 1 deletion docs/developer.md
Original file line number Diff line number Diff line change
Expand Up @@ -203,7 +203,7 @@ localhost:8080 by doing:
The inner 'query' gets the name of the postgres operator pod, and the outer
enables port forwarding. Afterwards, you can access the operator API with:

$ curl http://127.0.0.1:8080/$endpoint| jq .
$ curl --location http://127.0.0.1:8080/$endpoint | jq .

The available endpoints are listed below. Note that the worker ID is an integer
from 0 up to 'workers' - 1 (value configured in the operator configuration and
Expand Down Expand Up @@ -323,6 +323,9 @@ be updated. As explained [here](reference/operator_parameters.md), it's possible
to configure the operator either with a ConfigMap or CRD, but currently we aim
to synchronize parameters everywhere.

When choosing a parameter name for a new option in a PG manifest, keep in mind
the naming conventions there. The `snake_case` variables come from the Patroni/Postgres world, while the `camelCase` from the k8s world.

Note: If one option is defined in the operator configuration and in the cluster
[manifest](../manifests/complete-postgres-manifest.yaml), the latter takes
precedence.
Expand Down
10 changes: 9 additions & 1 deletion docs/reference/cluster_manifest.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ measurements. Please, refer to the [Kubernetes
documentation](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/)
for the possible values of those.

:exclamation: If both operator configmap/CRD and a Postgres cluster manifest define the same parameter, the value from the Postgres cluster manifest is applied.

## Manifest structure

A postgres manifest is a `YAML` document. On the top level both individual
Expand Down Expand Up @@ -45,7 +47,7 @@ Those parameters are grouped under the `metadata` top-level key.

## Top-level parameters

Those are parameters grouped directly under the `spec` key in the manifest.
These parameters are grouped directly under the `spec` key in the manifest.

* **teamId**
name of the team the cluster belongs to. Changing it after the cluster
Expand Down Expand Up @@ -117,6 +119,12 @@ Those are parameters grouped directly under the `spec` key in the manifest.
is `false`, then no volume will be mounted no matter how operator was
configured (so you can override the operator configuration).

* **enableLogicalBackup**
Determines if the logical backup of this cluster should be taken and uploaded to S3. Default: false.

* **logicalBackupSchedule**
Schedule for the logical backup k8s cron job. Please take [the reference schedule format](https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#schedule) into account. Default: "30 00 \* \* \*"

## Postgres parameters

Those parameters are grouped under the `postgresql` top-level key.
Expand Down
16 changes: 15 additions & 1 deletion docs/reference/operator_parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,8 @@ parameters, those parameters have no effect and are replaced by the
`CRD_READY_WAIT_INTERVAL` and `CRD_READY_WAIT_TIMEOUT` environment variables.
They will be deprecated and removed in the future.

For the configmap operator configuration, the [default parameter values](https://github.com/zalando-incubator/postgres-operator/blob/master/pkg/util/config/config.go#L14) mentioned here are likely to be overwritten in your local operator installation via your local version of the operator configmap. In the case you use the operator CRD, all the CRD defaults are provided in the [operator's default configuration manifest](https://github.com/zalando-incubator/postgres-operator/blob/master/manifests/postgresql-operator-default-configuration.yaml)

Variable names are underscore-separated words.


Expand Down Expand Up @@ -476,4 +478,16 @@ scalyr sidecar. In the CRD-based configuration they are grouped under the
Memory limit value for the Scalyr sidecar. The default is `1Gi`.


For the configmap operator configuration, the [default parameter values](https://github.com/zalando/postgres-operator/blob/master/pkg/util/config/config.go#L14) mentioned here are likely to be overwritten in your local operator installation via your local version of the operator configmap. In the case you use the operator CRD, all the CRD defaults are provided in the [operator's default configuration manifest](https://github.com/zalando/postgres-operator/blob/master/manifests/postgresql-operator-default-configuration.yaml)
## Logical backup

These parameters configure a k8s cron job managed by the operator to produce Postgres logical backups.
In the CRD-based configuration those parameters are grouped under the `logical_backup` key.

* **logical_backup_schedule**
Backup schedule in the cron format. Please take [the reference schedule format](https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#schedule) into account. Default: "30 00 \* \* \*"

* **logical_backup_docker_image**
Docker image for the pods of the cron job. Must implement backup logic and correctly handle pod and job restarts. The default image runs `pg_dumpall` (on a replica if possible) and uploads compressed results to an S3 bucket under the key `/spilo/pg_cluster_name/cluster_k8s_uuid/logical_backups` Default: "registry.opensource.zalan.do/acid/logical-backup"

* **logical_backup_s3_bucket**
S3 bucket to store backup results. The bucket has to be present and accessible by Postgres pods. Default: empty.
8 changes: 8 additions & 0 deletions docs/user.md
Original file line number Diff line number Diff line change
Expand Up @@ -347,3 +347,11 @@ every 6 hours.
Note that if the statefulset is scaled down before resizing the size changes
are only applied to the volumes attached to the running pods. The size of the
volumes that correspond to the previously running pods is not changed.

## Logical backups

If you add
```
enableLogicalBackup: true
```
to the cluster manifest, the operator will create and sync a k8s cron job to do periodic logical backups of this particular Postgres cluster. Due to the [limitation of Kubernetes cron jobs](https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/#cron-job-limitations) it is highly advisable to set up additional monitoring for this feature; such monitoring is outside of the scope of operator responsibilities. See [configuration reference](reference/cluster_manifest.md) and [administrator documentation](administrator.md) for details on how backups are executed.
4 changes: 4 additions & 0 deletions manifests/complete-postgres-manifest.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,10 @@ spec:
# cluster: "acid-batman"
# timestamp: "2017-12-19T12:40:33+01:00" # timezone required (offset relative to UTC, see RFC 3339 section 5.6)
# s3_wal_path: "s3://custom/path/to/bucket"

# run periodic backups with k8s cron jobs
# enableLogicalBackup: true
# logicalBackupSchedule: "30 00 * * *"
maintenanceWindows:
- 01:00-06:00 #UTC
- Sat:00:00-04:00
4 changes: 4 additions & 0 deletions manifests/configmap.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -54,3 +54,7 @@ data:
resource_check_interval: 3s
resource_check_timeout: 10m
resync_period: 5m

# logical_backup_schedule: "30 00 * * *"
# logical_backup_docker_image: "registry.opensource.zalan.do/acid/logical-backup"
# logical_backup_s3_bucket: ""
1 change: 0 additions & 1 deletion manifests/minimal-postgres-manifest.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@ spec:
# role for application foo
foo_user: []


#databases: name->owner
databases:
foo: zalando
Expand Down
12 changes: 11 additions & 1 deletion manifests/operator-service-account-rbac.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,17 @@ rules:
- bind
resourceNames:
- zalando-postgres-operator

- apiGroups:
- batch
resources:
- cronjobs # enables logical backups
verbs:
- create
- delete
- get
- list
- patch
- update
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
Expand Down
5 changes: 4 additions & 1 deletion manifests/postgresql-operator-default-configuration.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -91,4 +91,7 @@ configuration:
# scalyr_api_key: ""
# scalyr_image: ""
# scalyr_server_url: ""

logical_backup:
logical_backup_schedule: "30 00 * * *"
logical_backup_docker_image: "registry.opensource.zalan.do/acid/logical-backup"
logical_backup_s3_bucket: ""
45 changes: 26 additions & 19 deletions pkg/apis/acid.zalan.do/v1/operator_configuration_type.go
Original file line number Diff line number Diff line change
Expand Up @@ -143,25 +143,26 @@ type ScalyrConfiguration struct {

// OperatorConfigurationData defines the operation config
type OperatorConfigurationData struct {
EtcdHost string `json:"etcd_host,omitempty"`
DockerImage string `json:"docker_image,omitempty"`
Workers uint32 `json:"workers,omitempty"`
MinInstances int32 `json:"min_instances,omitempty"`
MaxInstances int32 `json:"max_instances,omitempty"`
ResyncPeriod Duration `json:"resync_period,omitempty"`
RepairPeriod Duration `json:"repair_period,omitempty"`
Sidecars map[string]string `json:"sidecar_docker_images,omitempty"`
PostgresUsersConfiguration PostgresUsersConfiguration `json:"users"`
Kubernetes KubernetesMetaConfiguration `json:"kubernetes"`
PostgresPodResources PostgresPodResourcesDefaults `json:"postgres_pod_resources"`
SetMemoryRequestToLimit bool `json:"set_memory_request_to_limit,omitempty"`
Timeouts OperatorTimeouts `json:"timeouts"`
LoadBalancer LoadBalancerConfiguration `json:"load_balancer"`
AWSGCP AWSGCPConfiguration `json:"aws_or_gcp"`
OperatorDebug OperatorDebugConfiguration `json:"debug"`
TeamsAPI TeamsAPIConfiguration `json:"teams_api"`
LoggingRESTAPI LoggingRESTAPIConfiguration `json:"logging_rest_api"`
Scalyr ScalyrConfiguration `json:"scalyr"`
EtcdHost string `json:"etcd_host,omitempty"`
DockerImage string `json:"docker_image,omitempty"`
Workers uint32 `json:"workers,omitempty"`
MinInstances int32 `json:"min_instances,omitempty"`
MaxInstances int32 `json:"max_instances,omitempty"`
ResyncPeriod Duration `json:"resync_period,omitempty"`
RepairPeriod Duration `json:"repair_period,omitempty"`
Sidecars map[string]string `json:"sidecar_docker_images,omitempty"`
PostgresUsersConfiguration PostgresUsersConfiguration `json:"users"`
Kubernetes KubernetesMetaConfiguration `json:"kubernetes"`
PostgresPodResources PostgresPodResourcesDefaults `json:"postgres_pod_resources"`
SetMemoryRequestToLimit bool `json:"set_memory_request_to_limit,omitempty"`
Timeouts OperatorTimeouts `json:"timeouts"`
LoadBalancer LoadBalancerConfiguration `json:"load_balancer"`
AWSGCP AWSGCPConfiguration `json:"aws_or_gcp"`
OperatorDebug OperatorDebugConfiguration `json:"debug"`
TeamsAPI TeamsAPIConfiguration `json:"teams_api"`
LoggingRESTAPI LoggingRESTAPIConfiguration `json:"logging_rest_api"`
Scalyr ScalyrConfiguration `json:"scalyr"`
LogicalBackup OperatorLogicalBackupConfiguration `json:"logical_backup"`
}

// OperatorConfigurationUsers defines configration for super user
Expand All @@ -174,3 +175,9 @@ type OperatorConfigurationUsers struct {

//Duration shortens this frequently used name
type Duration time.Duration

type OperatorLogicalBackupConfiguration struct {
Schedule string `json:"logical_backup_schedule,omitempty"`
DockerImage string `json:"logical_backup_docker_image,omitempty"`
S3Bucket string `json:"logical_backup_s3_bucket,omitempty"`
}
26 changes: 14 additions & 12 deletions pkg/apis/acid.zalan.do/v1/postgresql_type.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ package v1
import (
"time"

"k8s.io/api/core/v1"
v1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

Expand Down Expand Up @@ -43,17 +43,19 @@ type PostgresSpec struct {
// load balancers' source ranges are the same for master and replica services
AllowedSourceRanges []string `json:"allowedSourceRanges"`

NumberOfInstances int32 `json:"numberOfInstances"`
Users map[string]UserFlags `json:"users"`
MaintenanceWindows []MaintenanceWindow `json:"maintenanceWindows,omitempty"`
Clone CloneDescription `json:"clone"`
ClusterName string `json:"-"`
Databases map[string]string `json:"databases,omitempty"`
Tolerations []v1.Toleration `json:"tolerations,omitempty"`
Sidecars []Sidecar `json:"sidecars,omitempty"`
InitContainers []v1.Container `json:"init_containers,omitempty"`
PodPriorityClassName string `json:"pod_priority_class_name,omitempty"`
ShmVolume *bool `json:"enableShmVolume,omitempty"`
NumberOfInstances int32 `json:"numberOfInstances"`
Users map[string]UserFlags `json:"users"`
MaintenanceWindows []MaintenanceWindow `json:"maintenanceWindows,omitempty"`
Clone CloneDescription `json:"clone"`
ClusterName string `json:"-"`
Databases map[string]string `json:"databases,omitempty"`
Tolerations []v1.Toleration `json:"tolerations,omitempty"`
Sidecars []Sidecar `json:"sidecars,omitempty"`
InitContainers []v1.Container `json:"init_containers,omitempty"`
PodPriorityClassName string `json:"pod_priority_class_name,omitempty"`
ShmVolume *bool `json:"enableShmVolume,omitempty"`
EnableLogicalBackup bool `json:"enableLogicalBackup,omitempty"`
LogicalBackupSchedule string `json:"logicalBackupSchedule,omitempty"`
}

// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
Expand Down
17 changes: 17 additions & 0 deletions pkg/apis/acid.zalan.do/v1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading