Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design proposal for Restore hooks #2465

Merged
merged 3 commits into from
Jul 20, 2020
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Add details to restore hooks design
Signed-off-by: Marc Campbell <marc.e.campbell@gmail.com>
  • Loading branch information
Andrew Reed authored and marccampbell committed Jul 17, 2020
commit 2c1b489489dc7b9f0e2b1fe43a67a77e02d588aa
92 changes: 83 additions & 9 deletions design/restore-hooks.md
Original file line number Diff line number Diff line change
@@ -1,35 +1,109 @@
# Restore Hooks

Velero supports Backup Hooks to execute commands at before and/or after a backup.
This enables a user to, among other things, prepare data to be backed up.
An example of this would be to attach an empty volume to a Postgres pod, use a backup hook to execute `pg_dump` from the data volume, and back up the volume containing the export.
The problem is that there's no easy or automated way to include an automated restore process.
After a restore with the example configuration above, the postgres pod will be empty, but there will be a need to manually exec in and run `pg_restore`.
This is not specific to postgres backups, but any many database engines and other applications that have application-specific utilities to back up and restore the data.
This document proposes a solution that allows a user to specify Restore Hooks, much like Backup Hooks, that can be executed during the restore process.

## Goals

- Enable custom commands to be run during a restore in order to mirror the commands that are available to the backup process.
- Provide observability into the result of commands run in restored pods.

## Non Goals

- Handling any application specific scenarios (postgres, mongo, etc)

## Background

(See introduction)
Velero supports Backup Hooks to execute commands before and/or after a backup.
This enables a user to, among other things, prepare data to be backed up without having to freeze an in-use volume.
An example of this would be to attach an empty volume to a Postgres pod, use a backup hook to execute `pg_dump` from the data volume, and back up the volume containing the export.
The problem is that there's no easy or automated way to include an automated restore process.
After a restore with the example configuration above, the postgres pod will be empty, but there will be a need to manually exec in and run `pg_restore`.

## High-Level Design

The Restore spec
The Restore spec will have a `spec.hooks` section matching the same section on the Backup spec except no `pre` hooks can be defined - only `post`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there are different parts of the pod lifecycle that we would want to model/enable hooks for here. For example, we could theoretically support both (1) running a restore hook as an init container, so guaranteed to run before the main containers in the pod; and (2) running a restore hook after the pod is running, so as an exec hook inside the main container. Does anyone have thoughts on if something like this would be useful?

If we don't model this, then I believe you're proposing that we only support (2) from above, i.e. run the hook inside one of the existing containers in the pod once it's up and running.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For our use case (working on TAS backup-and-restore), this use-case (2) looks like exactly what we need: we'd like to be able to re-hydrate a database after the pod running the database server is ready.

Use-case (1) looks potentially useful to us in future, but no where near as immediately useful as use-case (2). So we're totally in agreement with adding use-case (2) right now, and maybe opening a new design PR later if we end up with a real-world use for (1).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering that the primary purpose of these hooks are for application data restoration and the fact that these hooks will be run once the pod becomes ready. Most applications would then need to be restarted to recognize the restored data. For this reason, the init container approach may be better suited.
As we are designing this, we should add both flavors of this restore hook and not just running commands. The approach of init containers will also have better error handling. In that, the pod doesn't run if the data isn't restored correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would help with the design if we had a concrete example of when initContainers would be useful during restore. The closest thing I could come up with is a scenario like this:

You have an API deployment that uses Postgres as its DB. You have your migrations in an initContainer for that deployment. You have a pre-snapshot hook that executes against the API pod and runs pg_dump to a volume in the pod. Then you would want an initContainer to be injected before the migrations initContainer to run pg_restore.

I'm not sure the above scenario would appear in the field since most API deployments have multiple replicas and it would be weird to run pg_dump/pg_restore multiple times for snapshot and restore. Do you have another use case in mind for initContainers?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@areed
The scenario I had was restoring of the stateful application itself and most stateful applications maintain state by writing files to disk.

In the case of postgres, if postgres were to start before running pg_dump/pg_restore, it would start by writing data into files that pg_dump/pg_restore would write when the restore hooks are run. The hooks will then overwrite the data that postgres initialized itself with. For postgres to heal, it may require a restart of postgres itself. Which means there will be a container restart. When this happens, it is possible for the pod, running the application, to be relocated and the new pod wouldn't be controlled by Velero for it to run the restore hooks to restore data. This is scope for data-loss.

Furthermore, when restoring a stateful application, say like Apache Zookeeper, which uses transaction semantics to maintain/manage state, overwriting data (performed in the restore hook) that the application started with will lead to data corruption causing the application to crash. Making the restore unsuccessful.

For this reason, the restore hooks should be run before the application can start.

Annotations comparable to the annotations used during backup can also be set on pods.
For each restored pod, the Velero server will check if there are any hooks applicable to the pod.
If a restored pod has any applicable hooks, Velero will wait for the pod to reach status Ready and then execute the hooks in the pod.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that, for the restic integration, certain users have seen issues where the pod that velero restores (and attempts to run the restic restore for) ends up being replaced by a controlling ReplicaSet shortly after restore, which ultimately makes that restic restore fail. See #1981 (comment) and the subsequent discussion for a lot more detail on that.

I do think, though, that for most environments, Velero's assumption that the pod that it restores will be "adopted" by its controller, seems to work fine. It sounds like, in order for the restore hooks to work as expected, we'd be making that same assumption?

(I know this is a somewhat vague comment, probably merits a live discussion at some point, but hopefully I wrote enough to point in the right direction for some context).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this design makes the same assumption.

An alternative to avoid this issue would be:

  • Drop support for restore annotations
  • Move restore hook execution stage to after the restic restore wait
  • Then use the labelSelector in the Restore hook spec to list Pods and then execute the hook in all that match.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect the annotations are included to mirror the current backup hooks design, but it certainly looks like there's some cases where it's likely to cause trouble.

Perhaps we don't include them?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to dig into this some more and see if I can get a better understanding of the scenarios where the pod adoption wouldn't work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skriss Have you been able to do this research? Any further thoughts here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During the latest community meeting the consensus was that since adoption behavior is already relied on for the restic integration it's safe to rely on for restore hooks.


The Restore log will include the results of each hook and the Restore object status will incorporate the results of hooks.

## Detailed Design

Post-restore hooks can be defined by annotation and/or by an array of resource hooks in the Restore spec.

The following annotations are supported:
- post.hook.restore.velero.io/container
- post.hook.restore.velero.io/command
- post.hook.restore.velero.io/on-error
- post.hook.restore.velero.io/timeout
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might also need a timeout to configure how long Velero should wait for the pod to be come running.



This is an example of defining hooks in the Restore spec.
```yaml
apiVersion: velero.io/v1
kind: Restore
spec:
...
hooks:
resources:
-
name: my-hook
includedNamespaces:
- '*'
excludedNamespaces:
- some-namespace
includedResources:
- pods
excludedResources: []
labelSelector:
matchLabels:
app: velero
component: server
post:
-
exec:
container: my-container
command:
- /bin/uname
- -a
onError: Fail
timeout: 10s
```

As with Backups, if an annotation is defined on a pod then no hooks from the Restore spec will be applied.

### Implementation

The types and function in pkg/backup/item_hook_handler.go will be moved to a new package (pkg/util/hooks) and exported so they can be used for both backups and restores.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat minor, but I'd promote the package to be pkg/hooks vs being inside pkg/util.


The restore hooks implementation will closely follow the design of restoring pod volumes with restic.
The pkg/restore.context type will have new fields `hooksWaitGroup` and `hooksErrs` comparable to `resticWaitGroup` and `resticErr`.
The pkg/restore.context.execute function will start a goroutine for each pod with applicable hooks and then continue with restoring other items.
Each hooks goroutine will create a pkg/util/hooks.ItemHookHandler for each pod and send any error on the context.hooksErrs channel.
The ItemHookHandler already includes stdout and stderr and other metadata in the Backup log so the same logs will automatically be added to the Restore log (passed as the first argument to the ItemHookhandler.HandleHooks method.)

The pkg/restore.context.execute function will wait for the hooksWaitGroup before returning.
Any errors received on context.hooksErrs will be added to errs.Velero.

One difference compared to the restic restore design is that any error on the context.hooksErrs channel will cancel the context of all hooks, since errors are only reported on this channel if the hook specified `onError: Fail`.
However, canceling the hooks goroutines will not cancel the restic goroutines.
nrb marked this conversation as resolved.
Show resolved Hide resolved
In practice the restic goroutines will complete before the hooks since the hooks do not run until a pod is ready, but it's possible a hook will be executed and fail while a different pod is still in the pod volume restore phase.

Failed hooks with `onError: Continue` will appear in the Restore log but will not affect the status of the parent Restore.
Failed hooks with `onError: Fail` will cause the parent Restore to have status Partially Failed.

## Alternatives Considered
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple other related upstream efforts to consider:

https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/20190120-execution-hook-design.md
https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/20200120-generic-data-populators.md

I don't think that either of those are in a state where we'd want to change our approach here for now, but worth look at/keeping an eye on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The execution-hook proposal is now obsolete and is being replaced by Draft ContainerNotifier Proposal


Wait for all restored Pods to report Ready, then execute the first hook in all applicable Pods simultaneously, then proceed to the next hook, etc.
That could introduce deadlock, e.g. if an API pod cannot be ready until the DB pod is restored.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have some option to enable execution simultaneously, since in some cases simultaneous execution may not required, for example running db repair or file system repair can be executed sequentially.


Put the restore hooks on the Backup spec as a third lifecycle event named `restore` along with `pre` and `post`.
That would be confusing since `pre` and `post` would appear in the Backup log but `restore` would only be in the Restore log.

Copy link

@totherme totherme May 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like calling this hook a "post" hook, since that leaves us open to add "pre" hooks in future if we want them (along the lines of skriss' comments above) 👍

Copy link

@neil-hickey neil-hickey May 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example of a pre hook might be:

  • I have a corrupted database / invalid state of a DB I am restoring into. My pre-hook will delete the corrupted data prior to restore, or reconcile a good state of the world prior to restore. (This can be done via one all encompassing hook also. But the separation may be cleaner)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will be the lifecycle phase of the pod for running a pre-restore-hook?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have a need for pre-restore-hooks right now, but I can imagine it being useful to have one that is guaranteed to run:

  • before the main containers in the restored pod start ; and
  • after the PVCs have been mounted.

This would allow us to do what neil-hickey suggests in the event of a corrpted DB: replace the bad data before starting a database process.

Having said all that, I don't think there's any need at all for pre-restore-hooks in an MVP. I just like the idea of leaving the option open for future work.

Execute restore hooks in parallel for each Pod.
That would not match the behavior of Backups.

## Security Considerations

N/A
Stdout or stderr in the Restore log may contain sensitive information, but the same risk already exists for Backup hooks.