Restore progress reporting

Velero Backup resource provides real-time progress of an ongoing backup by means of a Progress field in the CR. Velero Restore, on the other hand, only shows one of the phases (InProgress, Completed, PartiallyFailed, Failed) of the ongoing restore. In this document, we propose detailed progress reporting for Velero Restore. With the introduction of the proposed Progress field, Velero Restore CR will look like:

apiVersion: velero.io/v1
kind: Restore
metadata:
  name: test-restore
  namespace: velero
spec:
    [...]
status:
  phase: InProgress
  progress:
    itemsRestored: 100
    totalItems: 140

Goals

Enable progress reporting for Velero Restore

Non Goals

Estimate time to completion

Background

The current Restore CR lets users know whether a restore is in-progress or completed (failed/succeeded). While this basic piece of information is useful to the end user, there seems to be room for improvement in the user experience. The Restore CR can show detailed progress in terms of the number of resources restored so far and the total number of resources to be restored. This will be particularly useful for restores that run for a longer duration of time. Such progress reporting already exists for Velero Backup. This document proposes similar implementation for Velero Restore.

High-Level Design

We propose to divide the restore process in two steps. The first step will collect all the items to be restored from the backup tarball. It will apply the label selector and include/exclude rules on the resources / items and store them (preserving the priority order) in an in-memory data structure. The second step will read the collected items and restore them.

Detailed Design

Progress struct

A new struct will be introduced to store progress information:

type RestoreProgress struct {
    TotalItems    int `json:"totalItems,omitempty`
    ItemsRestored int `json:"itemsRestored,omitempty`
}

RestoreStatus will include the above struct:

type RestoreStatus struct {
    [...]

    Progress *RestoreProgress `json:"progress,omitempty"`
}

Modifications to restore.go

Currently, the restore process works by looping through the resources in the backup tarball and restoring them one-by-one in the same pass:

func (ctx *context) execute(...) {
    [...]

    for _, resource := range getOrderedResources(...) {
        [...]

        for namespace, items := range resourceList.ItemsByNamespace {
            [...]

            for _, item := range items {
                [...]

                // restore item here
                w, e := restoreItem(...)
            }
        }
    }
}

We propose to remove the call to restoreItem() in the inner most loop and instead store the item in a data structure. Once all the items are collected, we loop through the array of collected items and make a call to restoreItem():

func (ctx *context) getOrderedResourceCollection(...) {
    collectedResources := []restoreResource
    for _, resource := range getOrderedResources(...) {
        [...]

        for namespace, items := range resourceList.ItemsByNamespace {
            [...]
            collectedResource := restoreResource{}
            for _, item := range items {
                [...]

                // store item in a data structure
                collectedResource.itemsByNamespace[originalNamespace] = append(collectedResource.itemsByNamespace[originalNamespace], item)
            }
        }
        collectedResources.append(collectedResources, collectedResource)
    }
    return collectedResources
}

func (ctx *context) execute(...) {
    [...]

    // get all items
    resources := ctx.getOrderedResourceCollection(...)

    for _, resource := range resources {
        [...]

        for _, items := range resource.itemsByNamespace {
            [...]

            for _, item := range items {
                [...]

                // restore the item
                w, e := restoreItem(...)
            }
        }
    }

    [...]
}

We introduce two new structs to hold the collected items:

type restoreResource struct {
    resource            string
    itemsByNamespace    map[string][]restoreItem
    totalItems          int
}

type restoreItem struct {
    targetNamespace string
    name            string
}

Each group resource is represented by restoreResource. The map itemsByNamespace is indexed by originalNamespace, and the values are list of items in the original namespace. totalItems is simply the count of all items which are present in the nested map of namespace and items. It is updated every time an item is added to the map. Each item represented by restoreItem has name and the resolved targetNamespace.

Calculating progress

The total number of items can be calculated by simply adding the number of total items present in the map of all resources.

totalItems := 0

for _, resource := range collectedResources {
	totalItems += resource.totalItems
}

The additional items returned by the plugins will still be discovered at the time of plugin execution. The number of totalItems will be adjusted to include such additional items. As a result, the number of total items is expected to change whenever plugins execute:

    i := 0
    for _, resource := range resources {
        [...]

        for _, items := range resource.itemsByNamespace {
            [...]

            for _, item := range items {
                [...]

                // restore the item
                w, e := restoreItem(...)
		i++
		// calculate the actual count of resources
		actualTotalItems := len(ctx.restoredItems) + (totalItems - i)
            }
        }
    }

Updating progress

The updates to the progress field in the CR can be sent on a channel as soon as an item is restored. A goroutine receiving update on that channel can make an Update() call to update the Restore CR. This will require us to pass an instance of RestoresGetter to the kubernetesRestorer struct.

Alternatives Considered

As an alternative, we have considered an approach which doesn't divide the restore process in two steps.

With that approach, the total number of items will be read from the Backup CR. We will keep three counters, totalItems, skippedItems and restoredItems:

status:
  phase: InProgress
  progress:
    totalItems: 100
    skippedItems: 20
    restoredItems: 79

This approach doesn't require us to find the number of total items beforehand.

Security Considerations

Omitted

Compatibility

Omitted

Implementation

TBD

Open Issues

vmware-tanzu#21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

restore-progress.md

restore-progress.md

Restore progress reporting

Goals

Non Goals

Background

High-Level Design

Detailed Design

Progress struct

Modifications to restore.go

Calculating progress

Updating progress

Alternatives Considered

Security Considerations

Compatibility

Implementation

Open Issues

Files

restore-progress.md

Latest commit

History

restore-progress.md

File metadata and controls

Restore progress reporting

Goals

Non Goals

Background

High-Level Design

Detailed Design

Progress struct

Modifications to restore.go

Calculating progress

Updating progress

Alternatives Considered

Security Considerations

Compatibility

Implementation

Open Issues