-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use pod namespace from backup when matching PVBs #3475
Use pod namespace from backup when matching PVBs #3475
Conversation
In vmware-tanzu#3051, we introduced an additional check to ensure that a PVB matched a particular pod by checking both the name and the namespace of the pod. This caused an issue when using a namespace mapping on restore. In the case where a namespace mapping is being used, the check for whether a PVB matches a particular pod will fail as the PVB was created for the original pod namespace and is not aware of the new namespace mapping being used. This resulted in PVRs not being created for pods that were being restored into new namespaces. The restic init containers were being created to wait on the volume restore, however this would cause the restored pods to block indefinitely as they would be waiting for a volume restore that was not scheduled. To fix this, we use the original namespace of the pod from the backup to match the PVB to the pod being restored, not the new namespace where the pod is being restored into. Fixes vmware-tanzu#3467. Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
77b9ea1
to
77c2bd7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
pkg/restore/restic_restore_action.go
Outdated
var sourcePod corev1.Pod | ||
if err := runtime.DefaultUnstructuredConverter.FromUnstructured(input.ItemFromBackup.UnstructuredContent(), &sourcePod); err != nil { | ||
return nil, errors.Wrap(err, "unable to convert source pod from runtime.Unstructured") | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for reference, we could have also reverse looked up the RestoreItemActionExecuteInput.Restore.Spec.NamespaceMapping
. But this is also fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about doing that but there was actually an issue with that approach. At the point when this action's Execute
function is called in restoreItem
, the namespace mapping hasn't yet been applied to the object being restored. That doesn't happen until later in the function. This means that, under the current implementation, both pod
and sourcePod
have the same namespace when this is called. Rather than assuming they would be the same, I opted to look at the item from the backup. It seemed too risky for this change to change when the namespace mapping would be applied in restoreItem
, but checking the pod from the backup also prevents this issue occurring again if when we do change when the mapping is applied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should add a comment though to explain why we can't rely on the namespace mapping from the restore spec. Thanks for reminding me about why I couldn't use that approach! 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zubron Not sure I follow. The NamespaceMapping
that I was referring to is a field on the restore object spec. This field is set using the value of the namespace-mappings
flag that is passed at the CLI running velero restore create
command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying @zubron.
Just capturing our discussion here. Changes at line79-82 weren't necessary and we could have used pod.Namespace
at https://github.com/vmware-tanzu/velero/pull/3475/files#diff-9f110ab28d31cd60e15e9ec43ad813f22bd8b9329eaf0ebe70c0b6590592b76dR100
and added unit tests to catch any future code reordering that may break. The changes at line79-82 is future proofing.
Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
dbd3d54
to
0d56725
Compare
Before reviewing, I think this has the potential for backporting so that v1.5.x users can benefit from the fix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the added test and the comment about why we can't do reverse lookup!
* Use pod namespace from backup when matching PVBs In vmware-tanzu#3051, we introduced an additional check to ensure that a PVB matched a particular pod by checking both the name and the namespace of the pod. This caused an issue when using a namespace mapping on restore. In the case where a namespace mapping is being used, the check for whether a PVB matches a particular pod will fail as the PVB was created for the original pod namespace and is not aware of the new namespace mapping being used. This resulted in PVRs not being created for pods that were being restored into new namespaces. The restic init containers were being created to wait on the volume restore, however this would cause the restored pods to block indefinitely as they would be waiting for a volume restore that was not scheduled. To fix this, we use the original namespace of the pod from the backup to match the PVB to the pod being restored, not the new namespace where the pod is being restored into. Fixes vmware-tanzu#3467. Signed-off-by: Bridget McErlean <bmcerlean@vmware.com> * Explain why the namespace mapping can't be used Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
* Use pod namespace from backup when matching PVBs In vmware-tanzu#3051, we introduced an additional check to ensure that a PVB matched a particular pod by checking both the name and the namespace of the pod. This caused an issue when using a namespace mapping on restore. In the case where a namespace mapping is being used, the check for whether a PVB matches a particular pod will fail as the PVB was created for the original pod namespace and is not aware of the new namespace mapping being used. This resulted in PVRs not being created for pods that were being restored into new namespaces. The restic init containers were being created to wait on the volume restore, however this would cause the restored pods to block indefinitely as they would be waiting for a volume restore that was not scheduled. To fix this, we use the original namespace of the pod from the backup to match the PVB to the pod being restored, not the new namespace where the pod is being restored into. Fixes vmware-tanzu#3467. Signed-off-by: Bridget McErlean <bmcerlean@vmware.com> * Explain why the namespace mapping can't be used Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
* Restore CAPI cluster objects in a better order Restoring CAPI workload clusters without this ordering caused the capi-controller-manager code to panic, resulting in an unhealthy cluster state. This can be worked around (https://community.pivotal.io/s/article/5000e00001pJyN41611954332537?language=en_US), but we provide the inclusion of these resources as a default in order to provide a better out-of-the-box experience. Signed-off-by: Nolan Brubaker <brubakern@vmware.com> * Add changelog Signed-off-by: Nolan Brubaker <brubakern@vmware.com> * Use pod namespace from backup when matching PVBs (#3475) * Use pod namespace from backup when matching PVBs In #3051, we introduced an additional check to ensure that a PVB matched a particular pod by checking both the name and the namespace of the pod. This caused an issue when using a namespace mapping on restore. In the case where a namespace mapping is being used, the check for whether a PVB matches a particular pod will fail as the PVB was created for the original pod namespace and is not aware of the new namespace mapping being used. This resulted in PVRs not being created for pods that were being restored into new namespaces. The restic init containers were being created to wait on the volume restore, however this would cause the restored pods to block indefinitely as they would be waiting for a volume restore that was not scheduled. To fix this, we use the original namespace of the pod from the backup to match the PVB to the pod being restored, not the new namespace where the pod is being restored into. Fixes #3467. Signed-off-by: Bridget McErlean <bmcerlean@vmware.com> * Explain why the namespace mapping can't be used Signed-off-by: Bridget McErlean <bmcerlean@vmware.com> * Allow Dockerfiles to be configurable (#3634) For internal builds of Velero, we need to be able to specify an alternative Dockerfile which uses an alternative image registry to pull the base images from. This change adapts our Makefile such that both the main Dockerfile and build image Dockerfile can be overridden. We have some special handling for the build image to only build when the Dockerfile has changed. In this case, we check whether a custom Dockerfile has been provided, and always rebuild in that case. For custom build image Dockerfiles, use a fixed tag rather than the one based on commit SHA of the original file. Signed-off-by: Bridget McErlean <bmcerlean@vmware.com> * Combine CRD install verification into 1 job, and update k8s versions (#3448) * Validate CRDs against latest Kubernetes versions Add Kubernetes v1.19 and v1.20 series images, and consolidate the job into a single file to reduce repetition. Signed-off-by: Nolan Brubaker <brubakern@vmware.com> * Ignore job if the changes are only site/design Signed-off-by: Nolan Brubaker <brubakern@vmware.com> * Fix codespell error Signed-off-by: Nolan Brubaker <brubakern@vmware.com> * Cache Velero binary for reuse on workers This will cache the Velero binary based on the PR number and a SHA256 of the generated binary. This way, the runners testing each version of Kubernetes do not need to build it independently. Signed-off-by: Nolan Brubaker <brubakern@vmware.com> * Fix GitHub event access Signed-off-by: Nolan Brubaker <brubakern@vmware.com> * Wrap output path in quotes Signed-off-by: Nolan Brubaker <brubakern@vmware.com> * Move code checkout to build step Signed-off-by: Nolan Brubaker <brubakern@vmware.com> * Also cache go modules Signed-off-by: Nolan Brubaker <brubakern@vmware.com> * Fix syntax issues Signed-off-by: Nolan Brubaker <brubakern@vmware.com> * Download cached binary on each node Signed-off-by: Nolan Brubaker <brubakern@vmware.com> * Use cached go modules on main CI Signed-off-by: Nolan Brubaker <brubakern@vmware.com> * Add changelog for v1.5.4 Signed-off-by: Bridget McErlean <bmcerlean@vmware.com> Co-authored-by: Nolan Brubaker <brubakern@vmware.com>
* Use pod namespace from backup when matching PVBs In vmware-tanzu#3051, we introduced an additional check to ensure that a PVB matched a particular pod by checking both the name and the namespace of the pod. This caused an issue when using a namespace mapping on restore. In the case where a namespace mapping is being used, the check for whether a PVB matches a particular pod will fail as the PVB was created for the original pod namespace and is not aware of the new namespace mapping being used. This resulted in PVRs not being created for pods that were being restored into new namespaces. The restic init containers were being created to wait on the volume restore, however this would cause the restored pods to block indefinitely as they would be waiting for a volume restore that was not scheduled. To fix this, we use the original namespace of the pod from the backup to match the PVB to the pod being restored, not the new namespace where the pod is being restored into. Fixes vmware-tanzu#3467. Signed-off-by: Bridget McErlean <bmcerlean@vmware.com> * Explain why the namespace mapping can't be used Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
* Use pod namespace from backup when matching PVBs In vmware-tanzu#3051, we introduced an additional check to ensure that a PVB matched a particular pod by checking both the name and the namespace of the pod. This caused an issue when using a namespace mapping on restore. In the case where a namespace mapping is being used, the check for whether a PVB matches a particular pod will fail as the PVB was created for the original pod namespace and is not aware of the new namespace mapping being used. This resulted in PVRs not being created for pods that were being restored into new namespaces. The restic init containers were being created to wait on the volume restore, however this would cause the restored pods to block indefinitely as they would be waiting for a volume restore that was not scheduled. To fix this, we use the original namespace of the pod from the backup to match the PVB to the pod being restored, not the new namespace where the pod is being restored into. Fixes vmware-tanzu#3467. Signed-off-by: Bridget McErlean <bmcerlean@vmware.com> * Explain why the namespace mapping can't be used Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
* Use pod namespace from backup when matching PVBs In vmware-tanzu#3051, we introduced an additional check to ensure that a PVB matched a particular pod by checking both the name and the namespace of the pod. This caused an issue when using a namespace mapping on restore. In the case where a namespace mapping is being used, the check for whether a PVB matches a particular pod will fail as the PVB was created for the original pod namespace and is not aware of the new namespace mapping being used. This resulted in PVRs not being created for pods that were being restored into new namespaces. The restic init containers were being created to wait on the volume restore, however this would cause the restored pods to block indefinitely as they would be waiting for a volume restore that was not scheduled. To fix this, we use the original namespace of the pod from the backup to match the PVB to the pod being restored, not the new namespace where the pod is being restored into. Fixes vmware-tanzu#3467. Signed-off-by: Bridget McErlean <bmcerlean@vmware.com> * Explain why the namespace mapping can't be used Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
Please add a summary of your change
In #3051, we introduced an additional check to ensure that a PVB matched
a particular pod by checking both the name and the namespace of the pod.
This caused an issue when using a namespace mapping on restore. In the
case where a namespace mapping is being used, the check for whether a
PVB matches a particular pod will fail as the PVB was created for the
original pod namespace and is not aware of the new namespace mapping
being used. This resulted in PVRs not being created for pods that were
being restored into new namespaces. The restic init containers were
being created to wait on the volume restore, however this would cause
the restored pods to block indefinitely as they would be waiting for a
volume restore that was not scheduled.
To fix this, we use the original namespace of the pod from the backup to
match the PVB to the pod being restored, not the new namespace where
the pod is being restored into.
Signed-off-by: Bridget McErlean bmcerlean@vmware.com
Does your change fix a particular issue?
Fixes #3467
Please indicate you've done the following:
/kind changelog-not-required
.site/content/docs/main
.