E2E migration cases don't support k8s cluster switch correctly #8292
Description
What steps did you take and what happened:
Prepare two k8s clusters.
Run the Velero E2E test cases(including the migration cases) on those clusters.
Take this CLI as an example:
CLOUD_PROVIDER=azure \
VELERO_SERVER_DEBUG_MODE=true \
DEFAULT_CLUSTER=nightly-test-1728875035788-azure-default-6-default \
STANDBY_CLUSTER=nightly-test-1728875035788-azure-standby-6-standby \
DEFAULT_CLUSTER_NAME=nightly-test-1728875035788-azure-default-6 \
STANDBY_CLUSTER_NAME=nightly-test-1728875035788-azure-standby-6 \
PLUGINS=gcr.io/velero-gcp/velero-plugin-for-microsoft-azure:main \
CREDS_FILE=/velero/workspace/E2E-debug/azure-credential BSL_CONFIG=resourceGroup=velero-nightly,storageAccount=veleronightly,subscriptionId=2261f3e7-d159-48fe-95a3-0e6a96e11159 \
BSL_BUCKET=velero-e2e-testing-1728875035788 \
ADDITIONAL_BSL_PLUGINS=gcr.io/velero-gcp/velero-plugin-for-aws:main \
ADDITIONAL_OBJECT_STORE_PROVIDER=aws ADDITIONAL_BSL_CONFIG=region=minio,s3ForcePathStyle=true,s3Url=http://minio.minio.svc:9000/ \
ADDITIONAL_BSL_BUCKET=velero-e2e-testing ADDITIONAL_BSL_PREFIX=additional \
ADDITIONAL_CREDS_FILE=/velero/workspace/E2E-debug/minio-credential-additional \
VELERO_IMAGE=gcr.io/velero-gcp/velero:main \
RESTORE_HELPER_IMAGE=gcr.io/velero-gcp/velero-restore-helper:main VERSION=main \
STANDBY_CLUSTER_CLOUD_PROVIDER=azure \
STANDBY_CLUSTER_OBJECT_STORE_PROVIDER=aws \
STANDBY_CLUSTER_PLUGINS=gcr.io/velero-gcp/velero-plugin-for-microsoft-azure:main \
DISABLE_INFORMER_CACHE=true \
VERSION=main \
REGISTRY_CREDENTIAL_FILE=/root/.docker/config.json \
GINKGO_LABELS=(!LongTime) \
KIBISHII_DIRECTORY=/velero/workspace/E2E-debug/e2e/distributed-data-generator/kubernetes/yaml/ \
make test-e2e
The E2E failed randomly. The error always happened after running a migration case.
What did you expect to happen:
The E2E should run successfully.
[FAILED] in [It] - /velero/workspace/E2E-debug/e2e/velero/test/e2e/backups/deletion.go:76 @ 10/14/24 03:36:10.91
Test case failed and fail fast is enabled. Skip resource clean up.
• [FAILED] [21.079 seconds]
Velero tests of snapshot backup deletion when kibishii is the sample workload [It] Deleted backups are deleted from object storage and backups deleted from object storage can be deleted locally [Backups, Deletion, Snapshot, SkipVanillaZfs]
/velero/workspace/E2E-debug/e2e/velero/test/e2e/backups/deletion.go:75
[FAILED] Failed to run backup deletion test
Expected success, but got an error:
<*errors.withStack | 0xc0008302b8>:
Failed to install and prepare data for kibishii backup-deletion-1-a924f5cd-2fb0-4958-9167-6a1546f648a0: Failed to install Kibishii workload: failed to install kibishii, stderr=# Warning: 'bases' is deprecated. Please use 'resources' instead. Run 'kustomize edit fix' to update your Kustomization automatically.
Error from server (NotFound): error when creating "github.com/vmware-tanzu-experiments/distributed-data-generator/kubernetes/yaml/azure": namespaces "backup-deletion-1-a924f5cd-2fb0-4958-9167-6a1546f648a0" not found
Error from server (NotFound): error when creating "github.com/vmware-tanzu-experiments/distributed-data-generator/kubernetes/yaml/azure": namespaces "backup-deletion-1-a924f5cd-2fb0-4958-9167-6a1546f648a0" not found
Error from server (NotFound): error when creating "github.com/vmware-tanzu-experiments/distributed-data-generator/kubernetes/yaml/azure": namespaces "backup-deletion-1-a924f5cd-2fb0-4958-9167-6a1546f648a0" not found
Error from server (NotFound): error when creating "github.com/vmware-tanzu-experiments/distributed-data-generator/kubernetes/yaml/azure": namespaces "backup-deletion-1-a924f5cd-2fb0-4958-9167-6a1546f648a0" not found
Error from server (NotFound): error when creating "github.com/vmware-tanzu-experiments/distributed-data-generator/kubernetes/yaml/azure": namespaces "backup-deletion-1-a924f5cd-2fb0-4958-9167-6a1546f648a0" not found
Error from server (NotFound): error when creating "github.com/vmware-tanzu-experiments/distributed-data-generator/kubernetes/yaml/azure": namespaces "backup-deletion-1-a924f5cd-2fb0-4958-9167-6a1546f648a0" not found
Error from server (NotFound): error when creating "github.com/vmware-tanzu-experiments/distributed-data-generator/kubernetes/yaml/azure": namespaces "backup-deletion-1-a924f5cd-2fb0-4958-9167-6a1546f648a0" not found
Error from server (NotFound): error when creating "github.com/vmware-tanzu-experiments/distributed-data-generator/kubernetes/yaml/azure": namespaces "backup-deletion-1-a924f5cd-2fb0-4958-9167-6a1546f648a0" not found
Error from server (NotFound): error when creating "github.com/vmware-tanzu-experiments/distributed-data-generator/kubernetes/yaml/azure": namespaces "backup-deletion-1-a924f5cd-2fb0-4958-9167-6a1546f648a0" not found
: exit status 1
{
error: <*errors.withMessage | 0xc00090a380>{
cause: <*errors.withStack | 0xc000830258>{
error: <*errors.withMessage | 0xc00090a360>{
cause: <*errors.withStack | 0xc000830228>{
error: <*errors.withMessage | 0xc00090a340>{
cause: <*exec.ExitError | 0xc00090a320>{
ProcessState: {
pid: 23290,
status: 256,
rusage: {
Utime: {Sec: ..., Usec: ...},
Stime: {Sec: ..., Usec: ...},
Maxrss: 176904,
Ixrss: 0,
Idrss: 0,
Isrss: 0,
Minflt: 41783,
Majflt: 0,
Nswap: 0,
Inblock: 0,
Oublock: 133816,
Msgsnd: 0,
Msgrcv: 0,
Nsignals: 0,
Nvcsw: 11355,
Nivcsw: 5676,
},
},
Stderr: nil,
},
msg: "failed to install kibishii, stderr=# Warning: 'bases' is deprecated. Please use 'resources' instead. Run 'kustomize edit fix' to update your Kustomization automatically.\nError from server (NotFound): error when creating \"github.com/vmware-tanzu-experiments/distributed-data-generator/kubernetes/yaml/azure\": namespaces \"backup-deletion-1-a924f5cd-2fb0-4958-9167-6a1546f648a0\" not found\nError from server (NotFound): error when creating \"github.com/vmware-tanzu-experiments/distributed-data-generator/kubernetes/yaml/azure\": namespaces \"backup-deletion-1-a924f5cd-2fb0-4958-9167-6a1546f648a0\" not found\nError from server (NotFound): error when creating \"github.com/vmware-tanzu-experiments/distributed-data-generator/kubernetes/yaml/azure\": namespaces \"backup-deletion-1-a924f5cd-2fb0-4958-9167-6a1546f648a0\" not found\nError from server (NotFound): error when creating \"github.com/vmware-tanzu-experiments/distributed-data-generator/kubernetes/yaml/azure\": namespaces \"backup-deletion-1-a924f5cd-2fb0-4958-9167-6a1546f648a0\" not found\nError from server (NotFound): error when creating \"github.com/vmware-tanzu-experiments/distributed-data-generator/kubernetes/yaml/azure\": namespaces \"backup-deletion-1-a924f5cd-2fb0-4958-9167-6a1546f648a0\" not found\nError from server (NotFound): error when creating \"github.com/vmware-tanzu-experiments/distributed-data-generator/kubernetes/yaml/azure\": namespaces \"backup-deletion-1-a924f5cd-2fb0-4958-9167-6a1546f648a0\" not found\nError from server (NotFound): error when creating \"github.com/vmware-tanzu-experiments/distributed-data-generator/kubernetes/yaml/azure\": namespaces \"backup-deletion-1-a924f5cd-2fb0-4958-9167-6a1546f648a0\" not found\nError from server (NotFound): error when creating \"github.com/vmware-tanzu-experiments/distributed-data-generator/kubernetes/yaml/azure\": namespaces \"backup-deletion-1-a924f5cd-2fb0-4958-9167-6a1546f648a0\" not found\nError from server (NotFound): error when creating \"github.com/vmware-tanzu-experiments/distributed-data-generator/kubernetes/yaml/azure\": namespaces \"backup-deletion-1-a924f5cd-2fb0-4958-9167-6a1546f648a0\" not found\n",
},
stack: [0x1e935dd, 0x1e949e5, 0x1e98650, 0x1e97aa5, 0x89a393, 0x8ae54d, 0x47b261],
},
msg: "Failed to install Kibishii worklo...
Gomega truncated this representation as it exceeds 'format.MaxLength'.
Consider having the object provide a custom 'GomegaStringer' representation
or adjust the parameters in Gomega's 'format' package.
The following information will help us better understand what's going on:
This error happened due to the current E2E test cases having multiple ways to communicate with the Kubernetes API server.
- The cases use
kubectl
CLI to switch cluster contexts, and the casesvelero
CLI to create and delete thebackup
andrestore
resources. - The cases also use the
client-go
to talk to the Kubernetes API server to create k8s resources.
The migration cases use the kubectl
CLI to switch the k8s clusters. That change modifies the kubeconfig
. All the CLI commands depending on the ~/.kube/config
can take effect.
But the client-go
cannot share the same k8s cluster switch result.
The test case failure happened because the kubectl
switched to the standby cluster to install the Velero, but the client-go
created the backup target namespaces on the active cluster. As a result the following procedure on the standby cluster failed to find the created namespaces.
If you are using velero v1.7.0+:
Please use velero debug --backup <backupname> --restore <restorename>
to generate the support bundle, and attach to this issue, more options please refer to velero debug --help
If you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)
kubectl logs deployment/velero -n velero
velero backup describe <backupname>
orkubectl get backup/<backupname> -n velero -o yaml
velero backup logs <backupname>
velero restore describe <restorename>
orkubectl get restore/<restorename> -n velero -o yaml
velero restore logs <restorename>
Anything else you would like to add:
Environment:
- Velero version (use
velero version
): - Velero features (use
velero client config get features
): - Kubernetes version (use
kubectl version
): - Kubernetes installer & version:
- Cloud provider or hardware configuration:
- OS (e.g. from
/etc/os-release
):
Vote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.
- 👍 for "I would like to see this bug fixed as soon as possible"
- 👎 for "There are more important bugs to focus on right now"