OCPBUGS-56104,OCPCLOUD-2893: Add related objects to must-gather config by honza · Pull Request #267 · openshift/cluster-capi-operator

honza · 2025-03-03T18:17:27Z

No description provided.

openshift-ci-robot · 2025-03-03T18:17:51Z

@honza: This pull request references OCPCLOUD-2893 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

nrb · 2025-03-11T21:06:22Z

/assign @damdo

nrb

I think as a first pass this is good - ideally, we'd use discovery to grab all CRDs in the group, but I think that at the moment it's more important to have the functionality than to spend time on the generator.

Also, the resources should be plural; hopefully you can just accept the suggestions to have it done automatically.

manifests/0000_30_cluster-api_12_clusteroperator.yaml

honza · 2025-03-12T15:58:04Z

ideally, we'd use discovery to grab all CRDs in the group,

Is there any prior art on this? I looked but haven't found anything. I have a WIP uncommitted code that tries to do this but it feels wrong.

nrb · 2025-03-12T16:07:06Z

Is there any prior art on this?

Not that I can find; I was hoping so, but perhaps it doesn't exist on purpose.

it feels wrong.

Why's that?

honza · 2025-03-12T16:12:55Z

func (r *ClusterOperatorStatusClient) relatedObjects() []configv1.ObjectReference {
	// TBD: Add an actual set of object references from getResources method
	infra, err := util.GetInfra(context.TODO(), r.Client)
	if err != nil {
		// TODO
	}
	platform, err := util.GetPlatform(context.TODO(), infra)
	if err != nil {
		// TODO
	}

	m := r.Scheme().AllKnownTypes()

	for groupVersionKind, t := range m {

		if strings.Contains(groupVersionKind.Kind, "Options") {
			continue
		}
		if groupVersionKind.Kind == "WatchEvent" {
			continue
		}
		if strings.Contains(groupVersionKind.Group, "cluster.x-k8s.io") {
			field, found := t.FieldByName("ObjectMeta")
			if !found {
				continue
			}

			// TODO: match based on platform ^^^
		}
	}

	return []configv1.ObjectReference{
		{Resource: "namespaces", Name: controllers.DefaultManagedNamespace},
		{Group: configv1.GroupName, Resource: "clusteroperators", Name: controllers.ClusterOperatorName},
		{Resource: "namespaces", Name: r.ManagedNamespace},
		{Group: "", Resource: "serviceaccounts", Name: "cluster-capi-operator", Namespace: controllers.DefaultManagedNamespace},
		{Group: "", Resource: "configmaps", Name: "cluster-capi-operator-images", Namespace: controllers.DefaultManagedNamespace},
		{Group: "apps", Resource: "deployments", Name: "cluster-capi-operator", Namespace: controllers.DefaultManagedNamespace},
		{Group: "cluster.x-k8s.io", Resource: "clusters", Namespace: controllers.DefaultManagedNamespace},
		{Group: "cluster.x-k8s.io", Resource: "machines", Namespace: controllers.DefaultManagedNamespace},
	}
}

damdo · 2025-03-14T07:27:42Z

Hey @honza thanks for this PR.

Looking at the gather-extra from one of the e2e jobs I only see machines/machinesets, and not for example core CAPI cluster.

From the classic must-gather instead I also see cluster, but I still don't see awscluster for example.

Any ideas on why?

theobarberbany · 2025-04-28T12:12:11Z

👋 Reviving this, as having this functionality in must-gathers would be awesome.

Is there anything I can do to help?

@honza, I think your code in the above comment looks ok as a starting point. If we can get it commited we can review? :)

honza · 2025-04-28T18:09:29Z

Let me clean it up and push.

honza · 2025-04-28T22:40:44Z

In the classic gather, we now have:

$ tree namespaces/openshift-cluster-api/infrastructure.cluster.x-k8s.io/
infrastructure.cluster.x-k8s.io/
├── awsclusters
│   └── ci-op-yi9j3qyc-c3c99-58vxn.yaml
├── awsmachines
│   ├── ci-op-yi9j3qyc-c3c99-58vxn-worker-us-east-2b-7r2bh.yaml
│   ├── ci-op-yi9j3qyc-c3c99-58vxn-worker-us-east-2b-fk8k8.yaml
│   └── ci-op-yi9j3qyc-c3c99-58vxn-worker-us-east-2c-2qrcx.yaml
└── awsmachinetemplates
    ├── ci-op-yi9j3qyc-c3c99-58vxn-worker-us-east-2b.yaml
    └── ci-op-yi9j3qyc-c3c99-58vxn-worker-us-east-2c.yaml

4 directories, 6 files

honza · 2025-04-28T22:53:23Z

I think the extra gather commands are here. We could add a big switch statement and collect the infra resources based on platform.

pkg/operatorstatus/operator_status.go

damdo · 2025-05-02T16:04:21Z

I think the extra gather commands are here. We could add a big switch statement and collect the infra resources based on platform.

Yeah we could 👍

honza · 2025-05-02T18:58:56Z

openshift/release#64484

damdo · 2025-05-09T14:37:08Z

@honza yes openshift/release#56322 covers the gather-extra, but for the must-gather, the change was missing. Thanks for adding it, let's confirm this is in the must-gather artifact before proceeding with the merge.

wking · 2025-05-09T16:04:21Z

manifests/0000_30_cluster-api_12_clusteroperator.yaml

+  - group: "infrastructure.cluster.x-k8s.io"
+    name: ""
+    namespace: openshift-cluster-api
+    resource: awsclusters


If you're actively managing these entries in your operator (which it looks like you are), I don't think you need to go through the work of manually listing them here in your ClusterOperator manifest. You need enough in the manifest that a must-gather while your operator isn't running collects enough to figure out why the operator isn't running. And it seems unlikely that folks would need to get all the way out to cloud-specific types to figure out why the operator wasn't running?

But also, 🤷, if you don't mind managing this list by hand or you find some way to automate it, having the cloud-specific types in this manifest doesn't hurt.

honza · 2025-05-09T17:52:47Z

$ tree namespaces/openshift-cluster-api/cluster.x-k8s.io/machinesets/
namespaces/openshift-cluster-api/cluster.x-k8s.io/machinesets/
├── ci-op-hq03lh3v-c3c99-2nvtb-worker-us-east-2b.yaml
└── ci-op-hq03lh3v-c3c99-2nvtb-worker-us-east-2c.yaml

1 directory, 2 files

honza · 2025-05-09T17:57:28Z

I also noticed this in the Azure job:

$ tree infrastructure.cluster.x-k8s.io
infrastructure.cluster.x-k8s.io
├── azureclusteridentities
│   └── ci-op-hq03lh3v-d9d1a-g4jn8.yaml
└── azureclusters
    └── ci-op-hq03lh3v-d9d1a-g4jn8.yaml

3 directories, 2 files

honza · 2025-05-09T18:25:59Z

vsphere is the same

My theory is that the e2e test deletes those resources before we gather. Does that sound right?

damdo · 2025-05-12T08:19:37Z

I also noticed this in the Azure job:

$ tree infrastructure.cluster.x-k8s.io
infrastructure.cluster.x-k8s.io
├── azureclusteridentities
│   └── ci-op-hq03lh3v-d9d1a-g4jn8.yaml
└── azureclusters
    └── ci-op-hq03lh3v-d9d1a-g4jn8.yaml

3 directories, 2 files

Yes in azure we create an azureclusteridentity in the infracluster controller, as that's required there.
But we only do it for Azure.

cluster-capi-operator/pkg/controllers/infracluster/azure.go

Lines 178 to 200 in ac5aa33

    
           azureClusterIdentity := &azurev1.AzureClusterIdentity{ 
        
           	ObjectMeta: metav1.ObjectMeta{ 
        
           		Name:      r.Infra.Status.InfrastructureName, 
        
           		Namespace: defaultCAPINamespace, 
        
           		Annotations: map[string]string{ 
        
           			// The ManagedBy Annotation is set so CAPI infra providers ignore the InfraCluster object, 
        
           			// as that's managed externally, in this case by the cluster-capi-operator's infracluster controller. 
        
           			clusterv1.ManagedByAnnotation: managedByAnnotationValueClusterCAPIOperatorInfraClusterController, 
        
           		}, 
        
           	}, 
        
           	Spec: azurev1.AzureClusterIdentitySpec{ 
        
           		Type:              azurev1.ServicePrincipal, 
        
           		AllowedNamespaces: &azurev1.AllowedNamespaces{NamespaceList: []string{defaultCAPINamespace}}, 
        
           		ClientID:          string(azureClientID), 
        
           		TenantID:          string(azureTenantID), 
        
           		ClientSecret:      corev1.SecretReference{Name: clusterSecretName, Namespace: defaultCAPINamespace}, 
        
           	}, 
        
           } 
        
           // The Azure Cluster Identtiy does not exist, so it needs to be created. 
        
           if err := r.Create(ctx, azureClusterIdentity); err != nil && !cerrors.IsAlreadyExists(err) { 
        
           	return fmt.Errorf("failed to create Azure Cluster Identity: %w", err) 
        
           }

vsphere is the same

My theory is that the e2e test deletes those resources before we gather. Does that sound right?

@honza What do you mean by "vsphere is the same"?
I didn't notice anything peculiar in the vpshere gathering:

[~/Downloads/must-gather (6)/registry-apps-build02-vmc-ci-openshift-org-ci-op-hq03lh3v-stable-sha256-72b6d67be00f7b3b2f2e00ac24632b102079b7d2c4f4f1effd38295b199fa19f] $ tree -L 2 namespaces/openshift-cluster-api/infrastructure.cluster.x-k8s.io/
namespaces/openshift-cluster-api/infrastructure.cluster.x-k8s.io/
└── vsphereclusters
    └── ci-op-hq03lh3v-6e495-d7vzg.yaml

2 directories, 1 file

honza · 2025-05-12T12:40:13Z

What do you mean by "vsphere is the same"? I didn't notice anything peculiar in the vpshere gathering

I was distracted by the missing infra machines in the gather. It looks like we're good to go?

damdo · 2025-05-12T16:54:10Z

Yeah at the moment I would only expect AWS to still have machines at the end of the e2e-capi-techpreview test when the must-gather runs.

The e2e's for all of the other platforms do clean up the machines/machinesets they create during the test before considering it passed.

damdo

/unhold

/lgtm

damdo · 2025-05-12T16:56:34Z

Thanks for working on this @honza 🎉

honza · 2025-05-12T17:07:48Z

/test e2e-aws-ovn-serial

damdo · 2025-05-12T17:15:19Z

The serial job passed but timed out on tear-down hitting the 4h mark: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-capi-operator/267/pull-ci-openshift-cluster-capi-operator-main-e2e-aws-ovn-serial/1920845242664226816

I am happy to override it

/override ci/prow/e2e-aws-ovn-serial

openshift-ci · 2025-05-12T17:16:51Z

@damdo: Overrode contexts on behalf of damdo: ci/prow/e2e-aws-ovn-serial

Details

In response to this:

The serial job passed but timed out on tear-down hitting the 4h mark: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-capi-operator/267/pull-ci-openshift-cluster-capi-operator-main-e2e-aws-ovn-serial/1920845242664226816

I am happy to override it

/override ci/prow/e2e-aws-ovn-serial

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

damdo · 2025-05-12T17:37:01Z

/retest

openshift-ci-robot · 2025-05-12T20:54:57Z

/retest-required

Remaining retests: 0 against base HEAD d131690 and 2 for PR HEAD 706d6f9 in total

damdo · 2025-05-13T07:34:22Z

The serial job passed but timed out on tear-down hitting the 4h mark: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-capi-operator/267/pull-ci-openshift-cluster-capi-operator-main-e2e-aws-ovn-serial/1920845242664226816

I am happy to override it

/override ci/prow/e2e-aws-ovn-serial

openshift-ci · 2025-05-13T07:35:10Z

@damdo: Overrode contexts on behalf of damdo: ci/prow/e2e-aws-ovn-serial

Details

In response to this:

The serial job passed but timed out on tear-down hitting the 4h mark: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-capi-operator/267/pull-ci-openshift-cluster-capi-operator-main-e2e-aws-ovn-serial/1920845242664226816

I am happy to override it

/override ci/prow/e2e-aws-ovn-serial

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

damdo · 2025-05-13T09:02:00Z

/test unit

damdo · 2025-05-13T10:42:28Z

/test unit

openshift-ci · 2025-05-13T11:06:13Z

@honza: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

damdo · 2025-05-13T12:05:28Z

/cherry-pick release-4.19

openshift-cherrypick-robot · 2025-05-13T12:06:09Z

@damdo: new pull request created: #296

Details

In response to this:

/cherry-pick release-4.19

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

damdo · 2025-05-13T12:20:55Z

/retitle OCPBUGS-56104,OCPCLOUD-2893: Add related objects to must-gather config

openshift-ci-robot · 2025-05-13T12:21:05Z

@honza: Jira Issue OCPBUGS-56104: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-56104 has been moved to the MODIFIED state.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-bot · 2025-05-14T08:37:26Z

[ART PR BUILD NOTIFIER]

Distgit: ose-cluster-capi-operator
This PR has been included in build ose-cluster-capi-operator-container-v4.20.0-202505140744.p0.g39748f2.assembly.stream.el9.
All builds following this will include this PR.

honza changed the title ~~Add related objects to must-gather config~~ OCPCLOUD-2893: Add related objects to must-gather config Mar 3, 2025

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 3, 2025

openshift-ci bot requested review from elmiko and racheljpg March 3, 2025 18:18

honza changed the title ~~OCPCLOUD-2893: Add related objects to must-gather config~~ WIP: OCPCLOUD-2893: Add related objects to must-gather config Mar 4, 2025

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 4, 2025

honza force-pushed the related-objects branch 2 times, most recently from 5ab4e21 to be85c0f Compare March 6, 2025 21:59

openshift-ci bot assigned damdo Mar 11, 2025

nrb reviewed Mar 11, 2025

View reviewed changes

honza force-pushed the related-objects branch from be85c0f to 0954a1c Compare March 12, 2025 15:59

Add related objects to must-gather config

919ba0b

honza force-pushed the related-objects branch from 0954a1c to 5517e54 Compare April 28, 2025 18:47

honza changed the title ~~WIP: OCPCLOUD-2893: Add related objects to must-gather config~~ OCPCLOUD-2893: Add related objects to must-gather config May 1, 2025

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 1, 2025

damdo reviewed May 2, 2025

View reviewed changes

pkg/operatorstatus/operator_status.go Outdated Show resolved Hide resolved

honza force-pushed the related-objects branch from 5517e54 to 7e7360c Compare May 2, 2025 17:07

honza force-pushed the related-objects branch from 7e7360c to 21481d8 Compare May 5, 2025 13:18

wking reviewed May 9, 2025

View reviewed changes

damdo reviewed May 12, 2025

View reviewed changes

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 12, 2025

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 12, 2025

openshift-merge-bot bot merged commit 39748f2 into openshift:main May 13, 2025
23 checks passed

openshift-cherrypick-robot mentioned this pull request May 13, 2025

[release-4.19] OCPBUGS-56105,OCPCLOUD-2893: Add related objects to must-gather config #296

Merged

openshift-ci bot changed the title ~~OCPCLOUD-2893: Add related objects to must-gather config~~ OCPBUGS-56104,OCPCLOUD-2893: Add related objects to must-gather config May 13, 2025

Conversation

honza commented Mar 3, 2025

Uh oh!

openshift-ci-robot commented Mar 3, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nrb commented Mar 11, 2025

Uh oh!

nrb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

honza commented Mar 12, 2025

Uh oh!

nrb commented Mar 12, 2025

Uh oh!

honza commented Mar 12, 2025

Uh oh!

damdo commented Mar 14, 2025

Uh oh!

theobarberbany commented Apr 28, 2025

Uh oh!

honza commented Apr 28, 2025

Uh oh!

honza commented Apr 28, 2025

Uh oh!

honza commented Apr 28, 2025

Uh oh!

Uh oh!

damdo commented May 2, 2025

Uh oh!

honza commented May 2, 2025

Uh oh!

damdo commented May 9, 2025

Uh oh!

wking May 9, 2025

Choose a reason for hiding this comment

Uh oh!

honza commented May 9, 2025

Uh oh!

honza commented May 9, 2025

Uh oh!

honza commented May 9, 2025

Uh oh!

damdo commented May 12, 2025

Uh oh!

honza commented May 12, 2025

Uh oh!

damdo commented May 12, 2025

Uh oh!

damdo left a comment

Choose a reason for hiding this comment

Uh oh!

damdo commented May 12, 2025

Uh oh!

honza commented May 12, 2025

Uh oh!

damdo commented May 12, 2025

Uh oh!

openshift-ci bot commented May 12, 2025

Uh oh!

damdo commented May 12, 2025

Uh oh!

openshift-ci-robot commented May 12, 2025

Uh oh!

damdo commented May 13, 2025

Uh oh!

openshift-ci bot commented May 13, 2025

Uh oh!

damdo commented May 13, 2025

Uh oh!

openshift-ci-robot commented Mar 3, 2025 •

edited by openshift-ci bot

Loading

damdo commented May 13, 2025 •

edited by openshift-ci bot

Loading