OCPBUGS-56104,OCPCLOUD-2893: Add related objects to must-gather config#267
Conversation
|
@honza: This pull request references OCPCLOUD-2893 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
5ab4e21 to
be85c0f
Compare
|
/assign @damdo |
nrb
left a comment
There was a problem hiding this comment.
I think as a first pass this is good - ideally, we'd use discovery to grab all CRDs in the group, but I think that at the moment it's more important to have the functionality than to spend time on the generator.
Also, the resources should be plural; hopefully you can just accept the suggestions to have it done automatically.
Is there any prior art on this? I looked but haven't found anything. I have a WIP uncommitted code that tries to do this but it feels wrong. |
Not that I can find; I was hoping so, but perhaps it doesn't exist on purpose.
Why's that? |
func (r *ClusterOperatorStatusClient) relatedObjects() []configv1.ObjectReference {
// TBD: Add an actual set of object references from getResources method
infra, err := util.GetInfra(context.TODO(), r.Client)
if err != nil {
// TODO
}
platform, err := util.GetPlatform(context.TODO(), infra)
if err != nil {
// TODO
}
m := r.Scheme().AllKnownTypes()
for groupVersionKind, t := range m {
if strings.Contains(groupVersionKind.Kind, "Options") {
continue
}
if groupVersionKind.Kind == "WatchEvent" {
continue
}
if strings.Contains(groupVersionKind.Group, "cluster.x-k8s.io") {
field, found := t.FieldByName("ObjectMeta")
if !found {
continue
}
// TODO: match based on platform ^^^
}
}
return []configv1.ObjectReference{
{Resource: "namespaces", Name: controllers.DefaultManagedNamespace},
{Group: configv1.GroupName, Resource: "clusteroperators", Name: controllers.ClusterOperatorName},
{Resource: "namespaces", Name: r.ManagedNamespace},
{Group: "", Resource: "serviceaccounts", Name: "cluster-capi-operator", Namespace: controllers.DefaultManagedNamespace},
{Group: "", Resource: "configmaps", Name: "cluster-capi-operator-images", Namespace: controllers.DefaultManagedNamespace},
{Group: "apps", Resource: "deployments", Name: "cluster-capi-operator", Namespace: controllers.DefaultManagedNamespace},
{Group: "cluster.x-k8s.io", Resource: "clusters", Namespace: controllers.DefaultManagedNamespace},
{Group: "cluster.x-k8s.io", Resource: "machines", Namespace: controllers.DefaultManagedNamespace},
}
} |
|
Hey @honza thanks for this PR. Looking at the gather-extra from one of the e2e jobs I only see machines/machinesets, and not for example core CAPI cluster. From the classic must-gather instead I also see cluster, but I still don't see awscluster for example. Any ideas on why? |
|
👋 Reviving this, as having this functionality in must-gathers would be awesome. Is there anything I can do to help? @honza, I think your code in the above comment looks ok as a starting point. If we can get it commited we can review? :) |
|
Let me clean it up and push. |
|
In the classic gather, we now have: |
|
I think the extra gather commands are here. We could add a big switch statement and collect the infra resources based on platform. |
Yeah we could 👍 |
|
@honza yes openshift/release#56322 covers the gather-extra, but for the must-gather, the change was missing. Thanks for adding it, let's confirm this is in the must-gather artifact before proceeding with the merge. |
| - group: "infrastructure.cluster.x-k8s.io" | ||
| name: "" | ||
| namespace: openshift-cluster-api | ||
| resource: awsclusters |
There was a problem hiding this comment.
If you're actively managing these entries in your operator (which it looks like you are), I don't think you need to go through the work of manually listing them here in your ClusterOperator manifest. You need enough in the manifest that a must-gather while your operator isn't running collects enough to figure out why the operator isn't running. And it seems unlikely that folks would need to get all the way out to cloud-specific types to figure out why the operator wasn't running?
But also, 🤷, if you don't mind managing this list by hand or you find some way to automate it, having the cloud-specific types in this manifest doesn't hurt.
|
|
I also noticed this in the Azure job: |
|
vsphere is the same My theory is that the e2e test deletes those resources before we gather. Does that sound right? |
Yes in azure we create an cluster-capi-operator/pkg/controllers/infracluster/azure.go Lines 178 to 200 in ac5aa33
@honza What do you mean by "vsphere is the same"? |
I was distracted by the missing infra machines in the gather. It looks like we're good to go? |
|
Yeah at the moment I would only expect AWS to still have machines at the end of the e2e-capi-techpreview test when the must-gather runs. The e2e's for all of the other platforms do clean up the machines/machinesets they create during the test before considering it passed. |
|
Thanks for working on this @honza 🎉 |
|
/test e2e-aws-ovn-serial |
|
The serial job passed but timed out on tear-down hitting the 4h mark: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-capi-operator/267/pull-ci-openshift-cluster-capi-operator-main-e2e-aws-ovn-serial/1920845242664226816 I am happy to override it /override ci/prow/e2e-aws-ovn-serial |
|
@damdo: Overrode contexts on behalf of damdo: ci/prow/e2e-aws-ovn-serial DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest |
|
The serial job passed but timed out on tear-down hitting the 4h mark: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-capi-operator/267/pull-ci-openshift-cluster-capi-operator-main-e2e-aws-ovn-serial/1920845242664226816 I am happy to override it /override ci/prow/e2e-aws-ovn-serial |
|
@damdo: Overrode contexts on behalf of damdo: ci/prow/e2e-aws-ovn-serial DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/test unit |
1 similar comment
|
/test unit |
|
@honza: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/cherry-pick release-4.19 |
|
@damdo: new pull request created: #296 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retitle OCPBUGS-56104,OCPCLOUD-2893: Add related objects to must-gather config |
|
@honza: Jira Issue OCPBUGS-56104: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-56104 has been moved to the MODIFIED state. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[ART PR BUILD NOTIFIER] Distgit: ose-cluster-capi-operator |
No description provided.