-
Notifications
You must be signed in to change notification settings - Fork 4.7k
NO-JIRA: [TNF] add Two Node Fencing exception to accept less than two etcd endpoints #30058
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
NO-JIRA: [TNF] add Two Node Fencing exception to accept less than two etcd endpoints #30058
Conversation
Skipping CI for Draft Pull Request. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: clobrano The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
func isTwoNodeFencingCheck(clientConfig *rest.Config) (bool, error) { | ||
configClient, err := clientconfigv1.NewForConfig(clientConfig) | ||
if err != nil { | ||
logrus.WithError(err).Error("Error creating config client to check for Single Node configuration") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this error message is misleading :D
@@ -52,6 +52,12 @@ func testStableSystemOperatorStateTransitions(events monitorapi.Intervals, clien | |||
isSingleNode = false | |||
} | |||
|
|||
isTwoNodeFencing, err := isTwoNodeFencingCheck(clientConfig) | |||
if err != nil { | |||
logrus.Warnf("Error checking for TwoNodeFencing Node configuration on upgrade (unable to make exception): %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably rephrase this:
"Error checking for DualReplica controlPlaneTopology (i.e. Two Node OpenShift with Fencing)"
@@ -156,6 +168,15 @@ func isSingleNodeCheck(clientConfig *rest.Config) (bool, error) { | |||
return exutil.IsSingleNode(context.Background(), configClient) | |||
} | |||
|
|||
func isTwoNodeFencingCheck(clientConfig *rest.Config) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're already printing the error here and not doing anything with it when it's bubbled up, I would just return a bool here and just print out the error, then simplify the call site.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. In this case, however, I intentionally mirrored the signature of isSingleNodeCheck
. Would you like to change that function as well, for consistency?
origin/pkg/monitortests/clusterversionoperator/legacycvomonitortests/operators.go
Lines 162 to 169 in b55b947
func isSingleNodeCheck(clientConfig *rest.Config) (bool, error) { | |
configClient, err := clientconfigv1.NewForConfig(clientConfig) | |
if err != nil { | |
logrus.WithError(err).Error("Error creating config client to check for Single Node configuration") | |
return false, err | |
} | |
return exutil.IsSingleNode(context.Background(), configClient) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, that would be great, thank you!
b55b947
to
236a8ba
Compare
I am not sure this change is actually working /hold |
236a8ba
to
058c1cf
Compare
…ints In a Two Node Fencing cluster (DualReplicaTopology) is it acceptable to have less than two etcd endpoints.
058c1cf
to
4894db4
Compare
The original change wasn't effective. I was looking at condition Message, but the pattern wasn't there. It seemed to work, because the test failure is not 100% reproducible. The change was moved to the library looking for duplicated pathological events, and activated only if topology is DualReplica Sorry for the initial wasted reviews, but now the PR is ready again 🙇 /unhold |
@clobrano: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Job Failure Risk Analysis for sha: 4894db4
|
return &SimplePathologicalEventMatcher{ | ||
name: "EtcdEndpointsConfigMissingDuringTwoNodeTests", | ||
locatorKeyRegexes: map[monitorapi.LocatorKey]*regexp.Regexp{ | ||
monitorapi.LocatorNamespaceKey: regexp.MustCompile(`^openshift-kube-apiserver-operator$`), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this apply to openshift-apiserver and oauth-apiserver too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I only observed events from openshift-kube-apiserver-operator
, do you think oauth-apiserver
might incurr in the same error?
}, | ||
messageReasonRegex: regexp.MustCompile(`^ConfigMissing$`), | ||
messageHumanRegex: regexp.MustCompile(`apiServerArguments\.etcd-servers has less than two live etcd endpoints`), | ||
topology: &dualReplicaTopology, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is single node filtered out too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comes from APIServer knowing there is more than 1 endpoint during normal lifecycle, and 1 is a degraded state, but not failure. So I'd say no, only DualReplica.
/jira refresh |
@clobrano: No Jira issue is referenced in the title of this pull request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/jira refresh |
@clobrano: No Jira issue is referenced in the title of this pull request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/jira refresh |
@clobrano: No Jira issue is referenced in the title of this pull request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@clobrano: This pull request explicitly references no jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/jira refresh |
@clobrano: This pull request explicitly references no jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
In a Two Node Fencing cluster is it acceptable to have less than two etcd endpoints.