-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SDN-5568: only provision workloads when network creation has started #29348
SDN-5568: only provision workloads when network creation has started #29348
Conversation
/test e2e-aws-ovn-virt-techpreview |
47890fa
to
7542b3d
Compare
udnManifest := generateUserDefinedNetworkManifest(&c) | ||
By(fmt.Sprintf("Creating UserDefinedNetwork %s/%s", c.namespace, c.name)) | ||
Expect(applyManifest(c.namespace, udnManifest)).To(Succeed()) | ||
Expect(waitForUserDefinedNetworkReady(c.namespace, c.name, udnCrReadyTimeout)).To(Succeed()) | ||
return networkAttachmentConfig{networkAttachmentConfigParams{networkName: c.namespace + "." + c.name}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ormergi how can I have access to the name of the network ?
Right now, the only thing I think I can do is to read the corresponding NAD, unmarshal, and return that ...
Is there something simpler ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@npinaeva how about you ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fetched it from the NAD generated for the UDN.
I think we can assume the name is the same (i.e. NAD.metadata.Name == UDN.metadata.Name)
If that changes, we'll need to be able to derive it somehow. Not thinking much about that, because the change must be backwards compatible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would fetch the NAD that created by the UDN controller and parse the network-name it got.
Doing so ensure tests wont break even if network-name will be changed, no need to go back to tests and do some adjustments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, that would make sense. I have the owner reference for that, right ? @ormergi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you can also fetch the NAD with the meta.name as the UDN object as well
/test e2e-aws-ovn-virt-techpreview |
3961085
to
f009222
Compare
/test e2e-aws-ovn-virt-techpreview |
Job Failure Risk Analysis for sha: f009222
|
/test e2e-aws-ovn-virt-techpreview |
/hold fix the product, not the test. |
Job Failure Risk Analysis for sha: 3e93667
|
out, err = runOcWithRetry(oc.AsAdmin(), "get", | ||
"pods", | ||
"-o", "name", | ||
"-n", "openshift-ovn-kubernetes", | ||
"--field-selector", fmt.Sprintf("spec.nodeName=%s", nodeName), | ||
"-l", "app=ovnkube-node") | ||
if err != nil { | ||
return false, err | ||
} | ||
outReader := bufio.NewScanner(strings.NewReader(out)) | ||
re := regexp.MustCompile("^pod/(.*)") | ||
for outReader.Scan() { | ||
match := re.FindSubmatch([]byte(outReader.Text())) | ||
if len(match) != 2 { | ||
continue | ||
} | ||
podName = string(match[1]) | ||
break | ||
} | ||
if podName == "" { | ||
return false, fmt.Errorf("could not find a valid ovnkube-node pod on node '%s'", nodeName) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this code looks for ovnkube-node pod name, cant it be done using the programmatic go-client (similar to line 508)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can, but I just copied this code from somewhere else - I still need to make this coalesce.
And still, I guess staff engineering opposes this PR: #29348 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored the code. At least now it's clear I'm re-using existent code that can be improved on a follow-up.
How about creating a copy of this test that is showing the UDN problem and making it minimum reproducer. Then gate that test on UDN. This ensures that the failure mode is obvious and we don't lose the signal. With the signal preserved, it is then ok to adjust other tests to compensate for a product bug because the signal won't be lost and will block the promotion of UDN. |
Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
Otherwise, we risk facing a situation where the primary UDN will not be plumbed into the worlkload. Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
This way, we can better debug when things go wrong. Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
This way, we can re-use this logic in other places without duplicating code Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
3e93667
to
f2451e8
Compare
/test e2e-aws-ovn-virt-techpreview
Did this in the top most commit: Isolated the raceful conditions in a test tagged w/ the UDN FG. Let's see if CI likes the patch ... |
Job Failure Risk Analysis for sha: 5fd52b8
|
The UDN "smoke signal" test was executed and passed in https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/29348/pull-ci-openshift-origin-master-e2e-aws-ovn-virt-techpreview/1868669031075024896 |
5fd52b8
to
db5ca09
Compare
… race is avoided Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
db5ca09
to
69491f0
Compare
@maiqueb: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Thanks for adding the targetted test /hold cancel |
/approve |
@maiqueb: This pull request references SDN-5568 which is a valid jira issue. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: deads2k, knobunc, maiqueb The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Job Failure Risk Analysis for sha: 69491f0
|
e3f94ae
into
openshift:master
/cherry-pick release-4.18 |
@jluhrsen: new pull request created: #29413 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
No description provided.