-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce "Failed" phase in ANP status #4608
Conversation
Modify the ANP status phase as "Failed" when all Agents have reported the status and at least one Agent reports failure. Signed-off-by: wenyingd <wenyingd@vmware.com>
Codecov Report
@@ Coverage Diff @@
## main #4608 +/- ##
==========================================
+ Coverage 65.52% 69.61% +4.09%
==========================================
Files 390 400 +10
Lines 58147 58236 +89
==========================================
+ Hits 38098 40539 +2441
+ Misses 17361 14913 -2448
- Partials 2688 2784 +96
|
/test-all |
} else if currentNodes+len(failedNodes) == desiredNodes { | ||
phase = crdv1alpha1.NetworkPolicyFailed | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this calculation is needed? we may just check failedNodes only like below:
if len(failedNodes) > 0 {
phase = crdv1alpha1.NetworkPolicyFailed
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess this is one of the reasons why K8s API doc recommends to use conditions over phase.
Rather than encouraging clients to infer implicit properties from phases, we prefer to explicitly expose the individual conditions that clients need to monitor.
When there are both failed Nodes and realizing Nodes, it's hard to say which phase name is more proper, "Failed" or "Realizing", user may infer different properties regardless of which one is used. But since there is already an individual condition "RealizationFailure" to indicate failed Node exist, I think the phase should describe the overall progress of realization instead of whether there is any failure, so if currentNodes+len(failedNodes) == desiredNodes
makes more sense to me.
cc @jianjuns for opinions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question - in K8s Node case, do we have permanent realization failure - that is agent already gave up retrying realizing?
I agree "failed" and "realized" seems not match, but on the other hand people also wonder why we just have "realized" which seems a condition too (rather a phase), but no phase to express failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Pod case:
- If the Pod's restart policy is never
- If any of containers is still running, the phase is Running
- If none of containers is running, and there is any failure, the phase is Failed
- If none of contaienrs is running, and there is no failure, the phase is Succeeded
- If the Pod's restart policy is always
- The phase is running regardless whethere there is running Pod or failed Pod
Its principle seems: if the system keeps trying to reach desired state, the phase will always be "XXXing"; if the system is asked to not retry, the phase will be "XXXed".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our case, does agent always keep realizing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Antrea Agent always try to realize the ANP rules. Nephe may set "RealizationFailure: true" when the ANP rules are failed to realize on cloud.
Any other comments on this patch @tnqn @jianjuns @luolanzone ? |
/test-all |
/test-e2e |
/test-e2e |
Set the ANP status phase as "Failed" when all Agents have reported the status and at least one Agent reports failure. Signed-off-by: wenyingd <wenyingd@vmware.com>
Modify the ANP status phase as "Failed" when all Agents have reported the status and at least one failure is received.
Signed-off-by: wenyingd wenyingd@vmware.com