-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Emit WorkflowNodeRunning Event #5531
feat: Emit WorkflowNodeRunning Event #5531
Conversation
1afc23a
to
4d40abe
Compare
Codecov Report
@@ Coverage Diff @@
## master #5531 +/- ##
==========================================
- Coverage 46.91% 46.84% -0.08%
==========================================
Files 244 244
Lines 15209 15286 +77
==========================================
+ Hits 7135 7160 +25
- Misses 7169 7218 +49
- Partials 905 908 +3
Continue to review full report at Codecov.
|
4d40abe
to
b1ef536
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about comparing woc.orig to woc.wf at the end of persistUpdates?
Hmm, that does look like a potentially interesting approach. I'll take a look tonight and see if I can make that work. My initial thought is if it does work, then we won't need node event logic anywhere else, but I may be misunderstanding what |
I have an unstated secondary goal here. The existing code is not robust. If we can fix it in this PR by a little redesign, that’s a win. |
@alexec, I have updated the implementation to do all I don't think this is a problem, as event consumers must assume that the order may not be totally deterministic (and Argo or Kubernetes don't make any guarantees along those lines). |
Shoot, that functional test failure looks related to these changes. I'll have to take another look on Friday. |
Running those tests locally succeeded. Closing/reopening to try again. |
Functional test failed in a timeout waiting for events, probably waiting for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're making good progess here
Implemented your suggestions in my local checkout, but it appears the functional test has revealed a real problem with these changes. This is what I see running locally:
Still trying to find a solution to that issue... UPDATE: There is a Slack conversation about this for anyone looking to provide assistance. :) |
I checkout out your code changes and found the failed unit tests passes locally. I would sync with master and retry. |
1818a70
to
b4f8f1d
Compare
Rebased onto latest master, but still see the same error running the functional test:
Full error output from the controller logs:
|
- Issue argoproj#5320 Signed-off-by: Kenny Trytek <kenneth.g.trytek@gmail.com>
- Remove redundant onNodeComplete function. - Separate recording event in assess_node_status to its own conditional statement. - Rename function for clarity. Signed-off-by: Kenny Trytek <kenneth.g.trytek@gmail.com>
- Consolidate all WorkflowNode* events to one code path. Signed-off-by: Kenny Trytek <kenneth.g.trytek@gmail.com>
- Remove extra loop in event-sending logic. - Move phase event recording to immediately after woc is updated. Signed-off-by: Kenny Trytek <kenneth.g.trytek@gmail.com>
- Reassign woc.wf.TypeMeta during persistUpdates, since the Workflow representation returned from the server does not add this field. Signed-off-by: Kenny Trytek <kenneth.g.trytek@gmail.com>
fafdef0
to
26ba3cd
Compare
@alexec This is ready for review whenever you are available. |
- Use DeepCopy for event reporting to avoid modifying the active node. Signed-off-by: Kenny Trytek <kenneth.g.trytek@gmail.com>
- Use objects instead of pointers in the method signature to simplify things. Signed-off-by: Kenny Trytek <kenneth.g.trytek@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We’re there!
@kennytrytek I have installed Argo version 3.0.7 but I still don't see WorkflowNodeRunning event, is it expected? |
@Sushant20, I believe it was released in Argo 3.1.0. I know it's mentioned in the v3.0.3 cherry-pick, but that does not accurately reflect the release version. |
Signed-off-by: Kenny Trytek kenneth.g.trytek@gmail.com
Checklist
Feature Request
This pull request adds a
Running
event to WorkflowNode execution, outlined in #5320.Testing This Change
Besides the updated unit and functional tests, I ran this change locally with Argo Events installed and a sensor and trigger set up to deliver webhooks to another locally-running server.
Below is a running log of my testing steps as I iterated on this change. Discrete thoughts/iterations are separated by a horizontal line.
Running the hello-world example, I have so far not seen any
WorkflowNodeRunning
event anywhere. I'm not sure why this is the case, but I will continue to look and see what I am missing. I thought writing towoc.eventRecorder
would be sufficient to create these events, but maybe there is another step that I am not aware of.Running this example from the
operator_test.go
, I get a different result locally than in the test.Workflow:
Test result in
operator_test.go
:Local result from
argo submit
of that workflow with an Argo Events sensor:Note that
"Normal WorkflowNodeRunning Running node steps-events[0].a"
is missing. I will continue to investigate why this occurs.Simplifying this test, I see the same results.
Workflow:
operator_test.go
result:Local Argo result:
Okay, found it. It's this case where the node completes very quickly and is never registered as "running."
argo-workflows/workflow/controller/operator.go
Line 1598 in d964fe4
Now to decide how to handle it. Should Argo always emit a
WorkflowNodeRunning
event in the case the node ran to completion, even if the event is artificial? I think it should, so that the events emitted are predictable and not susceptible to race conditions.Okay, force-pushed my changes since this PR is still in draft.
Latest results running this workflow:
With an Argo event source configured to send events to a locally-running web server, I get these results:
The results here are exactly what was expected: WorkflowRunning, WorkflowNodeRunning, WorkflowNodeSucceeded, WorkflowSucceeded.
✅