[GOBBLIN-2011] Fix bug where another host could report the job as skipped, then the …#3888
[GOBBLIN-2011] Fix bug where another host could report the job as skipped, then the …#3888Will-Lo wants to merge 2 commits intoapache:masterfrom
Conversation
…skipped jobstatus is used to check for concurrent flows
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #3888 +/- ##
============================================
- Coverage 46.67% 46.66% -0.01%
+ Complexity 11154 11152 -2
============================================
Files 2219 2219
Lines 87657 87660 +3
Branches 9621 9624 +3
============================================
- Hits 40911 40910 -1
- Misses 43055 43057 +2
- Partials 3691 3693 +2 ☔ View full report in Codecov by Sentry. |
| if (flowStatusList == null || flowStatusList.isEmpty()) { | ||
| return false; | ||
| } | ||
| FlowStatus flowStatus = flowStatusList.get(0); |
There was a problem hiding this comment.
can you make a comment to make clear that the first one is the most recent and may or may not match the pending flowExecutionId attempt and the second index is an older one? It's a bit hard to keep track of what's going on here without reading ur commit desc
|
I think we should not set the status "Failed" when the last execution is running. We should instead emit a new event "SKIPPED". With this any further execution should be able to correctly decide whether a new job should run or not. |
|
@arjun4084346 I agree that we should emit a new event SKIPPED but we will need to make sure that this is compatible with all clients, which can cause issues if they are set to expect only certain outputs. |
…skipped jobstatus is used to check for concurrent flows
Dear Gobblin maintainers,
Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
JIRA
Description
There's a bug that causes GaaS multileader to kick off unintended concurrent flows which happens in the order described below:
To resolve this, we need to ensure that we are looking at the past 2 flow executions, and follow the behavior:
Tests
Unit tests
Commits