-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Adding resiliency tests and relevant changes for QueueDAO. #1921
Conversation
Codecov Report
@@ Coverage Diff @@
## dev #1921 +/- ##
=============================================
+ Coverage 23.76% 65.79% +42.02%
+ Complexity 4052 3977 -75
=============================================
Files 372 301 -71
Lines 79070 18676 -60394
Branches 12640 1748 -10892
=============================================
- Hits 18792 12288 -6504
+ Misses 58335 5542 -52793
+ Partials 1943 846 -1097 Continue to review full report at Codecov.
|
@@ -187,6 +187,7 @@ public void removeTaskFromQueue(@PathParam("taskType") String taskType, | |||
return taskService.getAllPollData(); | |||
} | |||
|
|||
@Deprecated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
// It's possible the terminate workflow call hits an exception as well, in that case we want to log both | ||
// errors to help diagnosis. | ||
try { | ||
terminateWorkflow(workflowId, "Error when restarting the workflow"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not convinced that a try...catch
block here is the right way to address this. The exception when terminating a workflow should be caught at a granular level within the terminate logic and this should be logged/handled accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This exception is primarily for executionDAOFacade.createWorkflow(workflow)
, and terminateWorkflow
is just a clean up as part of that exception. I'd still like to throw the exception for createWorkflow
, irrespective of what happened with terminateWorkflow
, while logging terminateWorkflow
exception. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the exception handling for createWorkflow needs to be handled.
My comment is regarding the extended handling for the terminate part which seems out of place here. Something like this -
try {
executionDAOFacade.createWorkflow(workflow);
} catch (Exception e) {
Monitors.recordWorkflowStartError(workflowDef.getName(), WorkflowContext.get().getClientApp());
LOGGER.error("Unable to restart workflow: {}", workflowDef.getName(), e);
terminateWorkflow(workflowId, "Error when restarting the workflow");
throw e;
}
@@ -0,0 +1,557 @@ | |||
package com.netflix.conductor.test.resiliency |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add license header.
pollResult == null | ||
} | ||
|
||
def "Verify updateTask succeeds when QueueDAO is unavailable"() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to add another test case for updateTask with IN_PROGRESS state?
This case only covers the updateTask with COMPLETED state, which does not directly use the queue, but the former uses queue to postpone.
taskResult.setCallbackAfterSeconds(120) | ||
def result = taskResource.updateTask(taskResult) | ||
|
||
then: "updateTask returns successfully without any exceptions" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update the text here?
def result = taskResource.updateTask(taskResult) | ||
|
||
then: "updateTask returns successfully without any exceptions" | ||
2 * queueDAO.postpone(*_) >> { throw new IllegalStateException("Queue postpone failed from Spy") } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is postpone invoked twice? Given that we retry twice, shouldn't this be 3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RetryUtil we're using here retries up to the provided number, which should be greater than >=1.
…pdate task is called with IN_PROGRESS state.
25641f5
to
61d403f
Compare
No description provided.