Skip to content
This repository has been archived by the owner on Dec 13, 2023. It is now read-only.

Adding resiliency tests and relevant changes for QueueDAO. #1921

Merged
merged 2 commits into from
Oct 14, 2020

Conversation

kishorebanala
Copy link
Contributor

No description provided.

@codecov
Copy link

codecov bot commented Oct 12, 2020

Codecov Report

Merging #1921 into dev will increase coverage by 42.02%.
The diff coverage is 97.29%.

Impacted file tree graph

@@              Coverage Diff              @@
##                dev    #1921       +/-   ##
=============================================
+ Coverage     23.76%   65.79%   +42.02%     
+ Complexity     4052     3977       -75     
=============================================
  Files           372      301       -71     
  Lines         79070    18676    -60394     
  Branches      12640     1748    -10892     
=============================================
- Hits          18792    12288     -6504     
+ Misses        58335     5542    -52793     
+ Partials       1943      846     -1097     
Impacted Files Coverage Δ Complexity Δ
...tflix/conductor/server/resources/TaskResource.java 100.00% <ø> (+4.16%) 20.00 <0.00> (+1.00)
.../netflix/conductor/dyno/DynomiteConfiguration.java 57.14% <ø> (-2.86%) 5.00 <0.00> (-1.00)
...m/netflix/conductor/bootstrap/BootstrapModule.java 100.00% <ø> (ø) 2.00 <0.00> (ø)
...lix/conductor/core/execution/WorkflowExecutor.java 79.67% <96.42%> (+1.45%) 181.00 <0.00> (-2.00) ⬆️
...n/java/com/netflix/conductor/metrics/Monitors.java 67.93% <100.00%> (+0.49%) 45.00 <1.00> (+1.00)
...com/netflix/conductor/dao/mysql/MySQLQueueDAO.java 85.60% <100.00%> (+0.35%) 51.00 <2.00> (+2.00)
...tflix/conductor/dao/postgres/PostgresQueueDAO.java 91.12% <100.00%> (+0.21%) 54.00 <2.00> (+2.00)
...ava/com/netflix/conductor/server/ServerModule.java 100.00% <100.00%> (ø) 2.00 <0.00> (ø)
.../main/java/com/netflix/conductor/dao/QueueDAO.java 42.85% <0.00%> (-14.29%) 1.00% <0.00%> (-1.00%)
.../com/netflix/conductor/dao/mysql/MySQLBaseDAO.java 67.04% <0.00%> (-4.55%) 17.00% <0.00%> (-2.00%)
... and 89 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 76bf84f...61d403f. Read the comment docs.

@@ -187,6 +187,7 @@ public void removeTaskFromQueue(@PathParam("taskType") String taskType,
return taskService.getAllPollData();
}

@Deprecated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

// It's possible the terminate workflow call hits an exception as well, in that case we want to log both
// errors to help diagnosis.
try {
terminateWorkflow(workflowId, "Error when restarting the workflow");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not convinced that a try...catch block here is the right way to address this. The exception when terminating a workflow should be caught at a granular level within the terminate logic and this should be logged/handled accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exception is primarily for executionDAOFacade.createWorkflow(workflow), and terminateWorkflow is just a clean up as part of that exception. I'd still like to throw the exception for createWorkflow, irrespective of what happened with terminateWorkflow, while logging terminateWorkflow exception. Thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the exception handling for createWorkflow needs to be handled.
My comment is regarding the extended handling for the terminate part which seems out of place here. Something like this -

try {
    executionDAOFacade.createWorkflow(workflow);
} catch (Exception e) {
    Monitors.recordWorkflowStartError(workflowDef.getName(), WorkflowContext.get().getClientApp());
    LOGGER.error("Unable to restart workflow: {}", workflowDef.getName(), e);
    terminateWorkflow(workflowId, "Error when restarting the workflow");
    throw e;
}

@@ -0,0 +1,557 @@
package com.netflix.conductor.test.resiliency
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add license header.

pollResult == null
}

def "Verify updateTask succeeds when QueueDAO is unavailable"() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to add another test case for updateTask with IN_PROGRESS state?
This case only covers the updateTask with COMPLETED state, which does not directly use the queue, but the former uses queue to postpone.

taskResult.setCallbackAfterSeconds(120)
def result = taskResource.updateTask(taskResult)

then: "updateTask returns successfully without any exceptions"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update the text here?

def result = taskResource.updateTask(taskResult)

then: "updateTask returns successfully without any exceptions"
2 * queueDAO.postpone(*_) >> { throw new IllegalStateException("Queue postpone failed from Spy") }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is postpone invoked twice? Given that we retry twice, shouldn't this be 3?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RetryUtil we're using here retries up to the provided number, which should be greater than >=1.

…pdate task is called with IN_PROGRESS state.
@kishorebanala kishorebanala force-pushed the feature/queuedao_resiliency_improvements branch from 25641f5 to 61d403f Compare October 14, 2020 00:05
@kishorebanala kishorebanala merged commit ddafc5a into dev Oct 14, 2020
@kishorebanala kishorebanala deleted the feature/queuedao_resiliency_improvements branch October 14, 2020 06:44
TwoUnderscorez pushed a commit to TwoUnderscorez/conductor that referenced this pull request Jul 23, 2021
)

* Adding resiliency tests and relevant changes for QueueDAO.

* Improvements to QueueDAO resiliency changes to verify the case when update task is called with IN_PROGRESS state.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants