Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sticky query support #2899

Merged
merged 8 commits into from
Dec 12, 2019

Conversation

andrewjdawson2016
Copy link
Contributor

@andrewjdawson2016 andrewjdawson2016 commented Dec 10, 2019

Sticky query has many edge cases and small bugs it also lacks visibility and the dynamic domain controls. This diff fixes many of these issues. The code around sticky query is still messy and likely has some bugs in edge cases (it is tricky code). A long term fix that should be implemented is the worker should respond back to the server when stickiness is cleared this would greatly improve the quality of this bit of code.

At a high level this code does the following:

  • There was a sticky ttl for workflows which do not produce many tasks but it was not being used. Here I make sure it was used - here I ensure its used.
  • I added a dynamic config to disable sticky query per domain
  • I added a lot of metrics around query, sticky query and various time out cases
  • I fixed a bug in which a closed workflow attempts to reset stickiness and wrongly produces an error from QueryWorkflow.

@coveralls
Copy link

coveralls commented Dec 11, 2019

Coverage Status

Coverage increased (+0.5%) to 68.367% when pulling 6b61022 on andrewjdawson2016:StickyQuerySupport into 5f71093 on uber:master.

Copy link
Contributor

@venkat1109 venkat1109 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, some minor comments

@@ -92,6 +92,7 @@ struct GetMutableStateResponse {
150: optional i32 workflowState
160: optional i32 workflowCloseState
170: optional shared.VersionHistories versionHistories
180: optional bool isStickyTaskListEnabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this explicit field necessary ? can we simply set stickyTaskList to nil instead ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add this new filed. The reason is that we want the result of this call to match what we have in the database. Unless we clear stickiness in the database we cannot correctly return an empty stickyTaskList name.

return &h.QueryWorkflowResponse{Response: matchingResp}, nil
}
if yarpcError, ok := err.(*yarpcerrors.Status); !ok || yarpcError.Code() != yarpcerrors.CodeDeadlineExceeded {
e.logger.Error("query directly though matching on sticky failed, will not attempt query on non-sticky",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer emitting a metric (possibly domain scoped) to a log message here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this log already existed

return nil, err
}
if msResp.GetIsWorkflowRunning() {
e.logger.Info("query direct through matching failed on sticky, clearing sticky before attempting on non-sticky",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment about logging

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this log will be run a subset of the times that the log below will be run and the log below was already present.

}

if err := common.IsValidContext(ctx); err != nil {
e.logger.Info("query context timed out before query on non-sticky task list could be attempted",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can cause log noise - prefer metric to logging here

}
}

if err := common.IsValidContext(ctx); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also check if there is any time left before the context deadline ? There is no point in making a call if the context will be valid only for another 2ms

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the IsValidContext method to account for this generally.

@andrewjdawson2016 andrewjdawson2016 merged commit 97c01c9 into uber:master Dec 12, 2019
@andrewjdawson2016 andrewjdawson2016 deleted the StickyQuerySupport branch December 12, 2019 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants