-
Notifications
You must be signed in to change notification settings - Fork 801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sticky query support #2899
Sticky query support #2899
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, some minor comments
@@ -92,6 +92,7 @@ struct GetMutableStateResponse { | |||
150: optional i32 workflowState | |||
160: optional i32 workflowCloseState | |||
170: optional shared.VersionHistories versionHistories | |||
180: optional bool isStickyTaskListEnabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this explicit field necessary ? can we simply set stickyTaskList to nil instead ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add this new filed. The reason is that we want the result of this call to match what we have in the database. Unless we clear stickiness in the database we cannot correctly return an empty stickyTaskList name.
return &h.QueryWorkflowResponse{Response: matchingResp}, nil | ||
} | ||
if yarpcError, ok := err.(*yarpcerrors.Status); !ok || yarpcError.Code() != yarpcerrors.CodeDeadlineExceeded { | ||
e.logger.Error("query directly though matching on sticky failed, will not attempt query on non-sticky", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefer emitting a metric (possibly domain scoped) to a log message here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this log already existed
return nil, err | ||
} | ||
if msResp.GetIsWorkflowRunning() { | ||
e.logger.Info("query direct through matching failed on sticky, clearing sticky before attempting on non-sticky", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment about logging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this log will be run a subset of the times that the log below will be run and the log below was already present.
} | ||
|
||
if err := common.IsValidContext(ctx); err != nil { | ||
e.logger.Info("query context timed out before query on non-sticky task list could be attempted", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can cause log noise - prefer metric to logging here
} | ||
} | ||
|
||
if err := common.IsValidContext(ctx); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also check if there is any time left before the context deadline ? There is no point in making a call if the context will be valid only for another 2ms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the IsValidContext method to account for this generally.
This reverts commit 6189d1d.
ae33e68
to
6b61022
Compare
Sticky query has many edge cases and small bugs it also lacks visibility and the dynamic domain controls. This diff fixes many of these issues. The code around sticky query is still messy and likely has some bugs in edge cases (it is tricky code). A long term fix that should be implemented is the worker should respond back to the server when stickiness is cleared this would greatly improve the quality of this bit of code.
At a high level this code does the following: