-
Notifications
You must be signed in to change notification settings - Fork 103
Update Client: Adds Workflow-Reset API #687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We intentionally do not expose reset as a high-level client call in any SDK as it is considered an advanced/operator call and not one expected to be invoked regularly (well we kinda expose in Go but make you use the proto objects there too). There are several other calls we don't expose high-level client calls for either (e.g. Out of curiosity, can you give more details on your programmatic reset use case? |
@cretz cheers for looking into this! Let me start with this:
While I cannot give you the details due to NDA, I'm just going to cook one example up (so pls imagine this was a serious example instead of a contrived one it actually is). The main idea is that we need a workflow which can be reset to prior point by the end user. The end user doesn't know this is temporal behind the scenes and so on. We provide an overlay DSL and graphs with lineages so the user actually sees that Like imagine a web-form where each page is backed by a temporal activity and subsequent page can only be filled once the current page is submitted because it depends on some data from the prev page. However the user is free to go back to any page and edit and redo things from there (ofc losing all the work from the point of rewind).
I don't see what is "advanced" about it, can you please explain the perils of exposing it? The way I see it is, I can very easily do this via the If there are critical drawbacks to using it, it should a part of documentation, just like I'm sure it would have been a part of the
That would be annoying UX for devs. I hope you reconsider. But ofc my points are based on (1) my lack of understanding of what is so "advanced" about this (2) critical drawback in using this feature (which otherwise has been working perfectly for me for nearly a week now and clients are happy). So if you answer those then I might educate myself.
I didn't even know |
Reset is meant for workflow-stuck situations and to be manually/explicitly performed by an operator on individual workflows. If you have a user need like you have, you should consider having this in your workflow logic. Maybe that's a loop that is re-iterated based on some user signal. It is clearer/safer in code to model the ability for users to restart some activities than to rely on the blunt reset instrument.
This would be much better modeled in workflow logic than using reset to go back. You can use a simple state machine and/or keep track of the steps completed and discard ones if you want to jump back or whatever. Can share/discard state however you might. Think of how you might write this in Python without Temporal from a Python logic approach. |
But isn't that what temporal is doing for me out of the box when I use the Reset feature? If I wasn't using temporal then I would be doing tons more of state management anyway and yes would have no choice but to state-manage this too. But I am using temporal and it does this bit of state-management for me OOB, so why can't I leverage it? I can understand if you didn't intend this to be a use case while designing the feature but I still don't see the:
Which is where my confusion is. You are asserting a few things (and I acknowledge it), I'm just asking you to evidence it. Eg. (1) Why cleaner? (2) Why safer? (3) What's so blunt about reset that relying on it is being frowned upon and I'm being asked to write potentially a lot more boilerplate in application logic? |
Because logic defined in a workflow is clearer to reason about than logic outside of a workflow. If you have code you need to re-execute for whatever reason, it is clearer for authors/readers of the code to see that logic is expected. It is easier to understand that code may be re-run if the code logic shows that it may be re-run. It also will be easier to reason about in history. You'll also be able to have more control over what re-runs, what state is retained, and other advanced things in code than you will with reset.
There are limitations to workflow reset such as not working with child workflows, limitations with how workflow execution timeout works, etc. You also have to understand stable reset points which can be hard to grasp.
Both reasons stated above. Reset has limitations, is advanced because of those limitations (and understanding its use), and it is clearer to reason about workflows whose code represents the code paths that will be taken. |
Cheers! Yeah the thing is we don't hardcode a workflow. It's a DSL on top of which folks (which is also us) can cobble up a workflow. As with many DSLs, it will have limitations in expressibility compared to writing every workflow out in a general purpose programming language. Can we change it in such a way as to accommodate internal resets? Definitely. However it's a bit nuanced in that we need to suppress graph arrows (otherwise every node will have a connection to every other node etc) and remember what the input to the node being reset to was so some extra state management needs to be done. Again all this is ofc possible, however I still find vending out reset points and leveraging what temporal already does to be simpler. Yeah I agree that finding reset points might be a bit surprising at first since there is no API that directly takes in an activity and resets to that (or something similar). What I do right now is take the activity name to reset to from the users (they just click on the right "box" in the displayed DSL graph in UI), get the event history, find the immediately preceding workflow-task-completed and use that as the reset point. This works, specially if the point of reset in the graph was linear with no parallel activities (sibling activities). We don't have child workflows so we are not affected by that (yet). I still feel that the shortcomings should be caveated but the API exposed for those who want to make use of it, even if for an interim solution. It's there in the UI after all, so just feels off to not be able to easily do it programmatically. In AWS for instance it's usually the opposite, everything that can be done in the console can be done via sdk, but vice versa is definitely not true for so many services. |
It is made more easily available for operators in UI and CLI because we do want to make it easy for operators. We do offer it via direct workflow service form as
This is also true in Temporal, including reset |
@cretz Is this the same as this:
in this PR? Can you give me some pointers on how to correctly use it? This is my understanding:
However I'm a little unsure about the This is the part that I'm a bit unsure about. As a library-caller, am I going to miss something subtle if I rely on |
Yes, similar to what is in this PR, just called from the outside, e.g. await my_client.workflow_service.reset_workflow_execution(
temporalio.api.workflowservice.v1.ResetWorkflowExecutionRequest(
namespace=my_client.namespace,
workflow_execution=temporalio.api.common.v1.WorkflowExecution(
workflow_id="my-wf-id"
run_id="my-optional-run-id"
),
reason="my-optional-reason",
workflow_task_finish_event_id="my-event-id",
request_id=str(uuid.uuid4()),
reset_reapply_exclude_types=my_exclude_types,
)
) A workflow handle is not involved since that is effectively just a high-level workflow/run ID holder.
Not really going to miss out on anything since this is basically what we call from the high-level client. Interceptors are for high-level calls only, which this is not.
Yes, we very intentionally make the |
Ah right, that was my only concern and why I raised this PR and also wrote that in the PR description above:
But if that's not a problem then yeah I don't really care whether there's a high or a low level facility to call and get the job done. Thank you very much for engaging and clarifying this! I suppose I should close this PR then (since you won't be exposing this via Thanks once again! |
No prob, happy to help. I can close. |
What was changed
The client
diddidn't have a reset workflow api exposed, although the implementation modules have the necessary bits. This PR add the reset API.Why?
This is just like any other missing api (eg. If
terminate()
was missing). Without this I need to create boilerplate to reset a workflow in my python codebase. However since that is external to this library it doesn't get the exact same treatment. For eg. the client-impl in this library is wrapped in a series of interceptors if interceptors were provided. But an external code doesn't benefit from any of this, unless we write even more boilerplate.Checklist
Closes
How was this tested:
This is the code similar to what I implemented in my application and that works.