Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Client: Adds Workflow-Reset API #687

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ustulation
Copy link

@ustulation ustulation commented Nov 8, 2024

What was changed

The client did didn't have a reset workflow api exposed, although the implementation modules have the necessary bits. This PR add the reset API.

Why?

This is just like any other missing api (eg. If terminate() was missing). Without this I need to create boilerplate to reset a workflow in my python codebase. However since that is external to this library it doesn't get the exact same treatment. For eg. the client-impl in this library is wrapped in a series of interceptors if interceptors were provided. But an external code doesn't benefit from any of this, unless we write even more boilerplate.

Checklist

  1. Closes

  2. How was this tested:
    This is the code similar to what I implemented in my application and that works.

@ustulation ustulation requested a review from a team as a code owner November 8, 2024 17:56
@CLAassistant
Copy link

CLAassistant commented Nov 8, 2024

CLA assistant check
All committers have signed the CLA.

@cretz
Copy link
Member

cretz commented Nov 12, 2024

We intentionally do not expose reset as a high-level client call in any SDK as it is considered an advanced/operator call and not one expected to be invoked regularly (well we kinda expose in Go but make you use the proto objects there too). There are several other calls we don't expose high-level client calls for either (e.g. delete). At the least we'd need to design a high-level form across all SDKs and not specific to Python. But like delete and others, we do not want this seen as a regular client call, so we encourage use of the advanced/direct gRPC API if needing to do this advanced thing.

Out of curiosity, can you give more details on your programmatic reset use case?

@ustulation
Copy link
Author

@cretz cheers for looking into this!

Let me start with this:

Out of curiosity, can you give more details on your programmatic reset use case?

While I cannot give you the details due to NDA, I'm just going to cook one example up (so pls imagine this was a serious example instead of a contrived one it actually is). The main idea is that we need a workflow which can be reset to prior point by the end user. The end user doesn't know this is temporal behind the scenes and so on. We provide an overlay DSL and graphs with lineages so the user actually sees that activity-0 is the parent of activity-1, which is the parent of activity-2 and 3 which are thus siblings (executed in parallel) and they converge into activity-4 and so on. In normal case they will run "uninterrupted". However more often than not, activity-1 (for eg) had fetched data which was wrong but which the user had told us was right. Only when a much later activity-8 was being executed was this caught. The user was told that we are unable to progress even after retries and are stuck on activity-8. So they check and conclude that their external data was wrong and they have now corrected it. They now want to rewind all the way back to activity-1 and start from exactly there again. Work done by all the activities prior to activity-1 is still valid thus all the states (inputs and outputs) leading up to activity-1 are valid. Just that activity-1 needs to re-fetch updated data which will thus affect its output and thus all the downstream activities, invalidating them.

Like imagine a web-form where each page is backed by a temporal activity and subsequent page can only be filled once the current page is submitted because it depends on some data from the prev page. However the user is free to go back to any page and edit and redo things from there (ofc losing all the work from the point of rewind).

Reset achieved this perfectly. It preserves all the state up to the point of reset (no activities are actually executed), discards the rest of it (well it terminates the original workflow which is also what we are happy with) and resumes from the point of reset as if it were the first time the workflow is running them.


We intentionally do not expose reset as a high-level client call in any SDK as it is considered an advanced/operator call [...]

I don't see what is "advanced" about it, can you please explain the perils of exposing it? The way I see it is, I can very easily do this via the UI, just like I can very easily terminate or cancel via UI too. If I can trivially do it via the UI I want to also be able to trivially do it via the API. IMO, the decision of whether one presses the button (or equivalently invokes the functionality through the sdk) should lie with the button-clicker/sdk-invoker.

If there are critical drawbacks to using it, it should a part of documentation, just like I'm sure it would have been a part of the UI design via popup-warnings with notes about what you should be aware of before proceeding etc. (there isn't any).

so we encourage use of the advanced/direct gRPC API if needing to do this advanced thing

That would be annoying UX for devs. I hope you reconsider. But ofc my points are based on (1) my lack of understanding of what is so "advanced" about this (2) critical drawback in using this feature (which otherwise has been working perfectly for me for nearly a week now and clients are happy). So if you answer those then I might educate myself.

There are several other calls we don't expose high-level client calls for either (e.g. delete).

I didn't even know delete was a thing. Is it there in the UI? If no then it's hidden anyway. IMO sdk-clients is pretty solid and a joy to use. It should handhold and protect users via documentation, not by deliberate omission of otherwise accessible features.

@cretz
Copy link
Member

cretz commented Nov 13, 2024

I don't see what is "advanced" about it, can you please explain the perils of exposing it?

Reset is meant for workflow-stuck situations and to be manually/explicitly performed by an operator on individual workflows. If you have a user need like you have, you should consider having this in your workflow logic. Maybe that's a loop that is re-iterated based on some user signal. It is clearer/safer in code to model the ability for users to restart some activities than to rely on the blunt reset instrument.

Like imagine a web-form where each page is backed by a temporal activity and subsequent page can only be filled once the current page is submitted because it depends on some data from the prev page. However the user is free to go back to any page and edit and redo things from there (ofc losing all the work from the point of rewind).

This would be much better modeled in workflow logic than using reset to go back. You can use a simple state machine and/or keep track of the steps completed and discard ones if you want to jump back or whatever. Can share/discard state however you might. Think of how you might write this in Python without Temporal from a Python logic approach.

@ustulation
Copy link
Author

You can use a simple state machine and/or keep track of the steps completed and discard ones if you want to jump back or whatever. Can share/discard state however you might. Think of how you might write this in Python without Temporal from a Python logic approach.

But isn't that what temporal is doing for me out of the box when I use the Reset feature? If I wasn't using temporal then I would be doing tons more of state management anyway and yes would have no choice but to state-manage this too. But I am using temporal and it does this bit of state-management for me OOB, so why can't I leverage it?

I can understand if you didn't intend this to be a use case while designing the feature but I still don't see the:

It is clearer/safer in code to model the ability for users to restart some activities than to rely on the blunt reset instrument.

Which is where my confusion is. You are asserting a few things (and I acknowledge it), I'm just asking you to evidence it. Eg. (1) Why cleaner? (2) Why safer? (3) What's so blunt about reset that relying on it is being frowned upon and I'm being asked to write potentially a lot more boilerplate in application logic?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants