Skip to content

Conversation

@dandavison
Copy link
Contributor

Adds a new feature that confirms the behavior when a client sends an update to a workflow that is continuing-as-new. Specifically, the test guarantees that the update is admitted while the WFT is in-flight. This confirms that:

  1. Updates do not block continue as new (unlike signals)
  2. An update that is admitted while the workflow is continuing as new lands on the post-CAN workflow run.

Only implemented for Python so far; let's confirm the approach before extending implementation to other languages.

@dandavison dandavison force-pushed the can-update branch 3 times, most recently from 3f05068 to 95386da Compare January 21, 2025 00:05
async def _check_result() -> None:
# See docstring at top of file.
# Cause an update to be admitted while the first WFT is in progress
first_run_wft_is_in_progress.wait()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a thread-blocking call in asyncio. Can you use https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.loop.run_in_executor? Would also request on the in-workflow wait but that makes a bit less sense since blocking the workflow event loop is the goal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a thread-blocking call in asyncio.

Yes that's deliberate, could you read the docstring at the top of the file and let me know if you agree with the strategy?

Copy link
Contributor Author

@dandavison dandavison Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither run_in_executor nor wait_condition can be used here since the purpose is to block the event loop in order to hold up WFT processing.

EDIT: Ignore, on re-reading I see you were explicitly not referring to the thread-blocking in workflow code.

Copy link
Member

@cretz cretz Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
first_run_wft_is_in_progress.wait()
asyncio.get_running_loop().run_in_executor(None, first_run_wft_is_in_progress.wait)

^ That is the safe way to make a thread-blocking call like this. You don't want to block the primary event loop (blocking the workflow's event loop is different). Blocking the primary event loop prevents all other asyncio systems in the process from running.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the code is doing this correctly already. This entire coroutine is run in a new event loop in a new thread. See line 93 below:

await asyncio.to_thread(lambda: asyncio.run(_check_result()))

Copy link
Member

@cretz cretz Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see, then I think this can bring up a different issue: our client is not safe for running in separate event loops (because it is not safe for running in separate threads)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. Modified to just do the thread-blocking synchronization itself in a new thread, using asyncio.to_thread:

await asyncio.to_thread(first_run_wft_is_in_progress.wait)

update_has_been_admitted.set()
# The workflow will now CAN. Wait for the update result
update_run_id = await update_task
# The update should have been handled on the post-CAN run.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be worth two extra assertions: 1) that the update event is not in the history of the pre-CAN run, and 2) that the update is in the history of the post-CAN run

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, added these.

@dandavison dandavison force-pushed the can-update branch 2 times, most recently from b327842 to e00988f Compare January 23, 2025 13:42
@dandavison dandavison requested a review from cretz January 23, 2025 13:46
Copy link
Member

@cretz cretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do fear with single-language features there is no incentive or even reminder to do these in other languages. I wonder if we should track the work anywhere.

@dandavison
Copy link
Contributor Author

I do fear with single-language features there is no incentive or even reminder to do these in other languages.

It's not clear that this should be a feature test at all, so didn't seem any point in doing it in multiple langauges. If the retries are done by Frontend / History gRPC interceptors / clients as they are supposed to be then, while it's fine to add it as an integration test in SDKs, it's unclear that there's a need for a feature test across SDKs.

I wonder if we should track the work anywhere.

One possibility would be to set up a nag of some sort that finds feature tests with less than the full complement of SDKs.

@dandavison dandavison merged commit 45af634 into main Jan 23, 2025
21 checks passed
@dandavison dandavison deleted the can-update branch January 23, 2025 22:56
mihaelabalas84 added a commit to fairmoney/temporal-features that referenced this pull request Dec 16, 2025
* Build docker images on all main CI jobs (temporalio#545)

* Add PHP dockerfile building (temporalio#541)

Co-authored-by: Josh Berry <josh.berry@temporal.io>

* Publish PHP in the Dockerhub (temporalio#546)

* Centralize dynamic config (temporalio#543)

* Update update tests for Java SDK v1.26.1 (temporalio#555)

Update SDK to support Java SDK v1.26.0

* Update gradle to v8 (temporalio#554)

* PHP SDK integration (temporalio#485)

* feat: add PHP initialization

* chore(PHP): implement RoadRunner run command generator

* feat(PHP): add ClassLocator; implement Workflow and Activity classes detection and loading

* chore(PHP): RoadRunner now is run from PHP script that will start also client code

* chore(PHP): implemented checks starting; refactoring; implemented query/successful_query feature

* chore(PHP): better notification about failures

* fix(PHP): fixed task queue binding and workflow run with correct task queue

* chore(PHP): add workflow injector that runs workflow with correct queue and options based on an attribute

* feat(PHP): add the RoadRunner runner service to be able to stop and rerun RoadRunner; add `query/timeout_due_to_no_active_workers` feature

* chore(PHP): add feature `query/unexpected_arguments`

* chore(PHP): finish all the `query` features

* chore(PHP): add activity features: `basic_no_workflow_timeout`, `cancel_try_cancel` and `retry_on_error`

* chore(PHP): add Child Workflow feature `signal`

* chore(PHP): add Child Workflow features: `result` and `throws_on_execute`

* chore(PHP): add feature `continue_as_new/continue_as_same`

* chore(PHP): add feature `eager_workflow/successful_start"`

* feat(PHP): add ability to inject client Interceptor provider via feature attributes

* chore(PHP): add feature `data_converter/json`

* chore(PHP): add feature `data_converter/json_protobuf`

* chore(PHP): add feature `data_converter/empty`

* chore(PHP): add feature `data_converter/failure`

* feat(PHP): support custom Payload Converters in client and server sides

* chore(PHP): add feature `data_converter/codec`

* fix(PHP): fix constants conflict in feature files; place binary proto converter after the json proto converter

* chore(PHP): add feature `data_converter/binary_protobuf`

* chore(PHP): add feature `data_converter/binary`

* chore(PHP): add feature `eager_activity/non_remote_activities_worker`

* chore(PHP): add feature `schedule\backfill`

* chore(PHP): add feature `schedule\basic`

* chore(PHP): add feature `schedule\pause`

* chore(PHP): add feature `schedule\trigger`

* chore(PHP): add feature `signal\activities`

* chore(PHP): add feature `signal\basic`

* chore(PHP): add feature `signal\child_workflow`

* chore(PHP): add feature `signal\external`

* chore(PHP): add feature `signal\prevent_close`

* chore(PHP): add feature `signal\signal_with_start`

* chore(PHP): add case into the `signal\signal_with_start` feature

* chore(PHP): add case `update\activities`

* chore(PHP): add case `update\async_accepted`

* chore(PHP): add case `update/basic`

* chore(PHP): add case `update/basic_async`

* chore(PHP): add case `update/client_interceptor`

* chore(PHP): add case `update/deduplication`

* chore(PHP): add case `update/non_durable_reject`

* chore(PHP): add case `update/self`

* chore(PHP): add case `update/task_failure`

* chore(PHP): add case `update/validation_replay`

* feat(PHP): configure KV module; add case `update/worker_restart`

* chore(PHP): support testing in a separated dir

* chore(PHP): add PHP dockerfile; cleanup

* chore(PHP): fix PHP dockerfile; add github workflow

* chore(PHP): polish update/* tests

* chore(PHP): fix todos

* chore: Sync with PHP SDK 2.11

* ci: Fix PHP version detection before Docker image building

* ci: add commands to build PHP image

* ci: add prefix `v` for php-ver input

* ci: fix typo

* Ignore platform req on composer install

* Skip data_converter/failure last check

* Skip one of update/task_failure tests

* Fix schedule/basic test: add 10 sec timeout to find schedule

* Fix installing dependencies on PHP image building

* Optimize PHP dockerfile

* Mark eager_activity tests skipped

* Add GITHUB_TOKEN to download RoadRunner without a limit

* Improve comment about BuildPhpProgramOptions.Version

* Skip `eager_workflow` test if the server doesn't support it on the ServerCapabilities level

* Fix dynamic config values being passed to cli

* Replace `frontend` with `system` for `forceSearchAttributesCacheRefreshOnRead`,  `enableActivityEagerExecution` and `enableEagerWorkflowStart` options

* Include `dynamicconfig` into php docker image

---------

Co-authored-by: Spencer Judge <spencer@temporal.io>

* PHP: fix a floating error in the worker_restart feature check (temporalio#570)

* Upgrade go sdk, ignore Deployment history field (temporalio#578)

* Fix CI (temporalio#577)

* Bump PHP version

* Log at WARN level

* Fix PHP test

* Bump sdk-go

* Scrub new Deployment field

* Use temporal. prefix

* Skip check with env var

* Skip failing PHP test

* Delete PHP feature

* Python: reduce log spam and show failed features (temporalio#574)

* Reduce log spam and show failed features

* Replace ListOpenWorkflow and ListClosedWorkflow with ListWorkflow (temporalio#579)

* Skip one more history check in cloud (temporalio#583)

* Skip one more history check in cloud

* Bump python 1.7.0 => 1.9.0 (temporalio#569)

* Use EventuallyWithT to prevent reset_and_delete feature flakiness (temporalio#586)


---------

Co-authored-by: Chad Retz <chad.retz@gmail.com>

* updates_do_not_block_continue_as_new (temporalio#584)

* Test update doesn't block CAN and is handled on next run

* Rename Go harness Runner.Assert to SoftAssert and document it (temporalio#587)

* uv + maturin migration (temporalio#600)

* [PHP] Update CI and tests (temporalio#601)

* Restore PHP workflows

* Use actions/download-artifact v4

* Enable logging for RR run command

* Fix workflow id in `continue as new` test

* Add debugging on date interval unmarshalling

* Start workflows with WorkflowIdReusePolicy as `AllowDuplicate`
Remove debug

* Update min PHP SDK version

* Increase RPC operations timeouts

* Add deployment version features (temporalio#611)

* Don't rely on TS FailureConverter returning a TemporalFailure (temporalio#618)

See related change: temporalio/sdk-typescript#1685

Also upgraded Go in docker image to ensure the Go image build action passes.

* Bump sdk version (temporalio#621)

* bump sdk-python (temporalio#622)

* bump sdk-python

* Add CODEOWNERS

* Pin protobufjs to 7.5.1 (temporalio#625)

* Fixed SDK version mismatch in .Net build. Made `--version` optional in `prepare`. Updated vulnerable dependencies for .Net. (temporalio#628)

Previously, .Net harness and tests were built against the hardcoded SDK version but run with the version specified in command line. This would make the code sometimes fail mysteriously due to binary incompatibilities. Now, the harness and tests are always built and run with the same SDK version.

`run` command already supported omitting `--version` argument (individual languages still do a check if it's required for them). `prepare` still required it for no real reason, so I changed it to match `run`'s behavior.

.Net SDK 9 changed behavior of NuGet audit to also check transitive dependencies by default. We were using a few vulnerable transitive dependencies, which caused .Net build to fail when using .Net 9.0 (it works with 8.0). Updating dependencies fixed the issue.

* Bump python (temporalio#629)

* TypeScript: Remove pin on grpc-js v1.10.10 (temporalio#626)

* Bump Python SDK (temporalio#638)

* Explicitly use rust in python build (temporalio#640)

* Fix flake in timeout_due_to_no_active_workers on TypeScript (temporalio#641)

* Update Rust to at least 1.88 (temporalio#642)

* Add missing TS dependency to @grpc/grpc-js (temporalio#643)

* Versioning breaking changes (temporalio#631)

* Update gomod to use unreleased version

* Handle breaking changes

* Update gomods

---------

Co-authored-by: Andrew Yuan <theandrewyuan@gmail.com>

* Fix broken pyproj version lookup (temporalio#654)

* Update graceful worker shutdown (temporalio#627)

* Edit README for shutdown

* PR feedback

* heartbeating blurb

* Be more specific about core vs. lang timeout behavior

* Add links, add TS TODO

* Build with custom stdout/stderr (temporalio#662)

* [TS] Fix the client injection activity interceptor conflicting with new upstream API (temporalio#667)

* Activity shutdown tests (temporalio#660)

* Wrote Go test

* Added Java

* Dotnet done

* ts and python

* clean up comments

* formatting

* fix TS test

* clean up, increase timeout to allow for 3rd activity to be scheduled

* test seems fine locally, run in CI with extra prints

* test passed in CI, remove prints

* wait on event instead of sleep

* forgot to run format on python and java

* Offline Python (temporalio#679)

* Add TLS Server Name Config (temporalio#686)

* Update TS Dockerfile to Node 22/Bullseye (temporalio#688)

* Bump Python SDK version and saner version syntax (temporalio#668)

* Handle if crt/key are Uint8Array (temporalio#689)

* Handle if crt/key are Uint8Array

* formatting/linting

* [PHP] Add test case for child_workflow/cancel_abandon (temporalio#644)

* Add test case for child_workflow/cancel_abandon

* Resolve TODO

* Update Cancel Abandoned Child Workflow tests

* Add close method to cancel workflow

* Add the ability to pass the ca cert for doing mtls server verification (temporalio#690)

* Set explicit permissions for GitHub Actions workflows (temporalio#693)

* Update to python >= 3.10 and sdk-python 1.19.0 (temporalio#699)

* Update to python >= 3.10 and sdk-python 1.19.0

* update features python program to new min python version

* chore(typescript): switch to pnpm (temporalio#698)

* chore: switch to pnpm

* skip setting up pnpm cache

* update python sdk to 1.20.0 (temporalio#704)

* remove autosetup (temporalio#707)

* remove autosetup

* casing and remove env vars that are not needed

* Revert "remove autosetup (temporalio#707)" (temporalio#710)

This reverts commit c1fa10c.

* Remove Autosetup and Add default namespace creation for GHA (temporalio#709)

* create default namespace after server start

* add some debugging logs

* more logging

* move to local docker compose

* add retries to test with cloud

* log heatlh check response

* refine the restart config and add some logging to the curl command

* try different retry logic

* Use FM forked repos

---------

Co-authored-by: Roey Berman <roey@temporal.io>
Co-authored-by: Aleksei Gagarin <roxblnfk@ya.ru>
Co-authored-by: Josh Berry <josh.berry@temporal.io>
Co-authored-by: Spencer Judge <spencer@temporal.io>
Co-authored-by: Quinn Klassen <klassenq@gmail.com>
Co-authored-by: Ivan Gantsev <ivangancev@yandex.ru>
Co-authored-by: Antonio Lain <135073478+antlai-temporal@users.noreply.github.com>
Co-authored-by: Dan Davison <dan.davison@temporal.io>
Co-authored-by: Chad Retz <chad.retz@gmail.com>
Co-authored-by: James Watkins-Harvey <mjameswh@users.noreply.github.com>
Co-authored-by: Maciej Dudkowski <maciej.dudkowski@temporal.io>
Co-authored-by: tconley1428 <tconley1428@gmail.com>
Co-authored-by: Andrew Yuan <theandrewyuan@gmail.com>
Co-authored-by: Andrew Yuan <andrew.yuan@temporal.io>
Co-authored-by: Stephan Behnke <stephanos@users.noreply.github.com>
Co-authored-by: kepe-temporal <kepe.bonner@temporal.io>
Co-authored-by: Thomas Hardy <thestaffofmoses@gmail.com>
Co-authored-by: Kent Gruber <kent.picat.gruber@gmail.com>
Co-authored-by: Alex Mazzeo <alex.mazzeo@temporal.io>
Co-authored-by: Chris Olszewski <chrisdolszewski@gmail.com>
Co-authored-by: Alex Stanfield <13949480+chaptersix@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants