Skip to content

Conversation

@igancev
Copy link
Contributor

@igancev igancev commented Dec 31, 2024

What's wrong

The WorkerRestartTest has a floating error (race condition between phpunit runtime and runtime into Roadrunner workers activity). The WorkerRestartTest is unstable.

How it works now:

The key-value cache (StorageInterface) has an in-memory implementation. Therefore set values are reset upon roadrunner restart.

  1. First the activity is locked, key KV_ACTIVITY_BLOCKED is set to true
  2. Then Roadrunner is restarted
    2.1. $runner->stop();, KV_ACTIVITY_BLOCKED key is reset, because storage is in-memory
    2.2. $runner->start();
    2.3. Roadrunner starts the workers which may execute activity code immediately
  3. Then the activity is unlocked (key KV_ACTIVITY_BLOCKED is set to false)

The activity code throws an exception, if key KV_ACTIVITY_BLOCKED has a value that is not boolean (null by default if key does't exists).

What's the problem

If step 3 starts before step 2.2, the test will pass.

But if an activity on step 2.3 in a roadrunner worker starts executing before step 3, (race condition), the test fails with ApplicationFailure KV BLOCKED key not set exception.

It depends only on the test execution environment and on the luck.

Proposed solution

By default the activity is blocked, no matter if a value in the cache exists. The activity is only unblocked after setting the KV_ACTIVITY_BLOCKED key to false. No exception is thrown.

Also the problem would not exist if the storage was persistent of course.

@vercel
Copy link

vercel bot commented Dec 31, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
php ✅ Ready (Inspect) Visit Preview 💬 Add feedback Dec 31, 2024 3:30pm

@CLAassistant
Copy link

CLAassistant commented Dec 31, 2024

CLA assistant check
All committers have signed the CLA.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@roxblnfk
Copy link
Collaborator

roxblnfk commented Jan 1, 2025

Nice catch!

@roxblnfk
Copy link
Collaborator

roxblnfk commented Jan 1, 2025

The same can be fixed there https://github.com/temporalio/features/blob/main/features/update/worker_restart/feature.php

@roxblnfk roxblnfk merged commit 6598397 into temporalio:master Jan 1, 2025
58 checks passed
@igancev
Copy link
Contributor Author

igancev commented Jan 1, 2025

The same can be fixed there https://github.com/temporalio/features/blob/main/features/update/worker_restart/feature.php

pr ready temporalio/features#570

@roxblnfk roxblnfk added the Tests Update tests or testing tools label Jan 4, 2025
@roxblnfk roxblnfk added this to the 2.13.0 milestone Jan 4, 2025
@roxblnfk roxblnfk modified the milestones: 2.13.0, 2.12.1 Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Tests Update tests or testing tools

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants