Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GDScript: Avoid deadlock possibility in multi-threaded load #93032

Merged
merged 2 commits into from
Jun 28, 2024

Conversation

RandomShaper
Copy link
Member

@RandomShaper RandomShaper commented Jun 11, 2024

We already have a mechanism in WorkerThreadPool by which it collaborates with CommandQueueMT by unlocking its mutex before starting a wait and relocking it afterwards. That is a deadlock prevention mechanism that lets certain otherwise ill situations to make progress.

I found that in multi-threaded load situations (with RESOURCE_CACHE_MODE_IGNORE), the GDScript cache can incurr in a deadlock. That's the case if multiple threads interact with GDScriptCache::get_full_script(). By letting the WorkerThreadPool unlock the corresponding mutex, the deadlock is broken.

The first commit just makes the existing system more generic (mostly a refactor). The second makes GDScriptCache opt-in for it. It's interesting to keep both changes separate for bisections.

NOTE: Due to a risk of regressions, as this is slippery ground, you power maintainers may want to reschedule to 4.4.

UPDATE: Fixes #85255.

@RandomShaper RandomShaper added this to the 4.3 milestone Jun 11, 2024
@RandomShaper RandomShaper force-pushed the wtp_antilock branch 8 times, most recently from 40ad0fa to 6ce9a98 Compare June 12, 2024 16:30
@RandomShaper RandomShaper marked this pull request as ready for review June 12, 2024 16:32
@RandomShaper RandomShaper requested review from a team as code owners June 12, 2024 16:32
@clayjohn clayjohn modified the milestones: 4.3, 4.4 Jun 12, 2024
@clayjohn
Copy link
Member

NOTE: Due to a risk of regressions, as this is slippery ground, you power maintainers may want to reschedule to 4.4.

I agree. The code seems good, but since it isn't fixing any reported regressions it's best to wait until 4.4

@RandomShaper
Copy link
Member Author

This turns out to fix an important issue. We may want to reasses milestone.

@akien-mga akien-mga modified the milestones: 4.4, 4.3 Jun 19, 2024
@akien-mga
Copy link
Member

This seems to break the threads=no build:

In file included from platform/web/http_client_web.cpp:31:
In file included from platform/web/http_client_web.h:34:
In file included from ./core/io/http_client.h:34:
In file included from ./core/crypto/crypto.h:36:
In file included from ./core/io/resource_loader.h:36:
./core/object/worker_thread_pool.h:240:18: error: class member cannot be redeclared
  240 |         static uint32_t thread_enter_unlock_allowance_zone(BinaryMutex *p_mutex);
      |                         ^
./core/object/worker_thread_pool.h:239:18: note: previous declaration is here
  239 |         static uint32_t thread_enter_unlock_allowance_zone(Mutex *p_mutex);
      |                         ^

@RandomShaper
Copy link
Member Author

This seems to break the threads=no build:

In file included from platform/web/http_client_web.cpp:31:
In file included from platform/web/http_client_web.h:34:
In file included from ./core/io/http_client.h:34:
In file included from ./core/crypto/crypto.h:36:
In file included from ./core/io/resource_loader.h:36:
./core/object/worker_thread_pool.h:240:18: error: class member cannot be redeclared
  240 |         static uint32_t thread_enter_unlock_allowance_zone(BinaryMutex *p_mutex);
      |                         ^
./core/object/worker_thread_pool.h:239:18: note: previous declaration is here
  239 |         static uint32_t thread_enter_unlock_allowance_zone(Mutex *p_mutex);
      |                         ^

I wonder why it doesn't support the overloading. However, I can just preprocess out those for web builds. I'll push with that.

@RandomShaper RandomShaper force-pushed the wtp_antilock branch 2 times, most recently from 836026b to 25ccc06 Compare June 19, 2024 09:13
@akien-mga
Copy link
Member

I wonder why it doesn't support the overloading. However, I can just preprocess out those for web builds. I'll push with that.

That's because in !defined(THREAD_ENABLED), those are defined to be the same:

using Mutex = MutexImpl;
using BinaryMutex = MutexImpl;

So it's the same definition twice.

…eneric mechanism

This is strictly beyond a refactor because it also changes when the mutexes are relocked,
but that's only for extra safety.
@akien-mga akien-mga merged commit 6f8b90e into godotengine:master Jun 28, 2024
16 checks passed
@akien-mga
Copy link
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ResourceLoader.load_threaded_request() hangs for use_sub_threads = true with no error messages
3 participants