Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rethink, consolidate and document Mutex's owner contract #3626

Open
qwwdfsad opened this issue Feb 14, 2023 · 5 comments
Open

Rethink, consolidate and document Mutex's owner contract #3626

qwwdfsad opened this issue Feb 14, 2023 · 5 comments

Comments

@qwwdfsad
Copy link
Collaborator

qwwdfsad commented Feb 14, 2023

Mutex primitive provides the concept of ownership -- an additional token passed to every operation to ensure that the lock is not being operated with the same owner concurrently, catches (not supported) reentrancy and prevents logical races that are otherwise hard to debug.

Notably, every locking operation has the following note:

When owner is specified (non-null value) and this mutex is already locked with the same token (same identity), this function throws [IllegalStateException].

This contract makes sense in theory -- e.g. it deterministically prevents reentrancy issues and some trivial races, thus proving itself quite useful.
But it is much more complex in practice, for example, consider the following scenario:

mutex.lock(o1)
launch { mutex.lock(o2) }
launch { mutex.lock(o2) }
// ... sometimes later ...
mutex.unlock(o1)

This code works, though, depending on the interpretation, it should or should not throw IllegalStateException.
Things get more complicated when lock implementation is taken into account (tryLock + slow path) as well as when actual concurrency is present: detecting such scenarios in a robust and linearizable manner doesn't seem to be practical and requires O(n) operations (where n is a number of waiters).

We have to make a decision about the reasonable implementation complexity and thoroughly document the contract, even if we are going to leave it as is

qwwdfsad added a commit that referenced this issue Feb 14, 2023
…is detected (#3620)


* Behaviour compatible with the previous implementation is restored
* For everything else, there is #3626

Signed-off-by: Nikita Koval <ndkoval@ya.ru>
Co-authored-by: Nikita Koval <ndkoval@ya.ru>
@elizarov
Copy link
Contributor

Just 5 cents here. An alternative is to deprecate (or leave alone?) the non-reentrant Mutex and provide a reentrant version as a replacement. The reentrant Mutex does not have the original problem that owner is trying to solve.

@qwwdfsad
Copy link
Collaborator Author

It could be the case, indeed, though reentrancy brings its own bag of surprises :)

@yorickhenning
Copy link
Contributor

Please don't change Mutex from non-reentrant to reentrant in place between versions. If coroutines offers a reentrant lock, it should be in a separate class.

I'll second that reentrance is a kettle of fish. Deadlocks are usually noticeable. Fixing a deadlock isn't always easy, but the fact of its existence is usually indisputable. When one thread of control passes through a reentrant barrier multiple recurrently, the resulting state corruption bugs are tougher to locate.

If it's interesting, we've tried a system for throwing on recursive access to a Mutex that keeps held locks' identities in the CoroutineContext while they're held. It's heuristic - a coroutine that waits for another coroutine waiting on the lock still silently deadlocks - but it turns many deadlocks into crashes.

@amal
Copy link

amal commented Mar 2, 2023

@yorickhenning it would be very interesting to see an example of such a system. Maybe s gist?

@yorickhenning
Copy link
Contributor

https://gist.github.com/yorickhenning/3275ea38e1619a5c8aa0efac9370484d

It's a blunt instrument, untested, unoptimized, and gisted from memory.

The withContext {} should execute undispatched, and withLock {} is reached immediately in the function calling it, giving the function the same properties as withLock {} has when called.

If a block launches a child coroutine while holding a Mutex and that child coroutine tries to lock the same Mutex, the child coroutine will crash. I forget whether we thought about that much, but I believe it'd cause @qwwdfsad's motivating example to crash every time via context inheritance. Since the example's doing suspicious things with mutexes, crashing it might not be such a bad thing.

It should be possible to use a hash set of mutexes as the container instead of the CoroutineContext to improve efficiency, but speed isn't really the point.

Credit to my esteemed colleague @vadimsemenov for the idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants