Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-deterministic deadlock on concurrent RPC calls #189

Closed
lthibault opened this issue Nov 15, 2021 · 0 comments · Fixed by #225
Closed

Non-deterministic deadlock on concurrent RPC calls #189

lthibault opened this issue Nov 15, 2021 · 0 comments · Fixed by #225
Assignees
Labels
Milestone

Comments

@lthibault
Copy link
Collaborator

As discussed in the Matrix channel, concurrent RPC calls seem to provoke an intermittent deadlock.

I am attaching scratch.zip, which contains a working example to reproduce this behavior. The example was built on commit e33fd9381f266af736f49ab7366c49f0397e3c16 of go-capnproto2 and version 0.9.1 of the capnp tool.

Summarizing a few key points from the Matrix discussion:

  1. This bug also manifests using out-of-process transports, notably TCP to localhost.
  2. The deadlock appears to occur when n>2 concurrent calls are made
    • This suggests a full buffer/channel somewhere
    • The synchronous semantics of net.Pipe likely cause this to happen sooner

@zenhack IFF you have the time, I would be super interested in pair-programming to debug this. This seems like a good opportunity to get more involved in the internals.

@lthibault lthibault added the bug label Nov 15, 2021
@lthibault lthibault added this to the 3.0 milestone Nov 15, 2021
zenhack added a commit to zenhack/go-capnp that referenced this issue Feb 20, 2022
This should fix capnproto#189, though right now the tests are deadlocking and
I don't have the energy to debug, so I'm putting it down.

This also obviates MaxConcurrentCalls; in the current version of the
patch it is still present, but unused. Once this is working, a
subsequent commit will remove the Policy struct.

This significantly restructures the internals of *Server, and I
think the result is easier to understand: the methods on *Server
that start calls just stick the calls in a queue, and there is a
goroutine that pulls them out and handles them. This even gets rid
of Server.mu entirely, since nothing actually needed the lock
anymore!

Hopefully it will stay simple when I actually get it working.
zenhack added a commit to zenhack/go-capnp that referenced this issue Apr 16, 2022
This should fix capnproto#189, though right now the tests are deadlocking and
I don't have the energy to debug, so I'm putting it down.

This also obviates MaxConcurrentCalls; in the current version of the
patch it is still present, but unused. Once this is working, a
subsequent commit will remove the Policy struct.

This significantly restructures the internals of *Server, and I
think the result is easier to understand: the methods on *Server
that start calls just stick the calls in a queue, and there is a
goroutine that pulls them out and handles them. This even gets rid
of Server.mu entirely, since nothing actually needed the lock
anymore!

Hopefully it will stay simple when I actually get it working.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants