Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix/unbounded send queue #224

Merged
merged 42 commits into from
Apr 16, 2022
Merged

Bugfix/unbounded send queue #224

merged 42 commits into from
Apr 16, 2022

Conversation

lthibault
Copy link
Collaborator

Fixes #190 and #194.

Copy link
Contributor

@zenhack zenhack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it's broadly on the right track. A couple things in-line.

rpc/rpc.go Show resolved Hide resolved
rpc/rpc.go Outdated Show resolved Hide resolved
rpc/rpc.go Outdated Show resolved Hide resolved
@lthibault lthibault marked this pull request as ready for review February 21, 2022 04:48
@lthibault lthibault marked this pull request as draft February 21, 2022 04:58
Copy link
Contributor

@zenhack zenhack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some immediate thoughts about the error handling package. Haven't looked through in detail yet, but will try to find time in the next couple days to do a more thorough review.

exc/types.go Show resolved Hide resolved
@lthibault
Copy link
Collaborator Author

lthibault commented Feb 21, 2022

This is more or less ready for review (modulo the deadlock in TestRecvAbort).

This refactor ended up introducing an issue that prompted me to refactor capnp exceptions. Briefly, context.Canceled is returned from send operations during shutdown, causing a number of tests to fail because they were expecting nil errors. The fix was to wrap errors with fmt.Errorf and the %w verb instead of flattening everything to strings with %v or err.Error(), allowing me to test using errors.Is from the standard library.

Note that the use of the standard library errors package introduced a name collision with our own internal/errors package, which has already been an annoyance in the past. More generally, I've found it difficult to inspect error values returned by the RPC code due to this "flattening" effect, and have been thinking quite a bit about how this could be improved.

Both of the above issues prompted me to make changes to internal/errors, with the goal of:

  1. Facilitating the inspection of error values using standard libarary errors.Is and errors.As.
  2. Exporting an Exception type to users.
  3. Eventually supporting a richer error system (e.g. providing a SendError type that wraps a more specific error type, itself a capnp Exception).

As a result, I performed the following actions:

  1. Renamed the errors package to exc (short for "exception")
  2. Renamed exc.Error to Exception
  3. Implemented an Annotator type to facilitate package-level annotations (e.g. with the "rpc" prefix)
  4. Refactored error handling to make use of the new Annotator type
  5. Added an exc.IsType function that asserts exception types in a manner analogous to errors.Is. Although I ended up not using it in this PR, I nevertheless opted to keep it in the code base as it will be useful for future improvements.

Let me know if any of this is unclear or seems harebrained!

rpc/rpc.go Show resolved Hide resolved
@zenhack
Copy link
Contributor

zenhack commented Feb 22, 2022 via email

@lthibault
Copy link
Collaborator Author

Ok, I think this is now good to go. 🤞

I don't think that will ever happen? in particular, send doesn't return until the bgctx is canceled, which happens in shutdown, which is after called after this wait.

bgctx expires when either of the errgroup's goroutines returns a non-nil error. There were actually two bugs here:

  1. I mistakenly thought bgctx would expire on nil errors as well.
  2. bgctx is always expired when we enter shutdown(), so we need an additional flag to suppress duplicate calls.

Re bug 2: I took the liberty of simplifying the Close() and shutdown() logic a bit, as well as apply some opinionated readability improvements. In particular:

  • Close no longer returns ExcAlreadyClosed. Repeated calls to Close() are idempotent, and will simply return nil.
  • The c.closed flag was renamed to closing and is used to detect duplicate calls to shutdown.

@lthibault lthibault marked this pull request as ready for review February 22, 2022 21:29
@lthibault
Copy link
Collaborator Author

Ugh ... unit tests are still blocking with the -race flag. Going to investigate a bit, but they definitely pass on my machine (go1.17.7).

@lthibault
Copy link
Collaborator Author

Hmm, I'm also seeing quite a few data-races. I'll investigate this later. Input welcome, but I know you're busy so no rush.

@zenhack
Copy link
Contributor

zenhack commented Feb 25, 2022

Yeah, definitely need to fix the races. Looks like it's happening in Encoder.Encode, so my first guess is that you missed some spot outside of the send() goroutine that is still directly sending data on the transport. I will have more time to stare at it next week.

@zenhack
Copy link
Contributor

zenhack commented Feb 26, 2022

Ok, yeah, grepping for c.transport.NewMessage comes up with a bunch of places where we're sending a message outside of the send goroutine. So those will need to be fixed to delegate to the send goroutine.

rpc/rpc.go Outdated Show resolved Hide resolved
Copy link
Contributor

@zenhack zenhack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments in-line. A mix of minor style/docs things and also some major problems/questions.

rpc/answer.go Outdated Show resolved Hide resolved
rpc/answer.go Outdated Show resolved Hide resolved
rpc/receiveranswer_test.go Outdated Show resolved Hide resolved
rpc/rpc.go Show resolved Hide resolved
rpc/rpc.go Show resolved Hide resolved
rpc/rpc.go Outdated Show resolved Hide resolved
rpc/rpc.go Show resolved Hide resolved
rpc/rpc.go Show resolved Hide resolved
rpc/rpc.go Outdated Show resolved Hide resolved
rpc/rpc.go Outdated Show resolved Hide resolved
Copy link
Contributor

@zenhack zenhack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments in-line (including bumps on a few things I'd asked about last time). Mostly looks good though.

rpc/answer.go Outdated Show resolved Hide resolved
rpc/answer.go Outdated Show resolved Hide resolved
rpc/bench_test.go Outdated Show resolved Hide resolved
rpc/bench_test.go Outdated Show resolved Hide resolved
rpc/rpc.go Show resolved Hide resolved
rpc/rpc.go Show resolved Hide resolved
rpc/rpc.go Outdated Show resolved Hide resolved
rpc/rpc.go Show resolved Hide resolved
rpc/rpc.go Outdated Show resolved Hide resolved
rpc/rpc.go Show resolved Hide resolved
@zenhack zenhack merged commit 22545e6 into main Apr 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Possible deadlock when receive loops on both sides of an unbuffered connection try to send.
2 participants