-
Notifications
You must be signed in to change notification settings - Fork 27
v23/context: fix goroutine leak in WithRootCancel #214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -9,6 +9,8 @@ import ( | |
| gocontext "context" | ||
| "fmt" | ||
| "os" | ||
| "runtime" | ||
| "strings" | ||
| "sync" | ||
| "testing" | ||
| "time" | ||
|
|
@@ -320,6 +322,36 @@ func TestRootCancelChain(t *testing.T) { | |
| } | ||
| } | ||
|
|
||
| func TestRootCancelGoroutineLeak(t *testing.T) { | ||
| rootCtx, rootcancel := context.RootContext() | ||
| const iterations = 1024 | ||
| for i := 0; i != iterations; i++ { | ||
| _, cancel := context.WithRootCancel(rootCtx) | ||
| cancel() | ||
| } | ||
|
|
||
| // Arbitrary threshold to wait for the goroutines in the created contexts | ||
| // above to exit. This threshold was arbitrarily created after running | ||
| // `go test -count=10000 -run TestRootCancelGoroutineLeak$` and verifying | ||
| // that the tests did not fail flakily. | ||
| const waitThreshold = 8*time.Millisecond | ||
| time.Sleep(waitThreshold) | ||
|
|
||
| // Verify that goroutines no longer exist in the runtime stack. | ||
| buf := make([]byte, 2<<20) | ||
| buf = buf[:runtime.Stack(buf, true)] | ||
| count := 0 | ||
| for _, g := range strings.Split(string(buf), "\n\n") { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is likely going to be flaky, at least in terms of getting to 0. You may want to just check that some, say a third, of the goroutines have exited since there's no way any of them would exit without your change for this test.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Great point. This is indeed flaky after running
Approach 1 has been a little flaky in terms of figuring out the threshold. Approach 2 seems to work. From what I've tested, it works for 10000 runs of the tests, but there's still always the possibility that it doesn't quite work depending on how go schedules the goroutines. It's also not great to have a test sleep and arbitrarily take up unnecessary time. This approach could also be done in combination with approach 1 but I prefer checking that all goroutines have really exited. Approach 3 exposes unnecessary internals of the context package for the purposes of a test which isn't great imo. Approach 4 requires some finagling to propagate the goroutine's changed value to the cancel context which violates some boundaries (or requires some more heavy handed adjustments). I've implemented approach 2 since it's pretty easy to do and the least intrusive to the general codebase. Let me know your thoughts and happy to switch it around as needed. The following has run without any problems from my testing so far though.
richhx marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| if strings.Contains(g, "v.io/v23/context.WithRootCancel.func1") { | ||
| count++ | ||
| } | ||
| } | ||
| if count != 0 { | ||
| t.Errorf("expected 0 but got %d: goroutine leaking in WithRootCancel", count) | ||
| } | ||
| rootcancel() | ||
| } | ||
|
|
||
| func TestRootCancel_GoContext(t *testing.T) { | ||
| root, rootcancel := context.RootContext() | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this line isn't really needed since it's a noop. I left this here to be consistent though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need it for the case that the child context is canceled but the root is not - this will be the common case for servers that forward requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes I mean the
cancel()on line 310. Thecase <-ctx.Done():for sure is needed but I believe if the child context is canceled or finished, it'll enter the select case and run cancel(), which I believe would just be a noop.