Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions v23/context/context.go
Original file line number Diff line number Diff line change
Expand Up @@ -303,8 +303,12 @@ func WithRootCancel(parent *T) (*T, CancelFunc) {
// Forward the cancelation from the root context to the newly
// created context.
go func() {
<-rootCtx.Done()
cancel()
select {
case <-rootCtx.Done():
cancel()
case <-ctx.Done():
cancel()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this line isn't really needed since it's a noop. I left this here to be consistent though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need it for the case that the child context is canceled but the root is not - this will be the common case for servers that forward requests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes I mean the cancel() on line 310. The case <-ctx.Done(): for sure is needed but I believe if the child context is canceled or finished, it'll enter the select case and run cancel(), which I believe would just be a noop.

}
}()
} else if atomic.AddInt32(&nRootCancelWarning, 1) < 3 {
vlog.Errorf("context.WithRootCancel: context %+v is not derived from root v23 context.\n", parent)
Expand Down
32 changes: 32 additions & 0 deletions v23/context/context_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ import (
gocontext "context"
"fmt"
"os"
"runtime"
"strings"
"sync"
"testing"
"time"
Expand Down Expand Up @@ -320,6 +322,36 @@ func TestRootCancelChain(t *testing.T) {
}
}

func TestRootCancelGoroutineLeak(t *testing.T) {
rootCtx, rootcancel := context.RootContext()
const iterations = 1024
for i := 0; i != iterations; i++ {
_, cancel := context.WithRootCancel(rootCtx)
cancel()
}

// Arbitrary threshold to wait for the goroutines in the created contexts
// above to exit. This threshold was arbitrarily created after running
// `go test -count=10000 -run TestRootCancelGoroutineLeak$` and verifying
// that the tests did not fail flakily.
const waitThreshold = 8*time.Millisecond
time.Sleep(waitThreshold)

// Verify that goroutines no longer exist in the runtime stack.
buf := make([]byte, 2<<20)
buf = buf[:runtime.Stack(buf, true)]
count := 0
for _, g := range strings.Split(string(buf), "\n\n") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is likely going to be flaky, at least in terms of getting to 0. You may want to just check that some, say a third, of the goroutines have exited since there's no way any of them would exit without your change for this test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point. This is indeed flaky after running go test -count=1000 -run TestRootCancelGoroutineLeak.
There were a few options I thought of:

  1. Do what you suggested, which is check some percentage of them exited.
  2. Make the test sleep for say 5-10ms after canceling all of the contexts and then check.
  3. Adjust the context API to expose whether the context's goroutine has ended via some internal context value check (e.g. similar to checking whether the context is done).
  4. Move the test to be an internal test (i.e. white box test) and adjust the test to wait for all of the contexts' goroutines to have finished via some internal cancel context value check (also similar to checking whether the context is done).

Approach 1 has been a little flaky in terms of figuring out the threshold.

Approach 2 seems to work. From what I've tested, it works for 10000 runs of the tests, but there's still always the possibility that it doesn't quite work depending on how go schedules the goroutines. It's also not great to have a test sleep and arbitrarily take up unnecessary time. This approach could also be done in combination with approach 1 but I prefer checking that all goroutines have really exited.

Approach 3 exposes unnecessary internals of the context package for the purposes of a test which isn't great imo.

Approach 4 requires some finagling to propagate the goroutine's changed value to the cancel context which violates some boundaries (or requires some more heavy handed adjustments).

I've implemented approach 2 since it's pretty easy to do and the least intrusive to the general codebase. Let me know your thoughts and happy to switch it around as needed. The following has run without any problems from my testing so far though.

go test -count=10000 -run TestRootCancelGoroutineLeak$ 

if strings.Contains(g, "v.io/v23/context.WithRootCancel.func1") {
count++
}
}
if count != 0 {
t.Errorf("expected 0 but got %d: goroutine leaking in WithRootCancel", count)
}
rootcancel()
}

func TestRootCancel_GoContext(t *testing.T) {
root, rootcancel := context.RootContext()

Expand Down