Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a way to shutdown http cache fetcher to avoid go-routine leaks. #928

Closed
screwyprof opened this issue May 25, 2023 · 10 comments · Fixed by #929
Closed

Provide a way to shutdown http cache fetcher to avoid go-routine leaks. #928

screwyprof opened this issue May 25, 2023 · 10 comments · Fixed by #929
Assignees

Comments

@screwyprof
Copy link

Abstract
At the moment jwx uses github.com/lestrrat-go/httprc under the hood to run a fetcher. The problem is that the globalFetcher instance created in the init function starts a go-routine which will run perpetually with no way to shutdown it.

We use go-leak to check for go-routines leaks in our tests:

func TestMain(m *testing.M) {
	code := m.Run()

	goleak.VerifyTestMain(m,
		goleak.IgnoreTopFunction("github.com/lestrrat-go/httprc"), //jwk has it running under the hood with no way to stop
	)

	os.Exit(code)
}

It would be good if jwx had a way to shutdown the fetcher.

Describe the proposed solution/change
jwx could expose a method to do a shutdown for example jwk.Shutdown() which will stop the fetcher's go-routines.

Analysis
Alternatively, jwx could provide an option to disable the fetch/http cache all together:

jwx.UseCaching(false)
@lestrrat
Copy link
Collaborator

Hmmm, I tried allowing setting a fetcher object like so (pseudocode)

ctx, cancel := context.WithCancel(context.Background())
defer cancel()
jwk.SetFlobalFetcher( httprc.NewFetcher(ctx) )
goleak.VerifyTestMain(m)

But I don't think this does the trick because presumably goleak is checking for unstopped goroutines before the context can be canceled. So... that means we can't allow globals to start goroutines, yeah?

Of course, this problem can be fixed easily by requiring jwk.Fetch to always spawn a fresh set of httprc.Fetcher object upon each call, but that incurs a performance penalty (albeit small), which I wanted to avoid.

I may not be understanding how goleak works completely, but ATM I'm thinking that goleak is inherently unfriendly with any usage of globals that spawn goroutines (even if they can be cleaned), and I am currently suspecting that none of your suggested workarounds work either, at least for goleak.VerifyTestMain() scenario -- so I'm kind of stuck, because I want to use a global, dangit :)

If I'm mistaken or if there are ways to force cleanup before goleak.VerifyTestMain() reports an error, please let me know.

@screwyprof
Copy link
Author

screwyprof commented May 25, 2023

I guess the issue you have is not related to goleak. goleak is a pretty simple tool it simply counts the number of gorutines running after the test. If the application has a proper shutdown, there should be none.

I guess in your case above the problem is that you still use init() function. Once it's called it will spawn the go-routines, so even when you then use jwk.SetGlobalFetcher which will overwrite the contents of globalFecher, the previous instance will still be running.

If the init() function is commented, these tests will pass:

func TestMain(m *testing.M) {
	goleak.VerifyTestMain(m)
}

func TestFetcherShutdown(t *testing.T) {
        // arrange
	ctx, cancel := context.WithCancel(context.Background())
	defer cancel()

	fetcher := httprc.NewFetcher(ctx, httprc.WithFetcherWorkerCount(1))
	jwk.SetGlobalFetcher(fetcher)

	// act
	cancel()

	// assert
	goleak.VerifyNone(t)
}

@screwyprof
Copy link
Author

screwyprof commented May 25, 2023

How about something along the lines?

diff --git a/jwk/fetch.go b/jwk/fetch.go
index daca177..300295d 100644
--- a/jwk/fetch.go
+++ b/jwk/fetch.go
@@ -20,6 +20,7 @@ func (f FetchFunc) Fetch(ctx context.Context, u string, options ...FetchOption)
 	return f(ctx, u, options...)
 }
 
+var fetcherCanceller context.CancelFunc
 var globalFetcher httprc.Fetcher
 
 func init() {
@@ -32,7 +33,15 @@ func init() {
 		nworkers = 3
 	}
 
-	globalFetcher = httprc.NewFetcher(context.Background(), httprc.WithFetcherWorkerCount(nworkers))
+	var ctx context.Context
+
+	ctx, fetcherCanceller = context.WithCancel(context.Background())
+
+	globalFetcher = httprc.NewFetcher(ctx, httprc.WithFetcherWorkerCount(nworkers))
+}
+
+func Shutdown() {
+	fetcherCanceller()
 }
 
 // Fetch fetches a JWK resource specified by a URL. The url must be
func TestFetcherShutdown(t *testing.T) {
	jwk.Shutdown()

	// assert
	goleak.VerifyNone(t)
}

A bit ugly, but backward-compatible and does what's expected.

@lestrrat
Copy link
Collaborator

I guess in your case above the problem is that you still use init() function. Once it's called it will spawn the go-routines, so even when you then use jwk.SetGlobalFetcher which will overwrite the contents of globalFecher, the previous instance will still be running.

No, when I made my changes I removed init(), made it so that a global variable is initialized upon first call to jwk.Fetch, and the problem persists. Counting or not counting, what I'm saying is that goleak.VerifyTestMain() checks for the remaining goroutines before I get the chance to clean them up.

Furthermore, I think your code below will break:

func TestMain(m *testing.M) {
	goleak.VerifyTestMain(m)
}

func TestFetcherShutdown(t *testing.T) {
        // arrange
	ctx, cancel := context.WithCancel(context.Background())
	defer cancel()

	fetcher := httprc.NewFetcher(ctx, httprc.WithFetcherWorkerCount(1))
	jwk.SetGlobalFetcher(fetcher)

	// act
	cancel()

	// assert
	goleak.VerifyNone(t)
}

I'm sure TestFetcherShutdown() will work, but if there are other tests/code that relies on jwk.Fetch after that test, they will fail because they will no longer be able to fetch remote JWKs. But if we re-instate the global fetcher after the test, then goleak.VerifyTestMain() will once again complain that there are goroutines still hanging around.

And as for jwk.Shutdown():

func TestFetcherShutdown(t *testing.T) {
	jwk.Shutdown()

	// assert
	goleak.VerifyNone(t)
}

Again, this is okay for jwx tests, but it will again break once somebody uses goleak.VerifyTestMain(). That just means that even though we added an extra API, support wise we solved nothing


So I'd like to help you, but here's my dilemma:

  • jwk.Fetch is something that our users actually use... a lot. I'd like it to be performant, while providing the same feel as other tools such as jwk.Cache -- this is why I reuse the same fetching mechanism in both places.
  • I use globals here because jwk.Fetch is a function, and I should be able to reuse the infrastructure for multiple invocations of jwk.Fetch.
  • jwk.Shutdown sounds like a function solely added to make a single test pattern work. if it worked for goleak.VerifyTestMain() in one call, I'd accept it, but it doesn't, so while it may work for your particular test case, I'd have to cope with future inquiries of the same nature. Also, other than solving your test issue, jwk.Shutdown has no other purpose. That's zero value add for us -- in fact, it's a net minus because that's one more API that we need to support.

Taking the above, and the fact that you can get away with marking particular packages as exempt in goleak, I'm inclined to do the following:

  • Add jwk.SetGlobalFetcher so that if you are paranoid, you can set your own httprc.Fetcher whose lifetime you can control at the beginning of your program
  • Document that there is a global object behind jwk.Fetch, and indicate that there's no way to properly (and easily) workaround things like goleak.VerifyTestMain() when globals are involved. Meanwhile you already can make your tests pass by ignoring httprc, or creating a localized test shown below
func TestGH922(t *testing.T) {
	ctx, cancel := context.WithCancel(context.Background())
	jwk.SetGlobalFetcher(httprc.NewFetcher(ctx))
	defer jwk.SetGlobalFetcher(nil) // This must be included to restore default behavior

	cancel() // stop fetcher goroutines

	goleak.VerifyNone(t)
}

The benefits

  • jwk.SetGlobalFetcher is useful in contexts other than testing -- for example, when you want to customize the behavior of the fetcher by including global whitelists
  • jwk.SetGlobalFetcher can be used as an escape hatch for those who really want to clean the environment themselves, while keeping everything else the same as before. For example, assuming you are building an app, you can do this:
// your main.go, or whatever top-most level code. Make sure that there are
// no other code that calls jwk.Fetch after `Run()` is called
func Run() {
    	ctx, cancel := context.WithCancel(context.Background())
	jwk.SetGlobalFetcher(httprc.NewFetcher(ctx))
       defer cancel()

      // your main code
}

// your test code:
func TestMain(m *testing.M) {
   goleak.VerifyTestMain(m) // call tests that use Run()
}

So, to conclude: To your original request, the answer is "no, we can't fix it the way you want us to". However, we can add ways to avoid the same problem if you are careful to now call jwk.Fetch after you cleanup the global fetcher. Also, this behavior will be documented.

Does that sound acceptable?

@screwyprof
Copy link
Author

Thank you for your detailed answer. It all makes sense. Forgive me if I’m missing anything, but my understanding is that the option to provide a global fetcher will only work for us if the init function won’t run another fetcher before or if there will be a way to optionally ignore the init function (via an env variable for example?).

@lestrrat
Copy link
Collaborator

@screwyprof yes, that's correct, and as I (briefly) wrote elsewhere my intention is to get rid of init(), and provide a getGlobalFetcher() that works with SetGlobalFetcher()

@lestrrat
Copy link
Collaborator

I have a PoC, let me just make a PR with it...

@lestrrat
Copy link
Collaborator

@screwyprof Please see if this works for you?

@screwyprof
Copy link
Author

Hey @lestrrat. This does work for me, thank you very much! Thank you for a quick and nice solution. The docs also look slick!

@lestrrat
Copy link
Collaborator

Thanks, I'll merge to develop/v2 shortly. As for releasing, I'm kind of waiting for #924 and #925.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants