Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[config/configtls] Enable goleak check #9220

Closed
wants to merge 10 commits into from

Conversation

crobert-1
Copy link
Member

Description:

This change enables goleak to check the config/configtls package for memory and goroutine leaks. There was originally a goroutine leak which is fixed by calling the newly added method TLSServerSetting.Shutdown.

The major change of this PR is to make the *clientCAsFileReloader object a member of the TLSServerSetting struct, as well as adding a public Shutdown method to TLSServerSetting. This was the cleanest way I could add a way to shut down the reloader, otherwise it will be a leaked goroutine.

Also, I had to change LoadTLSConfig() to be a pointer type receiver, so that it could modify the given object by reference instead of only having the value copy of the TLSServerSetting object it was called on.

Link to tracking Issue:
#9165

Testing:
Goleak is passing successfully, as well as all existing tests.

@crobert-1 crobert-1 requested review from a team and jpkrohling January 4, 2024 22:52
Copy link

codecov bot commented Jan 4, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.69%. Comparing base (93cbae4) to head (28e1284).
Report is 194 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9220      +/-   ##
==========================================
- Coverage   91.71%   91.69%   -0.03%     
==========================================
  Files         406      406              
  Lines       18947    18958      +11     
==========================================
+ Hits        17378    17383       +5     
- Misses       1209     1214       +5     
- Partials      360      361       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jpkrohling
Copy link
Member

Sorry about the go.mod conflict. Once that is fixed, this can be merged. Thank you!

@jpkrohling jpkrohling added the ready-to-merge Code review completed; ready to merge by maintainers label Jan 22, 2024
if err != nil {
return nil, err
}
if c.ReloadClientCAFile {
err = reloader.startWatching()
err = c.reloader.startWatching()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange logic that we create a reloader even when not starting, maybe split the logic? Separate PR would be preferred for that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure I understand, you're saying it's possible to get here even when the collector is not starting, so we should fix this logic to make sure the reloader only starts watching during start, am I following your thought?

config/configtls/configtls.go Outdated Show resolved Hide resolved
@bogdandrutu bogdandrutu removed the ready-to-merge Code review completed; ready to merge by maintainers label Jan 22, 2024
@bogdandrutu
Copy link
Member

@jpkrohling removed the ready-to-use since I believe this does not work for multiple calls to get settings, because of that I think is better to design a different API.

Copy link
Contributor

github-actions bot commented Feb 7, 2024

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@@ -132,7 +132,7 @@ func (r *clientCAsFileReloader) handleWatcherEvents() {

func (r *clientCAsFileReloader) shutdown() error {
if r.shutdownCH == nil {
return fmt.Errorf("client CAs file watcher is not running")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should return an error if the shutdown channel is nil.

From documentation:

	// This method must be safe to call:
	//   - without Start() having been called
	//   - if the component is in a shutdown state already

Maybe I'm misinterpreting what safe to call means, but I don't think we should return an error unless there's an issue shutting down. I don't think we should error if shutdown is successful. (Which I believe is the second point from docs here).

Even though this itself isn't specifically a collector component, I believe docs still apply as it's a part of collector shutdown.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your change makes sense. I interpret that comment to mean "calling shutdown on something not started should not error".

.chloggen/goleak_configtls.yaml Outdated Show resolved Hide resolved
@crobert-1 crobert-1 marked this pull request as draft February 7, 2024 23:15
@github-actions github-actions bot removed the Stale label Feb 8, 2024
@crobert-1 crobert-1 marked this pull request as ready for review February 8, 2024 21:54
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added Stale and removed Stale labels Feb 28, 2024
@crobert-1
Copy link
Member Author

@jpkrohling and @bogdandrutu: I believe I've addressed the concerns mentioned, another look would be appreciated when you're able!

Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Mar 22, 2024
config/configtls/configtls.go Outdated Show resolved Hide resolved
config/configtls/configtls.go Outdated Show resolved Hide resolved
Comment on lines 190 to 197
if r.cfg != nil {
if r.cfg.HTTP != nil && r.cfg.HTTP.TLSSetting != nil {
err = r.cfg.HTTP.TLSSetting.Shutdown()
}

if r.cfg.GRPC != nil && r.cfg.GRPC.TLSSetting != nil {
err = multierr.Append(err, r.cfg.GRPC.TLSSetting.Shutdown())
}
}
Copy link
Member

@TylerHelmuth TylerHelmuth Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this some more I really don't like that any component that uses configtls has to add this check. I am worried about components not realizing that the configtls.TLSSetting embedded in confighttp.ServerConfig needs a Shutdown call.

I think it is out of scope of this PR, which is only trying to solve a leak added by existing functionality, but I think we should take another look at how config modules should handle goroutines and Shutdown.

Copy link
Contributor

github-actions bot commented May 2, 2024

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label May 2, 2024
@github-actions github-actions bot removed the Stale label May 9, 2024
@yurishkuro
Copy link
Member

Q: why do we implement reloading of client CA file with a background goroutine instead of using the same approach as certReloader which does it on-demand when GetCertificate is called?

type certReloader struct {

This would avoid a goroutine leak and the unpleasant side-effects of having to call Shutdown on a "config" object.

Copy link
Contributor

github-actions bot commented Jun 5, 2024

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Jun 5, 2024
@github-actions github-actions bot removed the Stale label Jun 8, 2024
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Jun 23, 2024
@github-actions github-actions bot removed the Stale label Jul 8, 2024
@mx-psi
Copy link
Member

mx-psi commented Jul 18, 2024

@crobert-1 What's the status of this PR? Does it need more reviews? Do you need to fix something? (No rush, just trying to clean up our backlog)

@crobert-1
Copy link
Member Author

@crobert-1 What's the status of this PR? Does it need more reviews? Do you need to fix something? (No rush, just trying to clean up our backlog)

Just waiting for reviews 👍

@mx-psi
Copy link
Member

mx-psi commented Jul 19, 2024

Can you take a look at Yuri's question above?

@mx-psi
Copy link
Member

mx-psi commented Aug 6, 2024

@crobert-1 Mind taking a look at #9220 (comment) ?

@crobert-1
Copy link
Member Author

@crobert-1 Mind taking a look at #9220 (comment) ?

My apologies for the delay, thanks for the ping! I'm taking a look at this. There's no context from the original issue or implementing PR why this design was chosen. However, the issue says it should be done like the certReloader, which, as @yurishkuro pointed out, is done in a lazy manner on Get.

I'm working on implementing this to simply read the file on Get rather than the proactive watching approach that's currently implemented. If it works properly, I'll open a new PR instead of this approach.

Marking as draft until I get a chance to properly prove the alternative approach will work.

@crobert-1 crobert-1 marked this pull request as draft August 8, 2024 17:42
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Aug 23, 2024
Copy link
Contributor

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Sep 11, 2024
@akorompai
Copy link
Contributor

akorompai commented Oct 15, 2024

@crobert-1 Mind taking a look at #9220 (comment) ?

My apologies for the delay, thanks for the ping! I'm taking a look at this. There's no context from the original issue or implementing PR why this design was chosen. However, the issue says it should be done like the certReloader, which, as @yurishkuro pointed out, is done in a lazy manner on Get.

I'm working on implementing this to simply read the file on Get rather than the proactive watching approach that's currently implemented. If it works properly, I'll open a new PR instead of this approach.

Marking as draft until I get a chance to properly prove the alternative approach will work.

@crobert-1 Sorry for all the troubles my PR is causing, I didn't realize that the shutdown concern wasn't addressed before it got merged (#6708 (comment))
Please note that a dedicated client certReloader was implemented in a previous PR, but got rejected in favor of a fs watcher solution (#6525 (comment))
I'm happy to recreate the PR with some additional tests if needed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants