source/local: Fix fsnotify watcher resource leak#2412
source/local: Fix fsnotify watcher resource leak#2412ArangoGutierrez wants to merge 1 commit intokubernetes-sigs:masterfrom
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ArangoGutierrez The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
✅ Deploy Preview for kubernetes-sigs-nfd ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Pull request overview
This PR addresses a resource leak by ensuring the fsnotify watcher is properly closed when the context is cancelled in the local source implementation. The fix adds a cleanup function that closes the watcher and resets its reference.
Changes:
- Added
cleanupWatcher()method to close the fsnotify watcher and reset the reference to nil - Modified
runNotifier()to defer the cleanup function, ensuring watcher cleanup on context cancellation
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
554a4f4 to
aaeca24
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
source/local/local.go:334
- The
runNotifierfunction accessess.fsWatcher.Eventsands.fsWatcher.Errorswithout holding the mutex lock. SincecleanupWatcher()can sets.fsWatchertonilunder the lock, there's a potential race condition whererunNotifiercould be reading from a nil watcher's channels after cleanup, leading to a panic. Consider holding a local reference to the watcher at the start ofrunNotifierbefore entering the select loop, or add nil checks before accessing the watcher's channels.
case event := <-s.fsWatcher.Events:
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/assign @marquiz |
marquiz
left a comment
There was a problem hiding this comment.
Thank you @ArangoGutierrez for the shot at fixing issues in the implementation. Currently, in practice, proper operation relies on the fact that SetNotifyChannel() only gets called once.
I see some lurking problems in the PR which would be good to fix. Calling SetNotifyChannel() would start a new notifier (runNotifier()) but re-use the existing watcher. Events would only be sent to one channel (picked randomly), and others would not get notified. I see two evident ways to address this:
- There can be only one events channel at a time. On
SetNotifyChannel()tear down BOTH the watcher and notifier and create new ones. - Support multiple channels (we should probably rename to
AddNotifyChannel()or smth). Start only one notifier and watcher,localSourceholds a list of channels (protected by the mutex), and the (single) notifier broadcasts to all of them.
Thoughts?
/cc @ozhuraki
|
@marquiz: GitHub didn't allow me to request PR reviews from the following users: ozhuraki. Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@marquiz PTAL |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
source/local/local.go:413
- The SetNotifyChannel method and the resource cleanup logic introduced in this PR lack test coverage. Given the complexity of concurrent goroutine management and resource cleanup, tests should be added to verify:
- Watcher is properly closed when context is cancelled
- Multiple calls to SetNotifyChannel properly clean up previous watchers
- No resource leaks occur under concurrent access
- The notifier goroutine terminates when the context is done
func (s *localSource) SetNotifyChannel(ctx context.Context, ch chan *source.FeatureSource) error {
info, err := os.Stat(featureFilesDir)
if err != nil {
return err
}
if info.IsDir() {
// Create watcher before acquiring lock to minimize lock hold time
watcher, err := createWatcher()
if err != nil {
return err
}
s.mu.Lock()
// Stop any existing notifier; it will close its own watcher
s.stopNotifier()
// Create a cancellable context for the notifier goroutine
notifyCtx, cancel := context.WithCancel(ctx)
s.cancelFunc = cancel
s.mu.Unlock()
go s.runNotifier(notifyCtx, ch, watcher)
}
return nil
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
0cbddf8 to
cf8f5fc
Compare
Close the fsnotify watcher when the context is cancelled to prevent resource leaks. Ensure proper cleanup and re-initialization when SetNotifyChannel is called multiple times. Changes: - Add cancelFunc and done channel to track notifier goroutine lifecycle - Add stopNotifier() to cancel active notifier before starting new one - Add createWatcher() helper that always creates a fresh watcher - Pass watcher to runNotifier() to avoid shared state in hot path - Use done channel to wait for goroutine exit before starting new one - Add unit tests for cleanup, reinitialization, and concurrent calls This ensures only one notifier goroutine exists at a time, with proper cleanup of both the watcher and notifier when SetNotifyChannel is called again or when context is cancelled. Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
cf8f5fc to
a650db0
Compare
Close the fsnotify watcher when the context is cancelled to prevent resource leaks. The watcher reference is also reset to nil to allow proper re-initialization if SetNotifyChannel is called again.