Skip to content

source/local: Fix fsnotify watcher resource leak#2412

Open
ArangoGutierrez wants to merge 1 commit intokubernetes-sigs:masterfrom
ArangoGutierrez:fix/fswatcher-cleanup
Open

source/local: Fix fsnotify watcher resource leak#2412
ArangoGutierrez wants to merge 1 commit intokubernetes-sigs:masterfrom
ArangoGutierrez:fix/fswatcher-cleanup

Conversation

@ArangoGutierrez
Copy link
Contributor

Close the fsnotify watcher when the context is cancelled to prevent resource leaks. The watcher reference is also reset to nil to allow proper re-initialization if SetNotifyChannel is called again.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ArangoGutierrez

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 14, 2026
@netlify
Copy link

netlify bot commented Jan 14, 2026

Deploy Preview for kubernetes-sigs-nfd ready!

Name Link
🔨 Latest commit a650db0
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-sigs-nfd/deploys/696e8ceec9d1b00008d1fc5c
😎 Deploy Preview https://deploy-preview-2412--kubernetes-sigs-nfd.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 14, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a resource leak by ensuring the fsnotify watcher is properly closed when the context is cancelled in the local source implementation. The fix adds a cleanup function that closes the watcher and resets its reference.

Changes:

  • Added cleanupWatcher() method to close the fsnotify watcher and reset the reference to nil
  • Modified runNotifier() to defer the cleanup function, ensuring watcher cleanup on context cancellation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jan 14, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

source/local/local.go:334

  • The runNotifier function accesses s.fsWatcher.Events and s.fsWatcher.Errors without holding the mutex lock. Since cleanupWatcher() can set s.fsWatcher to nil under the lock, there's a potential race condition where runNotifier could be reading from a nil watcher's channels after cleanup, leading to a panic. Consider holding a local reference to the watcher at the start of runNotifier before entering the select loop, or add nil checks before accessing the watcher's channels.
		case event := <-s.fsWatcher.Events:

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@ArangoGutierrez
Copy link
Contributor Author

/assign @marquiz

Copy link
Contributor

@marquiz marquiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @ArangoGutierrez for the shot at fixing issues in the implementation. Currently, in practice, proper operation relies on the fact that SetNotifyChannel() only gets called once.

I see some lurking problems in the PR which would be good to fix. Calling SetNotifyChannel() would start a new notifier (runNotifier()) but re-use the existing watcher. Events would only be sent to one channel (picked randomly), and others would not get notified. I see two evident ways to address this:

  1. There can be only one events channel at a time. On SetNotifyChannel() tear down BOTH the watcher and notifier and create new ones.
  2. Support multiple channels (we should probably rename to AddNotifyChannel() or smth). Start only one notifier and watcher, localSource holds a list of channels (protected by the mutex), and the (single) notifier broadcasts to all of them.

Thoughts?

/cc @ozhuraki

@k8s-ci-robot
Copy link
Contributor

@marquiz: GitHub didn't allow me to request PR reviews from the following users: ozhuraki.

Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

Thank you @ArangoGutierrez for the shot at fixing issues in the implementation. Currently, in practice, proper operation relies on the fact that SetNotifyChannel() only gets called once.

I see some lurking problems in the PR which would be good to fix. Calling SetNotifyChannel() would start a new notifier (runNotifier()) but re-use the existing watcher. Events would only be sent to one channel (picked randomly), and others would not get notified. I see two evident ways to address this:

  1. There can be only one events channel at a time. On SetNotifyChannel() tear down BOTH the watcher and notifier and create new ones.
  2. Support multiple channels (we should probably rename to AddNotifyChannel() or smth). Start only one notifier and watcher, localSource holds a list of channels (protected by the mutex), and the (single) notifier broadcasts to all of them.

Thoughts?

/cc @ozhuraki

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Jan 19, 2026
@ArangoGutierrez
Copy link
Contributor Author

@marquiz PTAL

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

source/local/local.go:413

  • The SetNotifyChannel method and the resource cleanup logic introduced in this PR lack test coverage. Given the complexity of concurrent goroutine management and resource cleanup, tests should be added to verify:
  1. Watcher is properly closed when context is cancelled
  2. Multiple calls to SetNotifyChannel properly clean up previous watchers
  3. No resource leaks occur under concurrent access
  4. The notifier goroutine terminates when the context is done
func (s *localSource) SetNotifyChannel(ctx context.Context, ch chan *source.FeatureSource) error {
	info, err := os.Stat(featureFilesDir)
	if err != nil {
		return err
	}

	if info.IsDir() {
		// Create watcher before acquiring lock to minimize lock hold time
		watcher, err := createWatcher()
		if err != nil {
			return err
		}

		s.mu.Lock()
		// Stop any existing notifier; it will close its own watcher
		s.stopNotifier()

		// Create a cancellable context for the notifier goroutine
		notifyCtx, cancel := context.WithCancel(ctx)
		s.cancelFunc = cancel
		s.mu.Unlock()

		go s.runNotifier(notifyCtx, ch, watcher)
	}

	return nil
}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 19, 2026
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Jan 19, 2026
Close the fsnotify watcher when the context is cancelled to prevent
resource leaks. Ensure proper cleanup and re-initialization when
SetNotifyChannel is called multiple times.

Changes:
- Add cancelFunc and done channel to track notifier goroutine lifecycle
- Add stopNotifier() to cancel active notifier before starting new one
- Add createWatcher() helper that always creates a fresh watcher
- Pass watcher to runNotifier() to avoid shared state in hot path
- Use done channel to wait for goroutine exit before starting new one
- Add unit tests for cleanup, reinitialization, and concurrent calls

This ensures only one notifier goroutine exists at a time, with proper
cleanup of both the watcher and notifier when SetNotifyChannel is called
again or when context is cancelled.

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants