-
Notifications
You must be signed in to change notification settings - Fork 244
[shell-operator] fix: stop informer correctly #805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Timur Tuktamyshev <timur.tuktamyshev@flant.com>
Signed-off-by: Timur Tuktamyshev <timur.tuktamyshev@flant.com>
1bc1b7c to
b6e1fd0
Compare
71b5130 to
cde1198
Compare
Signed-off-by: Timur Tuktamyshev <timur.tuktamyshev@flant.com>
cde1198 to
cb63c6f
Compare
Signed-off-by: Timur Tuktamyshev <timur.tuktamyshev@flant.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements graceful shutdown for client-go informers with proper lifecycle tracking and completion signaling. The changes ensure that all informer goroutines fully exit before resources are cleaned up, preventing potential race conditions and panics during shutdown.
Changes:
- Replaced
PauseHandleEvents()method withStop()andWait()methods across the kube events manager stack - Added proper lifecycle tracking to
Factorywith adonechannel that signals when informer goroutines exit - Implemented
WaitStopped()inFactoryStoreusing condition variables to allow blocking until factory cleanup completes - Enhanced operator shutdown with structured logging at each phase of the shutdown sequence
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/kube_events_manager/factory.go | Added done channel to Factory, implemented WaitStopped() with condition variable, ensured single informer.Run() per factory |
| pkg/kube_events_manager/kube_events_manager.go | Replaced PauseHandleEvents() with Stop() and Wait(), snapshots monitors before waiting to avoid deadlocks |
| pkg/kube_events_manager/monitor.go | Converted PauseHandleEvents() to Wait() that blocks until all informers stop |
| pkg/kube_events_manager/resource_informer.go | Replaced pauseHandleEvents() with wait() that blocks on FactoryStore.WaitStopped() |
| pkg/kube_events_manager/namespace_informer.go | Added done channel and wait() method, wrapped Run() to signal completion |
| pkg/shell-operator/operator.go | Added structured shutdown logs and proper Wait() call after Stop() for informers |
| test/integration/kube_event_manager/kube_event_manager_test.go | Updated test cleanup to use Stop() instead of PauseHandleEvents() |
Comments suppressed due to low confidence (1)
pkg/kube_events_manager/resource_informer.go:64
- The
stoppedfield is still checked inhandleWatchEvent()(line 281), but it's never set to true in the new implementation sincepauseHandleEvents()was removed. This appears to be dead code that should either be removed or properly set when the context is canceled.
stopped bool
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Pavel Okhlopkov <pavel.okhlopkov@flant.com>
Signed-off-by: Pavel Okhlopkov <pavel.okhlopkov@flant.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Overview
Implement graceful shutdown for client-go informers and add structured shutdown logs across shell-operator components.
What this PR does / why we need it
donechannel toFactoryand startsinformer.Run(...)exactly once per factory; closesdonewhen the goroutine exits.FactoryStore.Stop(...)now, upon removing the last handler, cancels the shared context and waits for<-donebefore deleting the factory entry.FactoryStore.WaitStopped(index)to block until a factory for a givenFactoryIndexfully stops.resourceInformer.wait()waits for shared informer termination viaFactoryStore.WaitStopped.namespaceInformerwrapsRun()with adonechannel and exposeswait().Monitornow hasWait()to wait for all resource/namespace informers to stop.KubeEventsManagernow hasWait()to aggregateMonitor.Wait()calls without holding locks while blocking.Shutdown()now logs each stage, cancelsKubeEventsManager, then waits for completion (Wait()), then stops and waits for queues.Special notes for your reviewer
KubeEventsManager.Wait()snapshots monitors underRLockand releases the lock before waiting, avoiding deadlocks and map iteration races.FactoryStore.Stop(...)waits outside the lock for thedonechannel, then reacquires the lock to delete the factory and broadcast to waiters.operator.Shutdown(); feel free to request additional log details or levels.cancel(), waits will block. Current behavior intentionally favors correctness. If desired, we can add timeouts toWait()paths in a follow-up.