Description
When creating a controller with controller.Start(ctx);
and later stopping it by cancelling the provided context, watches defined on the controller will continue to trigger their event handler. Since the watch is owned by the controller, I would expect all watches defined on the controller to be terminated once the controller is terminated.
A reproduction repository is here: https://github.com/dash0hq/controller-runtime-reproducer/tree/main
- controller creation
- adding the watch to the controller
- starting the controller
- stopping it 30 seconds later
This can be reproduced with this test script contained in the repository, which makes sure to continuously create events that will trigger the watch (if it is active).
This produces the following output:
2024-10-14T14:48:39Z INFO setup successfully created a new watch
2024-10-14T14:48:39Z INFO setup starting manager
2024-10-14T14:48:39Z INFO Starting EventSource {"controller": "example_controller", "source": "kind source: *v1.Pod"}
2024-10-14T14:48:39Z INFO Starting Controller {"controller": "example_controller"}
2024-10-14T14:48:39Z INFO starting server {"name": "health probe", "addr": ":8081"}
2024-10-14T14:48:39Z INFO controller-runtime.metrics Starting metrics server
2024-10-14T14:48:39Z INFO controller-runtime.metrics Serving metrics server {"bindAddress": ":8080", "secure": false}
2024-10-14T14:48:39Z INFO received create event
...
2024-10-14T14:48:39Z INFO received create event
2024-10-14T14:48:39Z INFO Starting workers {"controller": "example_controller", "worker count": 1}
2024-10-14T14:48:39Z INFO received update event
2024-10-14T14:48:49Z INFO received update event
2024-10-14T14:48:53Z INFO received update event
2024-10-14T14:49:09Z INFO setup stopping controller/cancelling controller context
2024-10-14T14:49:09Z INFO setup controller context has been cancelled
2024-10-14T14:49:09Z INFO Shutdown signal received, waiting for all workers to finish {"controller": "example_controller"}
2024-10-14T14:49:09Z INFO All workers finished {"controller": "example_controller"}
2024-10-14T14:49:09Z INFO setup controller has been stopped
2024-10-14T14:49:23Z INFO received update event
2024-10-14T14:49:24Z INFO received update event
2024-10-14T14:49:24Z INFO received update event
2024-10-14T14:49:24Z INFO received delete event
2024-10-14T14:49:29Z INFO received create event
2024-10-14T14:49:29Z INFO received update event
2024-10-14T14:49:29Z INFO received update event
...
As you can see, the event handler receives updates after the controller has been stopped.
In case you are curious about the wider context: My actual use case for this is stopping/removing a watch dynamically. I want to handle create/update events for third party resource types (monitoring.coreos.com.PrometheusRule for example). I do not know in advance whether the third party CRD is deployed or not. Thus I have a reconciler watching apiextensionsv1.CustomResourceDefinition
with a filter predicate. If the CRD in question is created, I start a new controller/reconciler watching that resource type. If the CRD is deleted later, I really would like to stop or remove the watch. Otherwise an error message is emitted to the logs every couple of seconds: "Unhandled Error" err="pkg/mod/k8s.io/client-go@v0.31.1/tools/cache/reflector.go:243: Failed to watch monitoring.coreos.com/v1, Kind=PrometheusRule: the server could not find the requested resource" logger="UnhandledError"
. This apparently happens within controller-runtime.
So far I have not found any way to stop or remove a watch.
These previous issues seem to be related, but it does not seem that any of them ever resulted in something that allows stopping watches.