Skip to content

High memory when informer error occurs. #1377

Closed as not planned
Closed as not planned
@zhulinwei

Description

I have a custrom controller that useing infomer. This controller will list and watch more than 2000 nodes and 50000 pods, api, apimachinery, client-go all at v0.24.0.

Code sample:

func main() {
	cfg, err := clientcmd.BuildConfigFromFlags("", homedir.HomeDir()+"/.kube/config")
	if err != nil {
		glog.Fatalf("1 error building kubernetes config:%v", err)
	}
	kubeClient, err := kubernetes.NewForConfig(cfg)
	if err != nil {
		glog.Fatalf("2 error building kubernetes config:%v", err)
	}
	factory := informers.NewSharedInformerFactory(kubeClient, 0)
	podInformer := factory.Core().V1().Pods().Informer()
	podInformer.AddEventHandler(cache.ResourceEventHandlerFuncs{
		AddFunc: func(obj interface{}) {
			// do something
		},
		UpdateFunc: func(oldObj, newObj interface{}) {
			// do something
		},
		DeleteFunc: func(obj interface{}) {
			// do something
		},
	})
	factory.Start(make(chan struct{}))
	factory.WaitForCacheSync(make(chan struct{}))
}

Normally only 800MB of memory is needed:

Snipaste_2024-09-20_10-26-01

But when an error occurs, the momory will be doubuled instantly, than decrease slightly, but still be higher than the memory used before the error.

Snipaste_2024-09-20_10-31-09

W0920 04:58:56.453165 1 reflector.go:442] pkg/mod/k8s.io/client-go@v0.24.0/tools/cache/reflector.go:167: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: got short buffer with n=0, base=4092, cap=40960") has prevented the request from succeeding

W0920 04:58:43.401539 1 reflector.go:442] pkg/mod/k8s.io/client-go@v0.24.0/tools/cache/reflector.go:167: watch of *v1.Pod ended with: an error on the server ("unable to decode an event from the watch stream: got short buffer with n=0, base=882, cap=20480") has prevented the request from succeeding

Memory used after error occurs:
Snipaste_2024-09-20_10-34-30

As I understand, when a network anomaly occurs, informer will re-pull the full configuration of above resources from kube-apiserver. At this time, because old and new resources object exist at the same time, the memory will surge, then gc will be performed after a period of time to recycle old resources object, and the memory will fall back. But I don't understand why the meomry would be more than before error occurred.

Is this a bug? How can i fix it? What should I do?

Metadata

Assignees

No one assigned

    Labels

    lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions