Open
Description
I run flask app on Gunicorn 20.0.4 and Kubernetes. My python is v3.7.5. I have a problem with increasing CPU over a time.
After investigating stack traces using py-spy I noticed that the issue is caused by prometheus metrics lib.
It doesn't clean up metric files from old workers. Therefore over a time merge process take more and more CPU.
After deleting following files the CPU usages dropped significantly:
ll -lh
total 7.0M
drwxr-xr-x 2 root root 4.0K Jul 24 06:13 ./
drwxrwxrwt 1 root root 4.0K Jul 19 14:14 ../
-rw-r--r-- 1 root root 1.0M Jul 24 08:14 counter_106.db
-rw-r--r-- 1 root root 1.0M Jul 23 18:41 counter_112.db
-rw-r--r-- 1 root root 1.0M Jul 24 04:07 counter_118.db
-rw-r--r-- 1 root root 1.0M Jul 24 04:54 counter_136.db
-rw-r--r-- 1 root root 1.0M Jul 24 08:40 counter_142.db
-rw-r--r-- 1 root root 1.0M Jul 20 16:44 counter_16.db
-rw-r--r-- 1 root root 1.0M Jul 20 11:24 counter_17.db
-rw-r--r-- 1 root root 1.0M Jul 21 01:40 counter_18.db
-rw-r--r-- 1 root root 1.0M Jul 21 20:14 counter_40.db
-rw-r--r-- 1 root root 1.0M Jul 21 17:17 counter_52.db
-rw-r--r-- 1 root root 1.0M Jul 21 21:29 counter_58.db
-rw-r--r-- 1 root root 1.0M Jul 23 07:19 counter_70.db
-rw-r--r-- 1 root root 1.0M Jul 22 19:49 counter_82.db
-rw-r--r-- 1 root root 1.0M Jul 22 18:59 counter_88.db
-rw-r--r-- 1 root root 1.0M Jul 24 08:43 histogram_106.db
-rw-r--r-- 1 root root 1.0M Jul 24 04:15 histogram_112.db
-rw-r--r-- 1 root root 1.0M Jul 24 05:02 histogram_118.db
-rw-r--r-- 1 root root 1.0M Jul 24 08:43 histogram_136.db
-rw-r--r-- 1 root root 1.0M Jul 24 08:43 histogram_142.db
-rw-r--r-- 1 root root 1.0M Jul 20 16:46 histogram_16.db
-rw-r--r-- 1 root root 1.0M Jul 20 11:45 histogram_17.db
-rw-r--r-- 1 root root 1.0M Jul 21 01:51 histogram_18.db
-rw-r--r-- 1 root root 1.0M Jul 21 22:41 histogram_40.db
-rw-r--r-- 1 root root 1.0M Jul 21 17:45 histogram_52.db
-rw-r--r-- 1 root root 1.0M Jul 22 01:44 histogram_58.db
-rw-r--r-- 1 root root 1.0M Jul 23 07:37 histogram_70.db
-rw-r--r-- 1 root root 1.0M Jul 23 01:01 histogram_82.db
-rw-r--r-- 1 root root 1.0M Jul 22 23:40 histogram_88.db
The issue is related to #275
Can we avoid that somehow without periodically restating the k8s pod? Maybe multiprocess should use PID + UUID generated on worker rather than just PID for file names? So master could remove/merge files from dead workes?
Metadata
Assignees
Labels
No labels