Closed
Description
I found the CPU metrics of a container will be 0 after running for several hours. The processes and cgroup of the container changes.
I can reproduce this using kops 1.5.3:
$ kops version
Version 1.5.3 (git-46364f6)
$ kops create cluster --cloud=aws --zones ap-northeast-1a ${NAME}
# just changed the nodes number to 1 from 2
$ kops update cluster ${NAME} --yes
(AMI: k8s-1.5-debian-jessie-amd64-hvm-ebs-2017-05-02 (ami-56695f31))
Then I save some useful(maybe) info:
$ sudo docker info
Containers: 3
Running: 3
Paused: 0
Stopped: 0
Images: 3
Server Version: 1.12.3
Storage Driver: overlay
Backing Filesystem: extfs
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: null host bridge overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options:
Kernel Version: 4.4.65-k8s
Operating System: Debian GNU/Linux 8 (jessie)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.862 GiB
Name: ip-172-20-36-154
ID: J2K6:5YGH:LAHZ:DMRU:W2XB:OILL:EH53:WANY:UK4J:7UIY:ULVP:44PQ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
WARNING: No kernel memory limit support
Insecure Registries:
127.0.0.0/8
$ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
04ce9bd6939b gcr.io/google_containers/kube-proxy:v1.5.7 "/bin/sh -c 'echo -99" 16 minutes ago Up 16 minutes k8s_kube-proxy.2c2d203f_kube-proxy-ip-172-20-36-154.ap-northeast-1.compute.internal_kube-system_7b35e1830aa0d2f2e3e3bf9fbae75ac5_3da5329f
$ sudo docker inspect -f '{{.State.Pid}}' 04ce9bd6939b
1510
$ cat /proc/1510/cgroup
10:pids:/docker/04ce9bd6939b0921b31506cc8ed9e57bb5e85697b93422154957d2ec927cce16
9:perf_event:/docker/04ce9bd6939b0921b31506cc8ed9e57bb5e85697b93422154957d2ec927cce16
8:net_cls,net_prio:/docker/04ce9bd6939b0921b31506cc8ed9e57bb5e85697b93422154957d2ec927cce16
7:freezer:/docker/04ce9bd6939b0921b31506cc8ed9e57bb5e85697b93422154957d2ec927cce16
6:devices:/docker/04ce9bd6939b0921b31506cc8ed9e57bb5e85697b93422154957d2ec927cce16
5:memory:/docker/04ce9bd6939b0921b31506cc8ed9e57bb5e85697b93422154957d2ec927cce16
4:blkio:/docker/04ce9bd6939b0921b31506cc8ed9e57bb5e85697b93422154957d2ec927cce16
3:cpu,cpuacct:/docker/04ce9bd6939b0921b31506cc8ed9e57bb5e85697b93422154957d2ec927cce16
2:cpuset:/docker/04ce9bd6939b0921b31506cc8ed9e57bb5e85697b93422154957d2ec927cce16
1:name=systemd:/system.slice/docker.service/docker/04ce9bd6939b0921b31506cc8ed9e57bb5e85697b93422154957d2ec927cce16
$ sudo docker top 04ce9bd6939b
UID PID PPID C STIME TTY TIME CMD
root 1510 1495 0 03:10 ? 00:00:00 /bin/sh -c echo -998 > /proc/$$/oom_score_adj && kube-proxy --kubeconfig=/var/lib/kube-proxy/kubeconfig --resource-container="" --master=https://api.internal.test0.llstestk8s.com --v=2 1>>/var/log/kube-proxy.log 2>&1
root 1520 1510 0 03:10 ? 00:00:02 kube-proxy --kubeconfig=/var/lib/kube-proxy/kubeconfig --resource-container= --master=https://api.internal.test0.llstestk8s.com --v=2
Then I run this script to monitor the cgroup changes:
#!/bin/bash
while :
do
l=$(grep docker /proc/1510/cgroup | wc -l)
if [ $l != 10 ]; then
# send notification to Slack
else
printf "."
fi
sleep 60
done
After about 4 hours(May 31 06:50 UTC), I got the slack notification, and I checked the corresponding info and some logs:
$ cat /proc/1510/cgroup
10:pids:/docker/04ce9bd6939b0921b31506cc8ed9e57bb5e85697b93422154957d2ec927cce16
9:perf_event:/docker/04ce9bd6939b0921b31506cc8ed9e57bb5e85697b93422154957d2ec927cce16
8:net_cls,net_prio:/docker/04ce9bd6939b0921b31506cc8ed9e57bb5e85697b93422154957d2ec927cce16
7:freezer:/docker/04ce9bd6939b0921b31506cc8ed9e57bb5e85697b93422154957d2ec927cce16
6:devices:/
5:memory:/
4:blkio:/
3:cpu,cpuacct:/
2:cpuset:/docker/04ce9bd6939b0921b31506cc8ed9e57bb5e85697b93422154957d2ec927cce16
1:name=systemd:/system.slice/docker.service/docker/04ce9bd6939b0921b31506cc8ed9e57bb5e85697b93422154957d2ec927cce16
$ sudo docker top 04ce9bd6939b
UID PID PPID C STIME TTY TIME CMD
$ less /var/log/syslog
May 31 06:50:18 ip-172-20-36-154 systemd[1]: Reloading.
May 31 06:50:18 ip-172-20-36-154 systemd[1]: [/lib/systemd/system/docker.service:20] Unknown lvalue 'Delegate' in section 'Service'
May 31 06:50:18 ip-172-20-36-154 systemd[1]: Stopping RPC Port Mapper.
May 31 06:50:18 ip-172-20-36-154 systemd[1]: Stopped target RPC Port Mapper.
May 31 06:50:18 ip-172-20-36-154 systemd[1]: Stopping LSB: RPC portmapper replacement...
May 31 06:50:18 ip-172-20-36-154 rpcbind: rpcbind terminating on signal. Restart with "rpcbind -w"
May 31 06:50:18 ip-172-20-36-154 rpcbind[23556]: Stopping rpcbind daemon....
May 31 06:50:18 ip-172-20-36-154 systemd[1]: Stopped LSB: RPC portmapper replacement.
May 31 06:50:18 ip-172-20-36-154 systemd[1]: Reloading.
May 31 06:50:18 ip-172-20-36-154 systemd[1]: [/lib/systemd/system/docker.service:20] Unknown lvalue 'Delegate' in section 'Service'
May 31 06:50:18 ip-172-20-36-154 systemd[1]: Reloading.
May 31 06:50:18 ip-172-20-36-154 systemd[1]: [/lib/systemd/system/docker.service:20] Unknown lvalue 'Delegate' in section 'Service'
May 31 06:50:18 ip-172-20-36-154 systemd[1]: Reloading.
May 31 06:50:18 ip-172-20-36-154 systemd[1]: [/lib/systemd/system/docker.service:20] Unknown lvalue 'Delegate' in section 'Service'
May 31 06:50:19 ip-172-20-36-154 systemd[1]: Reloading.
May 31 06:50:19 ip-172-20-36-154 systemd[1]: [/lib/systemd/system/docker.service:20] Unknown lvalue 'Delegate' in section 'Service'
May 31 06:50:19 ip-172-20-36-154 systemd[1]: Stopped Provide limited super user privileges to specific users.
May 31 06:50:20 ip-172-20-36-154 systemd[1]: Reloading.
May 31 06:50:20 ip-172-20-36-154 systemd[1]: [/lib/systemd/system/docker.service:20] Unknown lvalue 'Delegate' in section 'Service'
May 31 06:50:20 ip-172-20-36-154 systemd[1]: Reloading.
May 31 06:50:20 ip-172-20-36-154 systemd[1]: [/lib/systemd/system/docker.service:20] Unknown lvalue 'Delegate' in section 'Service'
May 31 06:50:20 ip-172-20-36-154 systemd[1]: Reloading.
May 31 06:50:20 ip-172-20-36-154 systemd[1]: [/lib/systemd/system/docker.service:20] Unknown lvalue 'Delegate' in section 'Service'
May 31 06:50:20 ip-172-20-36-154 systemd[1]: Reloading.
May 31 06:50:20 ip-172-20-36-154 systemd[1]: [/lib/systemd/system/docker.service:20] Unknown lvalue 'Delegate' in section 'Service'
May 31 06:50:20 ip-172-20-36-154 systemd[1]: Starting LSB: RPC portmapper replacement...
May 31 06:50:20 ip-172-20-36-154 rpcbind[24955]: Starting rpcbind daemon....
May 31 06:50:20 ip-172-20-36-154 systemd[1]: Started LSB: RPC portmapper replacement.
May 31 06:50:20 ip-172-20-36-154 systemd[1]: Starting RPC Port Mapper.
May 31 06:50:20 ip-172-20-36-154 systemd[1]: Reached target RPC Port Mapper.
The cAdvisor result is wrong now:
Any idea?
Metadata
Metadata
Assignees
Labels
No labels