You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The text was updated successfully, but these errors were encountered:
lee-lib
changed the title
如何容器发生OOMKilling时,node-problen-detector向apiServer发送event时添加pod信息,以便获取到具体的pod发生OOM
在容器发生OOMKilling时,如何让node-problen-detector向apiServer发送event时添加pod信息,以便获取到具体的pod发生OOM
Feb 15, 2022
参考阿里容器服务ack文档 https://help.aliyun.com/knowledge_detail/178479.html
文档中描述在2020年07月的镜像版本registry.aliyuncs.com/acs/node-problem-detector:v0.6.3-28-160499f中就可以为oomkilling事件添加pod信息,
我这边是按照此版本的node-problem-detector镜像构建的容器,但是模拟触发oomkill事件时,还是无法获取到pod信息,只能获取到node类型的信息,yaml文件如下
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-problem-detector
namespace: kube-system
labels:
app: node-problem-detector
spec:
selector:
matchLabels:
app: node-problem-detector
template:
metadata:
labels:
app: node-problem-detector
spec:
containers:
- name: node-problem-detector
command:
- /node-problem-detector
- --logtostderr
- --system-log-monitors=/config/kernel-monitor.json,/config/docker-monitor.json
- --apiserver-override=http://192.168.1.228:8080?inClusterConfig=false
image: registry.aliyuncs.com/acs/node-problem-detector:v0.6.3-28-160499f
resources:
limits:
cpu: 10m
memory: 80Mi
requests:
cpu: 10m
memory: 80Mi
imagePullPolicy: IfNotPresent
securityContext:
privileged: true
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
volumeMounts:
- name: log
mountPath: /var/log
readOnly: true
- name: kmsg
mountPath: /dev/kmsg
readOnly: true
# Make sure node problem detector is in the same timezone
# with the host.
- name: localtime
mountPath: /etc/localtime
readOnly: true
- name: config
mountPath: /config
readOnly: true
volumes:
- name: log
# Config
log
to your system log directoryhostPath:
path: /var/log/
- name: kmsg
hostPath:
path: /dev/kmsg
- name: localtime
hostPath:
path: /etc/localtime
- name: config
configMap:
name: node-problem-detector-config
items:
- key: kernel-monitor.json
path: kernel-monitor.json
- key: docker-monitor.json
path: docker-monitor.json
模拟oom产生的日志如下,其中involvedObject.kind信息还是Node,无法获取到Pod信息
I0214 18:22:00.008987 1 mysql.go:73] {
"metadata": {
"name": "k8s-master.16d39fea6f67823e",
"namespace": "default",
"selfLink": "/api/v1/namespaces/default/events/k8s-master.16d39fea6f67823e",
"uid": "9c9cdb3b-240a-40e8-9db2-b4490dfc4f42",
"resourceVersion": "8981033",
"creationTimestamp": "2022-02-14T10:21:58Z",
"managedFields": [
{
"manager": "node-problem-detector",
"operation": "Update",
"apiVersion": "v1",
"time": "2022-02-14T10:21:58Z",
"fieldsType": "FieldsV1",
"fieldsV1": {
"f:count": {},
"f:firstTimestamp": {},
"f:involvedObject": {
"f:kind": {},
"f:name": {},
"f:uid": {}
},
"f:lastTimestamp": {},
"f:message": {},
"f:reason": {},
"f:source": {
"f:component": {},
"f:host": {}
},
"f:type": {}
}
}
]
},
"involvedObject": {
"kind": "Node",
"name": "k8s-master",
"uid": "k8s-master"
},
"reason": "OOMKilling",
"message": "Memory cgroup out of memory: Kill process 2235 (stress) score 0 or sacrifice child\nKilled process 2235 (stress) total-vm:515612kB, anon-rss:168728kB, file-rss:32kB, shmem-rss:0kB",
"source": {
"component": "kernel-monitor",
"host": "k8s-master"
},
"firstTimestamp": "2022-02-14T10:21:58Z",
"lastTimestamp": "2022-02-14T10:21:58Z",
"count": 1,
"type": "Warning",
"eventTime": null,
"reportingComponent": "",
"reportingInstance": ""
}
请问,开发者在容器发生后OOMKilling时,如何配置node-problem-detector.yaml和kube-eventer.yaml文件才能获取到Pod信息?
The text was updated successfully, but these errors were encountered: