forked from xujianming2017/bcc
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
2187813
commit fe430e5
Showing
4 changed files
with
137 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
.TH oomkill 8 "2016-02-09" "USER COMMANDS" | ||
.SH NAME | ||
oomkill \- Trace oom_kill_process(). Uses Linux eBPF/bcc. | ||
.SH SYNOPSIS | ||
.B bashreadline | ||
.SH DESCRIPTION | ||
This traces the kernel out-of-memory killer, and prints basic details, | ||
including the system load averages at the time of the OOM kill. This can | ||
provide more context on the system state at the time: was it getting busier | ||
or steady, based on the load averages? This tool may also be useful to | ||
customize for investigations; for example, by adding other task_struct | ||
details at the time of OOM. | ||
|
||
This program is also a basic example of eBPF/bcc. | ||
|
||
Since this uses BPF, only the root user can use this tool. | ||
.SH REQUIREMENTS | ||
CONFIG_BPF and bcc. | ||
.SH EXAMPLES | ||
.TP | ||
Trace OOM kill events: | ||
# | ||
.B oomkill | ||
.SH FIELDS | ||
.TP | ||
Triggered by ... | ||
The process ID and process name of the task that was running when another task was OOM | ||
killed. | ||
.TP | ||
OOM kill of ... | ||
The process ID and name of the target process that was OOM killed. | ||
.TP | ||
loadavg | ||
Contents of /proc/loadavg. The first three numbers are 1, 5, and 15 minute | ||
load averages (where the average is an exponentially damped moving sum, and | ||
those numbers are constants in the equation); then there is the number of | ||
running tasks, a slash, and the total number of tasks; and then the last number | ||
is the last PID to be created. | ||
.SH OVERHEAD | ||
Negligible. | ||
.SH SOURCE | ||
This is from bcc. | ||
.IP | ||
https://github.com/iovisor/bcc | ||
.PP | ||
Also look in the bcc distribution for a companion _examples.txt file containing | ||
example usage, output, and commentary for this tool. | ||
.SH OS | ||
Linux | ||
.SH STABILITY | ||
Unstable - in development. | ||
.SH AUTHOR | ||
Brendan Gregg | ||
.SH SEE ALSO | ||
memleak(8) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
#!/usr/bin/env python | ||
# | ||
# oomkill Trace oom_kill_process(). For Linux, uses BCC, eBPF. | ||
# | ||
# This traces the kernel out-of-memory killer, and prints basic details, | ||
# including the system load averages. This can provide more context on the | ||
# system state at the time of OOM: was it getting busier or steady, based | ||
# on the load averages? This tool may also be useful to customize for | ||
# investigations; for example, by adding other task_struct details at the time | ||
# of OOM. | ||
# | ||
# Copyright 2016 Netflix, Inc. | ||
# Licensed under the Apache License, Version 2.0 (the "License") | ||
# | ||
# 09-Feb-2016 Brendan Gregg Created this. | ||
|
||
from bcc import BPF | ||
from time import strftime | ||
|
||
# linux stats | ||
loadavg = "/proc/loadavg" | ||
|
||
# initialize BPF | ||
b = BPF(text=""" | ||
#include <uapi/linux/ptrace.h> | ||
#include <linux/oom.h> | ||
void kprobe__oom_kill_process(struct pt_regs *ctx, struct oom_control *oc, | ||
struct task_struct *p, unsigned int points, unsigned long totalpages) | ||
{ | ||
bpf_trace_printk("OOM kill of PID %d (\\"%s\\"), %d pages\\n", p->pid, | ||
p->comm, totalpages); | ||
} | ||
""") | ||
|
||
# print output | ||
print("Tracing oom_kill_process()... Ctrl-C to end.") | ||
while 1: | ||
(task, pid, cpu, flags, ts, msg) = b.trace_fields() | ||
with open(loadavg) as stats: | ||
avgline = stats.read().rstrip() | ||
print("%s Triggered by PID %d (\"%s\"), %s, loadavg: %s" % ( | ||
strftime("%H:%M:%S"), pid, task, msg, avgline)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
Demonstrations of oomkill, the Linux eBPF/bcc version. | ||
|
||
|
||
oomkill is a simple program that traces the Linux out-of-memory (OOM) killer, | ||
and shows basic details on one line per OOM kill: | ||
|
||
# ./oomkill | ||
Tracing oom_kill_process()... Ctrl-C to end. | ||
21:03:39 Triggered by PID 3297 ("ntpd"), OOM kill of PID 22516 ("perl"), 3850642 pages, loadavg: 0.99 0.39 0.30 3/282 22724 | ||
21:03:48 Triggered by PID 22517 ("perl"), OOM kill of PID 22517 ("perl"), 3850642 pages, loadavg: 0.99 0.41 0.30 2/282 22932 | ||
|
||
The first line shows that PID 22516, with process name "perl", was OOM killed | ||
when it reached 3850642 pages (usually 4 Kbytes per page). This OOM kill | ||
happened to be triggered by PID 3297, process name "ntpd", doing some memory | ||
allocation. | ||
|
||
The system log (dmesg) shows pages of details and system context about an OOM | ||
kill. What it currently lacks, however, is context on how the system had been | ||
changing over time. I've seen OOM kills where I wanted to know if the system | ||
was at steady state at the time, or if there had been a recent increase in | ||
workload that triggered the OOM event. oomkill provides some context: at the | ||
end of the line is the load average information from /proc/loadavg. For both | ||
of the oomkills here, we can see that the system was getting busier at the | ||
time (a higher 1 minute "average" of 0.99, compared to the 15 minute "average" | ||
of 0.30). | ||
|
||
oomkill can also be the basis of other tools and customizations. For example, | ||
you can edit it to include other task_struct details from the target PID at | ||
the time of the OOM kill. | ||
|
||
|
||
The following commands can be used to test this program, and invoke a memory | ||
consuming process that exhausts system memory and is OOM killed: | ||
|
||
sysctl -w vm.overcommit_memory=1 # always overcommit | ||
perl -e 'while (1) { $a .= "A" x 1024; }' # eat all memory | ||
|
||
WARNING: This exhausts system memory after disabling some overcommit checks. | ||
Only test in a lab environment. |