-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[CASSANDRA-20457] Limit number of OOM heap dumps to avoid full disk issue #4166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -199,6 +199,44 @@ if [ "x$CASSANDRA_HEAPDUMP_DIR" = "x" ]; then | |
fi | ||
JVM_OPTS="$JVM_OPTS -XX:HeapDumpPath=$CASSANDRA_HEAPDUMP_DIR/cassandra-`date +%s`-pid$$.hprof" | ||
|
||
# Cassandra heap dump files management options: | ||
# if not set externally, this script assigns defauls: | ||
# - enable heap dump files clean up | ||
# - keeping last 2 files | ||
# - keeping 1 oldest file to help identify first OOM incident | ||
if [ "x$CASSANDRA_HEAPDUMP_CLEAN" = "x" ]; then | ||
smiklosovic marked this conversation as resolved.
Show resolved
Hide resolved
|
||
CASSANDRA_HEAPDUMP_CLEAN=1 | ||
fi | ||
if [ "x$CASSANDRA_HEAPDUMP_KEEP_LAST_N_FILES" = "x" ]; then | ||
CASSANDRA_HEAPDUMP_KEEP_LAST_N_FILES=2 | ||
fi | ||
if [ "x$CASSANDRA_HEAPDUMP_KEEP_FIRST_N_FILES" = "x" ]; then | ||
CASSANDRA_HEAPDUMP_KEEP_FIRST_N_FILES=1 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @gowa Why do we even have this property? "keep newest n files" makes sense to me, why would we want to keep oldest n files? Under what circumstance an operator would want to keep oldest around and remove the newest? Don't you think that keeping around just the latest one (or few latest ones) make more sense? This complicates the logic unnecessarily. I would also rename the properties to have "oldest / newest" in them. I am not sure if "last" and "first" was understood correctly here. "last" means as newest or oldest? It would be cool to know how properties are functioning without consulting the documentation above. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @smiklosovic, in my experience it was often important to see and analyze the first occurrence of the issue, so you know the time and can think of what has changed in the process and led to OOM, while reoccurrence often happens 'automatically'. It can be removed and awk can be eliminated. I suggested an option. If it does not look right, I will drop it. Agree about last/first. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @netudima you thought about having two properties to keep both newest / oldest? |
||
fi | ||
|
||
clean_heap_dump_files() | ||
{ | ||
if [ "$CASSANDRA_HEAPDUMP_CLEAN" -eq 1 ] && \ | ||
[ "$CASSANDRA_HEAPDUMP_KEEP_LAST_N_FILES" -ge 0 ] && \ | ||
[ "$CASSANDRA_HEAPDUMP_KEEP_FIRST_N_FILES" -ge 0 ] && \ | ||
[ -d "$CASSANDRA_HEAPDUMP_DIR" ]; then | ||
# find all files under CASSANDRA_HEAPDUMP_DIR, | ||
# sort by last modification date descending, | ||
# print those, that need to be removed | ||
ls -pt1 "$CASSANDRA_HEAPDUMP_DIR" | grep -v / | \ | ||
grep "^cassandra-.*-pid.*[.]hprof$" | \ | ||
awk "BEGIN{ f=0; }{ files[f]=\$0; f+=1; }END{ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe there is a simpler way to do this. You can use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi @smiklosovic ! Thanks for your review. Will try to address your comments properly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. kept awk, removed find, xargs, head -n. using ls -pt1 as suggested. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we simplify this patch to just one property as suggested above, that is "keep newest n and remove the rest" is not it true that |
||
for(i = $CASSANDRA_HEAPDUMP_KEEP_LAST_N_FILES; i < f - $CASSANDRA_HEAPDUMP_KEEP_FIRST_N_FILES; i++) { | ||
print files[i]; | ||
} | ||
}" | while IFS= read -r file; do | ||
rm -f "$CASSANDRA_HEAPDUMP_DIR/$file" | ||
done | ||
fi | ||
} | ||
|
||
clean_heap_dump_files_defined=1 | ||
|
||
# stop the jvm on OutOfMemoryError as it can result in some data corruption | ||
# uncomment the preferred option | ||
# ExitOnOutOfMemoryError and CrashOnOutOfMemoryError require a JRE greater or equals to 1.7 update 101 or 1.8 update 92 | ||
|
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gowa why don't you just remove
x
from both sides? I have never seen this construct. I would also rename it toshould_clean_heap_dump_files
instead of_defined
suffix.