-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[CASSANDRA-20457] Limit number of OOM heap dumps to avoid full disk issue #4166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
…ssue New env vars are introduced: CASSANDRA_HEAPDUMP_CLEAN: 1 - clean (default) otherwise - do not clean CASSANDRA_HEAPDUMP_KEEP_LAST_N_FILES -1 - no heap dump files will be removed 0 - all heap dump files will be removed N > 0 - up to N most recent files will be kept default: 2 CASSANDRA_HEAPDUMP_KEEP_FIRST_N_FILES -1 - no heap dump files will be removed 0 - all heap dump files will be removed N > 0 - up to N least recent files will be kept default: 1 Currently, only 100 heap dump files are being searched for, therefore having CASSANDRA_HEAPDUMP_KEEP_LAST_N_FILES and CASSANDRA_HEAPDUMP_KEEP_FIRST_N_FILES greater than 100 makes no sense.
056692e
to
af5b7d3
Compare
find "$CASSANDRA_HEAPDUMP_DIR" -name "cassandra-*-pid*.hprof" -type f | \ | ||
head -n 100 | \ | ||
xargs ls -t1 2>/dev/null | \ | ||
awk "BEGIN{ f=0; }{ files[f]=\$0; f+=1; }END{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe there is a simpler way to do this. You can use tail -n
and head -n
or similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @smiklosovic ! Thanks for your review. Will try to address your comments properly.
I decided to rely on awk
because it is used already in this script and because the solution using head/tail
is not portable (at least chatgpt reported that head's negative line numbers cannot be used on MacOS, while some switch on uname
in the script above suggests it should be portable), so I decided to go with awk using its basics only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kept awk, removed find, xargs, head -n. using ls -pt1 as suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we simplify this patch to just one property as suggested above, that is "keep newest n and remove the rest" is not it true that awk
will not be necessary anymore and we can do with head -n N
, N
being number of newest files to keep? head -n
is portable.
CASSANDRA_HEAPDUMP_KEEP_LAST_N_FILES=2 | ||
fi | ||
if [ "x$CASSANDRA_HEAPDUMP_KEEP_FIRST_N_FILES" = "x" ]; then | ||
CASSANDRA_HEAPDUMP_KEEP_FIRST_N_FILES=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gowa Why do we even have this property? "keep newest n files" makes sense to me, why would we want to keep oldest n files? Under what circumstance an operator would want to keep oldest around and remove the newest? Don't you think that keeping around just the latest one (or few latest ones) make more sense? This complicates the logic unnecessarily.
I would also rename the properties to have "oldest / newest" in them. I am not sure if "last" and "first" was understood correctly here. "last" means as newest or oldest? It would be cool to know how properties are functioning without consulting the documentation above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smiklosovic, in my experience it was often important to see and analyze the first occurrence of the issue, so you know the time and can think of what has changed in the process and led to OOM, while reoccurrence often happens 'automatically'.
It is a heuristic anyway. I suggested this property just based on my gut feeling. I felt it would be beneficial to have a very first heap dump in series.
It can be removed and awk can be eliminated. I suggested an option. If it does not look right, I will drop it.
Agree about last/first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@netudima you thought about having two properties to keep both newest / oldest?
@@ -188,6 +188,10 @@ launch_service() | |||
cassandra_parms="$cassandra_parms -Dcassandra-pidfile=$pidpath" | |||
fi | |||
|
|||
if [ "x$clean_heap_dump_files_defined" = "x1" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gowa why don't you just remove x
from both sides? I have never seen this construct. I would also rename it to should_clean_heap_dump_files
instead of _defined
suffix.
New env vars are introduced:
CASSANDRA_HEAPDUMP_CLEAN:
1 - clean (default)
0 - do not clean
CASSANDRA_HEAPDUMP_KEEP_LAST_N_FILES
-1 - no heap dump files will be removed
0 - all heap dump files will be removed
N > 0 - up to N most recent files will be kept
default: 2
CASSANDRA_HEAPDUMP_KEEP_FIRST_N_FILES
-1 - no heap dump files will be removed
0 - all heap dump files will be removed
N > 0 - up to N least recent files will be kept
default: 1
test script