Skip to content

[CASSANDRA-20457] Limit number of OOM heap dumps to avoid full disk issue #4166

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: trunk
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions bin/cassandra
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,10 @@ launch_service()
cassandra_parms="$cassandra_parms -Dcassandra-pidfile=$pidpath"
fi

if [ "x$clean_heap_dump_files_defined" = "x1" ]; then
Copy link
Contributor

@smiklosovic smiklosovic Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gowa why don't you just remove x from both sides? I have never seen this construct. I would also rename it to should_clean_heap_dump_files instead of _defined suffix.

clean_heap_dump_files
fi

# The cassandra-foreground option will tell CassandraDaemon not
# to close stdout/stderr, but it's up to us not to background.
if [ "x$foreground" != "x" ]; then
Expand Down Expand Up @@ -247,6 +251,8 @@ while true; do
;;
-H)
properties="$properties -XX:HeapDumpPath=$2"
# disable automatic heap dump files management as HeapDumpPath was overridden
CASSANDRA_HEAPDUMP_CLEAN=0
shift 2
;;
-E)
Expand Down
38 changes: 38 additions & 0 deletions conf/cassandra-env.sh
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,44 @@ if [ "x$CASSANDRA_HEAPDUMP_DIR" = "x" ]; then
fi
JVM_OPTS="$JVM_OPTS -XX:HeapDumpPath=$CASSANDRA_HEAPDUMP_DIR/cassandra-`date +%s`-pid$$.hprof"

# Cassandra heap dump files management options:
# if not set externally, this script assigns defauls:
# - enable heap dump files clean up
# - keeping last 2 files
# - keeping 1 oldest file to help identify first OOM incident
if [ "x$CASSANDRA_HEAPDUMP_CLEAN" = "x" ]; then
CASSANDRA_HEAPDUMP_CLEAN=1
fi
if [ "x$CASSANDRA_HEAPDUMP_KEEP_LAST_N_FILES" = "x" ]; then
CASSANDRA_HEAPDUMP_KEEP_LAST_N_FILES=2
fi
if [ "x$CASSANDRA_HEAPDUMP_KEEP_FIRST_N_FILES" = "x" ]; then
CASSANDRA_HEAPDUMP_KEEP_FIRST_N_FILES=1
Copy link
Contributor

@smiklosovic smiklosovic Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gowa Why do we even have this property? "keep newest n files" makes sense to me, why would we want to keep oldest n files? Under what circumstance an operator would want to keep oldest around and remove the newest? Don't you think that keeping around just the latest one (or few latest ones) make more sense? This complicates the logic unnecessarily.

I would also rename the properties to have "oldest / newest" in them. I am not sure if "last" and "first" was understood correctly here. "last" means as newest or oldest? It would be cool to know how properties are functioning without consulting the documentation above.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smiklosovic, in my experience it was often important to see and analyze the first occurrence of the issue, so you know the time and can think of what has changed in the process and led to OOM, while reoccurrence often happens 'automatically'.
It is a heuristic anyway. I suggested this property just based on my gut feeling. I felt it would be beneficial to have a very first heap dump in series.

It can be removed and awk can be eliminated. I suggested an option. If it does not look right, I will drop it.

Agree about last/first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@netudima you thought about having two properties to keep both newest / oldest?

fi

clean_heap_dump_files()
{
if [ "$CASSANDRA_HEAPDUMP_CLEAN" -eq 1 ] && \
[ "$CASSANDRA_HEAPDUMP_KEEP_LAST_N_FILES" -ge 0 ] && \
[ "$CASSANDRA_HEAPDUMP_KEEP_FIRST_N_FILES" -ge 0 ] && \
[ -d "$CASSANDRA_HEAPDUMP_DIR" ]; then
# find all files under CASSANDRA_HEAPDUMP_DIR,
# sort by last modification date descending,
# print those, that need to be removed
ls -pt1 "$CASSANDRA_HEAPDUMP_DIR" | grep -v / | \
grep "^cassandra-.*-pid.*[.]hprof$" | \
awk "BEGIN{ f=0; }{ files[f]=\$0; f+=1; }END{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there is a simpler way to do this. You can use tail -n and head -n or similar.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @smiklosovic ! Thanks for your review. Will try to address your comments properly.
I decided to rely on awk because it is used already in this script and because the solution using head/tail is not portable (at least chatgpt reported that head's negative line numbers cannot be used on MacOS, while some switch on uname in the script above suggests it should be portable), so I decided to go with awk using its basics only.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kept awk, removed find, xargs, head -n. using ls -pt1 as suggested.

Copy link
Contributor

@smiklosovic smiklosovic Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we simplify this patch to just one property as suggested above, that is "keep newest n and remove the rest" is not it true that awk will not be necessary anymore and we can do with head -n N, N being number of newest files to keep? head -n is portable.

for(i = $CASSANDRA_HEAPDUMP_KEEP_LAST_N_FILES; i < f - $CASSANDRA_HEAPDUMP_KEEP_FIRST_N_FILES; i++) {
print files[i];
}
}" | while IFS= read -r file; do
rm -f "$CASSANDRA_HEAPDUMP_DIR/$file"
done
fi
}

clean_heap_dump_files_defined=1

# stop the jvm on OutOfMemoryError as it can result in some data corruption
# uncomment the preferred option
# ExitOnOutOfMemoryError and CrashOnOutOfMemoryError require a JRE greater or equals to 1.7 update 101 or 1.8 update 92
Expand Down