Add automatic crash detection and restarts#257
Add automatic crash detection and restarts#257michael2847 wants to merge 3 commits intoMinecraftServerControl:mainfrom michael2847:main
Conversation
Add automatic crash detection and restarts. Need to remove flock dependency
update wording
|
|
|
I agree with @zanix, 0 should be disabled and 1 enabled. It would be confusing otherwise. A per world option in addition to the global option would be best. Default should be disabled I think. Sorry about the slow review. I've been busy. |
| echo 0 | ||
| else | ||
| return 1 | ||
| echo 1 |
There was a problem hiding this comment.
I don't think this change is necessary. It forces a lot of other changes that make the code less readable.
E.g.
- while serverRunning $1; do
+ while [ $(serverRunning $1) -eq 0 ]; do
There was a problem hiding this comment.
Usually 0 means false, and 1 means true. UNIX land returns 0 when a command successfully ran and a value >0 for error states. This return trick was used here to make the code more readable, but the values are inverted from what you mean when changed to echo.
There was a problem hiding this comment.
Thanks for clarifying this. It's confusing because, as you said, in UNIX land 0 is returned when the command ran successfully. In my head, I associate success with being "TRUE", so it makes me think it should return 1 based on the 0=false and 1=true convention, but it doesn't.... But anyway, I think I understand now, thanks
There was a problem hiding this comment.
We should probably add comments to the serverRunning method to note this UNIX oddity.
serverRunning() {
# Try to determine if the world is running.
if [ $(getJavaPID "$1") -gt 0 ]; then
- return 0
+ return 0 # Return a true value.
else
- return 1
+ return 1 # Return a false value.
fi
}
There was a problem hiding this comment.
@sandain do you know why if [ serverRunning $1 ] doesn't work? it only works when i omit the brackets and I can't figure out how to do it if I want to negate the expression
There was a problem hiding this comment.
You need spaces next to the square brackets.
if [ serverRunning $1 ]; then
to negate:
if [ ! serverRunning $1 ]; then
There was a problem hiding this comment.
Although, I think the brackets are optional.
if ! serverRunning $1; then
|
|
||
| printf "[$(timestamp)] [INFO]: Server monitoring started for $1. Server PID: $(getJavaPID $1). Monitor PID: $MONITOR_PID.\n" | ||
| # Run monitor until the server is stopped and the PID file is removed (i.e. clean shutdown). | ||
| until [ $(serverRunning $1) -eq 1 ] && [ ! -f "$WORLDS_LOCATION/$1.pid" ]; do |
There was a problem hiding this comment.
This line would become:
until [ ! serverRunning $1 ] && [ ! -f "$WORLDS_LOCATION/$1.pid" ]; do
Which I think is more readable.
|
I'm going to close this PR and re-open one that is up-to-date with the current code. |
Automatic crash detection and restarts
This is a preliminary PR for #225 that enables the option to restart worlds after a crash is detected using the
mscs-enable-restart-after-crashflag (viamscs.defaults). I set the default to be 0 (true), but we can change the default to be 1 (false) if you think that's better. Additionally, we could add the functionality for it to be available on a per world basis viamscs.propertiesif you think that's a good idea as well.How it works:
When a world is started, the
startServerMonitor()function is called. This checks if themscs-enable-restart-after-crashflag is enabled. If so, it checks if there is another instance of the server monitor already running (which would occur if the server was restarted after a crash). To do this check, it usesflock. I tried for several hours to use thecreateLockFile()function, but it didn't work for me for whatever reason--not sure if it was because I was doing something wrong or if it was just an issue with the function itself. @sandain If you're able to get thecreateLockFile()function working instead of using flock, that'd be good so we don't have to useflockas a dependency.If the monitor is not running, the
startServerMonitor()function essentially calls thenohupcommand on theserverMonitor()function and runs it in the background, so you can close the terminal and the crash monitor will keep running.nohupdoesn't allow you to pass functions, but I managed to recreate the functionality through some research. The PID of the monitor is stored in$WORLD_DIR/monitor.pid.The
serverMonitor()function is essentially a loop that detects if the server has crashed by a combination of the a) the existence of the$WORLD.pidfile and b) whether the server is running via theserverRunning()function. If the PID file exists but the server isn't running, then this means the server probably crashed (since MSCS hasn't removed the PID file).Additionally, I had to modify the serverRunning() function to use
echoinstead ofreturnto return values so I could call the function in the test condition of theuntilloop and check the return value using$()(I didn't think I could do this the way the function was, but maybe I'm wrong). I subsequently modified all of the references in the script to call serverRunning using$().There is also a log for the monitor written to
$WORLD_DIR/logs/mscs.monitor.log. This log is deleted after$LOG_DURATIONto make sure it doesn't get too large. This check occurs in thestartServerMonitor()function. There is also a log created calledlast-start-status.logthat stores a message to be displayed later in themscs statuscommand.When the
mscs stopcommand is called, it sends a request to kill the server monitor using thestopServerMonitor()function. At this point the monitor should be killed.When a user runs the
mscs statusfor the first time after a server has crashed, they will see a notice that the world has crashed:This notice goes away after the first time. The code behind this is simply a check if whether or not the
last-start-status.logfile exists and is not empty. If this is true, the message is displayed and the file is deleted afterwards (so the message is only displayed once).P.S.: I tested crashes on the code by doing the
/stopcommand in-game, I believe it should replicate what happens with an actual crash.Please let me know of any questions or recommendations you have or bugs you find. Thanks!!
Summary of TODO: