Description
A Swif/t App may fail nondeterministically, and be tried to run in the same host and MPI rank using the TURBINE_APP_RETRIES
directive.
It may be useful however to attempt to run the app in a different rank (in case one of the hosts is unavailable, or there is network issue or the like). This pull request: attempts to run the app (upon failure) in a different MPI rank.
A simple test is added, where an external app (an infinite loop ) is run in the background, and turbine attempts to kill it twice. This works fine the first time, but when attempting to kill it again, it either fails (if no retries are allowed), or retries TURBINE_APP_RETRIES
times (in different ranks) and quits upon failure.
Another simple test creates a file, then attempts to delete it twice. Creation and deletion both work in the first time, and the second deletion attempt would retry on different MPI ranks until reaching TURBINE_APP_RETRIES
times , at which point the script exits.