Subcycling leads to time drift leading to desync #1866
Description
Describe your setup
Operating system (e.g. Linux distribution and version): Ubuntu 00.00
preCICE Version: develop at 05541eb
Describe the problem
Using subcycling leads to a time drift which can result in a desync of participants and a crash once only one participant decides that isCouplingOngoing() == false
.
Problem by @BenjaminRodenberg:
max_time = 1.0
precice_dt = 0.2
solver_dt = precice_dt / 640 # = 0.0003125
t = 0
NUMERICAL_ZERO_DIFFERENCE = 1e-14
while abs(t - max_time) > NUMERICAL_ZERO_DIFFERENCE:
i = 0
t_loc = 0
while abs(t_loc - precice_dt) > 10e-14:
i+=1
t_loc += solver_dt
t += t_loc
print(f"Reached {t} after {i} time steps")
if(t > max_time):
break
We are using the computed time window to advance timeWindowStart
in the BaseCouplingScheme
. This should always be moved by _timeWindowSize
if available.
This leads to a drift which builds up over time.
Step To Reproduce
- max-time = 1.0, time-window-size=0.2, P1 does 1 timestep, P2 does 640 timesteps of size 0.2/640.
- Run the simulation
- P1 finalizes while P2 crashes as it is waiting for data, but the communication was closed.
Expected behaviour
No drift, no crash.
Additional context
This is related to but not caused by #1788, which advances the time window size correctly.
This used to simply hang with the participant wait in finalize, which is now disabled by default #1600.
There are 3 relevant cases at the end of a time window:
- the next time step is too small, so we move to the real end Change valid-digits to min-timestep #1788
- the next time step size is numerically 0 (this doesn't move to the next time window correctly)
- there is no time window size (nothing to correct)
Possible solution
Always "snap" the provided time of the last timestep to the end of the time window. This also requires correcting the time stamps of the data samples.
precice/src/cplscheme/BaseCouplingScheme.cpp
Lines 298 to 325 in 05541eb
if (reachedEndOfTimeWindow()) {
_timeWindows += 1; // increment window counter. If not converged, will be decremented again later.
if (hasTimeWindowSize()) {
//If preCICE has stopped before the end of the time window we have to duplicate the last available sample and put it at the end of the time window.
// We have to exclude the case where coupling scheme does not have a time window size, since this will cause problem with the interpolation later on
if (getNextTimeStepMaxSize() > math::NUMERICAL_ZERO_DIFFERENCE) {
addTimeStepAtWindowEnd();
// Update the _computedTimeWindowPart in order to keep the time within preCICE synchronised
// Has to be done before the second exchange, since the serial coupling scheme moves to the new time window before updating _timeWindowStartTime
_computedTimeWindowPart = _timeWindowSize;
} else {
// snap final data samples to the end
snapDataToTimeWindowEnd();
_computedTimeWindowPart = _timeWindowSize;
}
}
exchangeFirstData();
}