Skip to content

Subcycling leads to time drift leading to desync #1866

Closed
@fsimonis

Description

Describe your setup

Operating system (e.g. Linux distribution and version): Ubuntu 00.00
preCICE Version: develop at 05541eb

Describe the problem

Using subcycling leads to a time drift which can result in a desync of participants and a crash once only one participant decides that isCouplingOngoing() == false.

Problem by @BenjaminRodenberg:

max_time   = 1.0
precice_dt = 0.2
solver_dt  = precice_dt / 640 # = 0.0003125

t = 0
NUMERICAL_ZERO_DIFFERENCE = 1e-14

while abs(t - max_time) > NUMERICAL_ZERO_DIFFERENCE:
    i = 0
    t_loc = 0
    while abs(t_loc - precice_dt) > 10e-14:
        i+=1
        t_loc += solver_dt
    t += t_loc
    print(f"Reached {t} after {i} time steps")
    if(t > max_time):
        break

We are using the computed time window to advance timeWindowStart in the BaseCouplingScheme. This should always be moved by _timeWindowSize if available.
This leads to a drift which builds up over time.

Step To Reproduce

  1. max-time = 1.0, time-window-size=0.2, P1 does 1 timestep, P2 does 640 timesteps of size 0.2/640.
  2. Run the simulation
  3. P1 finalizes while P2 crashes as it is waiting for data, but the communication was closed.

Expected behaviour

No drift, no crash.

Additional context

This is related to but not caused by #1788, which advances the time window size correctly.
This used to simply hang with the participant wait in finalize, which is now disabled by default #1600.

There are 3 relevant cases at the end of a time window:

  1. the next time step is too small, so we move to the real end Change valid-digits to min-timestep #1788
  2. the next time step size is numerically 0 (this doesn't move to the next time window correctly)
  3. there is no time window size (nothing to correct)

Possible solution

Always "snap" the provided time of the last timestep to the end of the time window. This also requires correcting the time stamps of the data samples.

void BaseCouplingScheme::firstExchange()
{
PRECICE_TRACE(_timeWindows, getTime());
checkCompletenessRequiredActions();
PRECICE_ASSERT(_isInitialized, "Before calling advance() coupling scheme has to be initialized via initialize().");
_hasDataBeenReceived = false;
_isTimeWindowComplete = false;
PRECICE_ASSERT(_couplingMode != Undefined);
if (reachedEndOfTimeWindow()) {
_timeWindows += 1; // increment window counter. If not converged, will be decremented again later.
//If preCICE has stopped before the end of the time window we have to duplicate the last available sample and put it at the end of the time window.
// We have to exclude the case where coupling scheme does not have a time window size, since this will cause problem with the interpolation later on
if (getNextTimeStepMaxSize() > math::NUMERICAL_ZERO_DIFFERENCE && hasTimeWindowSize()) {
addTimeStepAtWindowEnd();
// Update the _computedTimeWindowPart in order to keep the time within preCICE synchronised
// Has to be done before the second exchange, since the serial coupling scheme moves to the new time window before updating _timeWindowStartTime
_computedTimeWindowPart = _timeWindowSize;
}
exchangeFirstData();
}
}

if (reachedEndOfTimeWindow()) {

    _timeWindows += 1; // increment window counter. If not converged, will be decremented again later.

    if (hasTimeWindowSize()) {

      //If preCICE has stopped before the end of the time window we have to duplicate the last available sample and put it at the end of the time window.
      // We have to exclude the case where coupling scheme does not have a time window size, since this will cause problem with the interpolation later on
      if (getNextTimeStepMaxSize() > math::NUMERICAL_ZERO_DIFFERENCE) {

        addTimeStepAtWindowEnd();

        // Update the _computedTimeWindowPart in order to keep the time within preCICE synchronised
        // Has to be done before the second exchange, since the serial coupling scheme moves to the new time window before updating _timeWindowStartTime
        _computedTimeWindowPart = _timeWindowSize;
      } else {
        // snap final data samples to the end
        snapDataToTimeWindowEnd();
        _computedTimeWindowPart = _timeWindowSize;
      }
    }

    exchangeFirstData();
  }

Metadata

Labels

bugpreCICE does not behave the way we want and we should look into it (and fix it if possible)

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions