Skip to content

Commit

Permalink
Use a longer backoff for CASE retries. (#25736)
Browse files Browse the repository at this point in the history
We have a common failure mode for Thread devices where the commissioner sends
Sigma1 prior to sending CommissioningComplete, the commissionee responds, but
the Sigma2 (and all its retries) never reach the commissioner.

When this happens, our existing CASE retries do not help because they happen too
quickly: the commissionee is waiting for Sigma3, and will ignore Sigma1 messages
until it gets the Sigma3 or times out.

To address this, include the time it would take the commissionee to time out in
our CASE retry delay for every other retry.  This allows us to retry quickly
once, then give the commissionee time to start listening again, then retry
quickly one or two times (depending on how many retries were requested), etc.
  • Loading branch information
bzbarsky-apple authored and pull[bot] committed Jan 19, 2024
1 parent 3be890d commit 9503020
Show file tree
Hide file tree
Showing 3 changed files with 24 additions and 0 deletions.
12 changes: 12 additions & 0 deletions src/app/OperationalSessionSetup.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -532,6 +532,18 @@ CHIP_ERROR OperationalSessionSetup::ScheduleSessionSetupReattempt(System::Clock:
timerDelay = System::Clock::Seconds16(
static_cast<uint16_t>(CHIP_DEVICE_CONFIG_AUTOMATIC_CASE_RETRY_INITIAL_DELAY_SECONDS
<< min((mAttemptsDone - 1), CHIP_DEVICE_CONFIG_AUTOMATIC_CASE_RETRY_MAX_BACKOFF)));
if (mAttemptsDone % 2 == 0)
{
// It's possible that the other side received one of our Sigma1 messages
// and then failed to get its Sigma2 back to us. If that's the case, it
// will be waiting for that Sigma2 to time out before it starts
// listening for Sigma1 messages again.
//
// To handle that, on every other retry, add the amount of time it would
// take the other side to time out.
auto additionalTimeout = CASESession::ComputeSigma2ResponseTimeout(GetLocalMRPConfig().ValueOr(GetDefaultMRPConfig()));
timerDelay += std::chrono::duration_cast<System::Clock::Seconds16>(additionalTimeout);
}
CHIP_ERROR err = mInitParams.exchangeMgr->GetSessionManager()->SystemLayer()->StartTimer(timerDelay, TrySetupAgain, this);
// The cast on count() is needed because the type count() returns might not
// actually be uint16_t; on some platforms it's int.
Expand Down
8 changes: 8 additions & 0 deletions src/protocols/secure_channel/CASESession.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1984,4 +1984,12 @@ System::Clock::Timeout CASESession::ComputeSigma1ResponseTimeout(const ReliableM
kExpectedSigma1ProcessingTime;
}

System::Clock::Timeout CASESession::ComputeSigma2ResponseTimeout(const ReliableMessageProtocolConfig & remoteMrpConfig)
{
return GetRetransmissionTimeout(remoteMrpConfig.mActiveRetransTimeout, remoteMrpConfig.mIdleRetransTimeout,
// Assume peer is idle, as a worst-case assumption.
System::Clock::kZero, Transport::kMinActiveTime) +
kExpectedHighProcessingTime;
}

} // namespace chip
4 changes: 4 additions & 0 deletions src/protocols/secure_channel/CASESession.h
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,10 @@ class DLL_EXPORT CASESession : public Messaging::UnsolicitedMessageHandler,
// how long it will take to detect that our Sigma1 did not get through.
static System::Clock::Timeout ComputeSigma1ResponseTimeout(const ReliableMessageProtocolConfig & remoteMrpConfig);

// Compute our Sigma2 response timeout. This can give consumers an idea of
// how long it will take to detect that our Sigma1 did not get through.
static System::Clock::Timeout ComputeSigma2ResponseTimeout(const ReliableMessageProtocolConfig & remoteMrpConfig);

// TODO: remove Clear, we should create a new instance instead reset the old instance.
/** @brief This function zeroes out and resets the memory used by the object.
**/
Expand Down

0 comments on commit 9503020

Please sign in to comment.