Skip to content

[nrf toup] Handle network connection errors gracefully #619

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 13 additions & 10 deletions src/messaging/ReliableMessageMgr.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -444,22 +444,25 @@ CHIP_ERROR ReliableMessageMgr::MapSendError(CHIP_ERROR error, uint16_t exchangeI
{
if (
#if CHIP_SYSTEM_CONFIG_USE_LWIP
error == System::MapErrorLwIP(ERR_MEM)
error == System::MapErrorLwIP(ERR_MEM) || error == System::MapErrorLwIP(ERR_CONN)
#else
error == CHIP_ERROR_POSIX(ENOBUFS)
error == CHIP_ERROR_POSIX(ENOBUFS) || error == CHIP_ERROR_POSIX(ENETDOWN)
#endif // CHIP_SYSTEM_CONFIG_USE_LWIP
)
{
// sendmsg on BSD-based systems never blocks, no matter how the
// socket is configured, and will return ENOBUFS in situation in
// which Linux, for example, blocks.
// Treat specific send errors as transient and non-fatal:
//
// This is typically a transient situation, so we pretend like this
// packet drop happened somewhere on the network instead of inside
// sendmsg and will just resend it in the normal MRP way later.
// - Errors caused by lack of transmit (TX) buffers (e.g. ERR_MEM, ENOBUFS):
// These indicate that the system temporarily cannot allocate memory for sending data,
// often due to momentary buffer exhaustion under high load.
//
// Similarly, on LwIP an ERR_MEM on send indicates a likely
// temporary lack of TX buffers.
// - Errors caused by network connection issues (e.g. ERR_CONN, ENETDOWN):
// These can occur when the connection is temporarily lost or the interface goes down.
// Such conditions may resolve shortly without requiring a full teardown.
//
// These errors are treated as recoverable to avoid prematurely closing exchanges
// or tearing down subscriptions during transient conditions.

ChipLogError(ExchangeManager, "Ignoring transient send error: %" CHIP_ERROR_FORMAT " on exchange " ChipLogFormatExchangeId,
error.Format(), ChipLogValueExchangeId(exchangeId, isInitiator));
error = CHIP_NO_ERROR;
Expand Down
Loading