Skip to content

Mesh network traffic overflow ungraceful stop. (MMFAR Adress: 0x0)  #14714

@WilliamGFish

Description

@WilliamGFish

When running a BLE Mesh with high interactivity program throws an unhandled exception, whilst pushing the boundaries of radio/network traffic saturation.
It appears that when the either the RX or TX buffer are overloaded they are unable to process acknowledgement messages therefore moves to remove them from list (dlist.h: sys_dlist_remove). Unfortunately there may be a NULL pointer to list node.

Consistently points to failing at:
//
static inline void sys_dlist_remove(sys_dnode_t *node)
{
node->prev->next = node->next; {fails at this point, see below}
node->next->prev = node->prev;
sys_dnode_init(node);
}
//

To Reproduce
Devices used 2x nrf52-PCA10040, nrf52840-PCA10056 and Particle Boron
Similar issues with BBC Micro-bit yet unable to debug

Steps to reproduce the behaviour:

  1. Create Mesh with SRV Model with publish period of 300ms (Broadcast: 0xc000)
  2. Define each board as Model Client and Server
  3. Define each board as Repeater
  4. Config & Run
  5. Wait, can take 20 minutes

I have increased the buffer sizes and RX & TX settings which only delayed failure.
CONFIG_BT_MESH_ADV_BUF_COUNT=256
CONFIG_BT_MESH_TX_SEG_MAX=32
CONFIG_BT_MESH_TX_SEG_MSG_COUNT=24
CONFIG_BT_MESH_RX_SDU_MAX=384
CONFIG_BT_MESH_RX_SEG_MSG_COUNT=24

Expected behaviour
Gracefully catch error rather than "crash"

Impact
This is but an annoyance and potential showstopper as with larger number of nodes stressed networks will be increasing likely.

Environment (please complete the following information):

  • OS: Windows10
  • Toolchain: Zephyr SDK, latest Git 1.14.0 rc2
  • Booting Zephyr OS v1.14.0-rc1-1294-g3dea408405fd

Screenshots or console output

Debugged failure
***** MPU FAULT *****
Data Access Violation
MMFAR Address: 0x0
***** Hardware exception *****
Current thread ID = 0x20003360
Faulting instruction address = 0x13b44
Fatal fault in ISR! Spinning...

Original pre-debug (consistent address across 2 boards)
***** MPU FAULT *****
Data Access Violation
MMFAR Address: 0x0
***** Hardware exception *****
Current thread ID = 0x2000228c
Faulting instruction address = 0xe174
Fatal fault in ISR! Spinning...

Capture

image

Metadata

Metadata

Assignees

Labels

area: BluetoothbugThe issue is a bug, or the PR is fixing a bugpriority: mediumMedium impact/importance bug

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions