Skip to content

K64F: Memory leak in network driver #3118

@infinnovation-dev

Description

@infinnovation-dev

Description


Bug

Target
K64F

Toolchain:
GCC_ARM

mbed-os sha:
de41409

Expected behaviour
The ability to continue send large (1400-byte) UDP packets

Actual behaviour
After sending two bursts of packets, I am unable to send any more

Steps to reproduce
See demonstration program and description below.

Analysis

While investigating another issue, I was reading the code of
k64f_emac.c when I came across the following stanza:

  /* Wait until a descriptor is available for the transfer. */
  /* THIS WILL BLOCK UNTIL THERE ARE A DESCRIPTOR AVAILABLE */
  while (g_handle.txBdCurrent->control & ENET_BUFFDESCRIPTOR_TX_READY_MASK)
    osSemaphoreWait(k64f_enet->xTXDCountSem.id, osWaitForever);

This looks reasonable on the face of it, until you find that the only other reference to xTXDCountSem is its creation i.e nowhere is there a matching osSemaphoreRelease. I haven't fully examined the workings of the 8-item ring buffer, but it seems that this supposed safeguard against over-filling the ring buffer is not being exercised.

Consequently, by sending packets to the driver fast enough, and before the tx_clean thread has had a chance to free buffers already sent by the ENET device, it is possible to overwrite buffer pointers, resulting in a memory leak.

The bug is partially masked by the quite tight timings: the 120MHz CPU is hard pushed to generate packets faster than the ENET can transmit them on a 100Mbps link (given the several layers of software stack between user code and driver). It is also partially masked by the memory pool used by the ring buffer, into which the packets are copied (trying to write too many large packets fails at this stage as there is only space for 8 full-sized tx frames and the 16 rx frames already allocated).

However, I have managed to create a demonstration program which illustrates the misbehaviour.

The program repeatedly sends a burst of 8 1400-byte UDP packets and 12 20-byte packets. What we see is that after the second burst, we are unable to send any further 1400-byte packet. lwip's statistics show the heap memory pretty much all used.

Sample output:

k64f-net-leak

MEM HEAP
        avail: 36784
        used: 25096
        max: 27440
        err: 0
big=8 small=12

MEM HEAP
        avail: 36784
        used: 25096
        max: 28284
        err: 0
big=7 small=12

MEM HEAP
        avail: 36784
        used: 36432
        max: 36708
        err: 1
big=0 small=2

MEM HEAP
        avail: 36784
        used: 36432
        max: 36708
        err: 3
big=0 small=2

MEM HEAP
        avail: 36784
        used: 36432
        max: 36708
        err: 5
big=0 small=2

MEM HEAP
        avail: 36784
        used: 36432
        max: 36708
        err: 7
big=0 small=2

MEM HEAP
        avail: 36784
        used: 36432
        max: 36708
        err: 9
big=0 small=2

MEM HEAP
        avail: 36784
        used: 36432
        max: 36708
        err: 11
big=0 small=4

MEM HEAP
        avail: 36784
        used: 36432
        max: 36708
        err: 13
big=0 small=4

Note that I was not able to reproduce the failure mode with mbed-os 5.1. There don't appear to be any relevant changes to k64f_emac.c between 5.1 and 5.2, so I suspect it is some subtlety of the timing.

I have a suspicion that the bug may be the cause of #2553. The failure mode (packets not being sent) seems to fit the observed symptoms.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions