Description
Description of defect
We got a system running with an MQTT (mbed-mqtt) client running on top of a TLSSocket. The transport is a cellular PPP connection. We've noticed that when we're not sending data and the MQTT client is idling (only doing recv() to check for incoming data) and the cellular modem reports carrier lost the thread will hang when we try to clean up resources by deleting the TLSSocket.
Here's our system:
- A number of threads are running, one of them handling networking and makes sure the modem boots up, cellular context activates and PPP connection established. This thread also handles connecting to our MQTT server using a TLSSocket and regularly transmits data.
- If an issue is detected in our networking thread after everything has started correctly we'll first try to close everything gracefully and clean up any resources used. This includes deleting our dynamically allocated TLSSocket and creating a new one.
- Our modem is a Gemalto PLS62-W but is using a custom class inherited from AT_CellularDevice with some added features to set modem configuration and the like. We communicate over a BufferedSerial connection at 115200.
- The context is set up using hup on the modem DCD pin.
While trying to pinpoint where the thread hangs we've added all the debug outputs we can find and attached below is a log of one incident where the thread hangs. Our best guess so far is that somewhere down in the socket layers when TLSSocket closes it tries to gracefully close the TLS session and fails to send data. Mbed TLS will try issue a close notify to the peer. When trying to send data using the underlying socket it reaches the PPP layer but the PPP link is terminated and doesn't report back an error code and instead the send waits indefinitely for something that'll never happen.
In the above log [IOTH] is our networking thread. [CELL], [TLSW] and all [ppp*] are called from that thread. Once we reach the point that the PPP link has been terminated and we try to delete the TLSSocket it hangs. Other threads are still running however such as the [MBOX] thread.
Target(s) affected by this defect ?
Custom target running on an STM32F413VG.
Toolchain(s) (name and version) displaying this defect ?
arm-none-eabi-gcc-9.2.1.exe (GNU Tools for Arm Embedded Processors 9-2019-q4-major) 9.2.1 20191025
What version of Mbed-os are you using (tag or sha) ?
mbed-os-6.2.1
What version(s) of tools are you using. List all that apply (E.g. mbed-cli)
mbed-cli 1.10.2
How is this defect reproduced ?
- Start a thread for running networking.
- Establish PPP connection using DCD hup.
- Connect to a server using TLSSocket.
- Make sure PPP link terminates using the DCD pin (disconnecting modem antenna for example).
- Try to delete (or close) the TLSSocket.
- Observe how the thread is now in a blocked state forever.