Skip to content

PPP carrier lost causes TLSSocket::close() to hang #13505

Closed
@Uruloke

Description

@Uruloke

Description of defect

We got a system running with an MQTT (mbed-mqtt) client running on top of a TLSSocket. The transport is a cellular PPP connection. We've noticed that when we're not sending data and the MQTT client is idling (only doing recv() to check for incoming data) and the cellular modem reports carrier lost the thread will hang when we try to clean up resources by deleting the TLSSocket.

Here's our system:

  • A number of threads are running, one of them handling networking and makes sure the modem boots up, cellular context activates and PPP connection established. This thread also handles connecting to our MQTT server using a TLSSocket and regularly transmits data.
  • If an issue is detected in our networking thread after everything has started correctly we'll first try to close everything gracefully and clean up any resources used. This includes deleting our dynamically allocated TLSSocket and creating a new one.
  • Our modem is a Gemalto PLS62-W but is using a custom class inherited from AT_CellularDevice with some added features to set modem configuration and the like. We communicate over a BufferedSerial connection at 115200.
  • The context is set up using hup on the modem DCD pin.

While trying to pinpoint where the thread hangs we've added all the debug outputs we can find and attached below is a log of one incident where the thread hangs. Our best guess so far is that somewhere down in the socket layers when TLSSocket closes it tries to gracefully close the TLS session and fails to send data. Mbed TLS will try issue a close notify to the peer. When trying to send data using the underlying socket it reaches the PPP layer but the PPP link is terminated and doesn't report back an error code and instead the send waits indefinitely for something that'll never happen.

ppp-tls-down-hang.log

In the above log [IOTH] is our networking thread. [CELL], [TLSW] and all [ppp*] are called from that thread. Once we reach the point that the PPP link has been terminated and we try to delete the TLSSocket it hangs. Other threads are still running however such as the [MBOX] thread.

Target(s) affected by this defect ?

Custom target running on an STM32F413VG.

Toolchain(s) (name and version) displaying this defect ?

arm-none-eabi-gcc-9.2.1.exe (GNU Tools for Arm Embedded Processors 9-2019-q4-major) 9.2.1 20191025

What version of Mbed-os are you using (tag or sha) ?

mbed-os-6.2.1

What version(s) of tools are you using. List all that apply (E.g. mbed-cli)

mbed-cli 1.10.2

How is this defect reproduced ?

  1. Start a thread for running networking.
  2. Establish PPP connection using DCD hup.
  3. Connect to a server using TLSSocket.
  4. Make sure PPP link terminates using the DCD pin (disconnecting modem antenna for example).
  5. Try to delete (or close) the TLSSocket.
  6. Observe how the thread is now in a blocked state forever.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions