Description
Target: DISCO_F769NI
Toolchain: GCC_ARM 6.3.1 20170215
The background is that I'm working on an application that has a Modbus server component. I'm using modpoll (http://www.modbusdriver.com/modpoll.html) with the command
modpoll <ip-address> -r 1 -c 72 -t 4:hex -l 0
to perform torture testing. I had been finding out that the connection fails, then the device isn't responsive anymore, sometimes not even to ping requests.
I managed to create a working minimal example at https://github.com/pauluap/mbed-os/tree/lwip_cx11_issue
The twist is that the issue doesn't appear when building with the -std=gnuc++98
(with the command mbed compile -m DISCO_F769NI -t GCC_ARM
) but exhibits problems when building with the -std=c++11
option (I did this by running mbed export -i GCC_ARM -m DISCO_F769NI
and then changing the two instances of -std=gnu++98
in the generated Makefile)
I ended up getting my hands on a logic analyzer. After attaching a logic analyzer, it seems that there may be multiple issues within this problem.
I'm attaching saleae logic analyzer captures as well as wireshark captures. You're more than welcome to download saleae Logic (www.saleae.com/downloads) and Wireshark to open and view the captures.
capture_fail_201803041016.zip
One aspect of the problem seems to be that the transmit half of TCP stops working.
In the complete capture screenshot above, it can be observed that the first attach signal leads to a period of more or less normal communication (the solidish gray bars), and then something happens. The client tries a few retries, closes the socket, and tries reopening the socket (after the solidish gray bars, there's spikes without accompanying spike in the Attach signal, that's the retry, then the Attach spikes represent the client closing and trying to reopen the socket).
This is a capture of a single normal frame.
The client kicker is the ISR signal which transfers handling out of the ISR context onto the event queue context (indicated by the Pending signal raise). The pending event is then handled, receiving a frame. The received frame is then "processed" (the dummy implementation only sets enough to generate a valid response then pushes a new event onto the event queue) and then finally transmitted. The socket op communications at the bottom are debug messages reporting on all calls of socket.recv
and socket.send
. In this particular screenshot, the communications are (from left to right)
8 = recv(&[0], 8)
4 = recv(&[8], 4)
-3001 = recv(&[0], 8)
-3001 = recv(&[0], 8)
153 = send(&[0], 153)
And the cycle is supposed to repeat ad infinitum. And it does for the gnu++98 build.
In the C++11 build though, things fall off the cliff.
The last socket operation here is a 153 = send(&[0], 153)
and then silence.
If I observe a retry, it seems that the incoming frame was captured, processed, and transmitted. However, it doesn't appear that the client ever received the transmission thus declares a timeout failures and retries. After a while, the client decides that it's a socket failure, closes the socket and tries to reopen the socket and continue communications. The same thing as the screenshot above happens, but with the addition of the Attach signal.
So, the issue seem to tie in with C++11. Why is there even a problem? The LWIP sources are all in C, correct? And that's still built using -std=gnu99
I've been working on developing my codebase all throughout this and other issues, and have a pretty extensive C++11 codebase that I would really love to preserve (aside from having to return to the bad old days of C++98), so I'm willing to help in getting to the bottom of this issue. As you can see, I have a logic analyzer that can assist. The changing data in the Modbus response is worrying though, pointing to some kind of memory address assignment issue within the GCC_ARM toolchain.