-
Notifications
You must be signed in to change notification settings - Fork 13.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LWIP memory leak? #3419
Comments
I tried your sketch with chunksize=4000 and lwIPv2 PR #3362.
(is this similar to #2925 ?) |
Thanks for the info, https://github.com/esp8266/Arduino/issues/2925 looks similar, but I don't have high latency. Log for the sample sketch (git, master version, not lwip2): WiFi connected
IP address:
192.168.1.110
Heap: 43408
Time: 1
connecting to xyz.xyz.com
:ref 1
Connected.
Requesting URL: /
:wr 2000 2000 0
:wrc 256 2000 0
:wrc 256 1744 0
:wrc 256 1488 0
:wrc 256 1232 0
:wrc 256 976 0
:wrc 256 720 0
:wrc 256 464 0
:wrc 208 208 0
:wr 126 126 0
:wrc 126 126 0
send time: 2
>>> Client Timeout !
:ur 1
:close
:del
Heap: 38384
Time: 8
connecting to xyz.xyz.com
:ref 1
Connected.
Requesting URL: /
:wr 2000 2000 0
:wrc 256 2000 0
:wrc 256 1744 0
:wrc 256 1488 0
:wrc 256 1232 0
:wrc 256 976 0
:wrc 256 720 0
:wrc 256 464 0
:wrc 208 208 0
:wr 126 126 0
:wrc 126 126 0
send time: 2
pm open,type:2 0
>>> Client Timeout !
:ur 1
:close
:del
Heap: 32616
Time: 15
connecting to xyz.xyz.com
:ref 1
Connected.
Requesting URL: /
:wr 2000 2000 0
:wrc 256 2000 0
:wrc 256 1744 0
:wrc 256 1488 0
:wrc 256 1232 0
:wrc 256 976 0
:wrc 256 720 0
:wrc 256 464 0
:wrc 208 208 0
:wr 126 126 0
:wrc 126 126 0
send time: 2
>>> Client Timeout !
:ur 1
:close
:del
Heap: 27624
Time: 22
connecting to xyz.xyz.com
:ref 1
Connected.
Requesting URL: /
:wr 2000 2000 0
:wrc 256 2000 0
:wrc 256 1744 0
:wrc 256 1488 0
:wrc 256 1232 0
:wrc 256 976 0
:wrc 256 720 0
:wrc 256 464 0
:wrc 208 208 0
:wr 126 126 0
:wrc 126 126 0
send time: 2
>>> Client Timeout !
:ur 1
:close
:del
Heap: 22632
Time: 29
connecting to xyz.xyz.com
:ref 1
Connected.
Requesting URL: /
:wr 2000 2000 0
:wrc 256 2000 0
:wrc 256 1744 0
:wrc 256 1488 0
:wrc 256 1232 0
:wrc 256 976 0
:wrc 256 720 0
:wrc 256 464 0
:wrc 208 208 0
:wr 126 126 0
:wrc 126 126 0
send time: 2
>>> Client Timeout !
:ur 1
:close
:del
Heap: 17640
Time: 36
connecting to xyz.xyz.com
:ref 1
Connected.
Requesting URL: /
:wr 2000 2000 0
:wrc 256 2000 0
:wrc 256 1744 0
:wrc 256 1488 0
:wrc 256 1232 0
:wrc 256 976 0
:wrc 256 720 0
:wrc 256 464 0
:wrc 208 208 0
:wr 126 126 0
:wrc 126 126 0
send time: 2
>>> Client Timeout !
:ur 1
:close
:del
Heap: 12648
Time: 43
connecting to xyz.xyz.com
:ref 1
Connected.
Requesting URL: /
:wr 2000 2000 0
:wrc 256 2000 0
:wrc 256 1744 0
:wrc 256 1488 0
:wrc 256 1232 0
:wrc 256 976 0
:wrc 256 720 0
:wrc 256 464 0
:wrc 208 208 0
:wr 126 126 0
:wrc 126 126 0
send time: 2
>>> Client Timeout !
:ur 1
:close
:del
Heap: 7656
Time: 50
connecting to xyz.xyz.com
:ref 1
Connected.
Requesting URL: /
:wr 2000 2000 0
:wrc 256 2000 0
:wrc 256 1744 0
:wrc 256 1488 0
:wrc 256 1232 0
:wrc 256 976 0
:wrc 256 720 0
:wrc 256 464 0
:wrc 208 208 0
:wr 126 126 0
:wrc 126 126 0
send time: 3
>>> Client Timeout !
:ur 1
:close
:del
Heap: 2664
Time: 57
connecting to xyz.xyz.com
:ref 1
Connected.
Requesting URL: /
:wr 2000 2000 0
:wrc 256 2000 0
:wrc 256 1744 0
:wrc 256 1488 0
:wrc 256 1232 0
:wrc 256 976 0
:wrc 256 720 -1
:sent 1280
:wr 464 464 1280
:wrc 256 464 0
:wrc 208 208 0
:wr 126 126 0
:wrc 126 126 -1
send time: 29
:sent 464
>>> Client Timeout !
:ur 1
:close
:del
Heap: 2304
Time: 64
connecting to xyz.xyz.com
:ref 1
Connected.
Requesting URL: /
:wr 2000 2000 0
:wrc 256 2000 0
:wrc 256 1744 0
:wrc 256 1488 0
:wrc 256 1232 0
:wrc 256 976 0
:wrc 256 720 -1
:sent 1280
:wr 464 464 1280
:wrc 256 464 0
:wrc 208 208 0
:wr 126 126 0
:wrc 126 126 -1
send time: 19
:sent 464
>>> Client Timeout !
:ur 1
:close
:del
Heap: 2304
Time: 71
connecting to xyz.xyz.com
:ref 1
Connected.
Requesting URL: /
:wr 2000 2000 0
:wrc 256 2000 0
:wrc 256 1744 0
:wrc 256 1488 0
:wrc 256 1232 0
:wrc 256 976 0
:wrc 256 720 -1
:sent 1280
:wr 464 464 1280
:wrc 256 464 0
:wrc 208 208 0
:wr 126 126 0
:wrc 126 126 -1
send time: 19
:sent 464
>>> Client Timeout !
:ur 1
:close
:del
Heap: 2304
Time: 78
connecting to xyz.xyz.com
:ref 1
Connected.
Requesting URL: /
:wr 2000 2000 0
:wrc 256 2000 0
:wrc 256 1744 0
:wrc 256 1488 0
:wrc 256 1232 0
:wrc 256 976 0
:wrc 256 720 -1
:sent 1280
:wr 464 464 1280
:wrc 256 464 0
:wrc 208 208 0
:wr 126 126 0
:wrc 126 126 -1
send time: 28
:sent 464
>>> Client Timeout !
:ur 1
:close
:del |
Also strange, that the server that does not work is a subdomain (ie abc.xyz.com), sending to the top level (ie xyz.com) works fine! Same IP address. |
lwip2v2 is supposed to work better. did you experience ap+sta crash with it ? |
yup, same sketch, ap+sta exception. However I recompiled the original lwip with MSS=536 and memory limited to 8000 from 16000 now it seems to work better, MSS is truly limited. Now I'll run it for a long time and will see. |
I tried with lwip2v2, ap+sta is working like with sta alone, even with my mobile connected to AP. |
Closing due to lack of feedback in several months. |
Basic Infos
Main question: can there be a situation during a tcp communication session, where lwip crashes or memory is not freed up? If yes, when, how to avoid it?
Can anything cause this on the server/network side?
Hardware
Hardware: ESP-12E
Core Version: most recent git master, also tried git sdk 2.1.0
Description
I'm experiencing some strange issues with (assuming) the TCP/IP library.
It is very difficult to debug and and I'm still investigating it, however it already took more time than I wanted. Sorry in advance...
So I have a rather large piece of software, POSTing data to a server regularly, every 10 sec.
Everything was working fine (weeks without any issues), now I changed the server to a private one instead of a shared one at a hosting company. PC to his server connection is fine.
Now what I see is that sometimes (quite random, from every few requests to one out of several hours), data is not sent by WiFiClient (debug does not show ":sent"), and the buffer seems to be not freed. client.stop() s called after a timeout period. So slowly memory runs out and no more request can be made. The system is still running (more or less), but useless.
I debugged my heap usage, I free up everything that I allocated, I'm pretty sure the problem is in LWIP or close to it.
On the server side (Apache) I see missing requests (nothing appears in the access log) or request timeouts as no data arrived to the server (such as [12/Jul/2017:12:22:41 +0200] "-" 408 0 "-" "-").
At the first case the connection did not build up successfully, at the second, connection is ok, but packets were lost. Normally this shouldn't happen with tcp.
I did tcp dumps on the server that confirms this, some packets didn't reach the server. (However providing an ssh tunnel to the server from a localhost provided better results - still investigating).
I added delay after the client.write() function, it seemed to help, sending smaller chucks to write, also helped, but the problem still persists.
It occures pretty random, so not easy to tell.
According to wireshark if the connection was built up and the problem arises, I saw several duped ACKs, overlapping ACKs, strange things, but simetimes the tcp stream is just cut off during the session:
Server side: SYN, SYN ACK, ACK, PACKET, ACK, and nothing more, however an other packet should have been sent by the ESP.
Earlier I assumed WiFiClient.write is blocking, until ack received, so I assumed the missing packes were sent from the ESP but it looks like not. I also had some suspicion of the dynamic memory allocation for ClientContext in WiFiClient, maybe it is not freed up?
I also tried switching to the async tcp library, in order to have better control over the packets, which seemes great, but the problem seems to persist. For now it seems much more rare, but difficult to tell. I haven't yet seen memory leaks here, only crashes.
See the log at the bottom, an ack should have arrived.
I assume there might be some strange constellation between some kind of a network error (?) and a bug in lwip: not handling retransmission or acks in some strange situation.
Still investigating, but I wanted to know if anyone has experienced anything like this.
EDIT: I managed to reproduce the problem with a simple sketch. See below.
Try changing host and chunksize.
With google.com only small chunk size works (<=1460), but with for example httpbin.org size 2000 works without problems.
However, if I connect the ESP to mobile hotspot, it works!
I'm really confused now.
Settings in IDE
IDE: VMICRO+Visual Studio
Module: Generic ESP8266 Module
Flash Size: 4MB
CPU Frequency: 80Mhz
Flash Mode: qio
Flash Frequency: 40Mhz
Upload Using: SERIAL
Reset Method: nodemcu
Sketch
Debug Messages
Async Httpclient start connecting
Async Httpclient connected
Async Httpclient write len: 1400
Async Httpclient ACK len: 1400 time: 58
Async Httpclient write len: 748
Async Httpclient send finished
mac 985
ets Jan 8 2013,rst cause:4, boot mode:(1,6)
wdt reset
(second ACK should have arrived!)
The text was updated successfully, but these errors were encountered: