Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpu/esp8266: Tracking open problems of esp_wifi netdev driver #10861

Open
gschorcht opened this issue Jan 25, 2019 · 1 comment
Open

cpu/esp8266: Tracking open problems of esp_wifi netdev driver #10861

gschorcht opened this issue Jan 25, 2019 · 1 comment
Labels
Area: cpu Area: CPU/MCU ports Area: network Area: Networking Platform: ESP Platform: This PR/issue effects ESP-based platforms Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Type: tracking The issue tracks and organizes the sub-tasks of a larger effort

Comments

@gschorcht
Copy link
Contributor

gschorcht commented Jan 25, 2019

Description

During the stress test of the esp_wifi module for esp8266, including de-authentication attacks, the following issues sporadically occurred

  1. Reconnecting may fail after deauthentication and lead to system crash while excessive traffic is being sent to the esp8266. If the AP send a deauthentication, esp8266 tries to reconnect automatically. If there is only normal network load, the reconnect works as expected. However, if excessive traffic is being sent to the esp8266, it cannot reconnect and tries to repeat it until the memory is exhausted and it crashes. The memory seems to be consumed by the Espressif SDK 😟

    [esp_wifi] disconnected from ssid BSHS1, reason 7 (ASSOCED)
    [esp_wifi] heap: 15416 (used 5928, free 9488)
    [esp_wifi] disconnected from ssid BSHS1, reason 202 (FAIL)
    [esp_wifi] heap: 15416 (used 6128, free 9288)
    [esp_wifi] disconnected from ssid BSHS1, reason 2 (AUTH_EXPIRE)
    [esp_wifi] heap: 15416 (used 7568, free 7848)
    [esp_wifi] disconnected from ssid BSHS1, reason 2 (AUTH_EXPIRE)
    [esp_wifi] heap: 15416 (used 10576, free 4840)
    [esp_wifi] disconnected from ssid BSHS1, reason 2 (AUTH_EXPIRE)
    [esp_wifi] heap: 15416 (used 13584, free 1832)
    [esp_wifi] trying to reconnect to ssid BSHS1
    heap: 15416 (used 14936, free 480)
    E:M 40
    

    The problem might be related to problem 6.

  2. Send function may block completely on very heavy network load. Disconnecting and reconnecting helps sometimes but not always. Then, esp8266 has to be rebooted.
    Solved with PR cpu/esp8266: Fixes and improvements of esp_wifi netdev driver #10862

  3. Sporadically, LoadProhibitedCause exception occurs on very heavy network load.
    Seems to be solved by PR gnrc_icmpv6_echo: avoid crashing when pktbuf full #10869.

  4. GNRC packet buffer runs full on very heavy network load since packets are hanging in the packet buffer. The communication with the esp8266 is no longer possible. Packet buffer can be checked with command pktbuf using module gnrc_pktbuf_cmd.
    Seems to be solved by PR cpu/esp8266: Fixes and improvements of esp_wifi netdev driver #10862.

  5. Sporadically, error message dev 1500 occurs on very heavy network load and esp8266 crashes after that with LoadProhibitedCause exception.
    Seems to be solved by PR gnrc_icmpv6_echo: avoid crashing when pktbuf full #10869.

  6. Connecting to the access point while excessive traffic is being sent to the esp8266 often fails and a repetitive error message LmacRxBlk: 1 appear. esp8266 is then not usable at all and has to be reset. This might be related to problem 1 when trying to reconnect while excessive traffic is being sent to the esp8266.

    The problem can be reproduced if at least one host is pinging the esp8266 with the maximum data size and an intervall of 0 while esp8266 is trying to connect to the AP. Start pinging first and then reset the esp8266.

    According to network resources, error message LmacRxBlk:1 means that the internal MAC layer buffer has an overflow. The problem normally occurs when an interrupt service routing takes longer than the allowed 10 µs. It may also be that the esp8266 has a performance that is too low to handle such a large amount of frames while connecting, see Correct Error Handling for reconnect (better than range.py?) peterhinch/micropython-mqtt#3 (comment).

From today's perspective, this problem can't be solved with the means provided by the SDK.

Steps to reproduce the issue

Ping one esp8266 node from three different machines with different data sizes as fast as possible:

term1> sudo ping6 fe80::5ecf:7fff:fe80:3f08 -Ieth0 -s1392 -i 0
term2> sudo ping6 fe80::5ecf:7fff:fe80:3f08 -Ieth0 -s512 -i 0
term3> sudo ping6 fe80::5ecf:7fff:fe80:3f08 -Ieth0 -s52 -i 0

Expected results

All these problems above only occur on very heavy network load. Under normal conditions esp_wifi is working stable, for example under following conditions:

term1> sudo ping6 fe80::5ecf:7fff:fe80:3f08 -Ieth0 -s1392 -i 0.15
term2> sudo ping6 fe80::5ecf:7fff:fe80:3f08 -Ieth0 -s512 -i 0.15
term3> sudo ping6 fe80::5ecf:7fff:fe80:3f08 -Ieth0 -s52 -i 0.05
@gschorcht gschorcht added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Area: network Area: Networking Platform: ESP Platform: This PR/issue effects ESP-based platforms Area: cpu Area: CPU/MCU ports Type: tracking The issue tracks and organizes the sub-tasks of a larger effort labels Jan 25, 2019
@miri64 miri64 added this to the Release 2020.07 milestone Jul 4, 2020
@benpicco
Copy link
Contributor

benpicco commented May 3, 2021

I noticed that with esp_now the esp8266 would lock up after a few minutes (when connected to a border router).
It only prints

2021-05-03 15:13:17,642 # scandone
2021-05-03 15:13:27,381 # LmacRxBlk:0
2021-05-03 15:13:28,382 # LmacRxBlk:0
2021-05-03 15:13:29,383 # LmacRxBlk:0
2021-05-03 15:13:30,384 # LmacRxBlk:0
2021-05-03 15:13:31,384 # LmacRxBlk:0
2021-05-03 15:13:32,385 # LmacRxBlk:0
2021-05-03 15:13:33,386 # LmacRxBlk:0
2021-05-03 15:13:34,387 # LmacRxBlk:0
2021-05-03 15:13:35,387 # LmacRxBlk:0
2021-05-03 15:13:36,388 # LmacRxBlk:0

and does not react to shell input anymore.

(can be triggered by ping -f to the esp8266's address)

@MrKevinWeiss MrKevinWeiss removed this from the Release 2021.07 milestone Jul 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: cpu Area: CPU/MCU ports Area: network Area: Networking Platform: ESP Platform: This PR/issue effects ESP-based platforms Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Type: tracking The issue tracks and organizes the sub-tasks of a larger effort
Projects
None yet
Development

No branches or pull requests

4 participants