Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random hangs [Arduino UNO WiFi REV2] #106

Open
DesktopMan opened this issue Mar 7, 2020 · 12 comments
Open

Random hangs [Arduino UNO WiFi REV2] #106

DesktopMan opened this issue Mar 7, 2020 · 12 comments
Labels
type: imperfection Perceived defect in any part of project

Comments

@DesktopMan
Copy link

Library version: 1.5.0
Firmware version: 1.3.0

I've been stability testing by sending one HTTP get request per second, and I see random hangs after everything from 5 to 1000 requests. The whole Arduino freezes, so looks like it happens in one of the WiFiNINA library functions.

Full test code is attached. server and port must be set to a server that responds to HTTP get requests, e.g. hosting a simple index.html with python -m SimpleHTTPServer.

The test code outputs a period on every new connect and a dash when the previous connect hasn't been completed yet. One hundred statuses per line.

Here's example output from an early hang (5 requests):

20:52:16.473 -> Booting up.
20:52:17.211 -> Wi-Fi firmware: 1.3.0
20:52:22.235 -> Connecting to Wi-Fi... Connected.
20:52:26.548 -> .....

You can also enable the watchdog to force a reset if you want to see how stable it is over time. Note that it will just reboot forever if connect takes longer than 8 seconds. There really should be a non-blocking version of this library...

This might be a firmware issue but even so the library shouldn't hang so I'm posting it here first.

Code: stability.txt

@Ravenbs
Copy link

Ravenbs commented Mar 13, 2020

Think it is the same as: #103

Tip for your Watchdog:
Use a timer library with a timer ISR executing each second.
Use this ISR to reset the WD, but with an if. Something like this:

void TimerCallback0(void)
{
  if( (millis() - watchdogTimerMillis) > WD_TIMEOUT )
  {
    Serial.println("WD ALARM - going for Reset soon!");
  } else {
    Watchdog.reset();
  }
}

Now in your Main loop call:

void wdReset()
{
  watchdogTimerMillis = millis();
}

This allow for a 2 Minute watchdog.
Because new connecting to wifi takes most of the time longer as 8s this is the solution for at least are able to reconnect and still use a watchdog.

@roberthartung
Copy link

I am not seeing WiFi hangs, but rather corrupt data!

@Ravenbs
Copy link

Ravenbs commented Mar 13, 2020

@roberthartung:
Yes, but you caputred my thread/Bug which is about disconnects for your stuff. So the link is for Issue 103 - which is random dissconnects. And I think this toppic is also those disconnects and then WD Reset due to the long reconnect time.

@roberthartung
Copy link

@roberthartung:
Yes, but you caputred my thread/Bug which is about disconnects for your stuff. So the link is for Issue 103 - which is random dissconects. And I think this toppic is also those disconnects and then WD Reset due to the long reconnect time.

Well it linked to my comment before, that's why I was clarifying. And your issue is "unstable wifi" not "random disconnects" ;)

@Ravenbs
Copy link

Ravenbs commented Mar 13, 2020

Was my fault using the link from the e-mail notification. You are right and I fixed this.

Hope they find the issue soon, think your intput is very valuable.
Some routers disconnect on corrupt packages, maybe this is the root cause.

@DesktopMan
Copy link
Author

With the watch dog disabled this test script will hang forever when the problem happens, so it's not just that the connect is taking a long time. In fact the Connecting to Wi-Fi... text is never printed, so I believe it hangs somewhere else.

I think this should be investigated separately for now.

@per1234 per1234 added the type: imperfection Perceived defect in any part of project label Mar 14, 2020
@szaiftamas
Copy link

szaiftamas commented May 8, 2020

I have the same problem with MKR1010. When the wifi router gone away, the
int WiFiClient::connect(IPAddress ip, uint16_t port) can be hang.
I made a deep analysis in the library and this is hang in this function:

void ServerDrv::startClient(uint32_t ipAddress, uint16_t port, uint8_t sock, uint8_t protMode)

the problem is pointing to these three function:

SpiDrv::spiSlaveDeselect();
//Wait the reply elaboration
SpiDrv::waitForSlaveReady();
SpiDrv::spiSlaveSelect();

The waitForSlaveReady() including an infinite loop on digitalRead(SLAVEREADY) , so it can be hang.
The spiSlaveSelect() including loop on digitalRead(SLAVEREADY), but this is controlled by timeout.
Why do you use timeout in second case, if you not use it in first case?
Can I add timeout for this?

@apetryk2
Copy link

I added a comment to issue #18 where you can reproduce a similar hanging issue by trying to ping an IP address for a host that is offline. This reproduces reliably for me on a mkr wifi 1010.

@apetryk2
Copy link

Something else I've observed is that pinging the arduino from an external source will cause the hangs.

This is easily reproduced:

  1. Connect the arduino to wifi.
  2. Have the arduino loop ping an address where there is no host to reply. The ping command will timeout after 5 seconds (by design) and repeat.
  3. From an external source, ping the arduino every second or so.
  4. Now observe that the loop in step 2 'hangs'.
  5. Stop pinging from the external source. The loop resumes (after 5 seconds).

@leonbrag
Copy link

I have similar problem. I have Uno WiFi rev 2 automation sends ssl request every 30 sends and controlled using web server. WiFinina is used for both.

After 10 hours or sometimes less ssl http requests stop an web server no longer accepts connections.

@DesktopMan
Copy link
Author

I updated the library to 1.7.1 and firmware to 1.4.1 to see if this has been improved with the recent fixes. Now the test code in my orginal post behaves like the connection to the server is never closed. This did not occur with library 1.7.1 + firmware 1.3.0.

@Uup115
Copy link

Uup115 commented Jan 10, 2022

This is related to #207

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: imperfection Perceived defect in any part of project
Projects
None yet
Development

No branches or pull requests

8 participants