-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FW crash: Enable the watchdog to self-detect/recover a FW crash #349
Comments
Please install v2.4.2 firmware. This release contains stability releases. |
Hi, |
Hello,Improving the firmware robustness is certainly a good thing, however, regardless of how robust a firmware is, there could always be unexpected borderline exceptions. Especially for devices running 24/7.I therefore think it would be good to enable the watchdog (its purpose is to catch unpredictable crashes and reset + recover the system). I always ask the firmware engineers I work with to do this (I am electrical engineer at Apple). I don’t know the details of the ESP32, but nearly every micro has a watchdog function, I believe the ESP32 should also have it. The tricky thing might be to find a good place in the code to service regularly the watchdog (reset the watchdog timer) to show that the system is still alive and prevent the timeout that resets the system, but I am sure you can find the right placeAnother option, much more simple and could be implemented right away as bandaid, would be to add a new function, configurable by the user I. The UI, that restarts the system based on a schedule defined by the user (for instance every Thursday night at 3am). However, in that case, it would be good to remember the blind status/position from before the restartAny chance you can implement one of these recovery mechanisms? Or maybe you can come-up with an even smarter onThank you very muchBest regardsGerdOn 2 May 2024, at 11:30, magtimmermans ***@***.***> wrote:
Hi,
I have the same and more often with the V2.4.2 release. It is not responding which result that the blinds won't react. I must say that I am very impressed by this software and how it works. Great job! Hopefully, you can also improve the stability.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: ***@***.***>
|
I will be adding wdt to either the next release or the release after. There are some challenges with it at the moment though as I need to get my head around the interrupts that control the radio receiver. For OTA and git requests I will have to disable it I believe. EDIT: Are you sure you are running the official release? There were issues with v2.4.1 where UDP packet responses were leaking memory. See #273 |
I too have had the device stop responding about 3 times since I set it up, about 3-4 weeks now. Reboot fixes. I don't have any absolute way, that I know of, to see what's causing it, but it just happened last night. I also had another device last night report that it lost it's internet connection, but it recovered. So I'm suspect that these random issues might be the device not recovering from a dropped network connection. I have not looked at the code yet, but I have done a bit of ESP32 programming and work with lost WiFi connection recovery code. Do you think it's possible the WiFi recovery might need a look see??? Also, is there some way of capturing logs after an event to help troubleshoot? Thanks for the great product, I love it! |
In the v2.4.3 pre-release there is now a watchdog timer that will reboot the ESP if it encounters any long running processes. @geokscott which version of the firmware are you running. There was a udp leak that was fixed with v2.4.2 that after a period of several days would stop responding when the SSDP traffic was high. Essentially the datagram response was not being destroyed. ESPSomfy RTS now does roaming to ensure it has the best possible connection for mesh networks and the connection is checked on each loop cycle. |
Yeah @rstrouse , I forgot to say, I've been running 2.4.2 from the start. I have tried with and without Roaming. I originally had it turned on, but after a couple loss of service, I turned it off. I thought maybe that fixed it, but then last night the problem cropped up again. You didn't say, anyway of capturing logs? Can I run the device off a USB and see logging info? I would be willing to do that and do some testing. PS: Sorry, I just noticed this thread is for version 2.4.1. I just assumed it was 2.4.2 |
If you connect it with a data serial connection you can watch the logs with esphome.io. |
@rstrouse version 2.4.2 still crashes every few days. I'm running on ESP32 C3. |
I did upgrade the core to 2.0.16 on the pre-release. Install v2.4.3 pre version since that core fixes serial interrupt problems. Also if you are using HA you can watch the memory usage. There are memory entities now on the device. |
Thanks. I've just upgraded the firmware to 2.4.3 pre. Let's see how it goes. |
@cvhoang, @geokscott, and @cvhoang let me know if the update to the pre-release v2.4.3 solve the issue with your boards. |
I also installed pre-2.4.3 and updated my HA component a couple days ago. So far so good. I will let you know if anything changes. |
Me too, 2 days ago and still running fine. |
This morning the system was not avalible for a few minutes (2.4.3 pre). I am testing with uptime kuma every 2 minutes. |
If it went off for a few minutes and came back then that means it lost connection but had to go through a cycle to come back or it lost wifi connection long enough to trigger the AP mode for 3 minutes. Do you have HA and what was the network RSSI and free memory at the time. I am still getting a handle on how long it needs before the wdt is triggered. Currently, I have it set to 5 seconds but that may be too short. |
I have increased the wdt timeout to 7 seconds are shortened the internet check. v2.4.3 has been released so report please open a new issue for any new issues. |
@rstrouse: I forgot to reply to this thread. V2.4.3 has been working well for me. Thank you for your work. Really fantastic stuff. |
Thanks for letting me know. I think the stability issues are a thing f the past. |
I meant to respond also, tried to find this thread last week but didn't see it because it was a closed item. Mine has been stable also. I have only noticed one oddity since the new version, and only once did this happen. I have a wind sensor, an anemometer type. It has been working fine, but one day when there was little wind the web interface was reporting that it has been sensed (the yellow warning icon was showing) It seems it was stuck on from the day before. I did a reboot from the web interface and it cleared out. I have been keeping the web interface open in the browser tab and check it daily since updating the firmware. That's why I even noticed. BUT other than that, it's been rock solid. Good job! SO, when are you going to make these a for sale product and submit your component to HA and make it official? If you need any help with PCB designs or hardware let me know. Also 3D printing enclosures. I tinker in all of that. |
Thanks for confirming. I chased this issue for a while and even had several devices that never exhibited the issue. The wind sensor does not reset until it gets a sensor frame indicating no wind. Since this is not a persisted state it cleared on reboot. |
Hello, thank you very much for all the updates in the past weeks. Version 2.4.3 is working well so far. Great job! In case you have some time, I would still recommend to add an option in the config page to auto-reboot the device every so often (weekly, monthly, or so, user configurable). Even with the most stable operating systems, it is a good habit to reboot from time to time (auto-reboot can typically be set with the "pmset repeat wakeorpoweron" command line in the recent MacOS versions, useful in systems running 24/7). Better to do this reboot on a scheduled non-critical time (like Monday nights at 3am) rather than run the system until it reboots by his own on an unknown day/time. Beside that I want to re-iterate what I said the original post: The system works great and I am really impressed about the work that you did and it's maturity! Not only the firmware itself, but also the quality of the documentation! |
So what you're saying is the wind sensor sends a close command when it is tripped, and is supposed to send a no wind command when the wind dies down? Well either my wind sensor is NOT sending a message when the wind stops OR the receiver is not picking it up. I cannot tell which for sure.... I have been monitoring the wind sensor warning on the UI over the past 3 days while I've been outside and it has triggered and closed my awning at least once every day. Each time the warning never goes away. I don't now how to verify this, but I suspect this wind sensor is NOT sending an all clean message out at all. It only sends a close command when it has a sustained wind speed of X amount. If that IS the case, could your software be modified to automatically clear the sensor warning icon after a period of time elapses with no wind sensor triggers? Maybe other types of sensors send clear messages, but I'm pretty sure this one does not. This is the sensor I'm using: https://www.somfysystems.com/en-us/products/9012499/eolis-rts-wind-sensor-24v-dc-kit-includes-sensor-and-transformer EDIT 6/3/2024 Today is a breezy day. I slightly adjusted the location of my receiver to check for possible reception issues. I opened the awning and waited for the wind to close it. A while after it was closed, the wind warning icon cleared. So, obviously I had a reception issue before! I guess the wind sensor does not have much or very reliable range. It's only maybe 60 feet from my receiver. Thanks - George |
Hardware
ESP32
Firmware version
v2.4.1
Application version
v2.4.1
What happened? What did you expect to happen?
First: The system works great and I am really impressed about the work that you did and it's maturity.
The system works for a couple of months now, and I realize that every few weeks, the firmware crashes (web server/page of the device not reachable) and shades not controllable. After power cycle it works again.
I am not software engineer and have some difficulties to look at the code, however by doing a search in the source about "Watchdog" I coud not find anything.
I think that it would be good to enable the ESP32 watchdog, so that in case the FW crashes (for any reason, even if not being a bug, like ESD discharge, temperature, or whatever else unexpected), the system will restart. It just makes the system more robust.
I could of course put a timer on the power socket and power cycle the device each night, but I think that it would be much more elegant to use the watchdog for this (it's the purpose of the watchdog to handle this kind of things).
Thank you
How to reproduce it (step by step)
Logs
No response
The text was updated successfully, but these errors were encountered: