Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Controller reporting "Jammed" and then "Ready", operations hang #104230

Open
ember1205 opened this issue Nov 20, 2023 · 149 comments
Open

Controller reporting "Jammed" and then "Ready", operations hang #104230

ember1205 opened this issue Nov 20, 2023 · 149 comments

Comments

@ember1205
Copy link

The problem

I am commonly seeing my USB device (Aeotec Z-Stick 7 Plus) report being "Jammed" and then shortly after report being "Ready." Around this time, any operations will fail to occur and the system does not go back and "recover" from missed actions / automations.

What version of Home Assistant Core has the issue?

System Information version | core-2023.11.2 -- | -- installation_type | Home Assistant OS dev | false hassio | true docker | true user | root virtualenv | false python_version | 3.11.6 os_name | Linux os_version | 6.1.59 arch | x86_64 timezone | America/New_York config_dir | /config
Home Assistant Cloud logged_in | false -- | -- can_reach_cert_server | ok can_reach_cloud_auth | ok can_reach_cloud | ok
Home Assistant Supervisor host_os | Home Assistant OS 11.1 -- | -- update_channel | stable supervisor_version | supervisor-2023.11.3 agent_version | 1.6.0 docker_version | 24.0.6 disk_total | 228.5 GB disk_used | 6.1 GB healthy | true supported | true board | generic-x86-64 supervisor_api | ok version_api | ok installed_addons | Z-Wave JS (0.3.0), Let's Encrypt (4.12.9), Z-Wave JS UI (3.0.2)
Dashboards dashboards | 2 -- | -- resources | 0 views | 1 mode | storage
Recorder oldest_recorder_run | November 10, 2023 at 7:57 PM -- | -- current_recorder_run | November 16, 2023 at 3:32 PM estimated_db_size | 60.77 MiB database_engine | sqlite database_version | 3.41.2

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant OS

Integration causing the issue

zwave-js

Link to integration documentation on our website

No response

Diagnostics information

zwave_js-6d4b40be1e77c3e33d78bdf24d66a9c3-USB Controller-d288fb52eab027710676687815136717.json (1).txt

Example YAML snippet

No response

Anything in the logs that might be useful for us?

No response

Additional information

I can provide debug logs if useful for zwave-js: Please let me know if you want an entire day's worth of info or if you would prefer a certain amount of time leading up to and then after an event.

@home-assistant
Copy link

Hey there @home-assistant/z-wave, mind taking a look at this issue as it has been labeled with an integration (zwave_js) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of zwave_js can trigger bot actions by commenting:

  • @home-assistant close Closes the issue.
  • @home-assistant rename Awesome new title Renames the issue.
  • @home-assistant reopen Reopen the issue.
  • @home-assistant unassign zwave_js Removes the current integration label and assignees on the issue, add the integration domain after the command.

(message by CodeOwnersMention)


zwave_js documentation
zwave_js source
(message by IssueLinks)

@markus99
Copy link

markus99 commented Nov 27, 2023

Very similar issues here, ever since ZWave JS upgraded to 12.x.x it's been doing this. Absolutely maddening. Whole point of ZWave is to turn on / off / set levels of devices - now it just hangs and hangs and hangs.

Was using an Aeotec Gen5+, upgraded to Zooz 800 series stick.

image

ZWaveJSLog.txt

zwave_js-6d4b40be1e77c3e33d78bdf24d66a9c3-USB.Controller-d288fb52eab027710676687815136717.json.1.txt

@ember1205
Copy link
Author

Very similar issues here, ever since ZWave JS upgraded to 12.x.x it's been doing this. Absolutely maddening. Whole point of ZWave is to turn on / off / set levels of devices - now it just hangs and hangs and hangs.

Was using an Aeotec Gen5+, upgraded to Zooz 800 series stick.

image

ZWaveJSLog.txt

zwave_js-6d4b40be1e77c3e33d78bdf24d66a9c3-USB.Controller-d288fb52eab027710676687815136717.json.1.txt

A few points of note that may, or may not, be of value...

Switching between 800LR and 500/700 series controllers seems to require a rebuild. Last I checked (somewhat recently), there wasn't a reliable way to create a backup of an 800LR controller let alone restore it even to the same controller. And there is apparently some compatability issues between the 800 and older systems where the data can't be migrated.

My 700-based device generally works ok. I have read decent results from the 800LR community as well.

I was able to significantly reduce the occurrence of the "Jammed" situation by basically turning off / turning down the frequency of information reports from the various endpoints to sort of 'bare minimums.' I have found that many devices seem to come factory-configured to be super chatty and send all kinds of info all of the time and had modified a fair number of these in my setup already. But, a few days ago, I went through the WHOLE system and modified absolutely every device that I could to turn off automatic "periodic" reports, changed reports to Basic where I still needed reporting, and removed certain data points that I didn't need from various devices (like light levels from a sensor that I'm only using to collect humidity levels from ). I would estimate that I further cut my network traffic by at least half due to the significant reduction in log file size for a 24 hour period after the changes.

The "Jammed" status still appears, but it takes significantly longer to do so now and there does still appear to be the possibility of a relation between trying to communicate with devices that are "too far away" for smooth communications and the situation occurring.

When my device reports "Jammed", it almost immediately reports "Ready" shortly after (within a few seconds). My network is about 40 devices spread across three floors of my house (basement plus two living space stories above) with the controller being on the second floor.

@markus99
Copy link

markus99 commented Dec 1, 2023

This was working, from everything I've seen, BEFORE the 12.x ZWave upgrade and now hasn't for almost 2 months. It's ruined my HA experience. My old Aeotec Gen5+ was running pretty solid, now it's trash. Upgraded to Zooz 800, same issues. Now trying to wait for Hubitat to ship their device to me to go back to running a 3rd party hub (was on SmartThings prior to trying [and failing miserably] to run Zwave/Zigbee [SkyConnect] locally). What a disaster - and everyone's pointing to Silicon Labs driver being the issue...

@ember1205
Copy link
Author

This was working, from everything I've seen, BEFORE the 12.x ZWave upgrade and now hasn't for almost 2 months. It's ruined my HA experience. My old Aeotec Gen5+ was running pretty solid, now it's trash. Upgraded to Zooz 800, same issues. Now trying to wait for Hubitat to ship their device to me to go back to running a 3rd party hub (was on SmartThings prior to trying [and failing miserably] to run Zwave/Zigbee [SkyConnect] locally). What a disaster - and everyone's pointing to Silicon Labs driver being the issue...

Your comments are self-conflicting. You can't say "everyone's pointing to Silicon Labs driver being the issue" (implying you think that's inaccurate / untrue) while also stating both of your ZWave controllers are "trash" and that you "failed miserably" with SmartThings. It's all the same controller chips, designed and manufcatured by SL, that you're ultimately hitting issues with.

Fix your devices' configurations, regardless of the platform you tie them into, or you're going to continue to have problems.

My setup still encounters the "Jammed" error but the interval now is down to once every 24-36 hours. And the system gets itself back into an operational state fairly quickly with the Soft Reset option.

The SDK needs to be validated or fixed as does the driver built from it. Once that's done, additional steps can be taken by those that develop software like node-js and node-js-ui to ensure they have the updates in place as well. Also, this will need to be addressed by the hardware vendors to ensure the firmware for their controllers gets updated as necessary.

@markus99
Copy link

markus99 commented Dec 1, 2023

I have ~25 devices, only 3 of which are sensors - the rest are switches (typically Jasco/GE [about 1/3 dimmer, 2/3 toggle only]). I reduced / updated all 3 sensors to the bare minimum of reporting - to the best I was able / knowledgeable to do.

I have not added any add'l endpoints in the past year and the system, when on SmartThings was rock solid - as it was 99.9% of the time before October of this year.

From what I've seen on HA Community forums, and here on github - I'm not the only one having issues - and the firmware of my USB stick (the original Aetoc Gen5+) hadn't been updated in well over year - ceteris paribus, seems to me it's the ZWave JS 11.x -> 12.x update(s).

@ember1205
Copy link
Author

Are you all ZWave? I'm having a hard time following the bouncing ball with some of your comments... You stated ST was "rock solid" but also said you "failed miserably" with it when trying to get ZWave running locally. I'm not sure what to really make of this pair of statements.

For ZWave only, these comments may be helpful:

The GE/Jasco devices tend to be less configurable that other similar devices. They also tend to be less chatty on the network overall which should be a -good- thing.

If your sensors are ALSO running ZWave, you will want to look at things like the wakeup interval, whether they are sending automatic reports, what kinds of reports, and on what intervals. Example: I have an Aeotec door/window sensor installed on a bedroom closet. When you open the doors, the light goes on, close the doors, light goes off. The sensor is configured with the following details:

Report When Open
Low Battery Threshold 20%
Low Battery Check enabled
Low Battery Check interval 86640 (that's just slightly less than once per day)
Motion Sensor Triggered Command: Basic CC Report
Wakeup Interval (settable within ZWave JS UI only): 3600 (once per hour)

That's it. If the door moves, it sends a notification that it was opened/closed, checks the battery level once per day, wakes up to check in with the controller once per hour.

If you hadn't updated the firmware on your Gen5+ controller stick to anything higher than 7.18.x, then the driver issue isn't at all related to the issues you saw with it. You need to be debug logging your setup and looking through the logs for chatty nodes and dropped packets that are occuring leading up to the Jammed state. In my case, I have a couple of devices in my basement that are having a tougher time communicating with the controller which is on the second floor. Dropped packets from a sensor in the basement was causing communications issues and 'upsetting' the controller. Drastic changes to its configuration and what/when it reports has helped significantly and was my only option at the time since I couldn't move the controller closer.

Looking at the node map in ZWave JS UI and understanding where there are poor quality routes helped me to identify nodes that needed some adjustments that were more drastic than others. It also reminded me of "dead" nodes that I had unplugged that the controller was still trying to communicate with periodically... I removed them from the network to assist in cleaning up the amount of traffic.

As I've said... I still get the error - there is still a problem. But the frequency has gone down by orders of magnitude and it self-corrects much more reliably. My setup is functional "most of the time" again and should only get better once this issue is corrected.

@markus99
Copy link

markus99 commented Dec 1, 2023

Started on using Zwave in '15/'16 via SmartThings / IFTTT / Stringify. When the latter shut down I moved over to HA in ~ 2019 - integrating Zwave devices via SmartThings. When Samsung bought SmartThings and starting screwing with it I decided to move to HA 'local' Zwave control with Aeotec stick in late 2020 I think it was. No issues throughout the entirety of those changes (other than Jasco / GE switches getting the flashing blue light of death issue).

Anyhow, send a command, switch does what you want - flip switch manually, HA updates accordingly / quickly, etc. I run very few sensors (Aeotec MultiSensor 6's, and all hard wired to power and a single Zooz motion), the rest are light switches or outlet-type plug-in on/off devices.

9/26/23 is the node-zwave-js 12.0.0 release (zwave-js/node-zwave-js@51aa4ba) - and that's when it all hit the fan.

Thought my Aeotec Gen5+ stick might have been the issue (being 3-4 years old), so I tried upgrading (good times) to the Zooz 800 series. Same issues. Controller jammed. Commands not affecting switches. Lights on when should be off, vice-versa. Constantly.

Wrote a bunch of scripts to 'retry' turn on / turn off with do / while loops. Updated countless automations to use them vs. 'regular' switch.turn_on or light.turn_off commands. They help, but not always.

I've tried 'streamlining' my network for the 3 sensors I do have and the like (thank you for these suggestions), no changes in stability.

Keep in mind, for ~4 years+, all has been fine and I've only added 3-4 new switches over that time - nothing major.

I do run a Zigbee (SkyConnect USB) network as well in the home and that's been ok. But Zwave, until the 12.0.0 release was rock solid, no issues, like ever. Now, it's awful. Click an entity from a glance card in Lovelace and >50% off the time it fails - quicker now to just get up and hit the thing manually. Balks thru Google Home as well. #failarmy

The old Aeotec Gen5+ hadn't had its firmware updated in sometime, actually just checked and it was on:

image

So the old USB stick / old firmware had issues, the new stick w/ new firmware (7.19.3) has issues and the only change is this node-zwave-js -> 12.x.x change.

Seems to me (as it shows in other threads as well) that this is the culprit. I'm no expert, but it's what my experience shows everything pointing towards.

None of this frustration is aimed at anyone, merely the situation. I've spent countless hours, have zero wife-approval factor, and now hundreds of $$s trying to fix the problem - all to be told to wait for Silicon Labs to update something - when (and I realize most ppl work for free on HA {yes, I do support and do subscribe to their annual subscription}) to me it's not even an SI firmware issue. Regardless, appreciate everyone's help here, but it's still over-the-moon frustrating regardless.

@ember1205
Copy link
Author

I have some of those Aeotec Smart Sensor 6 devices... they can be SUPER chatty if you aren't careful with the configuration. They can track a lot of different types of variables, but I only use one to track humidity. I have the automatic reporting disabled and I'm -only- sending selective reports using the Humidity Thresholds to notify when it's above a certain level (to turn on the smart plug for the dehumidifier) and when it's below a certain level (to turn that plug off).

I saw what appeared to be a relatively functional setup until actually BEFORE the release you mentioned. My issues started somewhere more around the 9/10-ish range and actually settled back down around 9/14 or so. I can't say definitively whether I was seeing the "Jammed" reports or not because I hadn't yet discovered that piece of information, but something was definitely misbehaving.

One of the ways I knew something was up was due to how the two plugs that operate my coffee makers were operating... I have a drip maker and a single cup maker plugged into a smart plug each and then connected to the same household circuit. This is bad in the sense that trying to operate both at once will overload the circuit (15A) since each one can easily draw 12A+ when heating. The plugs monitor the power use and a high draw from the drip maker powers off the single cup brewer to prevent circuit overloads.

Any time I have had issues, the single cup brewer's plug clicks almost continuously for maybe 30 seconds or so. Sometimes it repeats after a 10-20 second "break". I believed I potentially had a bad plug and swapped it out. Problem persisted. Each time I felt like I made progress with settings and such, that clicking was a dead giveaway that something was still off. I haven't heard that happen now in quite a few days.

There's some discussion about possible frequency interference between the USB controller / bus and the ZWave radio, but I'm not convinced. First, the information seems to be related explicitly to USB3.0 which operates at 2.5GHz. This WILL cause some crosstalk and interference with Zigbee, but not ZWave since it's 903MHz.

USB2.0 is a little different since it operates at 240Mhz which at 4x frequency would be 960MHz. This could be sufficiently close enough to the 903 to generate some interference, but ONLY if it were to actually oscillate at that actual multiplied frequency.

In short, if your compute device has USB3.0 ports, and/or you have a USB3.0 SSD running it, you could be getting some odd behavior with the Zigbee. Moving that well away with an extension cable with shielding may assist there. And, if the Zigbee is being interfered with and its clogging the communication bus to where ZWave stuff can't go out or come in, then there could be a domino effect there. This is all speculation, though... Your debug logs may be helpful - look for activity leading up to a Jammed event to see what's actually happening in those last 5-10 seconds beforehand.

@otterlo
Copy link

otterlo commented Jan 8, 2024

I am looking at above discussion but i dont think it is a setup problem. My zwave setup with the 700 series is working excellent for months in a row until somewhere in december time i experience the same as above. Jammed receiver. Can not switch any module on or off. I havent changed anything. Only thing i do is the normal HA and Zwave js upgrades.. these errors must be related to the recent changew from HA upgrades. I was never thinking to leave zwave but options are running out now

@Polosaz
Copy link

Polosaz commented Jan 9, 2024

Thanks to this thread, I realized that my problem (reported here: #106827 ) is also this one.

I hope there is a solution soon because it has dismantled the proper functioning of my home automation.

imagen

@ember1205
Copy link
Author

The biggest reason that this problem isn't getting resolved is because no one will acknowledge it exists. Everywhere it has been reported, the code owners for that piece are, in essence, claiming "my code is fine" and the project piece owners are not working together to understand the underlying issue.

The complete lack of any ability to even understand how to track this issue down to the core component will likely result in me leaving HA and my project is barely six months old. It has taught me a lot about ZWave, but my learnings are likely going to simply be put to use elsewhere because no one is willing to own this problem.

@markus99
Copy link

markus99 commented Jan 9, 2024

@raman325 - I commented, as have numerous others on this issue - which I believe to be the same as #106827. Any thoughts here?

Tagging @MartinHjelmare and @AlCalzone as people who've had commits in the last 3-4 months as well. Appreciate the help here all.

@MartinHjelmare
Copy link
Member

Please don't tag people unless they've asked you to do that.

The integration can't do anything about a jammed controller. That problem is on the device or driver side. The driver project is another issue tracker.

I recommend reading this troubleshooting section for how to improve your network health.

https://zwave-js.github.io/node-zwave-js/#/troubleshooting/network-health

@AlCalzone
Copy link
Contributor

AlCalzone commented Jan 9, 2024

This is a firmware bug. The driver (node-zwave-js) tries it's best to work around it.
Before that workaround was added you'd randomly have nodes marked as dead, even if they weren't.

It may be fixed in the firmware based on SDK 7.21.0, but Silicon Labs are not 100% certain about it.
It seems like this isn't fixed yet, as of SDK 7.21.0

@ember1205
Copy link
Author

This is a firmware bug. The driver (node-zwave-js) tries it's best to work around it. It may be fixed in the firmware based on SDK 7.21.0, but Silicon Labs are not 100% certain about it.

Is there any information provided by SL that identifies it as a firmware bug? If you have any links to anything that has been published by SL, it would be interesting to read.

7.21 is not addressing the issue as some folks are running firmware based on that SDK and the issue persists.

@ember1205
Copy link
Author

Please don't tag people unless they've asked you to do that.

The integration can't do anything about a jammed controller. That problem is on the device or driver side. The driver project is another issue tracker.

I recommend reading this troubleshooting section for how to improve your network health.

https://zwave-js.github.io/node-zwave-js/#/troubleshooting/network-health

Where can we learn more about the specific issue with the device or the driver? This is affecting MANY people with various chipsets (500, 700, 800, etc.) across devices from many manufacturers, using various versions of the node-zwave-js software. And if it is a problem with a device, driver, or even firmware, how will addressing details for network health remove the error?

The point here is that this issue just keeps getting deflected from one section of the code to another, people are being told "it isn't our issue", and no one has any actual details about what's really going on or how to actually dig into the weeds to find it. The easy response is to blame SL, firmware, the SDK, and the manufacturers of the devices. But there's no actual evidence to support any of that that we have seen.

@AlCalzone
Copy link
Contributor

AlCalzone commented Jan 9, 2024

7.21 is not addressing the issue as some folks are running firmware based on that SDK and the issue persists.

Do we have driver logs of this?

And as for the evidence, we're in direct contact with Silicon Labs and they have confirmed the issue is on their side.

@ember1205
Copy link
Author

I don't have anything, but have chatted with others running 7.21 that indicate the "Jammed" status still occurs. I have stopped any additional time/effort troubleshooting for this and have my controller remaining at 7.17 until there's a known fix. I've also discontinued any updates to HA core or the components until there are fixes that have been tested and verified.

My personal next step -might- be to revert to the last known good working instance of all things HA prior to the early/mid September release that so many have pointed to as being the point when this showed up.

@AlCalzone
Copy link
Contributor

Do you happen to know which addon/driver version the affected users are running? Can you send them here to provide driver logs?

@ember1205
Copy link
Author

Not only do I now know what software pieces they are running, I couldn't tell you what I'm running. I can't differentiate between node-zwave-js, ZwaveJS, ZwaveJS UI, etc. Honestly, it would be immensely helpful to add a button to the console to dump certain core data pieces like that in simple terms that are not only useful to the developers but would be something that the users could understand as well.

@Polosaz
Copy link

Polosaz commented Jan 9, 2024

Do you happen to know which addon/driver version the affected users are running? Can you send them here to provide driver logs?

I run 7.19.2 Aeotec Firmware, 0.4.3 Zwave JS addon, HA Core 2024.1.1, HA Operating System 11.3.

You have my log in #106827

@otterlo
Copy link

otterlo commented Jan 9, 2024

i am pleased to see that the discussion is getting continued on this topic, though not sure if the fix will come.
for me Z-Wave will remain one of quite important home automations protocols and invested a lot in the devices and HA, and when HA took up the development to integrate in HA it was very good news for me, i could avoid having 3rd party zwave Hub.

it seems we finally struggle based on silicon labs drivers / firmware, which i hope we can overcome soon. thanks to all contributing to find a solution, much appreciated

@kpine
Copy link
Contributor

kpine commented Jan 9, 2024

Not only do I now know what software pieces they are running

Then could you provide a link to wherever these users are talking about 7.21.0 behavior?

I couldn't tell you what I'm running. I can't differentiate between node-zwave-js, ZwaveJS, ZwaveJS UI, etc.

The Z-Wave integration configuration panel will tell you the versions:

image

The URL will give an indication of what you've installed, but you should be able to tell based on how you installed it.

You can also download the integration diagnostics and device diagnostics, which will report which versions are being used (and other significant info used for troubleshooting).

image

Honestly, it would be immensely helpful to add a button to the console to dump certain core data pieces like that in simple terms that are not only useful to the developers but would be something that the users could understand as well.

Could you clarify what "the console" is? Do the diagnostics above fulfill this request, or are you looking for something else? The diagnostics are mostly for developers though, not sure what kind of "dump" information would be accessible for users.

As for debug logs, you can enable these from HA with one click in the integration panel. The instructions are listed in the in the integration documentation.

image

Or, if you are using Z-Wave JS UI, you can get the driver logs directly from it, see their documentation: https://zwave-js.github.io/zwave-js-ui/#/troubleshooting/generating-logs?id=driver-logs

Or, if you are using the official core add-on, you can enable logging to files in the settings and grab the log files later (default 7 days saved) using the File editor add-on. See instructions in the add-on documentation.

The last two options are best, IMO, for driver troubleshooting. The integration debug logs add a lot of extra noise, but is still better than nothing, and easiest to obtain.

@kpine
Copy link
Contributor

kpine commented Jan 9, 2024

I run 7.19.2 Aeotec Firmware, 0.4.3 Zwave JS addon, HA Core 2024.1.1, HA Operating System 11.3.

I assume the request was to get new data from someone running the new 7.21.0 firmware. 7.19.2 is already known to have the issue. You can request the 7.21.0 firmware from Aeotec tech support if you want to give it a try.

@ember1205
Copy link
Author

@kpine - While I appreciate the details you provided, they are way beyond what is "necessary" for a discussion like this. By "console", I generally mean somewhere in the admin UI. When someone asks "what integration are you running", there should be a single displayed piece of data somewhere that any user can click on to show main components that are in operation. Having to go to six different pages in the admin UI to find 9 different pieces of data is painful and causes these conversations to derail quickly because the information is just way too hard to locate.

I've been involved with all kinds of computer technology for thirty years - it's core to my career. I have a moderately complex home network with a home lab that I use as part of my job. I've run ZWave devices in my house for close to a decade. HA just isn't intuitive or simple when it comes to being able to find basic information like "what driver are you running?" From what I have seen, the software components use different version numbers to indicate the same pieces of software which makes it that much more confusing.

@kpine
Copy link
Contributor

kpine commented Jan 9, 2024

By "console", I generally mean somewhere in the admin UI.

OK thanks for the clarification. The screen shots I posted can all be navigated to from "Settings" page, which is what I would personally call the "admin UI".

When someone asks "what integration are you running", there should be a single displayed piece of data somewhere that any user can click on to show main components that are in operation.

You can find this under Settings (admin UI) -> Devices & services. This page defaults to showing all of the installed integrations. You will see Z-Wave listed. It only takes a few clicks to get there. One extra click for "Configuration" gets you to details about the Z-Wave integration including the component versions. To clarify, since we are talking about Z-Wave JS, the only integration relevant to HA is the Z-Wave integration.

causes these conversations to derail quickly because the information is just way too hard to locate.

Well, hopefully the information provided clarifies how to locate the information. I feel it is pretty easy to access, but I am likely biased. Improvements are welcome, of course. If my instructions were not helpful to you, hopefully they are for anyone else stumbling upon this issue who is thinking of testing firmware 7.21.0. 🤞 My apologies if I have derailed the conversation.

From what I have seen, the software components use different version numbers to indicate the same pieces of software which makes it that much more confusing.

Agreed, it's can be confusing for the average user, unfortunately this is a reality of a tech stack that involves multiple independent components. The component versioning can't be perfectly aligned, as they are independently developed projects with different release cadences. A handy website to clarify the version confusion is https://zwave-js.github.io/which-version/.

@otterlo
Copy link

otterlo commented Jan 9, 2024

For what it is worth: based on above post i checked my firmware version and decided to uograde from 7.19 to 7.20.2
I used the All frequencies version and suprisingly evrything works OK now

No jammed controller and all devices respond again except for some battery operated doorswitches but maybe they first have to wake up properly.

So far i am 2 hours without problems.. not sure if it woll remain though..

@ember1205
Copy link
Author

For what it is worth: based on above post i checked my firmware version and decided to uograde from 7.19 to 7.20.2 I used the All frequencies version and suprisingly evrything works OK now

No jammed controller and all devices respond again except for some battery operated doorswitches but maybe they first have to wake up properly.

So far i am 2 hours without problems.. not sure if it woll remain though..

I can easily get 24-36 hours in between "Jammed" events. I suspect you will absolutely see the error return "soon."

@ember1205
Copy link
Author

ember1205 commented May 29, 2024

No praise needed, but I appreciate the sentiment... Just trying to share what (little bit extra) I learned from all of the banging of my own head against the wall that I did. With the exception of a little of device-specific suggestions for certain config items, the majority of what I've tried to share is basically just me parroting what I took from somewhere else.

Don't get me wrong with my comment about 'incentive' previously. I would love to understand a way to put more pressure on SILabs around this, but I'm not even sure they interact with the end users / general consumers. It -seems- like they are most willing to communicate with developers that are using the SDK for their own needs.

While I personally don't see any real value in review-bombing manufacturers for an issue that they honestly can't control, I do wonder if partnering with them to help them champion our fight back to SILabs might actually have some value. For example, find some of the more common vendors for the controller sticks and see how we, as consumers, might be able to support THEM with data and details to push SILabs for a fix. I know that Chris @ Aeotec is generally pretty easy to work with and does his best to be as helpful as he can for their products. He sent me pretty much their entire firmware library for me to test with and I was able to give him basic info based on what I was seeing. What if we, as a community, could provide him with a lot more specific information about the failures so that he could aggregate it and push back on SILabs with that data to try and get updated code sooner?

@jmwhite5
Copy link
Contributor

jmwhite5 commented May 29, 2024

I have worked for many software and hardware manufacturers. There's real power in some exec at Zooz or Aeotec getting annoyed that their customers are having issues due to one of the suppliers (Silicon Labs) lack of responsiveness in fixing bugs. They'll make a call and put pressure at Silicon Labs. Some VP or director at SL will then give marching orders for paying attention to this issue.

Sharing info on this thread is useful for other end users battling the same issues, but companies usually pay more attention to signals that could impact their bottom line. Providing realistic reviews of a product is a good way to align all the stakeholders. Zooz and Aeotec will feel the pressure and in turn will put pressure on their supplier.

@ember1205
Copy link
Author

Fair enough, but...

SILabs is an $800,000,000 annual revenue company. ZooZ is about $1,500,000 annual revenue. If you were to assume that ALL of their revenues were derived from ZWave devices, their hardware expenses to buy the radios from SILabs might be around $100,000/year. That means that ZooZ's purchases would account for .0125% of SILab's annual revenue.

That's effectively ZERO leverage that ZooZ holds with SILabs.

Aeotec is a larger company at about $2 million annually, but the two together would still account for about 1/50 of one percent of SILab's annual revenues at most. Again, effectively zero leverage. ZooZ and Aeotec could both completely stop buying radios from SILabs and the impact just wouldn't be felt.

@jmwhite5
Copy link
Contributor

@ember1205 I like your earlier comment about working with Aeotec (or Zooz) to help solve this issue. I just emailed Chris asking about how we could help him get traction with Silicon Labs. Will share when I hear back.

@ember1205
Copy link
Author

I'm happy to join the fray there as well as I have a controller AND a decent number of controllable devices from them.

@jmwhite5
Copy link
Contributor

Chris from Aeotec is relaying this message from engineering:

I think going futurewize, these few things will be helpful for us when pushing reports over to Silicon Labs.

  1. Any gateway logs and if at all possible ZSniffer logs using PC Controller 5 to tagging the errors when they occur (may need S2 and S0 security keys so that we can investigate them if S2 or S0 are being used).

What we'll need:
Gateway / HA / ZWaveJS logs of the issue
If ZSniffer logs provided (recommended to send us if a user has this capability):
We'll likely need S2 and S0 secure keys so we can decode the logs
The Z-Wave SDK firmware used for Series 700/800 sticks (to pinpoint the issue on that specified firmware version)

  1. With tons of feedback of the same issue sent by anyone with logs/zniffer logs, we'll be able to report them directly to Silicon Labs.

So it'll be mainly zsniffer and gateway logs that will be helpful in this case, the SDK / firmware version used will be pretty important to understand to aggregate the issues on the specific firmware versions.

@rohrsh
Copy link

rohrsh commented Jun 22, 2024

What do we think of 7.22 firmware?

Edit: sorry 7.22 appears to be 800 series only

@otterlo
Copy link

otterlo commented Aug 28, 2024

I read that 7.23 is released now for 700 series. I yet cant find the firmware file to download. If any one knows the URL i will be happy if you can share

@ember1205
Copy link
Author

I read that 7.23 is released now for 700 series. I yet cant find the firmware file to download. If any one knows the URL i will be happy if you can share

Released where? From SILabs? If so, that would be the SDK (Software Development Kit) that vendors use to build firmware as opposed to firmware itself. Each vendor will need to collect the new SDK, build their firmware from it, then release that themselves (and we would collect it from the various vendors whose products we have). That would apply for the USB controllers and the edge devices although the controller is the main one to start with.

@otterlo
Copy link

otterlo commented Aug 28, 2024

I think you are right. I did not know that.

I found the announcement and release notes here;

https://github.com/SiliconLabs/gecko_sdk/releases

@ember1205
Copy link
Author

Where are you seeing 7.23? I see 7.21.4 (8/14/24 release) in the first section for ZWave... am I missing something?

Additionally, the Release Notes for 7.21.4 indicate that the lockup bug is still not fixed.

@otterlo
Copy link

otterlo commented Aug 28, 2024

Sorry my fault. Indeed 7.21.4
From the notes on this page i understood that it was mostly solved but soft reset may be required from time to time..

Applogies if i confused the topic even further..
https://community.silabs.com/s/question/0D5Vm0000036y3PKAQ/controller-reports-being-jammed-during-usage-and-provokes-the-zwave-network-to-stall-for-several-seconds?language=en_US

@ember1205
Copy link
Author

The specific issue is called out in that thread as being the "lockup" issue. That one is still a "Known Issue" in 7.21.4 as per their release notes. There is essentially ZERO contribution from SILabs on that thread which indicates that they simply don't care. ZWave doesn't represent enough of their revenue that "fear of loss" is incenting them to take action and fix this.

@otterlo
Copy link

otterlo commented Aug 28, 2024

Ah. As i am having same issue for long long time i understood it was solved for 800 series in 7.22 and yet to come for 700. My zwave is more stabiel with correct and latest firmware but still jams few times a day

@ember1205
Copy link
Author

Mine still hits brick walls periodically... sometimes a couple of times per day, sometimes once every couple of days - no way to predict it. My controller is 700 series and I have a mix of 500 and 700 series based devices. I turned off every bit of communications from an end device that I absolutely do not need to minimize traffic and it has helped. But the complete lack of ANY movement on a fix has me looking for alternative options, including dropping my controller down to a 500 series device and dumping the devices that I have that are 700 series based. In fact, I'm now wondering if the random issues I have had with certain controllable plugs is actually due to them being based on the 700 series chips. My Aeotec SS6 devices never give me an issue but I see odd happenings with the SS7 devices where they will simply turn off and become unresponsive with seemingly no warning or reasoning.

@PeteRager
Copy link
Contributor

I run on a 500 series stick. I have 500, 700, 800 series devices. I have implemented a lot of device monitoring. It's been rock solid.

@austwhite
Copy link

I started having this same issue with a 700 series controller on HA Yelllow, so just commenting so I can keep updated on it. Nothing much more I can add as my log files are similar to ones pasted earlier by others.

@tony-park
Copy link

tony-park commented Oct 19, 2024 via email

@austwhite
Copy link

austwhite commented Oct 19, 2024

@tony-park
I'd try upgrading if I was in your position. I upgraded my RazBerry 7 (from zwave.me) and I have not had any issues since upgrading to the latest firmware with the newer SiLabs SDK. It's been rock solid and no issues at all. On the previous firmware, I was getting the "jammed" almost daily needing the ZwaveJS server to be restarted to fix it.
Only restarts I have done since the firmware upgrade are when I have updated Home Assistant. Z-Wave JS has been rock solid and although I had one jam, it self recovered within 5 minutes and did not require intervention from me. I only saw it when I checked the logs. Not had any others since.

Definitely worth a try if the new firmware updates to the new SDK as well.

Edit: I updated my firmware about 4 weeks ago, so it was a couple of weeks after I posted on this issue last.

Edit2: I haven't seen anything from Aeotec, but I am sure they will get to it eventually. I do admit, the slow updates from Aeotec were one reason I went to a different controller. I also like that the Razberry just neatly fits on the HA Yellow and looks like it is meant to be part of it.

@clowgg
Copy link

clowgg commented Oct 20, 2024

I have this SiLabs UZB-7 Z-Wave 700 Stick device which shows as Device 0x0000 0x0004-0x0004 in Z-Wave JS UI.
It currently has FW: v7.18.8.

Can someone please point me at the right location to download the 7.21.4 firmware for it?

Cheers,

@tony-park
Copy link

tony-park commented Oct 20, 2024 via email

@jgjestad
Copy link

Has anyone tried FW 7.21.5?
I'm on 7.21.4 and still have issues.

@tony-park
Copy link

tony-park commented Dec 17, 2024 via email

@ember1205
Copy link
Author

SILabs has changed the description of the issue in their release notes stating that it can be mitigated by the host. I'm wondering if they have basically just shelved any efforts at this point to work on it as it has been well over a year with zero progress.

@billdwhite
Copy link

SILabs has changed the description of the issue in their release notes stating that it can be mitigated by the host. I'm wondering if they have basically just shelved any efforts at this point to work on it as it has been well over a year with zero progress.

Yes, at this point, I'm thinking you're right.

@tony-park
Copy link

tony-park commented Dec 29, 2024 via email

@otterlo
Copy link

otterlo commented Dec 29, 2024

Hi

No. I had all such issues but since the upgrade it is very stable now in homeassistant.

Sorry for asking obvious question but did you powerdown the unit after the upgrade? I did complete power down then powered on just to make sure

@austwhite
Copy link

@tony-park
Mine still freezes up occasionally since I updated the firmware on my Zwave Me Razberry 7 to the latest.
Prior to the update it would freeze at least twice a day.

It is a known issue on SiLabs end

@tony-park
Copy link

tony-park commented Dec 30, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests