Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matter Server: All device offline all of a sudden #126136

Closed
3oris opened this issue Sep 17, 2024 · 30 comments
Closed

Matter Server: All device offline all of a sudden #126136

3oris opened this issue Sep 17, 2024 · 30 comments

Comments

@3oris
Copy link

3oris commented Sep 17, 2024

The problem

After about 5 days of operation all matter devices become unavailable. The devices are still online in the other (google home) fabric though.

The devices are still pingable from the device info page, and if I do so the specific device gets back online again.

This is not feasible though manually with over 90 matter devices in the system.

Matter devices

  • eve sensors and plugs (thread)
  • nanoleaf lights (thread)
  • onvis s4 plugs (thread)
  • innovation matters (wifi)
  • ledvance lights (wifi)
  • wiz lights (wifi)

Border routers

  • 1 OTBR hosted on RPI 3
  • 5 Nest Hubs 2nd gen (updated to F20)

What version of Home Assistant Core has the issue?

core-2024.9.1

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant OS

Integration causing the issue

Matter

Link to integration documentation on our website

No response

Diagnostics information

core_matter_server_2024-09-17T15-42-59.844Z.log
matter-c921cb8346a353e6865401775d822fe4-Essentials GU10-80fecbd596935ee1f84171a5c0aac88b.json

Example YAML snippet

No response

Anything in the logs that might be useful for us?

No response

Additional information

No response

@3oris
Copy link
Author

3oris commented Sep 17, 2024

Restarting the matter server fixes it (after some time).

@tornenen
Copy link

Same problem for me.
Using Home Assistant OS and core-2024.9.2

@deveylder
Copy link

Veryfy your network setting in homeassistant. Mine had changed to something completely different. Setting a static adress solved the isue

@tornenen
Copy link

Veryfy your network setting in homeassistant. Mine had changed to something completely different. Setting a static adress solved the isue

No, still the same for me.

@agners
Copy link
Member

agners commented Sep 18, 2024

@3oris (and others) when the device go unavailable, does reloading the integration helps? Settings -> Devices & services -> Matter -> Three dot menu -> Reload.

What Home Assistant OS and Matter Server add-on version are you using?

@home-assistant
Copy link

Hey there @home-assistant/matter, mind taking a look at this issue as it has been labeled with an integration (matter) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of matter can trigger bot actions by commenting:

  • @home-assistant close Closes the issue.
  • @home-assistant rename Awesome new title Renames the issue.
  • @home-assistant reopen Reopen the issue.
  • @home-assistant unassign matter Removes the current integration label and assignees on the issue, add the integration domain after the command.
  • @home-assistant add-label needs-more-information Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue.
  • @home-assistant remove-label needs-more-information Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


matter documentation
matter source
(message by IssueLinks)

@3oris
Copy link
Author

3oris commented Sep 19, 2024

@3oris (and others) when the device go unavailable, does reloading the integration helps? Settings -> Devices & services -> Matter -> Three dot menu -> Reload.

@agners : will check as soon as it happens again (probably tomorrow or Saturday). Restarting the Add-On does help to say the least.

What Home Assistant OS and Matter Server add-on version are you using?

  • HAOS: 13.1
  • Supervisor: 2024.09.1
  • Home Assistant Core: 2024.9.1 (last time it happened, now .2)
  • Matter Server: 6.5.1

@3oris
Copy link
Author

3oris commented Sep 19, 2024

@agners -- Also, I was wondering if it might be a regression in 6.5.1 home-assistant-libs/python-matter-server#882 , but you probably will know anyways.

@marcelveldt
Copy link
Member

marcelveldt commented Sep 19, 2024

@agners -- Also, I was wondering if it might be a regression in 6.5.1 home-assistant-libs/python-matter-server#882 , but you probably will know anyways.

You would have a SEVERE issue with mdns if that cleanup is causing your nodes now to be offline.

What is the state of the nodes within the Matter Server's own UI ?

@ThomasKoppensteiner
Copy link

Hello,
I think I have a similar issue with the 6.5.1 matter server.
I don't see any nodes in the Web UI.

Bildschirmfoto 2024-09-19 um 22 02 45 Bildschirmfoto 2024-09-19 um 22 02 29

@marcelveldt
Copy link
Member

Hello, I think I have a similar issue with the 6.5.1 matter server. I don't see any nodes in the Web UI.

Well, that is another issue. Maybe you (accidentally) reinstalled the whole Matter integration?
You need to restore a backup to get your nodes back as the data is stored in the matter addon data.

@3oris
Copy link
Author

3oris commented Sep 20, 2024

@3oris (and others) when the device go unavailable, does reloading the integration helps? Settings -> Devices & services -> Matter -> Three dot menu -> Reload.

@agners -- So, it happened again, I restarted the integration , devices came back very very slowly. And only a few minutes after they were all back, they all disappeared again and the matter server was one again in the state of #124647 which I hadn't seen since the upgrade to 6.5.0b2.

Before I restarted the Matter server I took the logs:
matter-server.log

@tornenen
Copy link

i guess my problem just flew away.. after 3 times i had this issue and restarting the matter server afterwards its now running since 2 days without problems.

@3oris
Copy link
Author

3oris commented Sep 20, 2024

@agners -- Also, I was wondering if it might be a regression in 6.5.1 home-assistant-libs/python-matter-server#882 , but you probably will know anyways.

You would have a SEVERE issue with mdns if that cleanup is causing your nodes now to be offline.

What is the state of the nodes within the Matter Server's own UI ?

@marcelveldt -- Will tell next time it happens.

@ThomasKoppensteiner
Copy link

You need to restore a backup to get your nodes back as the data is stored in the matter addon data.

@marcelveldt yes, I reinstalled the matter integration, but why does a reinstall not create a new node?
Isn't this an issue?

If so should I create a new github issue?

@ThomasKoppensteiner
Copy link

Resetting my HomeAssistant VM to a previous state fixed the problem for me.
Know I see the nodes again. Running version 6.4.1 now.

@marcelveldt
Copy link
Member

@marcelveldt yes, I reinstalled the matter integration, but why does a reinstall not create a new node? Isn't this an issue?

If you reinstall the Matter integration, all data gets reset. So you basically destroyed your Matter network by uninstalling Matter from HA.

@agners
Copy link
Member

agners commented Sep 23, 2024

Resetting my HomeAssistant VM to a previous state fixed the problem for me.
Know I see the nodes again. Running version 6.4.1 now.

If you do a regular update, the nodes should not get lost. Can you try updating the add-on (again)? Worst case you should be able to restore 6.4.1.

That said, while the outcome of your issue is similar to the original poster, I don't think you suffer the same problem: In your case the store on the Matter Server lost all devices. If this happens with the second update attempt again, can you open a separate issue for this? This would be some type of add-on update issue 🤔

@agners
Copy link
Member

agners commented Sep 23, 2024

@agners -- So, it happened again, I restarted the integration , devices came back very very slowly. And only a few minutes after they were all back, they all disappeared again and the matter server was one again in the state of #124647 which I hadn't seen since the upgrade to 6.5.0b2.

Hm, that sounds like your whole system is completely overwhelmed somehow. I guess the Matter Server doesnt' respond in time for the Core, so the Core gives up communicating. I wonder if the Matter Server gets itself in a state where things just go awry.

Some messages I haven't seen so far, that sounds as if the message got corrupted 🤔

�[32m2024-09-20 05:56:18.928�[0m (Dummy-2) �[1;30mCHIP_ERROR�[0m �[34m[chip.native.EM]�[0m �[31mDropping unexpected message of type 0x5 with protocolId (0, 1) and MessageCounter:141254017 on exchange 44431i with Node: <00000000000000E2, 1>�[0m

From what I can tell you run this on a Raspberry Pi 3? 🤔 Maybe this is just a bit too much for it to handle 😢

@marcelveldt
Copy link
Member

Resetting my HomeAssistant VM to a previous state fixed the problem for me.
Know I see the nodes again. Running version 6.4.1 now.

If you do a regular update, the nodes should not get lost. Can you try updating the add-on (again)? Worst case you should be able to restore 6.4.1.

That said, while the outcome of your issue is similar to the original poster, I don't think you suffer the same problem: In your case the store on the Matter Server lost all devices. If this happens with the second update attempt again, can you open a separate issue for this? This would be some type of add-on update issue 🤔

He removed the Matter integration (to reinstall) but that also removed the matter add-on with its configuration.
So that is what got his nodes lost. It reminds me that we should probably add a confirmation to HA when trying to remove Matter, Z-Wave or Zigbee that this may lead to loss of data without a backup.

@3oris
Copy link
Author

3oris commented Sep 25, 2024

@agners -- Also, I was wondering if it might be a regression in 6.5.1 home-assistant-libs/python-matter-server#882 , but you probably will know anyways.

You would have a SEVERE issue with mdns if that cleanup is causing your nodes now to be offline.
What is the state of the nodes within the Matter Server's own UI ?

@marcelveldt -- Will tell next time it happens.

@marcelveldt -- they just all show offline in the Matter server add-on UI

@3oris
Copy link
Author

3oris commented Sep 25, 2024

@agners -- So, it happened again, I restarted the integration , devices came back very very slowly. And only a few minutes after they were all back, they all disappeared again and the matter server was one again in the state of #124647 which I hadn't seen since the upgrade to 6.5.0b2.

Hm, that sounds like your whole system is completely overwhelmed somehow. I guess the Matter Server doesnt' respond in time for the Core, so the Core gives up communicating. I wonder if the Matter Server gets itself in a state where things just go awry.

Some messages I haven't seen so far, that sounds as if the message got corrupted 🤔

�[32m2024-09-20 05:56:18.928�[0m (Dummy-2) �[1;30mCHIP_ERROR�[0m �[34m[chip.native.EM]�[0m �[31mDropping unexpected message of type 0x5 with protocolId (0, 1) and MessageCounter:141254017 on exchange 44431i with Node: <00000000000000E2, 1>�[0m

From what I can tell you run this on a Raspberry Pi 3? 🤔 Maybe this is just a bit too much for it to handle 😢

@agners -- no, this is Home Assistant running on HA Green. What I run on RPi3 is the OTBR which I run isolated from HA and compile myself in order to have some observability into the thread network via cli like channel monitor, TREL connectivity, child node distribution, link quality and stuff. By this I was also able to chose a thread channel with literally no wifi interference (as far as I can tell). But also, there is no difference on the matter fabric if I take the OTBR or any of the nest hubs out of the thread network. (I cannot take two or more TBRs out of the network though, because then total coverage is to low and the thread network gets overloaded.)

The points I am trying to make here:

  • The matter server should be good in terms of resources (HA Green, with reasonable CPU usage of the matter server)
  • The thread network should also be good by and large.

@ThomasKoppensteiner
Copy link

If you do a regular update, the nodes should not get lost. Can you try updating the add-on (again)? Worst case you should be able to restore 6.4.1.

That said, while the outcome of your issue is similar to the original poster, I don't think you suffer the same problem: In your case the store on the Matter Server lost all devices. If this happens with the second update attempt again, can you open a separate issue for this? This would be some type of add-on update issue 🤔

Hey, I did another upgrade to 6.5.1 and this time it works as expected. The old nodes were visable right after the updated and were also available soon afterwards. Additionally I was able to add new matter devices as well (this was also not working before).

My issue is fixed. Thank you for the support.

@AndreasMouskos
Copy link

AndreasMouskos commented Sep 27, 2024

I have the same issue for my EVE matter decices (motion, door, energy) the exact time I updated my iPhone to iOS 18 and my homepod to latest version. Matter server is also 6.5.1, i have no pending updates on anything in HA and HA is also on latest version. My EVE devices work on EVE app and on Home app. I also cannot re-add them
For some reason it keeps failing.
Here are my logs:

2024-09-27 18:45:23.435 (MainThread) WARNING [matter_server.server.device_controller] <Node:2> Setup for node failed: Unable to establish CASE session with Node 2
2024-09-27 18:45:23.435 (MainThread) INFO [matter_server.server.device_controller] <Node:2> Retrying node setup in 60 seconds...
2024-09-27 18:45:27.963 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:217964488 on exchange 28264i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-09-27 18:45:34.630 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 1
2024-09-27 18:45:37.635 (MainThread) INFO [matter_server.server.sdk] <Node:3> Attempting to establish CASE session... (attempt 2 of 2)
2024-09-27 18:46:19.261 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:217964489 on exchange 28265i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-09-27 18:46:23.438 (MainThread) INFO [matter_server.server.device_controller] <Node:2> Setting-up node...
2024-09-27 18:46:26.609 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 1
2024-09-27 18:46:26.611 (MainThread) WARNING [matter_server.server.device_controller] <Node:3> Setup for node failed: Unable to establish CASE session with Node 3
2024-09-27 18:46:26.611 (MainThread) INFO [matter_server.server.device_controller] <Node:3> Retrying node setup in 60 seconds...
2024-09-27 18:47:04.691 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:217964490 on exchange 28266i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-09-27 18:47:12.163 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 1
2024-09-27 18:47:26.613 (MainThread) INFO [matter_server.server.device_controller] <Node:3> Setting-up node...
2024-09-27 18:47:54.767 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:217964491 on exchange 28267i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-09-27 18:48:00.684 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 1
2024-09-27 18:48:03.689 (MainThread) INFO [matter_server.server.sdk] <Node:2> Attempting to establish CASE session... (attempt 2 of 2)
2024-09-27 18:48:07.901 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:217964492 on exchange 28268i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-09-27 18:48:15.340 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 1
2024-09-27 18:48:45.746 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:217964493 on exchange 28269i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-09-27 18:48:52.418 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 1
2024-09-27 18:48:56.840 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:217964494 on exchange 28270i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-09-27 18:49:03.867 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 1
2024-09-27 18:49:06.872 (MainThread) INFO [matter_server.server.sdk] <Node:3> Attempting to establish CASE session... (attempt 2 of 2)
2024-09-27 18:49:32.548 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:217964495 on exchange 28271i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-09-27 18:49:40.942 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 1
2024-09-27 18:49:40.944 (MainThread) WARNING [matter_server.server.device_controller] <Node:2> Setup for node failed: Unable to establish CASE session with Node 2
2024-09-27 18:49:40.945 (MainThread) INFO [matter_server.server.device_controller] <Node:2> Retrying node setup in 60 seconds...
2024-09-27 18:49:47.108 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:217964496 on exchange 28272i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-09-27 18:49:55.714 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 1
2024-09-27 18:49:55.716 (MainThread) WARNING [matter_server.server.device_controller] <Node:3> Setup for node failed: Unable to establish CASE session with Node 3
2024-09-27 18:49:55.716 (MainThread) INFO [matter_server.server.device_controller] <Node:3> Retrying node setup in 60 seconds...
2024-09-27 18:50:40.947 (MainThread) INFO [matter_server.server.device_controller] <Node:2> Setting-up node...
2024-09-27 18:50:55.719 (MainThread) INFO [matter_server.server.device_controller] <Node:3> Setting-up node...
2024-09-27 18:51:21.878 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:217964497 on exchange 28273i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-09-27 18:51:29.677 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 1
2024-09-27 18:51:38.724 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:217964498 on exchange 28274i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-09-27 18:51:44.442 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 1
2024-09-27 18:52:12.310 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:217964499 on exchange 28275i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-09-27 18:52:18.203 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 1
2024-09-27 18:52:21.207 (MainThread) INFO [matter_server.server.sdk] <Node:2> Attempting to establish CASE session... (attempt 2 of 2)
2024-09-27 18:52:26.424 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:217964500 on exchange 28276i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-09-27 18:52:32.970 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 1
2024-09-27 18:52:35.976 (MainThread) INFO [matter_server.server.sdk] <Node:3> Attempting to establish CASE session... (attempt 2 of 2)
2024-09-27 18:53:01.039 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:217964501 on exchange 28277i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-09-27 18:53:09.931 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 1
2024-09-27 18:53:17.511 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:217964502 on exchange 28278i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-09-27 18:53:24.817 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 1
2024-09-27 18:53:24.819 (MainThread) WARNING [matter_server.server.device_controller] <Node:3> Setup for node failed: Unable to establish CASE session with Node 3
2024-09-27 18:53:24.820 (MainThread) WARNING [matter_server.server.device_controller] <Node:3> Node setup not completed after 30 minutes, giving up.
2024-09-27 18:53:50.917 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:217964503 on exchange 28279i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-09-27 18:53:58.447 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 1
2024-09-27 18:53:58.449 (MainThread) WARNING [matter_server.server.device_controller] <Node:2> Setup for node failed: Unable to establish CASE session with Node 2
2024-09-27 18:53:58.450 (MainThread) INFO [matter_server.server.device_controller] <Node:2> Retrying node setup in 60 seconds...
2024-09-27 18:54:58.457 (MainThread) INFO [matter_server.server.device_controller] <Node:2> Setting-up node...
2024-09-27 18:55:42.408 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:217964504 on exchange 28280i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-09-27 18:55:47.179 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 1
2024-09-27 18:56:30.409 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:217964505 on exchange 28281i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-09-27 18:56:35.708 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 1
2024-09-27 18:56:38.713 (MainThread) INFO [matter_server.server.sdk] <Node:2> Attempting to establish CASE session... (attempt 2 of 2)
2024-09-27 18:57:22.536 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:217964506 on exchange 28282i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-09-27 18:57:27.442 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 1
2024-09-27 18:58:09.472 (Dummy-2) CHIP_ERROR [chip.native.EM] Failed to Send CHIP MessageCounter:217964507 on exchange 28283i with Node: <0000000000000000, 0> sendCount: 4 max retries: 4
2024-09-27 18:58:15.960 (Dummy-2) CHIP_ERROR [chip.native.SC] CASESession timed out while waiting for a response from the peer. Current state was 1
2024-09-27 18:58:15.962 (MainThread) WARNING [matter_server.server.device_controller] <Node:2> Setup for node failed: Unable to establish CASE session with Node 2
2024-09-27 18:58:15.962 (MainThread) WARNING [matter_server.server.device_controller] <Node:2> Node setup not completed after 30 minutes, giving up.
s6-rc: info: service legacy-services: stopping
s6-rc: info: service legacy-services successfully stopped
s6-rc: info: service legacy-cont-init: stopping
s6-rc: info: service matter-server: stopping
2024-09-27 19:15:30.842 (MainThread) WARNING [aiorun] Stopping the loop
2024-09-27 19:15:30.842 (MainThread) INFO [aiorun] Entering shutdown phase.
2024-09-27 19:15:30.842 (MainThread) INFO [aiorun] Executing provided shutdown_callback.
2024-09-27 19:15:30.842 (MainThread) INFO [matter_server.server.server] Stopping the Matter Server...
2024-09-27 19:15:30.843 (MainThread) INFO [matter_server.server.client_handler] [139977044284496] Connection closed by client
s6-rc: info: service legacy-cont-init successfully stopped
s6-rc: info: service fix-attrs: stopping
s6-rc: info: service fix-attrs successfully stopped
2024-09-27 19:15:30.848 (MainThread) INFO [matter_server.server.stack] Shutting down the Matter stack...
2024-09-27 19:15:30.848 (MainThread) CHIP_ERROR [chip.native.CTL] Shutting down the stack...
2024-09-27 19:15:30.850 (MainThread) CHIP_ERROR [chip.native.DIS] Failed to advertise records: src/inet/UDPEndPointImplSockets.cpp:416: OS Error 0x02000065: Network is unreachable
2024-09-27 19:15:30.853 (MainThread) CHIP_ERROR [chip.native.DIS] Failed to advertise records: src/lib/dnssd/minimal_mdns/Server.cpp:344: CHIP Error 0x00000046: No endpoint was available to send the message
2024-09-27 19:15:30.854 (MainThread) CHIP_ERROR [chip.native.DL] Inet Layer shutdown
2024-09-27 19:15:30.854 (MainThread) CHIP_ERROR [chip.native.DL] BLE shutdown
2024-09-27 19:15:30.854 (MainThread) CHIP_ERROR [chip.native.DL] System Layer shutdown
2024-09-27 19:15:30.855 (MainThread) INFO [aiorun] Waiting for executor shutdown.
2024-09-27 19:15:30.855 (MainThread) INFO [aiorun] Shutting down async generators
2024-09-27 19:15:30.855 (MainThread) INFO [aiorun] Closing the loop.
2024-09-27 19:15:30.855 (MainThread) INFO [aiorun] Leaving. Bye!
[16:15:31] INFO: matter-server service exited with code 0 (by signal 0).
s6-rc: info: service matter-server successfully stopped
s6-rc: info: service banner: stopping
s6-rc: info: service banner successfully stopped
s6-rc: info: service s6rc-oneshot-runner: stopping
s6-rc: info: service s6rc-oneshot-runner successfully stopped
s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service banner: starting
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
s6-rc: info: service legacy-cont-init successfully started 

@jvmahon
Copy link
Contributor

jvmahon commented Sep 30, 2024

I've had this happen, but I concluded that the issue wasn't HA, it (or at least that the issue also involved other equipement). I found that to bring devices back online, I needed to reboot my Google Wifi Pro 6e WiFi routers (which also include my OTBRs).

Also, I have both Nest OTBRs and 3 Apple TV OTBRs and have found that if I leave the Nest enabled (and unplug the Apple TVs), all seems OK and stable, but if I add more than 1 Apple OTBR, it can cause instability.

I'm thinking there may be something going on when you have a mix of OTBRs from different vendors, in my case, particularly seems to happen when Apple OTBRs and Nest OTBRs try to join into a single thread network. But as long as Apple / Google Nest maintain separate thread networks, its more stable. None of this really makes much sense, but it points to issues that may be beyond HA. Also, entire setup destabilizes if I use Matter 1.0 devices (hello Eve!).

@AndreasMouskos
Copy link

I've had this happen, but I concluded that the issue wasn't HA, it (or at least that the issue also involved other equipement). I found that to bring devices back online, I needed to reboot my Google Wifi Pro 6e WiFi routers (which also include my OTBRs).

Also, I have both Nest OTBRs and 3 Apple TV OTBRs and have found that if I leave the Nest enabled (and unplug the Apple TVs), all seems OK and stable, but if I add more than 1 Apple OTBR, it can cause instability.

I'm thinking there may be something going on when you have a mix of OTBRs from different vendors, in my case, particularly seems to happen when Apple OTBRs and Nest OTBRs try to join into a single thread network. But as long as Apple / Google Nest maintain separate thread networks, its more stable. None of this really makes much sense, but it points to issues that may be beyond HA. Also, entire setup destabilizes if I use Matter 1.0 devices (hello Eve!).

Maybe your case is different because as I mentioned everything was fine for 1 year until I upgraded to homepod OS 18 and iOS18. The devices work on all my other apps except home assistant. I am also not able to re-add them anymore it keeps failing.

@3oris
Copy link
Author

3oris commented Oct 2, 2024

@agners @marcelveldt -- an update:

I have been running on 6.5.2b0 with way less trouble over the last 1.5 weeks. I also see there is a 6.5.2 release but I don't seem to receive it.

Anyhow, with 6.5.2b0 always only a few devices go offline in the HA fabric while still being pingable from the device info page. So, not all devices any more. These devices are then also reported as unavailable on the Matter Add-On UI, and as before I can ping them back online into the HA matter fabric.

Also, but this is guessing now, those devices that go offline in chunks seem to be connected to the same TBR (Nest Hub G2 F20) at that time which also is a bit contradictory to the fact that they are pingable. continuing to keep an eye...

@marcelveldt
Copy link
Member

Let's try to prevent duplicate issues. We're tracking the availability issue in this report:
#123835

In general, using multiple Border routers is simply broken atm.
Using just one and it will be stable. A lot of pingpong is going on if this is a apple issue, general Thread issue, TREL issue or combination. In any case, the issue is not unique to HA.

@3oris
Copy link
Author

3oris commented Oct 30, 2024

This is not an Apple issue, it's all Nest Hubs and one OTBR (on a dedicated RPi3b).

Lowering the amount of BRs is not an option, since the node count is already 135 and one Nest Hub BR is only able to handle about 20 nodes max (be it due to hardware capacity or thread channel congestion). So 7 BRs seems to be a reasonable amount of BRs.

With the recent update to Fuchsia 20.1 things really started to become more stable. I cannot tell why though. Also, TREL seems to work actually well with Nest Hubs. E.g. it makes a huge difference if I disable TREL in the OTBR.

So, in general, I would not follow your statement that using multiple BRs is broken.

But I am also fine with closing this issue here since I feel that the same issue is now popping up every other week, and I see that you guys are actually on the topic.

@Puller
Copy link

Puller commented Nov 12, 2024

I had the same issue! I restored my Homeassistant VM with the backup from before the last update, and all Eve / Matter devices are back! So there must be something wrong with the latest update!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants