fix(sonos): Improved handling of bonded set membership changes. #2302

dljsjr · 2025-07-29T22:48:38Z

Type of Change

Bug fix

Checklist

I have performed a self-review of my code
I have commented my code in hard-to-understand areas
I have verified my changes by testing with a device or have communicated a plan for testing

Description of Change

We have historically ignored non-primary devices in bonded sets during discovery/onboarding, going back to the pre-Edge days.

Bonded sets are things like stereo pairs or home theater setups.

These devices cannot be controlled via the WSS LAN API at all, and they aren't treated as being part of a Group in the Sonos group model; the intent is for API consumers to treat the entire bonded set as a single "player". Only one of the devices in the set exposes an API endpoint, which differs from a Group where all the players expose an endpoint. While groups service most commands via calling the API endpoint on the Group Coordinator, certain commands like volume can be sent to non-coordinator players. This is not the case with bonded sets; the entire set always acts atomically, and only the primary device in the set can service commands. Hence, we treat non-primary devices as not controllable in a bonded set, and we don't onboard them.

However, we didn't really have very robust handling of devices that become part of a bonded set at runtime after onboarding. This led to unnecessary CPU and RAM usage by creating a few tight loops due to unexpected failure/edge cases when making certain calls.

Because we would fail to discover this device under normal circumstances, the preferred way to handle this transition is to mark the device as being offline. We discussed emitting a delete for these devices, but it seems that it wouldn't be too uncommon for some users to create and destroy bonded sets ephemerally the way one might do with a Group. For this reason we decided we would avoid deleting the records.

If a non-primary in a bonded set is removed from the bonded set, it will be marked as online again.

Summary of Completed Tests

Tested on my personal setup by overriding the driver files directly. Will need to be regression tested by internal QA once this lands on Alpha; the OAuth stuff precludes this from being tested using the PR channel invite.

github-actions · 2025-07-29T22:49:50Z

Invitation URL:
https://bestow-regional.api.smartthings.com/invite/oDM8QqRyYY2L

github-actions · 2025-07-29T22:50:08Z

Test Results

68 files 447 suites 0s ⏱️
2 324 tests 2 324 ✅ 0 💤 0 ❌
3 923 runs 3 923 ✅ 0 💤 0 ❌

Results for commit 99b6f2b.

♻️ This comment has been updated with latest results.

github-actions · 2025-07-29T22:50:09Z

Minimum allowed coverage is 90%

Generated by 🐒 cobertura-action against 99b6f2b

dljsjr · 2025-07-30T14:48:12Z

drivers/SmartThings/sonos/src/api/event_handlers.lua

@@ -12,41 +14,48 @@ CapEventHandlers.PlaybackStatus = {
  Playing = "PLAYBACK_STATE_PLAYING",
 }

+local function _do_emit(device, attribute_event)


The reason we collapsed everything down to this helper function is because certain activity payloads that come in on a single speaker will contain information about all Players in a group/set/zone/etc., and we will dispatch to all appropriate device records that can be associated with one of the payload entries.

Emitting information about that speaker if it's known to be part of a bonded set would lead to the device record being marked as Online in certain situations, which we don't want.

We have historically ignored non-primary devices in bonded sets during discovery/onboarding, going back to the pre-Edge days. Bonded sets are things like stereo pairs or home theater setups. These devices cannot be controlled via the WSS LAN API at all, and they aren't treated as being part of a Group in the Sonos group model; the intent is for API consumers to treat the entire bonded set as a single "player". Only one of the devices in the set exposes an API endpoint, which differs from a Group where all the players expose an endpoint. While groups service most commands via calling the API endpoint on the Group Coordinator, certain commands like volume can be sent to non-coordinator players. This is not the case with bonded sets; the entire set always acts atomically, and only the primary device in the set can service commands. Hence, we treat non-primary devices as not controllable in a bonded set, and we don't onboard them. However, we didn't really have very robust handling of devices that become part of a bonded set at runtime after onboarding. This led to unnecessary CPU and RAM usage by creating a few tight loops due to unexpected failure/edge cases when making certain calls. Because we would fail to discover this device under normal circumstances, the preferred way to handle this transition is to mark the device as being offline. We discussed emitting a delete for these devices, but it seems that it wouldn't be too uncommon for some users to create and destroy bonded sets ephemerally the way one might do with a Group. For this reason we decided we would avoid deleting the records. If a non-primary in a bonded set is removed from the bonded set, it will be marked as online again.

varzac · 2025-07-30T16:07:57Z

drivers/SmartThings/sonos/src/sonos_driver.lua

 ---
 ---@field public ssdp_task SonosPersistentSsdpTask?
 ---@field private ssdp_event_thread_handle table?
 local SonosDriver = {}

+---@param device SonosDevice
+function SonosDriver:update_bonded_device_tracking(device)


The logic in this function is very confusing to me, can you explain what is intended to be tracked here?

The intent is to track when a device record's bonded status changes.

The device's current bonded status is tracked on the device itself via the BONDED field, which we always update as soon as possible so that we can do stuff like short circuiting emits so that we don't mark an offline device as online.

The driver shadows this information in a map that is updated independently and with less urgency, so that it can be used to look up the previous state before updating itself.

This lets us determine if a device's bonded status changes. If a device becomes bonded, we mark it offline. If a device was bonded and is no longer bonded, we re-initialize it.

varzac · 2025-07-30T16:08:21Z

drivers/SmartThings/sonos/src/sonos_driver.lua

-        "Player %s is a non-primary bonded Sonos device, ignoring",
-        info.discovery_info.device.name
-      )
+  -- if  then


Commented code

varzac · 2025-07-30T16:13:58Z

drivers/SmartThings/sonos/src/api/sonos_connection.lua

@@ -358,19 +365,28 @@ function SonosConnection.new(driver, device)
        local household_id, current_coordinator =
          self.driver.sonos:get_coordinator_for_device(self.device)
        local _, player_id = self.driver.sonos:get_player_for_device(self.device)
-        self.driver.sonos:update_household_info(header.householdId, body, self.device)
+        self.driver.sonos:update_household_info(header.householdId, body, self.driver)


this change seems odd, but looking at the function definition change, it looks like the device was just an extra arg before?

Yeah, this looks weird on the surface.

before the OAuth refactor, this method would conditionally take a device record as a flag that it needed to perform some updates. Which was a bad heuristic, so when I refactored it, I got rid of that heuristic, but forgot to remove it from the call sites; Lua being Lua, it didn't throw any errors or cause any problems.

With the refactor for the bonded sets, we changed it so that it always takes a driver record to be able to do some tracking of bonded/unbonded state transitions, hence the updates to the call sites.

tpmanley · 2025-07-30T16:16:11Z

drivers/SmartThings/sonos/src/api/sonos_connection.lua

        self.driver.sonos:update_device_record_from_state(header.householdId, self.device)
        local _, updated_coordinator = self.driver.sonos:get_coordinator_for_device(self.device)

+        local bonded = self.device:get_field(PlayerFields.BONDED)
+        if bonded then


Could this be combined with the if not bonded below just to simplify the logic?

Not easily; we want to conditionally call :stop(), but then always call cleanup_unused_sockets, and we want to stop the bonded player entirely before calling that function.

The reason being is that cleanup_unused_sockets executes the cleanup logic for when the set of known Coordinators change, so that we get rid of certain namespace subscriptions and connections if a player that was a coordinator is no longer a coordinator.

This cleanup logic is applicable on any Group membership update for the topology regardless of whether or not bonded set membership shifts.

After we do the cleanup, we then want to validate our own coordinator connection (if our coordinator changed) and subscriptions only in the case that we're not part of a bonded set.

tpmanley · 2025-07-30T16:16:54Z

drivers/SmartThings/sonos/src/api/sonos_ssdp_discovery.lua

@@ -300,7 +300,7 @@ function sonos_ssdp.spawn_persistent_ssdp_task()
    if info_to_send then
      if not (info_to_send.discovery_info and info_to_send.discovery_info.device) then
        log.error_with(
-          { hub_logs = true },
+          { hub_logs = false },


Is there a reason we don't want to include this in hub logs?

It can be chatty, and we've been asked by trim down the amount of logging that the LAN Edge Drivers forward to hub-core. It was set to true during initial development of the refactor for debugging purposes but doesn't really need to be that way anymore. With the ability to set the broker.promoted_driver_logs flag in LD to promote all logs for a driver, there isn't as much need for this in general unless it's on a path that is really pathological, IMO.

tpmanley · 2025-07-30T16:17:18Z

drivers/SmartThings/sonos/src/lifecycle_handlers.lua

-                  "Error handling Sonos player initialization: %s, error code: %s",
-                  error,
-                  (error_code or "N/A")
+                  { hub_logs = false },


Same question about not including this in hub logs

This can be chatty as well, and based on logs that I've looked at from the fleet, it's almost always only hit by non-primary bonded players, so it's not providing useful information.

cjswedes

LGTM

dljsjr requested review from tpmanley, varzac, cjswedes and NoahCornell July 29, 2025 22:48

dljsjr force-pushed the fix/sonos-bonded-set-handling branch from e357b8f to 55bd1a7 Compare July 29, 2025 22:50

dljsjr mentioned this pull request Jul 30, 2025

chore: Improve Sonos Memory Usage #2306

Draft

7 tasks

dljsjr commented Jul 30, 2025

View reviewed changes

dljsjr force-pushed the fix/sonos-bonded-set-handling branch from 55bd1a7 to c5ed33e Compare July 30, 2025 15:00

dljsjr force-pushed the fix/sonos-bonded-set-handling branch from c5ed33e to 99b6f2b Compare July 30, 2025 15:18

varzac reviewed Jul 30, 2025

View reviewed changes

tpmanley reviewed Jul 30, 2025

View reviewed changes

cjswedes reviewed Jul 31, 2025

View reviewed changes

cjswedes approved these changes Aug 1, 2025

View reviewed changes

fix(sonos): Improved handling of bonded set membership changes. #2302

Are you sure you want to change the base?

fix(sonos): Improved handling of bonded set membership changes. #2302

Uh oh!

Conversation

dljsjr commented Jul 29, 2025

Type of Change

Checklist

Description of Change

Summary of Completed Tests

Uh oh!

github-actions bot commented Jul 29, 2025

Uh oh!

github-actions bot commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results

Uh oh!

github-actions bot commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dljsjr Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dljsjr Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dljsjr Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cjswedes left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Jul 29, 2025 •

edited

Loading

github-actions bot commented Jul 29, 2025 •

edited

Loading

dljsjr Jul 30, 2025 •

edited

Loading

dljsjr Aug 1, 2025 •

edited

Loading

dljsjr Aug 1, 2025 •

edited

Loading