Skip to content

fix(sonos): Improved handling of bonded set membership changes. #2302

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dljsjr
Copy link
Member

@dljsjr dljsjr commented Jul 29, 2025

Type of Change

  • Bug fix

Checklist

  • I have performed a self-review of my code
  • I have commented my code in hard-to-understand areas
  • I have verified my changes by testing with a device or have communicated a plan for testing

Description of Change

We have historically ignored non-primary devices in bonded sets during discovery/onboarding, going back to the pre-Edge days.

Bonded sets are things like stereo pairs or home theater setups.

These devices cannot be controlled via the WSS LAN API at all, and they aren't treated as being part of a Group in the Sonos group model; the intent is for API consumers to treat the entire bonded set as a single "player". Only one of the devices in the set exposes an API endpoint, which differs from a Group where all the players expose an endpoint. While groups service most commands via calling the API endpoint on the Group Coordinator, certain commands like volume can be sent to non-coordinator players. This is not the case with bonded sets; the entire set always acts atomically, and only the primary device in the set can service commands. Hence, we treat non-primary devices as not controllable in a bonded set, and we don't onboard them.

However, we didn't really have very robust handling of devices that become part of a bonded set at runtime after onboarding. This led to unnecessary CPU and RAM usage by creating a few tight loops due to unexpected failure/edge cases when making certain calls.

Because we would fail to discover this device under normal circumstances, the preferred way to handle this transition is to mark the device as being offline. We discussed emitting a delete for these devices, but it seems that it wouldn't be too uncommon for some users to create and destroy bonded sets ephemerally the way one might do with a Group. For this reason we decided we would avoid deleting the records.

If a non-primary in a bonded set is removed from the bonded set, it will be marked as online again.

Summary of Completed Tests

Tested on my personal setup by overriding the driver files directly. Will need to be regression tested by internal QA once this lands on Alpha; the OAuth stuff precludes this from being tested using the PR channel invite.

Copy link

Copy link

github-actions bot commented Jul 29, 2025

Test Results

   68 files    447 suites   0s ⏱️
2 324 tests 2 324 ✅ 0 💤 0 ❌
3 923 runs  3 923 ✅ 0 💤 0 ❌

Results for commit 99b6f2b.

♻️ This comment has been updated with latest results.

Copy link

github-actions bot commented Jul 29, 2025

Minimum allowed coverage is 90%

Generated by 🐒 cobertura-action against 99b6f2b

@dljsjr dljsjr force-pushed the fix/sonos-bonded-set-handling branch from e357b8f to 55bd1a7 Compare July 29, 2025 22:50
@dljsjr dljsjr mentioned this pull request Jul 30, 2025
7 tasks
@@ -12,41 +14,48 @@ CapEventHandlers.PlaybackStatus = {
Playing = "PLAYBACK_STATE_PLAYING",
}

local function _do_emit(device, attribute_event)
Copy link
Member Author

@dljsjr dljsjr Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason we collapsed everything down to this helper function is because certain activity payloads that come in on a single speaker will contain information about all Players in a group/set/zone/etc., and we will dispatch to all appropriate device records that can be associated with one of the payload entries.

Emitting information about that speaker if it's known to be part of a bonded set would lead to the device record being marked as Online in certain situations, which we don't want.

@dljsjr dljsjr force-pushed the fix/sonos-bonded-set-handling branch from 55bd1a7 to c5ed33e Compare July 30, 2025 15:00
We have historically ignored non-primary devices in bonded sets during
discovery/onboarding, going back to the pre-Edge days. 

Bonded sets are things like stereo pairs or home theater setups.

These devices cannot be controlled via the WSS LAN API at all, and they
aren't treated as being part of a Group in the Sonos group model; the
intent is for API consumers to treat the entire bonded set as a single
"player". Only one of the devices in the set exposes an API endpoint,
which differs from a Group where all the players expose an endpoint.
While groups service most commands via calling the API endpoint on the
Group Coordinator, certain commands like volume can be sent to
non-coordinator players. This is not the case with bonded sets; the
entire set always acts atomically, and only the primary device in the
set can service commands. Hence, we treat non-primary devices as not
controllable in a bonded set, and we don't onboard them.

However, we didn't really have very robust handling of devices that
become part of a bonded set at runtime after onboarding. This led to
unnecessary CPU and RAM usage by creating a few tight loops due to
unexpected failure/edge cases when making certain calls.

Because we would fail to discover this device under normal
circumstances, the preferred way to handle this transition is to mark
the device as being offline. We discussed emitting a delete for these
devices, but it seems that it wouldn't be too uncommon for some users to
create and destroy bonded sets ephemerally the way one might do with a
Group. For this reason we decided we would avoid deleting the records.

If a non-primary in a bonded set is removed from the bonded set, it will
be marked as online again.
@dljsjr dljsjr force-pushed the fix/sonos-bonded-set-handling branch from c5ed33e to 99b6f2b Compare July 30, 2025 15:18
---
---@field public ssdp_task SonosPersistentSsdpTask?
---@field private ssdp_event_thread_handle table?
local SonosDriver = {}

---@param device SonosDevice
function SonosDriver:update_bonded_device_tracking(device)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic in this function is very confusing to me, can you explain what is intended to be tracked here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent is to track when a device record's bonded status changes.

The device's current bonded status is tracked on the device itself via the BONDED field, which we always update as soon as possible so that we can do stuff like short circuiting emits so that we don't mark an offline device as online.

The driver shadows this information in a map that is updated independently and with less urgency, so that it can be used to look up the previous state before updating itself.

This lets us determine if a device's bonded status changes. If a device becomes bonded, we mark it offline. If a device was bonded and is no longer bonded, we re-initialize it.

"Player %s is a non-primary bonded Sonos device, ignoring",
info.discovery_info.device.name
)
-- if then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented code

@@ -358,19 +365,28 @@ function SonosConnection.new(driver, device)
local household_id, current_coordinator =
self.driver.sonos:get_coordinator_for_device(self.device)
local _, player_id = self.driver.sonos:get_player_for_device(self.device)
self.driver.sonos:update_household_info(header.householdId, body, self.device)
self.driver.sonos:update_household_info(header.householdId, body, self.driver)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change seems odd, but looking at the function definition change, it looks like the device was just an extra arg before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this looks weird on the surface.

before the OAuth refactor, this method would conditionally take a device record as a flag that it needed to perform some updates. Which was a bad heuristic, so when I refactored it, I got rid of that heuristic, but forgot to remove it from the call sites; Lua being Lua, it didn't throw any errors or cause any problems.

With the refactor for the bonded sets, we changed it so that it always takes a driver record to be able to do some tracking of bonded/unbonded state transitions, hence the updates to the call sites.

self.driver.sonos:update_device_record_from_state(header.householdId, self.device)
local _, updated_coordinator = self.driver.sonos:get_coordinator_for_device(self.device)

local bonded = self.device:get_field(PlayerFields.BONDED)
if bonded then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be combined with the if not bonded below just to simplify the logic?

Copy link
Member Author

@dljsjr dljsjr Aug 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not easily; we want to conditionally call :stop(), but then always call cleanup_unused_sockets, and we want to stop the bonded player entirely before calling that function.

The reason being is that cleanup_unused_sockets executes the cleanup logic for when the set of known Coordinators change, so that we get rid of certain namespace subscriptions and connections if a player that was a coordinator is no longer a coordinator.

This cleanup logic is applicable on any Group membership update for the topology regardless of whether or not bonded set membership shifts.

After we do the cleanup, we then want to validate our own coordinator connection (if our coordinator changed) and subscriptions only in the case that we're not part of a bonded set.

@@ -300,7 +300,7 @@ function sonos_ssdp.spawn_persistent_ssdp_task()
if info_to_send then
if not (info_to_send.discovery_info and info_to_send.discovery_info.device) then
log.error_with(
{ hub_logs = true },
{ hub_logs = false },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we don't want to include this in hub logs?

Copy link
Member Author

@dljsjr dljsjr Aug 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be chatty, and we've been asked by trim down the amount of logging that the LAN Edge Drivers forward to hub-core. It was set to true during initial development of the refactor for debugging purposes but doesn't really need to be that way anymore. With the ability to set the broker.promoted_driver_logs flag in LD to promote all logs for a driver, there isn't as much need for this in general unless it's on a path that is really pathological, IMO.

"Error handling Sonos player initialization: %s, error code: %s",
error,
(error_code or "N/A")
{ hub_logs = false },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question about not including this in hub logs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be chatty as well, and based on logs that I've looked at from the fleet, it's almost always only hit by non-primary bonded players, so it's not providing useful information.

Copy link
Contributor

@cjswedes cjswedes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants