-
Notifications
You must be signed in to change notification settings - Fork 502
fix(sonos): Improved handling of bonded set membership changes. #2302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Invitation URL: |
Test Results 68 files 447 suites 0s ⏱️ Results for commit 99b6f2b. ♻️ This comment has been updated with latest results. |
Minimum allowed coverage is Generated by 🐒 cobertura-action against 99b6f2b |
e357b8f
to
55bd1a7
Compare
@@ -12,41 +14,48 @@ CapEventHandlers.PlaybackStatus = { | |||
Playing = "PLAYBACK_STATE_PLAYING", | |||
} | |||
|
|||
local function _do_emit(device, attribute_event) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason we collapsed everything down to this helper function is because certain activity payloads that come in on a single speaker will contain information about all Players in a group/set/zone/etc., and we will dispatch to all appropriate device records that can be associated with one of the payload entries.
Emitting information about that speaker if it's known to be part of a bonded set would lead to the device record being marked as Online in certain situations, which we don't want.
55bd1a7
to
c5ed33e
Compare
We have historically ignored non-primary devices in bonded sets during discovery/onboarding, going back to the pre-Edge days. Bonded sets are things like stereo pairs or home theater setups. These devices cannot be controlled via the WSS LAN API at all, and they aren't treated as being part of a Group in the Sonos group model; the intent is for API consumers to treat the entire bonded set as a single "player". Only one of the devices in the set exposes an API endpoint, which differs from a Group where all the players expose an endpoint. While groups service most commands via calling the API endpoint on the Group Coordinator, certain commands like volume can be sent to non-coordinator players. This is not the case with bonded sets; the entire set always acts atomically, and only the primary device in the set can service commands. Hence, we treat non-primary devices as not controllable in a bonded set, and we don't onboard them. However, we didn't really have very robust handling of devices that become part of a bonded set at runtime after onboarding. This led to unnecessary CPU and RAM usage by creating a few tight loops due to unexpected failure/edge cases when making certain calls. Because we would fail to discover this device under normal circumstances, the preferred way to handle this transition is to mark the device as being offline. We discussed emitting a delete for these devices, but it seems that it wouldn't be too uncommon for some users to create and destroy bonded sets ephemerally the way one might do with a Group. For this reason we decided we would avoid deleting the records. If a non-primary in a bonded set is removed from the bonded set, it will be marked as online again.
c5ed33e
to
99b6f2b
Compare
--- | ||
---@field public ssdp_task SonosPersistentSsdpTask? | ||
---@field private ssdp_event_thread_handle table? | ||
local SonosDriver = {} | ||
|
||
---@param device SonosDevice | ||
function SonosDriver:update_bonded_device_tracking(device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic in this function is very confusing to me, can you explain what is intended to be tracked here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent is to track when a device record's bonded status changes.
The device's current bonded status is tracked on the device itself via the BONDED
field, which we always update as soon as possible so that we can do stuff like short circuiting emits so that we don't mark an offline device as online.
The driver shadows this information in a map that is updated independently and with less urgency, so that it can be used to look up the previous state before updating itself.
This lets us determine if a device's bonded status changes. If a device becomes bonded, we mark it offline. If a device was bonded and is no longer bonded, we re-initialize it.
"Player %s is a non-primary bonded Sonos device, ignoring", | ||
info.discovery_info.device.name | ||
) | ||
-- if then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented code
@@ -358,19 +365,28 @@ function SonosConnection.new(driver, device) | |||
local household_id, current_coordinator = | |||
self.driver.sonos:get_coordinator_for_device(self.device) | |||
local _, player_id = self.driver.sonos:get_player_for_device(self.device) | |||
self.driver.sonos:update_household_info(header.householdId, body, self.device) | |||
self.driver.sonos:update_household_info(header.householdId, body, self.driver) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change seems odd, but looking at the function definition change, it looks like the device was just an extra arg before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this looks weird on the surface.
before the OAuth refactor, this method would conditionally take a device record as a flag that it needed to perform some updates. Which was a bad heuristic, so when I refactored it, I got rid of that heuristic, but forgot to remove it from the call sites; Lua being Lua, it didn't throw any errors or cause any problems.
With the refactor for the bonded sets, we changed it so that it always takes a driver record to be able to do some tracking of bonded/unbonded state transitions, hence the updates to the call sites.
self.driver.sonos:update_device_record_from_state(header.householdId, self.device) | ||
local _, updated_coordinator = self.driver.sonos:get_coordinator_for_device(self.device) | ||
|
||
local bonded = self.device:get_field(PlayerFields.BONDED) | ||
if bonded then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be combined with the if not bonded
below just to simplify the logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not easily; we want to conditionally call :stop()
, but then always call cleanup_unused_sockets
, and we want to stop
the bonded player entirely before calling that function.
The reason being is that cleanup_unused_sockets
executes the cleanup logic for when the set of known Coordinators change, so that we get rid of certain namespace subscriptions and connections if a player that was a coordinator is no longer a coordinator.
This cleanup logic is applicable on any Group membership update for the topology regardless of whether or not bonded set membership shifts.
After we do the cleanup, we then want to validate our own coordinator connection (if our coordinator changed) and subscriptions only in the case that we're not part of a bonded set.
@@ -300,7 +300,7 @@ function sonos_ssdp.spawn_persistent_ssdp_task() | |||
if info_to_send then | |||
if not (info_to_send.discovery_info and info_to_send.discovery_info.device) then | |||
log.error_with( | |||
{ hub_logs = true }, | |||
{ hub_logs = false }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we don't want to include this in hub logs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be chatty, and we've been asked by trim down the amount of logging that the LAN Edge Drivers forward to hub-core. It was set to true
during initial development of the refactor for debugging purposes but doesn't really need to be that way anymore. With the ability to set the broker.promoted_driver_logs
flag in LD to promote all logs for a driver, there isn't as much need for this in general unless it's on a path that is really pathological, IMO.
"Error handling Sonos player initialization: %s, error code: %s", | ||
error, | ||
(error_code or "N/A") | ||
{ hub_logs = false }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question about not including this in hub logs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be chatty as well, and based on logs that I've looked at from the fleet, it's almost always only hit by non-primary bonded players, so it's not providing useful information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Type of Change
Checklist
Description of Change
We have historically ignored non-primary devices in bonded sets during discovery/onboarding, going back to the pre-Edge days.
Bonded sets are things like stereo pairs or home theater setups.
These devices cannot be controlled via the WSS LAN API at all, and they aren't treated as being part of a Group in the Sonos group model; the intent is for API consumers to treat the entire bonded set as a single "player". Only one of the devices in the set exposes an API endpoint, which differs from a Group where all the players expose an endpoint. While groups service most commands via calling the API endpoint on the Group Coordinator, certain commands like volume can be sent to non-coordinator players. This is not the case with bonded sets; the entire set always acts atomically, and only the primary device in the set can service commands. Hence, we treat non-primary devices as not controllable in a bonded set, and we don't onboard them.
However, we didn't really have very robust handling of devices that become part of a bonded set at runtime after onboarding. This led to unnecessary CPU and RAM usage by creating a few tight loops due to unexpected failure/edge cases when making certain calls.
Because we would fail to discover this device under normal circumstances, the preferred way to handle this transition is to mark the device as being offline. We discussed emitting a delete for these devices, but it seems that it wouldn't be too uncommon for some users to create and destroy bonded sets ephemerally the way one might do with a Group. For this reason we decided we would avoid deleting the records.
If a non-primary in a bonded set is removed from the bonded set, it will be marked as online again.
Summary of Completed Tests
Tested on my personal setup by overriding the driver files directly. Will need to be regression tested by internal QA once this lands on Alpha; the OAuth stuff precludes this from being tested using the PR channel invite.