Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backends/winrt: fix max pdu update race #1552

Merged
merged 1 commit into from
Apr 29, 2024
Merged

backends/winrt: fix max pdu update race #1552

merged 1 commit into from
Apr 29, 2024

Conversation

dlech
Copy link
Collaborator

@dlech dlech commented Apr 29, 2024

Users have reported that the max pdu size was wrong on some devices. This was happening because the max_pdu_size_changed event was happeing after the get_services() call returned.

This adds an event to wait for the max_pdu_size_changed event to happen and updates the characteristics in the service dictionary after the fact to ensure they have the correct value.

The service_explorer example is also updated to show the value to help with future troubleshooting.

Fixes: #1497

Users have reported that the max pdu size was wrong on some devices.
This was happening because the max_pdu_size_changed event was happeing
after the get_services() call returned.

This adds an event to wait for the max_pdu_size_changed event to happen
and updates the characteristics in the service dictionary after the
fact to ensure they have the correct value.

The service_explorer example is also updated to show the value to help
with future troubleshooting.

Fixes: #1497
@dlech dlech merged commit cfe5a15 into develop Apr 29, 2024
14 checks passed
@dlech dlech deleted the winrt-fix-pdu-race branch April 29, 2024 01:56
@JPHutchins
Copy link
Contributor

Confirmed that this fixed the issue, thanks!

@JPHutchins
Copy link
Contributor

I may have spoken too soon but need to investigate further to understand.

@JPHutchins
Copy link
Contributor

I think that I made need to await asyncio.sleep(1) to benefit from this consistently?

@dlech
Copy link
Collaborator Author

dlech commented May 11, 2024

Do you see "max_pdu_size_changed event not received, using default" if you enable debug logging?

@JPHutchins
Copy link
Contributor

Do you see "max_pdu_size_changed event not received, using default" if you enable debug logging?

Thank you, I will look next time I am running!

@JPHutchins
Copy link
Contributor

JPHutchins commented May 11, 2024

Do you see "max_pdu_size_changed event not received, using default" if you enable debug logging?

Yes, I get the log in question. I see that the correct value does arrive!

DEBUG:smpclient.transport.ble:Found device: device=BLEDevice(F2:0D:69:5C:11:6F, A SMP DUT), connecting...
DEBUG:bleak.backends.winrt.client:Connecting to BLE device @ F2:0D:69:5C:11:6F
DEBUG:bleak.backends.winrt.client:getting services (service_cache_mode=None, cache_mode=None)...
DEBUG:bleak.backends.winrt.client:max_pdu_size_changed event not received, using default
DEBUG:bleak.backends.winrt.client:session_status_changed_event_handler: id: BluetoothLE#BluetoothLEe0:d4:64:58:c4:91-f2:0d:69:5c:11:6f, error: <BluetoothError.SUCCESS: 0>, status: <GattSessionStatus.ACTIVE: 1>
DEBUG:smpclient.transport.ble:Connected to device=BLEDevice(F2:0D:69:5C:11:6F, A SMP DUT)
DEBUG:bleak.backends.winrt.client:max_pdu_size_changed_handler: 498
self._smp_characteristic.max_write_without_response_size=20

I suppose that increasing the timeout here might resolve the issue but I understand the concern with it already being 1 second:

# REVISIT: Devices that don't support > default PDU size
# may be punished by this timeout with a slow connection
# time. We may want to consider an option to ignore this
# timeout for such devices.
async with async_timeout(1):
await pdu_size_event.wait()

It appears that the max_pdu_size_changed_handler does not mutate any state of the client or characteristic.

@dlech
Copy link
Collaborator Author

dlech commented May 11, 2024

Can you add timestamps to the debug logs so we can see how long it actually takes before the size changed event comes?

@JPHutchins
Copy link
Contributor

Can you add timestamps to the debug logs so we can see how long it actually takes before the size changed event comes?

Good idea, here's an example:

2024-05-11 23:16:56 DEBUG    Connecting to BLE device @ C4:35:E3:1E:09:DB
2024-05-11 23:16:56 DEBUG    getting services (service_cache_mode=None, cache_mode=None)...
2024-05-11 23:16:57 DEBUG    max_pdu_size_changed event not received, using default
2024-05-11 23:16:57 DEBUG    session_status_changed_event_handler: id: BluetoothLE#BluetoothLEe0:d4:64:58:c4:91-c4:35:e3:1e:09:db, error: <BluetoothError.SUCCESS: 0>, status: <GattSessionStatus.ACTIVE: 1>
2024-05-11 23:16:57 DEBUG    Connected to device=BLEDevice(C4:35:E3:1E:09:DB, A SMP DUT)
2024-05-11 23:16:58 DEBUG    max_pdu_size_changed_handler: 498
self._smp_characteristic.max_write_without_response_size=20

Second run with milliseconds, this time it got the value on time:

2024-05-11 23:25:08.352 DEBUG    Connecting to BLE device @ C4:35:E3:1E:09:DB
2024-05-11 23:25:08.444 DEBUG    getting services (service_cache_mode=None, cache_mode=None)...
2024-05-11 23:25:09.173 DEBUG    session_status_changed_event_handler: id: BluetoothLE#BluetoothLEe0:d4:64:58:c4:91-c4:35:e3:1e:09:db, error: <BluetoothError.SUCCESS: 0>, status: <GattSessionStatus.ACTIVE: 1>
2024-05-11 23:25:09.298 DEBUG    max_pdu_size_changed_handler: 498
2024-05-11 23:25:09.482 DEBUG    Connected to device=BLEDevice(C4:35:E3:1E:09:DB, A SMP DUT)
self._smp_characteristic.max_write_without_response_size=495

@dlech
Copy link
Collaborator Author

dlech commented May 12, 2024

Interesting, it seems like it varies quite a bit. It would be interesting to see some Bluetooth packet captures from it connecting and disconnecting a bunch of times so we can see the general distribution of how long it takes. I'm curious to see if there is a delay in sending the request or a delay in the peripheral sending the reply or both. What kind of device is this anyway?

@JPHutchins
Copy link
Contributor

The BLE device is an NRF52840. The laptop has an Intel bluetooth chip.

Do you suggest Wireshark for the captures? My gut feeling is that it's related to Window's model for Bluetooth devices. Any ideas for a workaround?

@JPHutchins
Copy link
Contributor

I have a minimal workaround that produces this log:

2024-05-19 16:56:52.390 DEBUG    Connecting to BLE device @ C4:35:E3:1E:09:DB
2024-05-19 16:56:52.572 DEBUG    getting services (service_cache_mode=None, cache_mode=None)...
2024-05-19 16:56:53.311 DEBUG    session_status_changed_event_handler: id: BluetoothLE#BluetoothLEe0:d4:64:58:c4:91-c4:35:e3:1e:09:db, error: <BluetoothError.SUCCESS: 0>, status: <GattSessionStatus.ACTIVE: 1>
2024-05-19 16:56:53.348 DEBUG    session_status_changed_event_handler: id: BluetoothLE#BluetoothLEe0:d4:64:58:c4:91-c4:35:e3:1e:09:db, error: <BluetoothError.SUCCESS: 0>, status: <GattSessionStatus.CLOSED: 0>
2024-05-19 16:56:53.629 DEBUG    max_pdu_size_changed event not received, using default
2024-05-19 16:56:53.630 DEBUG    Connected to device=BLEDevice(C4:35:E3:1E:09:DB, A SMP DUT)
2024-05-19 16:56:53.630 DEBUG    Found SMP characteristic: smp_characteristic=<bleak.backends.winrt.characteristic.BleakGATTCharacteristicWinRT object at 0x000001415EEAEEA0>
2024-05-19 16:56:53.630 INFO     smp_characteristic.max_write_without_response_size=20
2024-05-19 16:56:53.631 WARNING  The SMP characteristic MTU is 20 bytes, possibly a Windows bug, checking again
2024-05-19 16:56:54.375 DEBUG    session_status_changed_event_handler: id: BluetoothLE#BluetoothLEe0:d4:64:58:c4:91-c4:35:e3:1e:09:db, error: <BluetoothError.SUCCESS: 0>, status: <GattSessionStatus.ACTIVE: 1>
2024-05-19 16:56:54.385 DEBUG    max_pdu_size_changed_handler: 498
2024-05-19 16:56:55.637 WARNING  smp_characteristic.max_write_without_response_size=495
            logger.debug(f"Found SMP characteristic: {smp_characteristic=}")
            logger.info(f"{smp_characteristic.max_write_without_response_size=}")
            if (
                platform.system() == "Windows"
                and smp_characteristic.max_write_without_response_size == 20
            ):
                # https://github.com/hbldh/bleak/pull/1552#issuecomment-2105573291
                logger.warning(
                    "The SMP characteristic MTU is 20 bytes, possibly a Windows bug, checking again"
                )
                await asyncio.sleep(2)
                smp_characteristic._max_write_without_response_size = (
                    self._client._backend._session.max_pdu_size - 3  # type: ignore
                )
                logger.warning(f"{smp_characteristic.max_write_without_response_size=}")

Wait another 2 seconds for the ._session.max_pdu_size to be mutated by the max_pdu_size_changed_handler.

@dlech
Copy link
Collaborator Author

dlech commented May 20, 2024

Do you suggest Wireshark for the captures?

Yes, that would let us see exactly what is going on.

@JPHutchins
Copy link
Contributor

Are we expecting this to be an issue in Linux as well? I have received a similar report: intercreate/smpclient#21

Should I really be using the max write without response size or is there a better way to get the MTU reliably across all platforms?

@dlech
Copy link
Collaborator Author

dlech commented May 22, 2024

Is this with the same peripheral device?

@JPHutchins
Copy link
Contributor

I believe that the chip is also an NRF52840. Nordic has like 50% market share 🤣. Uncertain of exactly what the peripheral is exactly.

https://iotbusinessnews.com/2023/09/29/78978-tsr-market-update-bluetooth-low-energy-market/

@dlech
Copy link
Collaborator Author

dlech commented May 22, 2024

If it happens on Linux too, then I will probably just redo windows to avoid the wait/timeout and update the characteristics dict later.

If these peripherals just take a long time to exchange this information, then there is nothing you can do besides wait a long time to get the correct info. Again, Bluetooth packet capture would let us actually see if this is really what is happening.

@tomaszduda23
Copy link

The MTU is always exchanged correctly between devices. Only reported value seems to be wrong. Bellow API returns correct MTU

# BlueZ doesn't have a proper way to get the MTU, so we have this hack.
        # If this doesn't work for you, you can set the client._mtu_size attribute
        # to override the value instead.
        if self._client._backend.__class__.__name__ == "BleakClientBlueZDBus":
            await self._client._backend._acquire_mtu()

@dlech
Copy link
Collaborator Author

dlech commented May 22, 2024

The MTU is always exchanged correctly between devices.

Yes, but the problem I understand here is when this happens, not if it happens.

@tomaszduda23
Copy link

I can capture data if you tell me what exactly is needed.

@dlech
Copy link
Collaborator Author

dlech commented May 22, 2024

See https://bleak.readthedocs.io/en/latest/troubleshooting.html#capture-bluetooth-traffic for how to capture Bluetooth packets. Looking for something that can be opened in Wireshark.

@tomaszduda23
Copy link

I do a thing which I'm not supposed to do https://bleak.readthedocs.io/en/latest/troubleshooting.html#calling-asyncio-run-more-than-once. I will check that first 😃

@tomaszduda23
Copy link

After changing to single asyncio run the same problem occures. wireshark log in attachment.
mtu_20.zip

@dlech
Copy link
Collaborator Author

dlech commented May 25, 2024

Thanks for the logs. I can see that the time between the MTU request and response was within 200 ms.

image

Was this on Windows? How long did it take to get the PDU size changed event in Bleak during this capture?

@tomaszduda23
Copy link

It is ubuntu 22.

@tomaszduda23
Copy link

The value is different depending how you check it. The second on is always correct. The first is correct only sometimes.

INFO smp_characteristic.max_write_without_response_size=20
INFO self._max_write_without_response_size=495

You can see the code here https://github.com/intercreate/smpclient/blob/8b9111deb9f41a9173e21bc29e456e53263166d9/smpclient/transport/ble.py#L110

@dlech
Copy link
Collaborator Author

dlech commented May 26, 2024

I would suggest starting a new issue abut the same thing happening on BlueZ since this issue was about Windows specifically.

If you have to call _acquire_mtu() on BluetZ though, it could just mean that you have an older version of BlueZ and it is expected that you would need that workaround.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MTU size incorrect on Windows
3 participants