-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bluetooth: scan: Implement non-blocking ext-adv report reassembly #78982
base: main
Are you sure you want to change the base?
Bluetooth: scan: Implement non-blocking ext-adv report reassembly #78982
Conversation
bc3c4c7
to
5f26984
Compare
This enables the bluetooth/host/scan/slow test on nRF53 in the Zephyr Continuous Integration pipeline. Why waste more CI cycles? The 5340 build will: 1. stress-test the HCI IPC driver (controller + host sides) 2. use a different buffer allocation model, where the allocator doesn't know what other events are in the queue 2. is especially important, as this is the model a lot of host users will be running with. Any controller driver that is not directly on-chip _cannot_ know what is in the HCI queues, and cannot take the same convenient shortcuts as the on-chip function-called drivers. Signed-off-by: Jonathan Rico <jonathan.rico@nordicsemi.no>
Add some advertising data and validate it on the peer. Signed-off-by: Jonathan Rico <jonathan.rico@nordicsemi.no>
We allocate extended advertising reports from the `sync_evt_pool`. That pool will always have a buffer available when `bt_recv()` is called. This necessitates a new API for the drivers to get a buffer, as the old one does not have enough information to detect extended advertising reports. Signed-off-by: Jonathan Rico <jonathan.rico@nordicsemi.no>
This is done in preparation to select flags based on the LE meta event code. Signed-off-by: Aleksander Wasaznik <aleksander.wasaznik@nordicsemi.no>
Since advertising report event processing defers the application callback to a different context, it's non-blocking and should be performed on in prio context to avoid blocking prio events. Signed-off-by: Aleksander Wasaznik <aleksander.wasaznik@nordicsemi.no>
bbc17f0
to
d4eb246
Compare
d4eb246
to
b483d1e
Compare
There may be some funnies left in there. Do report and we will fix. |
b483d1e
to
22354ee
Compare
* | ||
* @return HCI event flags for the specified event. | ||
*/ | ||
static inline uint8_t bt_hci_evt_get_flags(uint8_t evt) | ||
static inline uint8_t bt_hci_evt_get_flags(uint8_t evt[static 3]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've never seen the static
keyword used in an array size designator like that. What kind of sorcery is this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And more importantly, which C standard is this from, since there's a risk that it may introduce some compatibility issues (see https://docs.zephyrproject.org/latest/develop/languages/c/index.html#language-standards)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alwa-nordic is the C language lawyer lol.
another nugget was we had while (num_reports --> 0)
. Which is also valid 🙃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it's from C11, so we should be fine there: https://stackoverflow.com/questions/3430315/what-is-the-purpose-of-static-keyword-in-array-parameter-of-function-like-char
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alwa-nordic is the C language lawyer lol.
another nugget was we had
while (num_reports --> 0)
. Which is also valid 🙃
perfectly valid code: value += (value = value == value --> value <-- value), value++;
I see myself out now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first declaration tells the compiler that someArray is at least 100 elements long. This can be used for optimizations. For example, it also means that someArray is never NULL.
Does that mean that if the function is called with [2]
or NULL
that it gives a compiler error/warning?
@@ -161,6 +161,8 @@ static inline enum bt_buf_type bt_buf_get_type(struct net_buf *buf) | |||
->type; | |||
} | |||
|
|||
struct net_buf *bt_buf_get_evt_but_better(uint8_t evt, uint8_t meta, k_timeout_t timeout); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we'd want to deprecate bt_buf_get_evt()
as well? That said, you'd need to update all upstream HCI drivers first, I suppose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. I made this to allow a soft-migration. Now that I think of it, I should update bt_buf_can_steal_ext_adv_report_buf
to take buf
and double-check the pool too. Because it won't work if the driver uses a different pool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed. Found (and fixed) a bug in the process, yay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. I made this to allow a soft-migration.
I assume that it's planned that @alwa-nordic will take over and finalize this?
If not, we should create a GH issue for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alwa-nordic will take over, that is the plan.
* - There are buffers in the `free` LIFO | ||
* - There are un-initialized buffers | ||
*/ | ||
return (!k_fifo_is_empty(&sync_adv_pool.free)) || (sync_adv_pool.uninit_count != 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This almost begs for a proper net_buf_pool API to test for "is it empty?". That said, it's inherently racy in premptible contexts, so we might not want to give people such a foot gun.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep. Ideally net_buf
would init all buffers on boot and place them in the free
LIFO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, though that's a bit of a separate issue. Even with everything in the LIFO it's still racy to first check and then try to allocate (or make any other decisions based on the pool status)
529cfd3
to
76d9699
Compare
@jhedberg @Thalley Given that sequence:
Where What callbacks and data should we expect as a user? The current behavior (as tested by Do we ask for a new spec? with a new status |
subsys/bluetooth/host/scan.c
Outdated
uintptr_t data = POINTER_TO_UINT(_data); | ||
uintptr_t start = POINTER_TO_UINT(buf->__buf); | ||
uintptr_t end = POINTER_TO_UINT(buf->__buf + buf->len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't quite right. buf->len
describes the amount of valid data behind buf->data
. buf->size
OTOH is the maximum theoretical amount of data that the net_buf can hold, i.e. buf->size
and buf->__buf
go together, and buf->data
and buf->len
go together.
76d9699
to
9e244fe
Compare
Throwback: #39352 (comment) Yes, the spec is broken. There is no way to know if you lost the beginning of the adv report. We have to rely on the controller not interleaving fragmented reports. We tried raising an errata last time and it didnt go anywhere. |
https://bluetooth.atlassian.net/jira/software/c/projects/ES/issues/ES-17839 The suggested solution is to keep a list of advertiser address and data status. Completely ignoring that this data needs to be stored for an unbounded number of advertisers, and that one advertiser may have multiple sets and anonymous advertising exists. |
return bt_buf_get_evt(h4->rx.evt.evt, h4->rx.discardable, timeout); | ||
case BT_HCI_H4_EVT: { | ||
uint8_t evt = h4->rx.evt.evt; | ||
uint8_t meta = h4->rx.hdr[3]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any define for the offset instead of the magic number 3
?
@@ -161,6 +161,8 @@ static inline enum bt_buf_type bt_buf_get_type(struct net_buf *buf) | |||
->type; | |||
} | |||
|
|||
struct net_buf *bt_buf_get_evt_but_better(uint8_t evt, uint8_t meta, k_timeout_t timeout); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. I made this to allow a soft-migration.
I assume that it's planned that @alwa-nordic will take over and finalize this?
If not, we should create a GH issue for it
#if defined(CONFIG_BT_OBSERVER) && defined(CONFIG_BT_EXT_ADV) | ||
switch (node_rx->hdr.type) { | ||
case NODE_RX_TYPE_EXT_1M_REPORT: | ||
case NODE_RX_TYPE_EXT_2M_REPORT: | ||
case NODE_RX_TYPE_EXT_CODED_REPORT: | ||
*evt = BT_HCI_EVT_LE_META_EVENT; | ||
*meta = BT_HCI_EVT_LE_EXT_ADVERTISING_REPORT; | ||
return true; | ||
case NODE_RX_TYPE_EXT_SCAN_TERMINATE: | ||
*evt = BT_HCI_EVT_LE_META_EVENT; | ||
*meta = BT_HCI_EVT_LE_SCAN_TIMEOUT; | ||
return true; | ||
default: | ||
return false; | ||
} | ||
|
||
#else | ||
return false; | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#if defined(CONFIG_BT_OBSERVER) && defined(CONFIG_BT_EXT_ADV) | |
switch (node_rx->hdr.type) { | |
case NODE_RX_TYPE_EXT_1M_REPORT: | |
case NODE_RX_TYPE_EXT_2M_REPORT: | |
case NODE_RX_TYPE_EXT_CODED_REPORT: | |
*evt = BT_HCI_EVT_LE_META_EVENT; | |
*meta = BT_HCI_EVT_LE_EXT_ADVERTISING_REPORT; | |
return true; | |
case NODE_RX_TYPE_EXT_SCAN_TERMINATE: | |
*evt = BT_HCI_EVT_LE_META_EVENT; | |
*meta = BT_HCI_EVT_LE_SCAN_TIMEOUT; | |
return true; | |
default: | |
return false; | |
} | |
#else | |
return false; | |
#endif | |
if IS_ENABLED((CONFIG_BT_OBSERVER) && IS_ENABLED(CONFIG_BT_EXT_ADV)) { | |
switch (node_rx->hdr.type) { | |
case NODE_RX_TYPE_EXT_1M_REPORT: | |
case NODE_RX_TYPE_EXT_2M_REPORT: | |
case NODE_RX_TYPE_EXT_CODED_REPORT: | |
*evt = BT_HCI_EVT_LE_META_EVENT; | |
*meta = BT_HCI_EVT_LE_EXT_ADVERTISING_REPORT; | |
return true; | |
case NODE_RX_TYPE_EXT_SCAN_TERMINATE: | |
*evt = BT_HCI_EVT_LE_META_EVENT; | |
*meta = BT_HCI_EVT_LE_SCAN_TIMEOUT; | |
return true; | |
default: | |
return false; | |
} | |
else { | |
return false; | |
} |
Indeed, broken spec. Nothing much we can do about that. We can only do best effort and keep incomplete advertiser data for a specific amount of time before discarding it |
1e4ebd9
to
b16609f
Compare
Re-assemble the extended advertising reports into a dedicated buffer pool and notify the application asynchronously. To achieve this we: - decompose the re-assembly function into fragmented and non-frag parts - for reassembled reports: push any other info onto the reassembled buffer - for non-frag reports: pass the raw event buffer directly - put that buffer on a FIFO - trigger a (sys)work item that pulls and calls the cb The optimization for 80% of cases is kept: Processing non-fragmented extended advertising reports will not double-copy: We add a buffer pool similar to `sync_evt_pool` and apply the following algo: - if report is non-frag & we have at least 1 buf available in pool -> ref() that buffer and put it on the FIFO directly - else: -> allocate from re-assembly pool and copy into it Note that we can only apply this optimization when we are processing the last report in the event buffer. Note to reviewers: The diff of this commit is garbage, you're probably better of comparing with the previous implementation manually. Signed-off-by: Jonathan Rico <jonathan.rico@nordicsemi.no> Co-authored-by: Aleksander Wasaznik <aleksander.wasaznik@nordicsemi.no>
Sorry we don't have color support yet.. On the non-joke side, this option will print the number of dropped extended advertising reports every second. Signed-off-by: Jonathan Rico <jonathan.rico@nordicsemi.no> Co-authored-by: Aleksander Wasaznik <aleksander.wasaznik@nordicsemi.no>
b16609f
to
db675f5
Compare
static char *status_to_str(uint8_t status) | ||
{ | ||
switch (status) { | ||
case COMPLETE: | ||
return "COMPLETE"; | ||
case INCOMPLETE_MORE_DATA_TO_COME: | ||
return "INCOMPLETE-MD"; | ||
case INCOMPLETE_DATA_TRUNCATED_NO_MORE_TO_COME: | ||
return "INCOMPLETE-NM"; | ||
case STATUS_RFU: | ||
return "RFU"; | ||
default: | ||
__ASSERT(0, "Unknown data status %x", status); | ||
return ""; | ||
} | ||
} | ||
|
||
is_new_advertiser = reassembling_advertiser.state == FRAG_ADV_INACTIVE || | ||
!fragmented_advertisers_equal(&reassembling_advertiser, | ||
&evt->addr, evt->sid); | ||
static bool process_fragmented_report(struct bt_hci_evt_le_ext_advertising_info *evt) | ||
{ | ||
/* This function parses the event and re-assembles fragmented reports | ||
* into synthetic events. | ||
* | ||
* Returns true if the event was processed. I.e. will return false for | ||
* reports that fit a single event and are complete. | ||
*/ | ||
struct fragmented_advertiser *r = &reassembling_advertiser; | ||
|
||
if (is_new_advertiser && is_report_complete) { | ||
/* Only advertising report from this advertiser. | ||
* Create event immediately. | ||
*/ | ||
create_ext_adv_info(evt, &scan_info); | ||
le_adv_recv(&evt->addr, &scan_info, &buf->b, evt->length); | ||
goto cont; | ||
} | ||
uint16_t evt_type = sys_le16_to_cpu(evt->evt_type); | ||
uint16_t data_status = BT_HCI_LE_ADV_EVT_TYPE_DATA_STATUS(evt_type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
status_to_str
is called with data_status
which has the return value of BT_HCI_LE_ADV_EVT_TYPE_DATA_STATUS
.
However the values returned by BT_HCI_LE_ADV_EVT_TYPE_DATA_STATUS
are not the same values as status_to_str
matches on.
BT_HCI_LE_ADV_EVT_TYPE_DATA_STATUS
should return one of the following:
BT_HCI_LE_ADV_EVT_TYPE_DATA_STATUS_COMPLETE
BT_HCI_LE_ADV_EVT_TYPE_DATA_STATUS_PARTIAL
BT_HCI_LE_ADV_EVT_TYPE_DATA_STATUS_INCOMPLETE
/* Core Spec v5.4. 4.E.7.7.65.13 says controller uses latest | ||
* RSSI. Host wants to be like controller when grow up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Host wants to be like controller when grow up.
Can you elaborate on what that means?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, I'll comment with my personal account :)
@alwa-nordic should rephrase it. Basically, it means that the host should not have a surprising behavior: if the spec has a guideline for the controller, the host should also (unless otherwise stated) try to follow it.
Here the guideline for the LL is that it should use the RSSI from the latest AUX packet received for enqueuing the HCI ext-adv-report event. In the host we translate that to "use the RSSI value from the last ext-adv-report fragment in the chain when you notify the application".
I hope that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that makes sense.
Re-assemble the extended advertising reports into a dedicated buffer
pool and notify the application asynchronously.
To achieve this we:
The optimization for 80% of cases is kept: Processing non-fragmented
extended advertising reports will not double-copy: We add a buffer pool
similar to
sync_evt_pool
and apply the following algo:-> ref() that buffer and put it on the FIFO directly
-> allocate from re-assembly pool and copy into it
Note that we can only apply this optimization when we are processing the
last report in the event buffer.
Note to reviewers: The diff of this commit is garbage, you're probably
better of comparing with the previous implementation manually.
Signed-off-by: Jonathan Rico jonathan.rico@nordicsemi.no
Co-authored-by: Aleksander Wasaznik aleksander.wasaznik@nordicsemi.no
Fixes #50786