Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bluetooth: scan: Implement non-blocking ext-adv report reassembly #78982

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

jori-nordic
Copy link
Collaborator

@jori-nordic jori-nordic commented Sep 25, 2024

Re-assemble the extended advertising reports into a dedicated buffer
pool and notify the application asynchronously.

To achieve this we:

  • decompose the re-assembly function into fragmented and non-frag parts
  • for reassembled reports: push any other info onto the reassembled buffer
  • for non-frag reports: pass the raw event buffer directly
  • put that buffer on a FIFO
  • trigger a (sys)work item that pulls and calls the cb

The optimization for 80% of cases is kept: Processing non-fragmented
extended advertising reports will not double-copy: We add a buffer pool
similar to sync_evt_pool and apply the following algo:

  • if report is non-frag & we have at least 1 buf available in pool
    -> ref() that buffer and put it on the FIFO directly
  • else:
    -> allocate from re-assembly pool and copy into it

Note that we can only apply this optimization when we are processing the
last report in the event buffer.

Note to reviewers: The diff of this commit is garbage, you're probably
better of comparing with the previous implementation manually.

Signed-off-by: Jonathan Rico jonathan.rico@nordicsemi.no
Co-authored-by: Aleksander Wasaznik aleksander.wasaznik@nordicsemi.no

Fixes #50786

@jori-nordic jori-nordic force-pushed the blocking-is-bad-mmkay branch 3 times, most recently from bc3c4c7 to 5f26984 Compare September 26, 2024 07:39
jori-nordic and others added 5 commits September 26, 2024 14:24
This enables the bluetooth/host/scan/slow test on nRF53 in the Zephyr
Continuous Integration pipeline.

Why waste more CI cycles?

The 5340 build will:
1. stress-test the HCI IPC driver (controller + host sides)
2. use a different buffer allocation model, where the allocator doesn't
know what other events are in the queue

2. is especially important, as this is the model a lot of host users
will be running with. Any controller driver that is not directly on-chip
_cannot_ know what is in the HCI queues, and cannot take the same
convenient shortcuts as the on-chip function-called drivers.

Signed-off-by: Jonathan Rico <jonathan.rico@nordicsemi.no>
Add some advertising data and validate it on the peer.

Signed-off-by: Jonathan Rico <jonathan.rico@nordicsemi.no>
We allocate extended advertising reports from the `sync_evt_pool`. That
pool will always have a buffer available when `bt_recv()` is called.

This necessitates a new API for the drivers to get a buffer, as the old
one does not have enough information to detect extended advertising
reports.

Signed-off-by: Jonathan Rico <jonathan.rico@nordicsemi.no>
This is done in preparation to select flags based on the LE meta event
code.

Signed-off-by: Aleksander Wasaznik <aleksander.wasaznik@nordicsemi.no>
Since advertising report event processing defers the application
callback to a different context, it's non-blocking and should be
performed on in prio context to avoid blocking prio events.

Signed-off-by: Aleksander Wasaznik <aleksander.wasaznik@nordicsemi.no>
@jori-nordic jori-nordic changed the title Re-assemble adv reports synchronously Bluetooth: scan: Implement non-blocking ext-adv report reassembly Sep 26, 2024
@jori-nordic
Copy link
Collaborator Author

There may be some funnies left in there. Do report and we will fix.

*
* @return HCI event flags for the specified event.
*/
static inline uint8_t bt_hci_evt_get_flags(uint8_t evt)
static inline uint8_t bt_hci_evt_get_flags(uint8_t evt[static 3])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never seen the static keyword used in an array size designator like that. What kind of sorcery is this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And more importantly, which C standard is this from, since there's a risk that it may introduce some compatibility issues (see https://docs.zephyrproject.org/latest/develop/languages/c/index.html#language-standards)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alwa-nordic is the C language lawyer lol.

another nugget was we had while (num_reports --> 0). Which is also valid 🙃

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alwa-nordic is the C language lawyer lol.

another nugget was we had while (num_reports --> 0). Which is also valid 🙃

perfectly valid code: value += (value = value == value --> value <-- value), value++; I see myself out now

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From https://stackoverflow.com/questions/3430315/what-is-the-purpose-of-static-keyword-in-array-parameter-of-function-like-char :

The first declaration tells the compiler that someArray is at least 100 elements long. This can be used for optimizations. For example, it also means that someArray is never NULL.

Does that mean that if the function is called with [2] or NULL that it gives a compiler error/warning?

@@ -161,6 +161,8 @@ static inline enum bt_buf_type bt_buf_get_type(struct net_buf *buf)
->type;
}

struct net_buf *bt_buf_get_evt_but_better(uint8_t evt, uint8_t meta, k_timeout_t timeout);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we'd want to deprecate bt_buf_get_evt() as well? That said, you'd need to update all upstream HCI drivers first, I suppose.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. I made this to allow a soft-migration. Now that I think of it, I should update bt_buf_can_steal_ext_adv_report_buf to take buf and double-check the pool too. Because it won't work if the driver uses a different pool.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed. Found (and fixed) a bug in the process, yay.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. I made this to allow a soft-migration.

I assume that it's planned that @alwa-nordic will take over and finalize this?

If not, we should create a GH issue for it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alwa-nordic will take over, that is the plan.

* - There are buffers in the `free` LIFO
* - There are un-initialized buffers
*/
return (!k_fifo_is_empty(&sync_adv_pool.free)) || (sync_adv_pool.uninit_count != 0);
Copy link
Member

@jhedberg jhedberg Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This almost begs for a proper net_buf_pool API to test for "is it empty?". That said, it's inherently racy in premptible contexts, so we might not want to give people such a foot gun.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep. Ideally net_buf would init all buffers on boot and place them in the free LIFO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, though that's a bit of a separate issue. Even with everything in the LIFO it's still racy to first check and then try to allocate (or make any other decisions based on the pool status)

@jori-nordic jori-nordic force-pushed the blocking-is-bad-mmkay branch 2 times, most recently from 529cfd3 to 76d9699 Compare September 27, 2024 12:11
@jori-nordic
Copy link
Collaborator Author

@jhedberg @Thalley
There is another issue, due to a missing half-bit in the spec:

Given that sequence:

[ Ai | Ai | Bi | Bc | Ac ]

Where i is incomplete-more-data and c is complete. A and B are two different peers.

What callbacks and data should we expect as a user? The current behavior (as tested by host_long_adv_recv) is that we will receive the full data for B. BUT we also receive A, with the beginning truncated.

Do we ask for a new spec? with a new status START using up the last half-bit

Comment on lines 1039 to 1041
uintptr_t data = POINTER_TO_UINT(_data);
uintptr_t start = POINTER_TO_UINT(buf->__buf);
uintptr_t end = POINTER_TO_UINT(buf->__buf + buf->len);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't quite right. buf->len describes the amount of valid data behind buf->data. buf->size OTOH is the maximum theoretical amount of data that the net_buf can hold, i.e. buf->size and buf->__buf go together, and buf->data and buf->len go together.

@hermabe
Copy link
Member

hermabe commented Sep 27, 2024

@jhedberg @Thalley There is another issue, due to a missing half-bit in the spec:

Given that sequence:

[ Ai | Ai | Bi | Bc | Ac ]

Where i is incomplete-more-data and c is complete. A and B are two different peers.

What callbacks and data should we expect as a user? The current behavior (as tested by host_long_adv_recv) is that we will receive the full data for B. BUT we also receive A, with the beginning truncated.

Do we ask for a new spec? with a new status START using up the last half-bit

Throwback: #39352 (comment)

Yes, the spec is broken. There is no way to know if you lost the beginning of the adv report. We have to rely on the controller not interleaving fragmented reports. We tried raising an errata last time and it didnt go anywhere.

@hermabe
Copy link
Member

hermabe commented Sep 27, 2024

https://bluetooth.atlassian.net/jira/software/c/projects/ES/issues/ES-17839

The suggested solution is to keep a list of advertiser address and data status. Completely ignoring that this data needs to be stored for an unbounded number of advertisers, and that one advertiser may have multiple sets and anonymous advertising exists.

return bt_buf_get_evt(h4->rx.evt.evt, h4->rx.discardable, timeout);
case BT_HCI_H4_EVT: {
uint8_t evt = h4->rx.evt.evt;
uint8_t meta = h4->rx.hdr[3];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any define for the offset instead of the magic number 3?

drivers/bluetooth/hci/ipc.c Show resolved Hide resolved
@@ -161,6 +161,8 @@ static inline enum bt_buf_type bt_buf_get_type(struct net_buf *buf)
->type;
}

struct net_buf *bt_buf_get_evt_but_better(uint8_t evt, uint8_t meta, k_timeout_t timeout);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. I made this to allow a soft-migration.

I assume that it's planned that @alwa-nordic will take over and finalize this?

If not, we should create a GH issue for it

Comment on lines +436 to +454
#if defined(CONFIG_BT_OBSERVER) && defined(CONFIG_BT_EXT_ADV)
switch (node_rx->hdr.type) {
case NODE_RX_TYPE_EXT_1M_REPORT:
case NODE_RX_TYPE_EXT_2M_REPORT:
case NODE_RX_TYPE_EXT_CODED_REPORT:
*evt = BT_HCI_EVT_LE_META_EVENT;
*meta = BT_HCI_EVT_LE_EXT_ADVERTISING_REPORT;
return true;
case NODE_RX_TYPE_EXT_SCAN_TERMINATE:
*evt = BT_HCI_EVT_LE_META_EVENT;
*meta = BT_HCI_EVT_LE_SCAN_TIMEOUT;
return true;
default:
return false;
}

#else
return false;
#endif
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#if defined(CONFIG_BT_OBSERVER) && defined(CONFIG_BT_EXT_ADV)
switch (node_rx->hdr.type) {
case NODE_RX_TYPE_EXT_1M_REPORT:
case NODE_RX_TYPE_EXT_2M_REPORT:
case NODE_RX_TYPE_EXT_CODED_REPORT:
*evt = BT_HCI_EVT_LE_META_EVENT;
*meta = BT_HCI_EVT_LE_EXT_ADVERTISING_REPORT;
return true;
case NODE_RX_TYPE_EXT_SCAN_TERMINATE:
*evt = BT_HCI_EVT_LE_META_EVENT;
*meta = BT_HCI_EVT_LE_SCAN_TIMEOUT;
return true;
default:
return false;
}
#else
return false;
#endif
if IS_ENABLED((CONFIG_BT_OBSERVER) && IS_ENABLED(CONFIG_BT_EXT_ADV)) {
switch (node_rx->hdr.type) {
case NODE_RX_TYPE_EXT_1M_REPORT:
case NODE_RX_TYPE_EXT_2M_REPORT:
case NODE_RX_TYPE_EXT_CODED_REPORT:
*evt = BT_HCI_EVT_LE_META_EVENT;
*meta = BT_HCI_EVT_LE_EXT_ADVERTISING_REPORT;
return true;
case NODE_RX_TYPE_EXT_SCAN_TERMINATE:
*evt = BT_HCI_EVT_LE_META_EVENT;
*meta = BT_HCI_EVT_LE_SCAN_TIMEOUT;
return true;
default:
return false;
}
else {
return false;
}

subsys/bluetooth/host/buf.c Show resolved Hide resolved
subsys/bluetooth/host/scan.c Outdated Show resolved Hide resolved
subsys/bluetooth/host/scan.c Outdated Show resolved Hide resolved
subsys/bluetooth/host/scan.c Outdated Show resolved Hide resolved
subsys/bluetooth/host/scan.c Show resolved Hide resolved
subsys/bluetooth/host/scan.c Outdated Show resolved Hide resolved
@Thalley
Copy link
Collaborator

Thalley commented Sep 27, 2024

https://bluetooth.atlassian.net/jira/software/c/projects/ES/issues/ES-17839

The suggested solution is to keep a list of advertiser address and data status. Completely ignoring that this data needs to be stored for an unbounded number of advertisers, and that one advertiser may have multiple sets and anonymous advertising exists.

Indeed, broken spec. Nothing much we can do about that.

We can only do best effort and keep incomplete advertiser data for a specific amount of time before discarding it

@narvalotech narvalotech force-pushed the blocking-is-bad-mmkay branch 3 times, most recently from 1e4ebd9 to b16609f Compare September 30, 2024 09:56
jori-nordic and others added 2 commits September 30, 2024 13:45
Re-assemble the extended advertising reports into a dedicated buffer
pool and notify the application asynchronously.

To achieve this we:
- decompose the re-assembly function into fragmented and non-frag parts
- for reassembled reports: push any other info onto the reassembled buffer
- for non-frag reports: pass the raw event buffer directly
- put that buffer on a FIFO
- trigger a (sys)work item that pulls and calls the cb

The optimization for 80% of cases is kept: Processing non-fragmented
extended advertising reports will not double-copy: We add a buffer pool
similar to `sync_evt_pool` and apply the following algo:
- if report is non-frag & we have at least 1 buf available in pool
  -> ref() that buffer and put it on the FIFO directly
- else:
  -> allocate from re-assembly pool and copy into it

Note that we can only apply this optimization when we are processing the
last report in the event buffer.

Note to reviewers: The diff of this commit is garbage, you're probably
better of comparing with the previous implementation manually.

Signed-off-by: Jonathan Rico <jonathan.rico@nordicsemi.no>
Co-authored-by: Aleksander Wasaznik <aleksander.wasaznik@nordicsemi.no>
Sorry we don't have color support yet..

On the non-joke side, this option will print the number of dropped
extended advertising reports every second.

Signed-off-by: Jonathan Rico <jonathan.rico@nordicsemi.no>
Co-authored-by: Aleksander Wasaznik <aleksander.wasaznik@nordicsemi.no>
Comment on lines +945 to +973
static char *status_to_str(uint8_t status)
{
switch (status) {
case COMPLETE:
return "COMPLETE";
case INCOMPLETE_MORE_DATA_TO_COME:
return "INCOMPLETE-MD";
case INCOMPLETE_DATA_TRUNCATED_NO_MORE_TO_COME:
return "INCOMPLETE-NM";
case STATUS_RFU:
return "RFU";
default:
__ASSERT(0, "Unknown data status %x", status);
return "";
}
}

is_new_advertiser = reassembling_advertiser.state == FRAG_ADV_INACTIVE ||
!fragmented_advertisers_equal(&reassembling_advertiser,
&evt->addr, evt->sid);
static bool process_fragmented_report(struct bt_hci_evt_le_ext_advertising_info *evt)
{
/* This function parses the event and re-assembles fragmented reports
* into synthetic events.
*
* Returns true if the event was processed. I.e. will return false for
* reports that fit a single event and are complete.
*/
struct fragmented_advertiser *r = &reassembling_advertiser;

if (is_new_advertiser && is_report_complete) {
/* Only advertising report from this advertiser.
* Create event immediately.
*/
create_ext_adv_info(evt, &scan_info);
le_adv_recv(&evt->addr, &scan_info, &buf->b, evt->length);
goto cont;
}
uint16_t evt_type = sys_le16_to_cpu(evt->evt_type);
uint16_t data_status = BT_HCI_LE_ADV_EVT_TYPE_DATA_STATUS(evt_type);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

status_to_str is called with data_status which has the return value of BT_HCI_LE_ADV_EVT_TYPE_DATA_STATUS.

However the values returned by BT_HCI_LE_ADV_EVT_TYPE_DATA_STATUS are not the same values as status_to_str matches on.

BT_HCI_LE_ADV_EVT_TYPE_DATA_STATUS should return one of the following:

BT_HCI_LE_ADV_EVT_TYPE_DATA_STATUS_COMPLETE
BT_HCI_LE_ADV_EVT_TYPE_DATA_STATUS_PARTIAL
BT_HCI_LE_ADV_EVT_TYPE_DATA_STATUS_INCOMPLETE

Comment on lines +1024 to +1025
/* Core Spec v5.4. 4.E.7.7.65.13 says controller uses latest
* RSSI. Host wants to be like controller when grow up.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Host wants to be like controller when grow up.

Can you elaborate on what that means?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, I'll comment with my personal account :)

@alwa-nordic should rephrase it. Basically, it means that the host should not have a surprising behavior: if the spec has a guideline for the controller, the host should also (unless otherwise stated) try to follow it.

Here the guideline for the LL is that it should use the RSSI from the latest AUX packet received for enqueuing the HCI ext-adv-report event. In the host we translate that to "use the RSSI value from the last ext-adv-report fragment in the chain when you notify the application".
I hope that makes sense.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that makes sense.

@kruithofa kruithofa removed their request for review October 18, 2024 09:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bluetooth: Host: Extended advertising reports may block the host
8 participants