Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bluetooth: Fix various deadlock issues when running low on buffers #16870

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions drivers/bluetooth/hci/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,20 @@ config BT_UART_ON_DEV_NAME
This option specifies the name of UART device to be used
for Bluetooth.

if BT_H4

config BT_H4_DISCARD_RX_WAIT
int "How many milliseconds to initially wait for RX buffers"
range -1 6000000
default 100
help
This option specifies how many milliseconds the H4 HCI driver
will initially wait for RX buffers to become available before
attempting to discard discardable buffers from the RX queue. Set
to -1 (K_FOREVER) to disable the feature.

endif # BT_H4

if BT_SPI

config BT_BLUENRG_ACI
Expand Down
75 changes: 72 additions & 3 deletions drivers/bluetooth/hci/h4.c
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@
#define H4_SCO 0x03
#define H4_EVT 0x04

#define BUF_DISCARDABLE BIT(7)

static K_THREAD_STACK_DEFINE(rx_thread_stack, CONFIG_BT_RX_STACK_SIZE);
static struct k_thread rx_thread_data;

Expand Down Expand Up @@ -159,15 +161,82 @@ static void reset_rx(void)

static struct net_buf *get_rx(int timeout)
{
struct net_buf *buf;

BT_DBG("type 0x%02x, evt 0x%02x", rx.type, rx.evt.evt);

if (rx.type == H4_EVT) {
return bt_buf_get_evt(rx.evt.evt, timeout);
buf = bt_buf_get_evt(rx.evt.evt, timeout);
} else {
buf = bt_buf_get_rx(BT_BUF_ACL_IN, timeout);
}

if (buf) {
if (rx.discardable) {
buf->flags |= BUF_DISCARDABLE;
} else {
buf->flags &= ~BUF_DISCARDABLE;
}
}

return bt_buf_get_rx(BT_BUF_ACL_IN, timeout);
return buf;
}

#if (CONFIG_BT_H4_DISCARD_RX_WAIT >= 0)
static struct net_buf *get_rx_blocking(void)
{
sys_slist_t list = SYS_SLIST_STATIC_INIT(&list);
bool discarded = false;
struct net_buf *buf;

/* If we have ACL flow control enabled then ACL buffers come from a
* dedicated pool, and we cannot benefit from trying to discard
* events from the RX queue.
*/
if (IS_ENABLED(CONFIG_BT_HCI_ACL_FLOW_CONTROL) && rx.type == H4_ACL) {
return get_rx(K_FOREVER);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the nature of such patches is to make sure that buffer-getting functions are never called with K_FOREVER.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scenario here is that we're in the RX thread with the UART RX interrupt disabled, i.e. there's nothing else that can be done than to sit and wait for a buffer. With ACL flow control this should be guaranteed to succeed since otherwise the controller is sending us an ACL packet when we haven't given it permission (credits) to do so. Another option would be K_NO_WAIT + ASSERT.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e. there's nothing else that can be done than to sit and wait for a buffer.

There's always something to do, e.g. to report a warning after some period of time, that this extended wait happens.

That said, that's just a random comment from someone not familiar with the BT subsys. (In IP networking, I believe we decided that we don't want any K_FOREVER's, and it was even implemented IIRC).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For debugging purposes something that might be useful, but it should be made for the entire stack. This actually gave me a nice debug feature idea for the entire kernel: have a Kconfig option, which when enabled causes any threads using K_FOREVER sleep to wake up periodically after a certain amount of time (e.g. every minute) and announce to the system log that they're still sleeping. The kernel APIs wouldn't return, rather just go back to waiting, i.e. the semantics of the APIs being guaranteed to return non-NULL with K_FOREVER wouldn't change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For debugging purposes something that might be useful, but it should be made for the entire stack. This actually gave me a nice debug feature idea for the entire kernel: have a Kconfig option, which when enabled causes any threads using K_FOREVER sleep to wake up periodically after a certain amount of time (e.g. every minute) and announce to the system log that they're still sleeping. The kernel APIs wouldn't return, rather just go back to waiting, i.e. the semantics of the APIs being guaranteed to return non-NULL with K_FOREVER wouldn't change.

Interrestingm I was thinking something similar sometime ago, though if we just wake up the threads just to make them wait again that perhaps would change the order in the waiters list so we it might have to be done without touching the waiters list.

}

buf = get_rx(CONFIG_BT_H4_DISCARD_RX_WAIT);
if (buf) {
return buf;
}

BT_WARN("Attempting to discard packets from RX queue");

while ((buf = net_buf_get(&rx.fifo, K_NO_WAIT))) {
if (!discarded && (buf->flags & BUF_DISCARDABLE)) {
Copy link
Contributor

@Vudentz Vudentz Jun 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this seems to discard only one buffer at time? If that is the correct behavior perhaps you can check if there have been anything added to the list, if there isn't anything that means the order is preserved and we can return interrupting the while loop since you don't have to call k_fifo_put_slist.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite. Even if we don't find anything to discard we'll have emptied rx.fifo through the net_buf_get calls. So we anyway have to do the put_slist(). And this code is discarding at most one buffer to minimise packet loss for something like mesh that's waiting for advertising reports.

net_buf_unref(buf);
discarded = true;
} else {
/* Since we use k_fifo_put_slist() instead of
* net_buf APIs to insert back to the FIFO we
* cannot support fragmented buffers. This should
* never happen but better to "document" it in the
* form of an assert here.
*/
__ASSERT(!(buf->flags & NET_BUF_FRAGS),
"Fragmented buffers not supported");
sys_slist_append(&list, &buf->node);
}
}

if (!discarded) {
BT_WARN("Unable to discard anything from the RX queue");
}

/* Insert non-discarded packets back to the RX queue */
k_fifo_put_slist(&rx.fifo, &list);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andyross asking you since you seem to have touched the k_queue code most recently. The above code lines, starting with line 205, are iterating the contents of a k_fifo with the aim to remove one element from there. In this place of the code we have a guarantee that no other thread or ISR will attempt to access the k_fifo while we do this, however in the absence of any more suitable public APIs I'm having to remove each element one-by one and then re-insert the non-discarded elements using k_fifo_put_slist(). What I'd really want to do is the equivalent of SYS_SFLIST_FOR_EACH_NODE() and sys_sflist_remove() and stop iterating once a matching element is found and removed, however currently that requires accessing what I assume is private data of k_fifo (and k_queue). Would that still be ok, or any cleaner way to solve this?


return get_rx(K_FOREVER);
}
#else
static inline struct net_buf *get_rx_blocking(void)
{
return get_rx(K_FOREVER);
}
#endif

static void rx_thread(void *p1, void *p2, void *p3)
{
struct net_buf *buf;
Expand All @@ -186,7 +255,7 @@ static void rx_thread(void *p1, void *p2, void *p3)
* original command buffer (if available).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this comment need updating now that there is a separate pool for num complete?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. To be honest, I don't fully understand how the host stack works.

*/
if (rx.have_hdr && !rx.buf) {
rx.buf = get_rx(K_FOREVER);
rx.buf = get_rx_blocking();
BT_DBG("Got rx.buf %p", rx.buf);
if (rx.remaining > net_buf_tailroom(rx.buf)) {
BT_ERR("Not enough space in buffer");
Expand Down