Skip to content

Bluetooth: conn: Don't wait for buf allocation in syswq #71666

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jori-nordic
Copy link
Contributor

We can't just keep blocking the syswq, people be mad. Override the timeout value so we always have K_NO_WAIT. Also print a message to whomever's listening.

theob-pro
theob-pro previously approved these changes Apr 18, 2024
@@ -1471,6 +1471,12 @@ struct net_buf *bt_conn_create_pdu_timeout(struct net_buf_pool *pool,
*/
__ASSERT_NO_MSG(!k_is_in_isr());

if (!K_TIMEOUT_EQ(timeout, K_NO_WAIT) &&
k_current_get() == k_work_queue_thread_get(&k_sys_work_q)) {
LOG_DBG("Blocking the SYSWQ is bad, mmkay?");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mr. Mackey says that we should have proper debug logs

Suggested change
LOG_DBG("Blocking the SYSWQ is bad, mmkay?");

IMO this does not require a log statement at all

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it does since it's a "silent" override. I would be confused at least that my supposedly-blocking alloc is non blocking.
I tried to look into dumping the timeout value in the warning below but it's opaque and has no getters in zephyr :/
I'll reword it ofc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a fair point. I think we should add a similar LOG statement in bt_att_req_alloc as well then

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't just keep blocking the syswq, people be mad.
Override the timeout value so we always have K_NO_WAIT.
Also print a message to whomever's listening.

Signed-off-by: Jonathan Rico <jonathan.rico@nordicsemi.no>
@jori-nordic
Copy link
Contributor Author

jori-nordic commented Apr 19, 2024

@Thalley @alwa-nordic I'm debugging a deadlock rn caused by an audio sample blocking in the same way in the syswq.
What do you think about patching net_buf_alloc_len() directly instead?

edit: A whole lot of audio samples do the badness, so we'll probably have to add error handling there.
edit2: #71697

@Thalley
Copy link
Contributor

Thalley commented Apr 19, 2024

edit: A whole lot of audio samples do the badness, so we'll probably have to add error handling there.

@jori-nordic can you elaborate?

@jori-nordic
Copy link
Contributor Author

@Thalley they do a net_buf_alloc(K_FOREVER) from a work item scheduled on the syswq. That's a no-no.
The preferred solution would be the samples having their own workqueue.
Alternatively, just retrying when they don't get a buf, or we could expose the long_wq as public API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Bluetooth Host Bluetooth Host (excluding BR/EDR) area: Bluetooth
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants