Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib: net_buf: Generates lots of warnings during Bluetooth initialization #80167

Open
HoZHel opened this issue Oct 21, 2024 · 7 comments
Open
Assignees
Labels
area: Bluetooth Enhancement Changes/Updates/Additions to existing features

Comments

@HoZHel
Copy link
Contributor

HoZHel commented Oct 21, 2024

Describe the bug

Several warnings are generated on the console during Bluetooth initialization with the message as follow:
<wrn> net_buf: Timeout discarded. No blocking in syswq.

I discovered this using the platform nrf52840dk/nrf52840 with peripheral_hr, even if the problem is not specific to this application.

Apparently this has been introduced by 88ee2be

To Reproduce

  1. west build -b nrf52840dk/nrf52840 samples/bluetooth/peripheral_hr
  2. west flash

Expected behavior
Normally, I would expect no warning for a regular Bluetooth initialization.

Impact
Annoyance and feeling that something is not right.

Logs and console output

*** Booting Zephyr OS build v3.7.0-4797-g98e0e65069cf ***
[00:00:00.250,518] <wrn> net_buf: Timeout discarded. No blocking in syswq
[00:00:00.250,640] <wrn> net_buf: Timeout discarded. No blocking in syswq
[00:00:00.250,762] <wrn> net_buf: Timeout discarded. No blocking in syswq
[00:00:00.250,915] <wrn> net_buf: Timeout discarded. No blocking in syswq
[00:00:00.251,068] <wrn> net_buf: Timeout discarded. No blocking in syswq
[00:00:00.251,190] <wrn> net_buf: Timeout discarded. No blocking in syswq
[00:00:00.251,312] <wrn> net_buf: Timeout discarded. No blocking in syswq
[00:00:00.251,464] <wrn> net_buf: Timeout discarded. No blocking in syswq
[00:00:00.251,586] <wrn> net_buf: Timeout discarded. No blocking in syswq
[00:00:00.251,708] <wrn> net_buf: Timeout discarded. No blocking in syswq
[00:00:00.251,861] <wrn> net_buf: Timeout discarded. No blocking in syswq
[00:00:00.251,983] <wrn> net_buf: Timeout discarded. No blocking in syswq
[00:00:00.252,105] <inf> bt_hci_core: HW Platform: Nordic Semiconductor (0x0002)
[00:00:00.252,136] <inf> bt_hci_core: HW Variant: nRF52x (0x0002)
[00:00:00.252,136] <inf> bt_hci_core: Firmware: Standard Bluetooth controller (0x00) Version 3.7 Build 99
[00:00:00.252,197] <wrn> net_buf: Timeout discarded. No blocking in syswq
[00:00:00.252,349] <wrn> net_buf: Timeout discarded. No blocking in syswq
[00:00:00.252,471] <wrn> net_buf: Timeout discarded. No blocking in syswq
[00:00:00.252,624] <wrn> net_buf: Timeout discarded. No blocking in syswq
[00:00:00.252,746] <wrn> net_buf: Timeout discarded. No blocking in syswq
[00:00:00.252,929] <wrn> net_buf: Timeout discarded. No blocking in syswq
[00:00:00.253,112] <inf> bt_hci_core: Identity: F4:EC:64:8F:85:C8 (random)
[00:00:00.253,143] <inf> bt_hci_core: HCI: version 5.4 (0x0d) revision 0x0000, manufacturer 0x05f1
[00:00:00.253,173] <inf> bt_hci_core: LMP: version 5.4 (0x0d) subver 0xffff
Bluetooth initialized
Starting Legacy Advertising (connectable and scannable)
[00:00:00.253,295] <wrn> net_buf: Timeout discarded. No blocking in syswq
[00:00:00.253,448] <wrn> net_buf: Timeout discarded. No blocking in syswq
[00:00:00.253,601] <wrn> net_buf: Timeout discarded. No blocking in syswq
[00:00:00.253,845] <wrn> net_buf: Timeout discarded. No blocking in syswq
Advertising successfully started
Checking LED device...done.
Configuring GPIO pin...done.
Start blinking LED...

Environment:

  • OS: Windows
  • Toolchain: Zephyr SDK
  • Commit SHA: 98e0e65
  • Version used: 3.7.99
@HoZHel HoZHel added the bug The issue is a bug, or the PR is fixing a bug label Oct 21, 2024
@henrikbrixandersen
Copy link
Member

CC: @JordanYates

@jhedberg jhedberg added Enhancement Changes/Updates/Additions to existing features and removed bug The issue is a bug, or the PR is fixing a bug labels Oct 21, 2024
@JordanYates
Copy link
Collaborator

More accurately this was introduced by #71697, and any such usage should have been cleaned up as part of that PR.

@msmttchr
Copy link
Contributor

@JordanYates, I think the change of the message level from debug to warning, triggered all these messages on the console.

@JordanYates
Copy link
Collaborator

JordanYates commented Oct 22, 2024

@JordanYates, I think the change of the message level from debug to warning, triggered all these messages on the console.

Yes, but it's the usage of blocking APIs on the system workqueue that is the problem, not the log level of the message.

@msmttchr
Copy link
Contributor

@JordanYates, I think the change of the message level from debug to warning, triggered all these messages on the console.

Yes, but it's the usage of blocking APIs on the system workqueue that is the problem, not the log level of the message.

If I understood correctly, @jori-nordic used "defensive" programming, by changing the timeout to K_NO_WAIT when sys work queue was used. So problem was "patched" and user can have surprising results since timeout is changed, in certain situation, behind the scene.

I think enabling debug level for the module could help then to identify, at runtime, all the situation where the timeout is changed and so fix the problem properly.

This second phase can be very time consuming and so I guess can be addressed in a future PR, but changing the log message level from debug to warning would imply to have these fixes before next release, which I believe is unpractical.

So, my proposal, is to switch back to debug level the message. Does this make sense to you?

@JordanYates
Copy link
Collaborator

I think enabling debug level for the module could help then to identify, at runtime, all the situation where the timeout is changed and so fix the problem properly.

The problem with the original change is that it breaks a pretty fundamental assumption that net_buf_alloc(K_FOREVER) will only return when a buffer has been claimed. Determining that is not the case is a frustrating exercise. I am extremely skeptical that anyone is going to turn up CONFIG_NET_BUF_LOG_LEVEL when encountering this issue, because if they knew the problem was in the net buffer allocation code they would already know the issue and solution.

This second phase can be very time consuming and so I guess can be addressed in a future PR, but changing the log message level from debug to warning would imply to have these fixes before next release, which I believe is unpractical.

This is not a problem with the log level, If net_buf_alloc(K_FOREVER) is being called from the system workqueue, it NEEDS to be fixed before the release.

So, my proposal, is to switch back to debug level the message. Does this make sense to you?

No, that only hides the problem. I would agree to only printing the level at LOG_LEVEL_WRN if the timeout parameter is K_FOREVER, as some other timeout implies the caller is aware the function may return NULL.

@msmttchr
Copy link
Contributor

The problem with the original change is that it breaks a pretty fundamental assumption that net_buf_alloc(K_FOREVER) will only return when a buffer has been claimed. Determining that is not the case

Indeed, this is quite frustrating, and I submitted this point a while ago (#72699).

To my surprise, what you call a "fundamental assumption" was not respected also prior to @jori-nordic changes. See note on net_buf_alloc documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Bluetooth Enhancement Changes/Updates/Additions to existing features
Projects
None yet
Development

No branches or pull requests

6 participants