-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Bluetooth: host: Remove cancel sync from database hash commit #35403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bluetooth: host: Remove cancel sync from database hash commit #35403
Conversation
0707a37 to
927d0cb
Compare
927d0cb to
be16bbc
Compare
pabigot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The settings API's use of "load", "set", and "commit" seem to be non-intuitive. I don't understand what this code does.
But that's irrelevant. This appears to eliminate the deadlock. But it may violate user expectations that settings_commit() will block until the settings have actually been committed (whatever that means). The effect of the commit will be delayed until the work item is completed, no?
From the perspective of eliminating the deadlock this seems fine. I can't speak to whether the resulting behavior is acceptable.
I do believe there's a data race involving stored_hash but that wouldn't be new.
subsys/bluetooth/host/gatt.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the semantics of settings_commit allow this? The commit will not have completed by the time the call returns because it doesn't wait. Unless setting the LOAD flag ensures that any operation that depends on the commit will detect the incomplete operation and complete it before the work handler does.
If that makes any sense and is also correct perhaps a comment could be added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The settings API's use of "load", "set", and "commit" seem to be non-intuitive. I don't understand what this code does.
You get a "set" for each value stored, and then "commit" once all values have been loaded, i.e no more "set" callbacks.
The value has been "commited" as we have the stored_hash value. The next thing we wish to do is to compare it against the calculated hash value, and store the new value if they are different. That don't have to be part of the the commit here.
So we can push this to the db_hash work, since this was already the work responsible for the calculation.
As long as peers that read the hash characteristic value receive the correct value then this works as expected.
Can you elaborate on the data race on stored_hash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on the data race on
stored_hash?
I don't see anything that prevents one thread from accessing it via db_hash_set() while the work thread is accessing it via db_hash_process(). Only a problem when preemptable threads are involved or SMP is present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They should be strictly sequential, db_hash_set is only called before db_hash_commit, and db_hash_work would only read the stored_hash when it has been submitted from db_hash_commit due to the flag set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still a data race if a second set/commit occurs in a thread that preempts the work thread. Unlikely, but not impossible, unless all set/commits come from something that runs on the same work queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Settings load should only happen once.
subsys/bluetooth/host/gatt.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the cancel fail? If so what happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would now be from a work handler so shouldn't fail in being canceled, right?
EDIT:
If it would fail then we would send out a service changed indication to all connected peers. That would be uneccesary as they would have to rediscover a database that hasn't changed. But having peers connected at this early in the startup phase is very unlikely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it can fail, and we need to do something, we should check the return value and do whatever.
If you don't think we need to do anything, I think we should cast the return value to (void) to make it explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, preemption of the work thread and SMP could both create a situation where the cancel could fail.
https://docs.zephyrproject.org/latest/reference/kernel/threads/workqueue.html#check-return-values shows recommended documentary practices. This comment provides more background on the recommendation and why a cast to (void) is not useful by itself.
Fix deadlock when db_hash_commit has to wait for the delayed work to finish. This creates a deadlock if the delayed work for database hash calculation needs to store the hash since the settings API is locked when calling the commit callback. Remove call to k_work_cancel_delayable_sync from db_hash_commit in order to avoid the deadlock. Instead move comparing of the stored hash to the delayed work and reschedule the work with no wait. Signed-off-by: Joakim Andersson <joakim.andersson@nordicsemi.no>
be16bbc to
1d498db
Compare
| uint8_t hash[16]; | ||
| #if defined(CONFIG_BT_SETTINGS) | ||
| uint8_t stored_hash[16]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have a GATT_DB_HASH_SIZE or similar macro?
|
@mniestroj could you please check if this solves the issue? |
@carlescufi FYI: We ran into the same issue here as well eventually, and @MarekPieta has confirmed that this fixed the issue for us. |
mniestroj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I confirm it fixes #35058.
Fix deadlock when db_hash_commit has to wait for the delayed work to
finish. This creates a deadlock if the delayed work for database hash
calculation needs to store the hash since the settings API is locked
when calling the commit callback.
Remove call to k_work_cancel_delayable_sync from db_hash_commit in order
to avoid the deadlock. Instead move comparing of the stored hash to the
delayed work and reschedule the work with no wait.
Fixes: #35058