Bluetooth: host: Remove cancel sync from database hash commit #35403

joerchan · 2021-05-18T14:57:22Z

Fix deadlock when db_hash_commit has to wait for the delayed work to
finish. This creates a deadlock if the delayed work for database hash
calculation needs to store the hash since the settings API is locked
when calling the commit callback.
Remove call to k_work_cancel_delayable_sync from db_hash_commit in order
to avoid the deadlock. Instead move comparing of the stored hash to the
delayed work and reschedule the work with no wait.

Fixes: #35058

pabigot

The settings API's use of "load", "set", and "commit" seem to be non-intuitive. I don't understand what this code does.

But that's irrelevant. This appears to eliminate the deadlock. But it may violate user expectations that settings_commit() will block until the settings have actually been committed (whatever that means). The effect of the commit will be delayed until the work item is completed, no?

From the perspective of eliminating the deadlock this seems fine. I can't speak to whether the resulting behavior is acceptable.

I do believe there's a data race involving stored_hash but that wouldn't be new.

pabigot · 2021-05-19T10:45:37Z

subsys/bluetooth/host/gatt.c

Does the semantics of settings_commit allow this? The commit will not have completed by the time the call returns because it doesn't wait. Unless setting the LOAD flag ensures that any operation that depends on the commit will detect the incomplete operation and complete it before the work handler does.

If that makes any sense and is also correct perhaps a comment could be added.

The settings API's use of "load", "set", and "commit" seem to be non-intuitive. I don't understand what this code does.

You get a "set" for each value stored, and then "commit" once all values have been loaded, i.e no more "set" callbacks.

The value has been "commited" as we have the stored_hash value. The next thing we wish to do is to compare it against the calculated hash value, and store the new value if they are different. That don't have to be part of the the commit here.

So we can push this to the db_hash work, since this was already the work responsible for the calculation.
As long as peers that read the hash characteristic value receive the correct value then this works as expected.

Can you elaborate on the data race on stored_hash?

Can you elaborate on the data race on stored_hash?

I don't see anything that prevents one thread from accessing it via db_hash_set() while the work thread is accessing it via db_hash_process(). Only a problem when preemptable threads are involved or SMP is present.

They should be strictly sequential, db_hash_set is only called before db_hash_commit, and db_hash_work would only read the stored_hash when it has been submitted from db_hash_commit due to the flag set.

It's still a data race if a second set/commit occurs in a thread that preempts the work thread. Unlikely, but not impossible, unless all set/commits come from something that runs on the same work queue.

Settings load should only happen once.

pabigot · 2021-05-19T11:18:58Z

subsys/bluetooth/host/gatt.c

Can the cancel fail? If so what happens?

This would now be from a work handler so shouldn't fail in being canceled, right?

EDIT:
If it would fail then we would send out a service changed indication to all connected peers. That would be uneccesary as they would have to rediscover a database that hasn't changed. But having peers connected at this early in the startup phase is very unlikely.

If it can fail, and we need to do something, we should check the return value and do whatever.

If you don't think we need to do anything, I think we should cast the return value to (void) to make it explicit.

Yes, preemption of the work thread and SMP could both create a situation where the cancel could fail.

https://docs.zephyrproject.org/latest/reference/kernel/threads/workqueue.html#check-return-values shows recommended documentary practices. This comment provides more background on the recommendation and why a cast to (void) is not useful by itself.

Fix deadlock when db_hash_commit has to wait for the delayed work to finish. This creates a deadlock if the delayed work for database hash calculation needs to store the hash since the settings API is locked when calling the commit callback. Remove call to k_work_cancel_delayable_sync from db_hash_commit in order to avoid the deadlock. Instead move comparing of the stored hash to the delayed work and reschedule the work with no wait. Signed-off-by: Joakim Andersson <joakim.andersson@nordicsemi.no>

Thalley · 2021-05-21T11:06:10Z

subsys/bluetooth/host/gatt.c

 	uint8_t hash[16];
+#if defined(CONFIG_BT_SETTINGS)
+	 uint8_t stored_hash[16];


We don't have a GATT_DB_HASH_SIZE or similar macro?

carlescufi · 2021-05-21T12:31:01Z

@mniestroj could you please check if this solves the issue?

joerchan · 2021-05-21T13:33:33Z

mniestroj could you please check if this solves the issue?

@carlescufi FYI: We ran into the same issue here as well eventually, and @MarekPieta has confirmed that this fixed the issue for us.

mniestroj

I confirm it fixes #35058.

github-actions bot added area: Bluetooth area: Bluetooth Host Bluetooth Host (excluding BR/EDR) labels May 18, 2021

joerchan mentioned this pull request May 18, 2021

Bluetooth: deadlock when canceling db_hash.work from settings commit handler #35058

Closed

joerchan force-pushed the bt-fix-hash-settings-deadlock branch from 0707a37 to 927d0cb Compare May 19, 2021 07:22

joerchan marked this pull request as ready for review May 19, 2021 07:23

joerchan requested review from Vudentz and jhedberg as code owners May 19, 2021 07:23

joerchan requested a review from pabigot May 19, 2021 07:23

joerchan force-pushed the bt-fix-hash-settings-deadlock branch from 927d0cb to be16bbc Compare May 19, 2021 07:37

joerchan added the bug The issue is a bug, or the PR is fixing a bug label May 19, 2021

pabigot approved these changes May 19, 2021

View reviewed changes

zephyrbot requested review from Thalley and asbjornsabo May 20, 2021 12:18

zephyrbot assigned jhedberg May 20, 2021

joerchan force-pushed the bt-fix-hash-settings-deadlock branch from be16bbc to 1d498db Compare May 21, 2021 10:05

Thalley reviewed May 21, 2021

View reviewed changes

carlescufi approved these changes May 21, 2021

View reviewed changes

mniestroj approved these changes May 22, 2021

View reviewed changes

nashif merged commit 7986cae into zephyrproject-rtos:main May 25, 2021

joerchan deleted the bt-fix-hash-settings-deadlock branch May 25, 2021 07:21

joerchan mentioned this pull request May 26, 2021

Upmerge 12.05.2021 nrfconnect/sdk-nrf#4527

Merged

Bluetooth: host: Remove cancel sync from database hash commit #35403

Bluetooth: host: Remove cancel sync from database hash commit #35403

Uh oh!

Conversation

joerchan commented May 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pabigot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joerchan May 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pabigot May 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carlescufi commented May 21, 2021

Uh oh!

joerchan commented May 21, 2021

Uh oh!

mniestroj left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

joerchan commented May 18, 2021 •

edited

Loading

joerchan May 21, 2021 •

edited

Loading

pabigot May 21, 2021 •

edited

Loading