Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Darwin] MTRBaseSubscriptionCallback OnDone callback being called async can lead to crashes #22935

Closed
jtung-apple opened this issue Sep 28, 2022 · 0 comments · Fixed by #22978
Assignees
Labels

Comments

@jtung-apple
Copy link
Contributor

jtung-apple commented Sep 28, 2022

Reproduction steps

1. ReadClient starts tearing down, for example during `OnLivenessTimeoutCallback` calling `Close`
2. While that is happening, Darwin framework calls read with a `MTRClusterStateCacheContainer` parameter with one of the `MTRBaseCluster` objects
3. The `MTRBaseCluster` does a dispatch_sync call to the Matter queue
4. ReadClient `Close` proceeds, and calls `OnError` in the MTRBaseSubscriptionCallback object, which calls `ReportError`, which eventually does an `dispatch_async` to call the OnDone handler (this cleans up the cache container\'s pointer to the c++ object), which is scheduled after BaseCluster\'s read.
5. ReadClient then calls MTRBaseSubscriptionCallback\'s OnDone, which deletes itself and returns
6. The BaseCluster then executes the read with the cache container, which still has a pointer to the now defunct c++ cache, leading to a crash

Platform

darwin

Platform Version(s)

No response

Type

Manually tested with SDK

(Optional) If manually tested please explain why this is only manually tested

No existing tests covers this specific, probably timing-related scenario, and Darwin platform crash tracking caught this crash.

Anything else?

The simple fix is to require (and document in header) SubscriptionCallback's OnDone be called inline. This works because the OnDone handler in MTRBaseDevice today only does one job, which is to clear the cache container's pointer to the c++ object.

I will file another issue to fix this in a more consistent way. One option is to change MTRBaseSubscriptionCallback to always call everything inline, and leave the "call client callback on queue" as exercise in MTRBaseDevice / MTRDevice objects.

@jtung-apple jtung-apple self-assigned this Sep 28, 2022
@jtung-apple jtung-apple changed the title [Darwin] SubscriptionCallback OnDone callback being called async can lead to crashes [Darwin] MTRBaseSubscriptionCallback OnDone callback being called async can lead to crashes Sep 28, 2022
jtung-apple added a commit to jtung-apple/connectedhomeip that referenced this issue Sep 30, 2022
…callback being called async can lead to crashes
jtung-apple added a commit that referenced this issue Oct 3, 2022
bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this issue Oct 7, 2022
project-chip#22978 accidentally
reintroduced the crash that
project-chip#22324 had fixed.  To avoid
more issues along these lines:

1) Add unit tests that reproduce the crashes described in
   project-chip#22320 (with the
   changes from project-chip#22978) and
   project-chip#22935 (without those
   changes).
2) Change MTRBaseSubscriptionCallback to always invoke its callbacks
   synchronously, on the Matter queue, so that we can clean up the
   MTRClusterStateCacheContainer's pointer to the ClusterStateCache before it
   gets deleted on the Matter queue.
3) Move the queueing of callbacks to the client queue into the consumers of
   MTRBaseSubscriptionCallback, so they can do whatever sync work they need
   (like the above cleanup) before going async.
4) Update documentation.
andy31415 pushed a commit that referenced this issue Oct 11, 2022
#22978 accidentally
reintroduced the crash that
#22324 had fixed.  To avoid
more issues along these lines:

1) Add unit tests that reproduce the crashes described in
   #22320 (with the
   changes from #22978) and
   #22935 (without those
   changes).
2) Change MTRBaseSubscriptionCallback to always invoke its callbacks
   synchronously, on the Matter queue, so that we can clean up the
   MTRClusterStateCacheContainer's pointer to the ClusterStateCache before it
   gets deleted on the Matter queue.
3) Move the queueing of callbacks to the client queue into the consumers of
   MTRBaseSubscriptionCallback, so they can do whatever sync work they need
   (like the above cleanup) before going async.
4) Update documentation.
selissia pushed a commit to selissia/connectedhomeip that referenced this issue Oct 12, 2022
…hip#23076)

project-chip#22978 accidentally
reintroduced the crash that
project-chip#22324 had fixed.  To avoid
more issues along these lines:

1) Add unit tests that reproduce the crashes described in
   project-chip#22320 (with the
   changes from project-chip#22978) and
   project-chip#22935 (without those
   changes).
2) Change MTRBaseSubscriptionCallback to always invoke its callbacks
   synchronously, on the Matter queue, so that we can clean up the
   MTRClusterStateCacheContainer's pointer to the ClusterStateCache before it
   gets deleted on the Matter queue.
3) Move the queueing of callbacks to the client queue into the consumers of
   MTRBaseSubscriptionCallback, so they can do whatever sync work they need
   (like the above cleanup) before going async.
4) Update documentation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
1 participant