Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure to call connection callbacks in synchronous failure #16509

Closed

Conversation

mspang
Copy link
Contributor

@mspang mspang commented Mar 21, 2022

Problem

If OperationalDeviceProxy::Connect fails synchronously it does not call
pending failure callbacks except for the argument to the current call.

This leads to unanswered connection requests as FindOrEstablishSession
immediately calls ReleaseSession() after such a failure, which deletes
the proxy.

Change overview

Make sure to call pending callbacks on synchronous failure.

Testing

Connect to an non-routable address, which fails synchronously in sendmsg()
in the resolve callback. Application is still notified of failure.

If OperationalDeviceProxy::Connect fails synchronously it does not call
pending failure callbacks except for the argument to the current call.

This leads to unanswered connection requests as FindOrEstablishSession
immediately calls ReleaseSession() after such a failure, which deletes
the proxy.
@github-actions
Copy link

github-actions bot commented Mar 22, 2022

PR #16509: Size comparison from a72ab9b to 1d0d4e0

Increases (16 builds for cyw30739, efr32, esp32, k32w, linux, nrfconnect, p6, telink)
platform target config section a72ab9b 1d0d4e0 change % change
cyw30739 ota-requestor-no-progress-logging cyw930739m2evb_01 (read/write) 571494 571518 24 0.0
.app_xip_area 470472 470496 24 0.0
efr32 lighting-app BRD4161A (read only) 922008 922024 16 0.0
.text 922000 922016 16 0.0
BRD4161A+rpc (read only) 950820 950836 16 0.0
.text 950812 950828 16 0.0
esp32 all-clusters-app c3devkit (read only) 962010 962034 24 0.0
.flash.text 962010 962034 24 0.0
m5stack (read only) 1018143 1018171 28 0.0
.flash.text 1012759 1012787 28 0.0
k32w light k32w061+release (read/write) 701092 701108 16 0.0
.text 615768 615784 16 0.0
linux all-clusters-app debug (read only) 2477281 2477345 64 0.0
.text 2101826 2101890 64 0.0
chip-tool debug (read only) 10145893 10145957 64 0.0
.text 8845477 8845541 64 0.0
chip-tool-ipv6only arm64 (read only) 9769404 9769468 64 0.0
.text 8224436 8224500 64 0.0
ota-requestor-app debug (read only) 1969001 1969065 64 0.0
.text 1654034 1654098 64 0.0
shell debug (read only) 2426625 2426689 64 0.0
.text 2060498 2060562 64 0.0
thermostat-no-ble arm64 (read only) 2262500 2262564 64 0.0
.text 1899152 1899216 64 0.0
tv-app debug (read only) 2677657 2677721 64 0.0
.text 2295394 2295458 64 0.0
nrfconnect all-clusters-app nrf52840dk_nrf52840 (read/write) 1155923 1155939 16 0.0
text 788084 788108 24 0.0
p6 all-clusters-app default (read/write) 2492976 2493008 32 0.0
.text 1451240 1451272 32 0.0
telink lighting-app tlsr9518adk80d (read/write) 895830 895854 24 0.0
text 633404 633428 24 0.0
Full report (27 builds for cyw30739, efr32, esp32, k32w, linux, mbed, nrfconnect, p6, telink)
platform target config section a72ab9b 1d0d4e0 change % change
cyw30739 light cyw930739m2evb_01 (read/write) 603870 603870 0 0.0
.app_xip_area 510968 510968 0 0.0
.bss 75656 75656 0 0.0
.data 596 596 0 0.0
.rodata 0 0 0 0.0
.text 0 0 0 0.0
lock cyw930739m2evb_01 (read/write) 561658 561658 0 0.0
.app_xip_area 470284 470284 0 0.0
.bss 74160 74160 0 0.0
.data 560 560 0 0.0
.rodata 0 0 0 0.0
.text 0 0 0 0.0
ota-requestor-no-progress-logging cyw930739m2evb_01 (read/write) 571494 571518 24 0.0
.app_xip_area 470472 470496 24 0.0
.bss 83488 83488 0 0.0
.data 500 500 0 0.0
.rodata 0 0 0 0.0
.text 112 112 0 0.0
efr32 lighting-app BRD4161A (read only) 922008 922024 16 0.0
(read/write) 128760 128760 0 0.0
.bss 126768 126768 0 0.0
.data 1988 1988 0 0.0
.text 922000 922016 16 0.0
BRD4161A+rpc (read only) 950820 950836 16 0.0
(read/write) 144712 144712 0 0.0
.bss 142544 142544 0 0.0
.data 2168 2168 0 0.0
.text 950812 950828 16 0.0
window-app BRD4161A (read only) 852304 852304 0 0.0
(read/write) 126712 126712 0 0.0
.bss 124848 124848 0 0.0
.data 1864 1864 0 0.0
.text 852296 852296 0 0.0
esp32 all-clusters-app c3devkit (read only) 962010 962034 24 0.0
(read/write) 1393306 1393306 0 0.0
.dram0.bss 62048 62048 0 0.0
.dram0.data 14188 14188 0 0.0
.flash.rodata 198160 198160 0 0.0
.flash.text 962010 962034 24 0.0
.iram0.text 62016 62016 0 0.0
m5stack (read only) 1018143 1018171 28 0.0
(read/write) 461148 461148 0 0.0
.dram0.bss 67576 67576 0 0.0
.dram0.data 34016 34016 0 0.0
.flash.rodata 227720 227720 0 0.0
.flash.text 1012759 1012787 28 0.0
.iram0.text 123107 123107 0 0.0
k32w light k32w061+release (read/write) 701092 701108 16 0.0
.bss 77656 77656 0 0.0
.data 1868 1868 0 0.0
.text 615768 615784 16 0.0
lock k32w061+release (read/write) 700964 700964 0 0.0
.bss 77624 77624 0 0.0
.data 1908 1908 0 0.0
.text 615632 615632 0 0.0
linux all-clusters-app debug (read only) 2477281 2477345 64 0.0
(read/write) 143152 143152 0 0.0
.bss 57312 57312 0 0.0
.data 1264 1264 0 0.0
.data.rel.ro 78840 78840 0 0.0
.dynamic 592 592 0 0.0
.got 4168 4168 0 0.0
.init 27 27 0 0.0
.init_array 936 936 0 0.0
.rodata 212005 212005 0 0.0
.text 2101826 2101890 64 0.0
bridge-app debug+rpc (read only) 1752637 1752637 0 0.0
(read/write) 89424 89424 0 0.0
.bss 44456 44456 0 0.0
.data 1952 1952 0 0.0
.data.rel.ro 37944 37944 0 0.0
.dynamic 592 592 0 0.0
.got 3920 3920 0 0.0
.init 27 27 0 0.0
.init_array 544 544 0 0.0
.rodata 144684 144684 0 0.0
.text 1492741 1492741 0 0.0
chip-tool debug (read only) 10145893 10145957 64 0.0
(read/write) 354848 354848 0 0.0
.bss 22304 22304 0 0.0
.data 1072 1072 0 0.0
.data.rel.ro 325408 325408 0 0.0
.dynamic 608 608 0 0.0
.got 4784 4784 0 0.0
.init 27 27 0 0.0
.init_array 632 632 0 0.0
.rodata 517813 517813 0 0.0
.text 8845477 8845541 64 0.0
chip-tool-ipv6only arm64 (read only) 9769404 9769468 64 0.0
(read/write) 472689 472689 0 0.0
.bss 40609 40609 0 0.0
.data 1128 1128 0 0.0
.data.rel.ro 371824 371824 0 0.0
.dynamic 560 560 0 0.0
.got 55328 55328 0 0.0
.init 24 24 0 0.0
.init_array 192 192 0 0.0
.rodata 493604 493604 0 0.0
.text 8224436 8224500 64 0.0
door-lock-app debug (read only) 2004473 2004473 0 0.0
(read/write) 116832 116832 0 0.0
.bss 47584 47584 0 0.0
.data 992 992 0 0.0
.data.rel.ro 62856 62856 0 0.0
.dynamic 592 592 0 0.0
.got 4120 4120 0 0.0
.init 27 27 0 0.0
.init_array 664 664 0 0.0
.rodata 181180 181180 0 0.0
.text 1673634 1673634 0 0.0
lighting-app debug+rpc (read only) 2178137 2178137 0 0.0
(read/write) 123792 123792 0 0.0
.bss 48864 48864 0 0.0
.data 1472 1472 0 0.0
.data.rel.ro 67928 67928 0 0.0
.dynamic 608 608 0 0.0
.got 4168 4168 0 0.0
.init 27 27 0 0.0
.init_array 744 744 0 0.0
.rodata 175484 175484 0 0.0
.text 1843634 1843634 0 0.0
ota-provider-app debug (read only) 1938833 1938833 0 0.0
(read/write) 112624 112624 0 0.0
.bss 47488 47488 0 0.0
.data 1288 1288 0 0.0
.data.rel.ro 58200 58200 0 0.0
.dynamic 608 608 0 0.0
.got 4376 4376 0 0.0
.init 27 27 0 0.0
.init_array 616 616 0 0.0
.rodata 166571 166571 0 0.0
.text 1620498 1620498 0 0.0
ota-requestor-app debug (read only) 1969001 1969065 64 0.0
(read/write) 115592 115592 0 0.0
.bss 48480 48480 0 0.0
.data 1416 1416 0 0.0
.data.rel.ro 60264 60264 0 0.0
.dynamic 592 592 0 0.0
.got 4168 4168 0 0.0
.init 27 27 0 0.0
.init_array 640 640 0 0.0
.rodata 162980 162980 0 0.0
.text 1654034 1654098 64 0.0
shell debug (read only) 2426625 2426689 64 0.0
(read/write) 147208 147208 0 0.0
.bss 67240 67240 0 0.0
.data 784 784 0 0.0
.data.rel.ro 73472 73472 0 0.0
.dynamic 592 592 0 0.0
.got 4152 4152 0 0.0
.init 27 27 0 0.0
.init_array 920 920 0 0.0
.rodata 209426 209426 0 0.0
.text 2060498 2060562 64 0.0
thermostat-no-ble arm64 (read only) 2262500 2262564 64 0.0
(read/write) 148273 148273 0 0.0
.bss 62753 62753 0 0.0
.data 1024 1024 0 0.0
.data.rel.ro 77000 77000 0 0.0
.dynamic 560 560 0 0.0
.got 4480 4480 0 0.0
.init 24 24 0 0.0
.init_array 360 360 0 0.0
.rodata 140036 140036 0 0.0
.text 1899152 1899216 64 0.0
tv-app debug (read only) 2677657 2677721 64 0.0
(read/write) 247712 247712 0 0.0
.bss 164384 164384 0 0.0
.data 3104 3104 0 0.0
.data.rel.ro 74168 74168 0 0.0
.dynamic 592 592 0 0.0
.got 4552 4552 0 0.0
.init 27 27 0 0.0
.init_array 888 888 0 0.0
.rodata 207413 207413 0 0.0
.text 2295394 2295458 64 0.0
mbed lock-app CY8CPROTO_062_4343W+release (read only) 6224 6224 0 0.0
(read/write) 2353452 2353452 0 0.0
.bss 184652 184652 0 0.0
.data 5752 5752 0 0.0
.text 1316052 1316052 0 0.0
nrfconnect all-clusters-app nrf52840dk_nrf52840 (read/write) 1155923 1155939 16 0.0
bss 146716 146716 0 0.0
rodata 145956 145956 0 0.0
text 788084 788108 24 0.0
p6 all-clusters-app default (read/write) 2492976 2493008 32 0.0
.bss 118072 118072 0 0.0
.data 2632 2632 0 0.0
.text 1451240 1451272 32 0.0
light-app default (read/write) 2396376 2396376 0 0.0
.bss 111544 111544 0 0.0
.data 2488 2488 0 0.0
.text 1354640 1354640 0 0.0
lock-app default (read/write) 2359928 2359928 0 0.0
.bss 111288 111288 0 0.0
.data 2448 2448 0 0.0
.text 1318192 1318192 0 0.0
telink lighting-app tlsr9518adk80d (read/write) 895830 895854 24 0.0
bss 87424 87424 0 0.0
noinit 37160 37160 0 0.0
text 633404 633428 24 0.0

@@ -98,6 +98,10 @@ CHIP_ERROR OperationalDeviceProxy::Connect(Callback::Callback<OnDeviceConnected>
if (err != CHIP_NO_ERROR && onFailure != nullptr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if onFailure is null we won't call the other callbacks?

@@ -98,6 +98,10 @@ CHIP_ERROR OperationalDeviceProxy::Connect(Callback::Callback<OnDeviceConnected>
if (err != CHIP_NO_ERROR && onFailure != nullptr)
{
onFailure->mCall(onFailure->mContext, mPeerId, err);

DequeueConnectionSuccessCallbacks(/* executeCallback */ false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So trying to understand the various cases here.

If we enter this function when state is State::Uninitialized will we have existing callbacks? I'd think not...

If we enter this function when state is State::NeedsAddress it actually looks like the existing logic (with or without this change) is buggy. We'll call LookupPeerAddress, enqueue the new callbacks, and then if the LookupPeerAddress call failed call the error callback but not dequeue it... which means we will later call it again.

If we enter this function when state is State::Initialized we will try to EstablishConnection... is this the part that was observed to sync-fail? But would we ever be in the Initialized state with pending callbacks? I guess we could get here if we had an earlier UpdateDeviceData call after DNS-SD resolve where EstablishConnection failed, OnSessionEstablishmentError was called but is a no-op (which it is), and we were left in the Initialized state with dangling callbacks? But then the right answer would seem to be to deque and call the callbacks at that point, so they're not left sitting there until someone else tries to connect and we sync-fail.

So overall, I think we should:

  1. Fix OperationalDeviceProxy::UpdateDeviceData to handle the sync-failure from EstablishConnection there correctly. Either by calling callbacks, or by calling Connect(nullptr, nullptr) after moving to the new state. And nix the bogus OnSessionEstablishmentError call (and probably the inheritance from SessionEstablishmentDelegate.
  2. Fix this code to be consistent in its error handling. Probably most simply by ensuring that all branches of the swich enqueue the callbacks and set err as needed and after the switch if err is failure we invoke all of our queued callbacks. That would still cover up bugs like the one item 1 is fixing, but hopefully we won't have any at that point?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree this needs more work, this was mostly intended as a fire extinguishing exercise as this particular case was reliably causing issues.

However other changes have mitigated that particular issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes proposed above were made in #17266.


DequeueConnectionSuccessCallbacks(/* executeCallback */ false);
DequeueConnectionFailureCallbacks(err, /* executeCallback */ true);
CloseCASESession();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should really live inside OperationalDeviceProxy::EstablishConnection in the case when EstablishSession returns error, because there it's clear we need to clean up mCASEClient.... Putting it here is a little odd.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#17466 addresses this.

@mspang
Copy link
Contributor Author

mspang commented Mar 28, 2022

Closing this. Unfortunately, I don't have the cycles to own a proper cleanup pass of this code.

@mspang mspang closed this Mar 28, 2022
@mspang mspang deleted the for-chip/fix-connection-cb branch April 12, 2022 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants