Fix crash when receiving a message that wants an ack on a closed exchange that's still alive. #8068

bzbarsky-apple · 2021-07-01T16:48:23Z

The basic scenario, with all messages requesting acks, is:

A sends message to B.
B responds to A, piggybacks ack, closes exchange.
A responds to B, piggybacks ack.

When B receives the message, the exchange is still alive, waiting for
an ack. It gets the message handed to it, processes it, queues up an
ack for the message it just got. Then the stack unwinds, the
exchange's refcount drops to 0 (because it's no longer waiting for an
ack, so not referenced by the reliable message manager), and its
destructor tries to flush the pending ack, which fails assertions due
to the refcount being 0.

The fix is to always immediately send a standalone ack if we
have no delegate, because in that situation there won't be anything
app-level to piggyback on.

Problem

Crash on receiving a message. To reproduce easily:

scripts/examples/gn_build_example.sh examples/all-clusters-app/linux out/debug/standalone chip_config_network_layer_ble=false
scripts/examples/gn_build_example.sh examples/chip-tool out/debug/standalone/
In terminal 1: ./out/debug/standalone/chip-all-clusters-app

In terminal 2:

out/debug/standalone/chip-tool pairing onnetwork 0 20202021 3840 ::1 11097
./out/debug/standalone/chip-tool onoff report on-off 1 2 3

Change overview

Fix the crash by immediately flushing out the pending ack if we have no delegate (which includes when we're closed), so that we don't try to do it from our destructor.

Testing

Unit test added in the PR. Manually tested the steps above.

…ange that's still alive. The basic scenario, with all messages requesting acks, is: 1) A sends message to B. 2) B responds to A, piggybacks ack, closes exchange. 3) A responds to B, piggybacks ack. When B receives the message, the exchange is still alive, waiting for an ack. It gets the message handed to it, processes it, queues up an ack for the message it just got. Then the stack unwinds, the exchange's refcount drops to 0 (because it's no longer waiting for an ack, so not referenced by the reliable message manager), and its destructor tries to flush the pending ack, which fails assertions due to the refcount being 0. The fix is to always immediately send a standalone ack if we have no delegate, because in that situation there won't be anything app-level to piggyback on.

github-actions · 2021-07-01T19:06:10Z

Size increase report for "esp32-example-build" from 8551b50

File	Section	File	VM
chip-shell.elf	.flash.text	36	36
chip-all-clusters-app.elf	.flash.text	36	36
chip-temperature-measurement-app.elf	.flash.text	36	36
chip-lock-app.elf	.flash.text	36	36

Full report output

BLOAT REPORT

Files found only in the build output:
    report.csv

Comparing ./master_artifact/chip-pigweed-app.elf and ./pull_artifact/chip-pigweed-app.elf:

sections,vmsize,filesize

Comparing ./master_artifact/chip-shell.elf and ./pull_artifact/chip-shell.elf:

sections,vmsize,filesize
.debug_loc,0,166
.debug_line,0,102
.debug_info,0,62
.flash.text,36,36
.debug_ranges,0,16
.debug_str,0,2

Comparing ./master_artifact/chip-all-clusters-app.elf and ./pull_artifact/chip-all-clusters-app.elf:

sections,vmsize,filesize
.debug_loc,0,166
.debug_line,0,102
.debug_info,0,62
.flash.text,36,36
.debug_ranges,0,16
.debug_str,0,-2

Comparing ./master_artifact/chip-temperature-measurement-app.elf and ./pull_artifact/chip-temperature-measurement-app.elf:

sections,vmsize,filesize
.debug_line,0,84
.debug_info,0,66
.flash.text,36,36
.debug_loc,0,-46

Comparing ./master_artifact/chip-persistent-storage.elf and ./pull_artifact/chip-persistent-storage.elf:

sections,vmsize,filesize

Comparing ./master_artifact/chip-lock-app.elf and ./pull_artifact/chip-lock-app.elf:

sections,vmsize,filesize
.debug_loc,0,166
.debug_line,0,102
.debug_info,0,62
.flash.text,36,36
.debug_ranges,0,16
.debug_str,0,2

github-actions · 2021-07-01T19:07:02Z

Size increase report for "nrfconnect-example-build" from 8551b50

File	Section	File	VM
chip-shell.elf	text	36	36
chip-shell.elf	device_handles	-4	-4
chip-lock.elf	text	36	36
chip-lock.elf	device_handles	-4	-4

Full report output

BLOAT REPORT

Files found only in the build output:
    report.csv

Comparing ./master_artifact/chip-shell.elf and ./pull_artifact/chip-shell.elf:

sections,vmsize,filesize
.debug_info,0,60
.debug_loc,0,52
.debug_line,0,40
text,36,36
device_handles,-4,-4

Comparing ./master_artifact/chip-lock.elf and ./pull_artifact/chip-lock.elf:

sections,vmsize,filesize
.debug_info,0,60
.debug_loc,0,52
.debug_line,0,40
text,36,36
device_handles,-4,-4

github-actions · 2021-07-01T19:08:14Z

Size increase report for "gn_qpg6100-example-build" from 8551b50

File	Section	File	VM
chip-qpg6100-lighting-example.out	.text	32	32

Full report output

BLOAT REPORT

Files found only in the build output:
    report.csv

Comparing ./master_artifact/chip-qpg6100-lighting-example.out.map and ./pull_artifact/chip-qpg6100-lighting-example.out.map:

BLOAT EXECUTION FAILED WITH CODE 1:
bloaty: unknown file type for file './pull_artifact/chip-qpg6100-lighting-example.out.map'

Comparing ./master_artifact/chip-qpg6100-lighting-example.out and ./pull_artifact/chip-qpg6100-lighting-example.out:

sections,vmsize,filesize
.debug_info,0,60
.debug_loc,0,54
.debug_line,0,39
.text,32,32
.debug_str,0,-1
[Unmapped],0,-32

woody-apple · 2021-07-02T18:55:31Z

@mspang @LuDuda ?

woody-apple · 2021-07-02T18:55:36Z

@Damian-Nordic ?

…ange that's still alive. (project-chip#8068) The basic scenario, with all messages requesting acks, is: 1) A sends message to B. 2) B responds to A, piggybacks ack, closes exchange. 3) A responds to B, piggybacks ack. When B receives the message, the exchange is still alive, waiting for an ack. It gets the message handed to it, processes it, queues up an ack for the message it just got. Then the stack unwinds, the exchange's refcount drops to 0 (because it's no longer waiting for an ack, so not referenced by the reliable message manager), and its destructor tries to flush the pending ack, which fails assertions due to the refcount being 0. The fix is to always immediately send a standalone ack if we have no delegate, because in that situation there won't be anything app-level to piggyback on.

bzbarsky-apple requested review from kghost and yufengwangca July 1, 2021 16:48

pullapprove bot requested review from andy31415, chrisdecenzo, Damian-Nordic, hawk248, jepenven-silabs, msandstedt and woody-apple July 1, 2021 16:48

pullapprove bot added the review - pending label Jul 1, 2021

yufengwangca approved these changes Jul 1, 2021

View reviewed changes

msandstedt approved these changes Jul 1, 2021

View reviewed changes

woody-apple approved these changes Jul 2, 2021

View reviewed changes

mspang approved these changes Jul 6, 2021

View reviewed changes

pullapprove bot added review - approved and removed review - pending labels Jul 6, 2021

mspang merged commit 40925bd into project-chip:master Jul 6, 2021

kghost mentioned this pull request Jul 6, 2021

Fix compile error, due to merge conflict #8127

Merged

bzbarsky-apple deleted the fix-exchange-crash branch July 7, 2021 16:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix crash when receiving a message that wants an ack on a closed exchange that's still alive. #8068

Fix crash when receiving a message that wants an ack on a closed exchange that's still alive. #8068

bzbarsky-apple commented Jul 1, 2021

github-actions bot commented Jul 1, 2021

github-actions bot commented Jul 1, 2021

github-actions bot commented Jul 1, 2021

woody-apple commented Jul 2, 2021

woody-apple commented Jul 2, 2021

Fix crash when receiving a message that wants an ack on a closed exchange that's still alive. #8068

Fix crash when receiving a message that wants an ack on a closed exchange that's still alive. #8068

Conversation

bzbarsky-apple commented Jul 1, 2021

Problem

Change overview

Testing

github-actions bot commented Jul 1, 2021

github-actions bot commented Jul 1, 2021

github-actions bot commented Jul 1, 2021

woody-apple commented Jul 2, 2021

woody-apple commented Jul 2, 2021