Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix crash when receiving a message that wants an ack on a closed exchange that's still alive. #8068

Merged
merged 1 commit into from
Jul 6, 2021

Conversation

bzbarsky-apple
Copy link
Contributor

The basic scenario, with all messages requesting acks, is:

  1. A sends message to B.
  2. B responds to A, piggybacks ack, closes exchange.
  3. A responds to B, piggybacks ack.

When B receives the message, the exchange is still alive, waiting for
an ack. It gets the message handed to it, processes it, queues up an
ack for the message it just got. Then the stack unwinds, the
exchange's refcount drops to 0 (because it's no longer waiting for an
ack, so not referenced by the reliable message manager), and its
destructor tries to flush the pending ack, which fails assertions due
to the refcount being 0.

The fix is to always immediately send a standalone ack if we
have no delegate, because in that situation there won't be anything
app-level to piggyback on.

Problem

Crash on receiving a message. To reproduce easily:

  1. scripts/examples/gn_build_example.sh examples/all-clusters-app/linux out/debug/standalone chip_config_network_layer_ble=false
  2. scripts/examples/gn_build_example.sh examples/chip-tool out/debug/standalone/
  3. In terminal 1: ./out/debug/standalone/chip-all-clusters-app
  4. In terminal 2:
    out/debug/standalone/chip-tool pairing onnetwork 0 20202021 3840 ::1 11097
    ./out/debug/standalone/chip-tool onoff report on-off 1 2 3
    

Change overview

Fix the crash by immediately flushing out the pending ack if we have no delegate (which includes when we're closed), so that we don't try to do it from our destructor.

Testing

Unit test added in the PR. Manually tested the steps above.

…ange that's still alive.

The basic scenario, with all messages requesting acks, is:

1) A sends message to B.
2) B responds to A, piggybacks ack, closes exchange.
3) A responds to B, piggybacks ack.

When B receives the message, the exchange is still alive, waiting for
an ack.  It gets the message handed to it, processes it, queues up an
ack for the message it just got.  Then the stack unwinds, the
exchange's refcount drops to 0 (because it's no longer waiting for an
ack, so not referenced by the reliable message manager), and its
destructor tries to flush the pending ack, which fails assertions due
to the refcount being 0.

The fix is to always immediately send a standalone ack if we
have no delegate, because in that situation there won't be anything
app-level to piggyback on.
@github-actions
Copy link

github-actions bot commented Jul 1, 2021

Size increase report for "esp32-example-build" from 8551b50

File Section File VM
chip-shell.elf .flash.text 36 36
chip-all-clusters-app.elf .flash.text 36 36
chip-temperature-measurement-app.elf .flash.text 36 36
chip-lock-app.elf .flash.text 36 36
Full report output
BLOAT REPORT

Files found only in the build output:
    report.csv

Comparing ./master_artifact/chip-pigweed-app.elf and ./pull_artifact/chip-pigweed-app.elf:

sections,vmsize,filesize

Comparing ./master_artifact/chip-shell.elf and ./pull_artifact/chip-shell.elf:

sections,vmsize,filesize
.debug_loc,0,166
.debug_line,0,102
.debug_info,0,62
.flash.text,36,36
.debug_ranges,0,16
.debug_str,0,2

Comparing ./master_artifact/chip-all-clusters-app.elf and ./pull_artifact/chip-all-clusters-app.elf:

sections,vmsize,filesize
.debug_loc,0,166
.debug_line,0,102
.debug_info,0,62
.flash.text,36,36
.debug_ranges,0,16
.debug_str,0,-2

Comparing ./master_artifact/chip-temperature-measurement-app.elf and ./pull_artifact/chip-temperature-measurement-app.elf:

sections,vmsize,filesize
.debug_line,0,84
.debug_info,0,66
.flash.text,36,36
.debug_loc,0,-46

Comparing ./master_artifact/chip-persistent-storage.elf and ./pull_artifact/chip-persistent-storage.elf:

sections,vmsize,filesize

Comparing ./master_artifact/chip-lock-app.elf and ./pull_artifact/chip-lock-app.elf:

sections,vmsize,filesize
.debug_loc,0,166
.debug_line,0,102
.debug_info,0,62
.flash.text,36,36
.debug_ranges,0,16
.debug_str,0,2


@github-actions
Copy link

github-actions bot commented Jul 1, 2021

Size increase report for "nrfconnect-example-build" from 8551b50

File Section File VM
chip-shell.elf text 36 36
chip-shell.elf device_handles -4 -4
chip-lock.elf text 36 36
chip-lock.elf device_handles -4 -4
Full report output
BLOAT REPORT

Files found only in the build output:
    report.csv

Comparing ./master_artifact/chip-shell.elf and ./pull_artifact/chip-shell.elf:

sections,vmsize,filesize
.debug_info,0,60
.debug_loc,0,52
.debug_line,0,40
text,36,36
device_handles,-4,-4

Comparing ./master_artifact/chip-lock.elf and ./pull_artifact/chip-lock.elf:

sections,vmsize,filesize
.debug_info,0,60
.debug_loc,0,52
.debug_line,0,40
text,36,36
device_handles,-4,-4


@github-actions
Copy link

github-actions bot commented Jul 1, 2021

Size increase report for "gn_qpg6100-example-build" from 8551b50

File Section File VM
chip-qpg6100-lighting-example.out .text 32 32
Full report output
BLOAT REPORT

Files found only in the build output:
    report.csv

Comparing ./master_artifact/chip-qpg6100-lighting-example.out.map and ./pull_artifact/chip-qpg6100-lighting-example.out.map:

BLOAT EXECUTION FAILED WITH CODE 1:
bloaty: unknown file type for file './pull_artifact/chip-qpg6100-lighting-example.out.map'

Comparing ./master_artifact/chip-qpg6100-lighting-example.out and ./pull_artifact/chip-qpg6100-lighting-example.out:

sections,vmsize,filesize
.debug_info,0,60
.debug_loc,0,54
.debug_line,0,39
.text,32,32
.debug_str,0,-1
[Unmapped],0,-32


@woody-apple
Copy link
Contributor

@mspang @LuDuda ?

@woody-apple
Copy link
Contributor

@Damian-Nordic ?

@mspang mspang merged commit 40925bd into project-chip:master Jul 6, 2021
@bzbarsky-apple bzbarsky-apple deleted the fix-exchange-crash branch July 7, 2021 16:31
nikita-s-wrk pushed a commit to nikita-s-wrk/connectedhomeip that referenced this pull request Sep 23, 2021
…ange that's still alive. (project-chip#8068)

The basic scenario, with all messages requesting acks, is:

1) A sends message to B.
2) B responds to A, piggybacks ack, closes exchange.
3) A responds to B, piggybacks ack.

When B receives the message, the exchange is still alive, waiting for
an ack.  It gets the message handed to it, processes it, queues up an
ack for the message it just got.  Then the stack unwinds, the
exchange's refcount drops to 0 (because it's no longer waiting for an
ack, so not referenced by the reliable message manager), and its
destructor tries to flush the pending ack, which fails assertions due
to the refcount being 0.

The fix is to always immediately send a standalone ack if we
have no delegate, because in that situation there won't be anything
app-level to piggyback on.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants