Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sys/net/gnrc/sock: fix race condition #18949

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

maribu
Copy link
Member

@maribu maribu commented Nov 22, 2022

Contribution description

When in gnrc_sock_recv() a timeout triggers after a message came in but before the timeout timer was cancelled, a stale magic timeout message was left in the mbox message queue. This results in

  1. Passing a wild pointer into gnrc_pktbuf_release_error() if a gnrc_netreg_unregister() is called next on the socket (with the number _TIMEOUT_MAGIC interpreted as a pointer to a pktsnip)
  2. gnrc_sock_recv() failing with a timeout error on the next call right away (without actually waiting)

This also contains the required infrastructure needed to fix the bug.

Finally, the cleanup code in gnrc_netreg_unregister() is made more fault tolerant in that it ignores (and drops) unexpected messages in the mbox. This may not be ideal, but it is certainly better than interpreting the content of the message as a pointer to a gnrc_pktsnip_t that is in need of being released.

Testing procedure

tests/pkt_edhoc_c in combination with GCC >= 12.x reliably triggered the issue for me on both the nRF52830DK an the Nucleo-F767ZI.

Issues/PRs references

None

@github-actions github-actions bot added Area: core Area: RIOT kernel. Handle PRs marked with this with care! Area: network Area: Networking Area: sys Area: System Area: tests Area: tests and testing framework labels Nov 22, 2022
Comment on lines 177 to 185
if (entry->type == GNRC_NETREG_TYPE_MBOX) {
msg_t msg;
while (mbox_try_get(entry->target.mbox, &msg)) {
gnrc_pktbuf_release_error(msg.content.ptr, EBADF);
if ((msg.type == GNRC_NETAPI_MSG_TYPE_RCV)
|| (msg.type == GNRC_NETAPI_MSG_TYPE_SND)) {
gnrc_pktbuf_release_error(msg.content.ptr, EBADF);
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again: I think it is cleaner, to move this whole block to (the yet to be created) gnrc_sock_close() (which also would call gnrc_netreg_unregister(). From our offline discussion yesterday, I think the content of the mbox is much more use case dependent than previously thought and only gnrc_sock really knows what it all put into it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But GNRC_NETAPI_MSG_TYPE_SND/RCV messages aren't exclusive to gnrc_sock, or am I missing something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but _TIMEOUT_MSG_TYPE and whatever gnrc_sock might throw in here (potentially with packet context) in here, is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alterntaively SND/RCV could be cleaned out here, but there should also be a centralized clean-out within gnrc_sock to properly handle the rest.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's focus on the bug here. We can restructure code and clean up things as follow up.

This is already more complex than I have hoped for as it is, but we need to fix this release-blocking issue.

Copy link
Contributor

@benpicco benpicco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, the forward -> backward iterating over cib is confusing, but only some nitpicks from my side.

core/mbox.c Show resolved Hide resolved
core/mbox.c Outdated Show resolved Hide resolved
core/include/mbox.h Outdated Show resolved Hide resolved
sys/net/gnrc/netreg/gnrc_netreg.c Outdated Show resolved Hide resolved
Comment on lines +173 to +174
msg_t garbage_bin;
mbox_try_get_with_type(&reg->mbox, &garbage_bin, _TIMEOUT_MSG_TYPE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this still has potential for racey behavior. What if two threads are waiting with timeout? Then this clears out the timeout of the other thread. IMHO, the most ideal way to fix this to use the PID of the waiter as a value for the timeout message instead of some magic integer and check here, if the removed timeout message is actually the one for this thread. Since this requires to search for both type and value, I am not sure, if having this dedicated function makes that much sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it legal to have two threads waiting on the same socket? If so, whats the use case?

On mutli-core systems one could use that to distribute load on multiple cores. I don't see this happen on RIOT.

@kaspar030
Copy link
Contributor

To me shuffling the mbox queue feels like overkill and error prone. And there's now a feature gap between msg and mbox. And this solution is O(n) (even though n is really small here as it's a special case catching a race). And mbox is supposed to be multi-consumer, so multiple threads might be waiting with timeout (this must be broken already). And this is a ztimer quirk that's now fixed in gnrc. ...

There was a thread flags based solution for xtimer and msg_recv_timeout() (#13429). I wasn't following it because it added 200b of core_thread_flags dependency (and closed all xtimer PRs at some point), and there were multiple competing solutions (I don't think any got in).

There's also a similar issue with msg_recv_timeout() that ztimer inherited from xtimer.

@miri64 miri64 added the Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) label Nov 22, 2022
@maribu
Copy link
Member Author

maribu commented Nov 22, 2022

And this is a ztimer quirk that's now fixed in gnrc. ...

Not really. The race condition is in GNRC. The ztimer quirk just makes the condition likely to trigger.

I will also look into the ztimer quirk, but one bug at a time.

@maribu
Copy link
Member Author

maribu commented Nov 22, 2022

And this solution is O(n)

Note that it is reasonable to assume that the expected message type will be sitting in the front of the queue.

I think the fundamental issue here is that msg and mbox are a poor choice for the backbone of a network stack. And the consequence are an increased complexity to handle the lifecycle of network messages passing through the network stack, an increased chance to loosing IPC messages (or rather not being able to send without blocking) and having to drop network messages, and an increased number of corner cases and race conditions.

I see no way to rework the architecture of GNRC within a reasonable time frame. The alternative is to add the features needed to fix the race with the use of msg/mbox.

A simpler approach to fix the issue as it surfaces in tests/pkg_edhoc_c is a simple mbox_try_peek(). However, can we be sure that the next message in the mbox() is the stale timeout? What if another thread preempted the calling one just before ztimer_remove() is called and multiple network messages are received before the stale timer fires and sends a stale timeout message? I think we cannot be sure. So we really have to iterate through the whole mbox message queue and remove stale timeouts.

I am open to suggestions on how to fix the issue. But other than moving away from mbox/msg (which certainly is out of scope for fixing a bug in an overdue release), I have none.

I am also open to suggestions on how to optimize mbox_try_get_with_type().

  • E.g. we could just move the first element into the position of the found message. The cost would still be O(n) (due to the search) and we may add a footgun by changing the order.
  • We could just add a magic emtpy flag into messages and fragment the queue space. It would still be O(n) for the search and we may have a situation with a single message in the queue preventing new messages to be stored non-blockingly, regardless of the size of the queue.

I agree that this sucks. To me, the PR contains the option that sucks the least.

Add a non-blocking function that allows draining the mbox queue from
message of a certain type, while keeping the rest of the messages
in the mbox intact for later consumption.
`gnrc_sock_recv()` blocks waiting on an `mbox` when receiving. The
timeout implementation just sends a magic timeout message to that mbox
when the timeout timer fires. The implementation assumed that the head
of the mbox will always contain either the received message, or the
magic timeout message, but never both.

This assumption doesn't hold when the timeout fires after the message
was received but before the timer was stopped. This can result in the
subsequent call to `gnrc_sock_recv()` to incorrectly fail with a timeout
right away (without actually waiting), or to a wild pointer
(`_TIMEOUT_MAGIC`) being freed in `gnrc_netreg_unregister()` (whatever
function call comes next).

This fixes the issue by discarding a stale timeout message (if any) from
the mbox while keeping the mbox queue otherwise intact.
When freeing any stale pktsnips from stale messages in the mbox, make
sure that the messages actually contains a pktsnip before freeing.
@kaspar030
Copy link
Contributor

I am open to suggestions on how to fix the issue. But other than moving away from mbox/msg (which certainly is out of scope for fixing a bug in an overdue release), I have none.

I'm just realizing that gnrc_sock is hand-rolling sth like mbox_recv_timeout(). Would a non-leaking version of that fix the issue?

Anyhow,

  • not breaking on an unexpected message is fixable (and this PR adds that)
  • not timing out on a stale timeout message is fixable (with ztimer's msg_receive_timeout() that wouldn't happen as the return value for timeout or stale message is different)
  • and, a mbox_recv_timeout() that doesn't leak is doable (my preferred solution)

If this solution works and fixes the issue, can we just push it to the release branch, and find a proper solution for the future? Or, go with this now (as its already done) and let time solve this differently if pressing enough?

(maybe mark the added mbox function somehow so there'll not be users should we want drop it?)

@maribu
Copy link
Member Author

maribu commented Nov 22, 2022

I salvaged the uncontroversial parts. Also, the wild pointer freeing is giving me much more headache than the premature timeout error. After all, code that needs to communicate over network will have some fault tolerance and error handling in it, as in real life network communication things go wrong. A premature timeout due to a race condition is something that is not ideal, but should not break real world applications.

I agree that a ztimer_mbox_get_timeout() is the best presented solution to fix premature timeouts on subsequent calls.

@maribu
Copy link
Member Author

maribu commented Nov 22, 2022

Btw., the doc of ztimer_msg_receive_timeout() says:

RIOT/sys/include/ztimer.h

Lines 468 to 469 in 97bc825

* @note: This might function might leave a message with type MSG_ZTIMER in the
* thread's message queue, which must be handled (ignored).

This is not encouraging :-/

bors bot added a commit that referenced this pull request Jan 3, 2023
18632: tests/thread_float: do not overload slow MCUs with IRQs r=benpicco a=maribu

### Contribution description

If the regular context switches are triggered too fast, slow MCUs will be able to spent little time on actually progressing in the test. This will scale the IRQ rate with the CPU clock as a crude way too keep load within limits.

### Testing procedure

The unit test should now pass on the Microduino CoreRF

```
$ make BOARD=microduino-corerf AVRDUDE_PROGRAMMER=dragon_jtag -C tests/thread_float flash test
make: Entering directory '/home/maribu/Repos/software/RIOT/tests/thread_float'
Building application "tests_thread_float" for "microduino-corerf" with MCU "atmega128rfa1".
[...]
   text	  data	   bss	   dec	   hex	filename
  12834	   520	  3003	 16357	  3fe5	/home/maribu/Repos/software/RIOT/tests/thread_float/bin/microduino-corerf/tests_thread_float.elf
avrdude -c dragon_jtag -p m128rfa1  -U flash:w:/home/maribu/Repos/software/RIOT/tests/thread_float/bin/microduino-corerf/tests_thread_float.hex
[...]
Welcome to pyterm!
Type '/exit' to exit.
READY
s
START
main(): This is RIOT! (Version: 2022.10-devel-858-g18566-tests/thread_float)
THREADS CREATED

Context switch every 3125 µs
{ "threads": [{ "name": "idle", "stack_size": 192, "stack_used": 88 }]}
{ "threads": [{ "name": "main", "stack_size": 640, "stack_used": 220 }]}
THREAD t1 start
THREAD t2 start
THREAD t3 start
t1: 141.443770
t3: 141.466810
t1: 141.443770
t3: 141.466810
t1: 141.443770
t3: 141.466810
t1: 141.443770
t3: 141.466810
t1: 141.443770
t3: 141.466810
t1: 141.443770
t3: 141.466810
t1: 141.443770

make: Leaving directory '/home/maribu/Repos/software/RIOT/tests/thread_float'
```

(~~Note: The idle thread exiting is something that should never occur. I guess the culprit may be `cpu_switch_context_exit()` messing things up when the main thread exits. But that is not directly related to what this PR aims to fix. Adding a `thread_sleep()` at the end of `main()` does indeed prevent the idle thread from exiting.~~
Update: That's expected. The idle thread stats are printed on exit of the main thread, the idle thread does not actually exit.)

### Issues/PRs references

Fixes #16908 maybe?

18950: tests/unittests: add unit tests for core_mbox r=benpicco a=maribu

### Contribution description

As the title says

### Testing procedure

The test cases are run on `native` by Murdock anyway.

### Issues/PRs references

Split out of #18949

19030: tests/periph_timer_short_relative_set: improve test r=benpicco a=maribu

### Contribution description

Reduce the number lines to output by only testing for intervals 0..15 to speed up the test.

In addition, run each test case 128 repetitions (it is still faster than before) to give some confidence the short relative set actually succeeded.

### Testing procedure

The test application should consistently fail or succeed, rather than occasionally passing.

### Issues/PRs references

None

19085: makefiles/tests/tests.inc.mk: fix test/available target r=benpicco a=maribu

### Contribution description

`dist/tools/compile_and_test_for_board/compile_and_test_for_board.py` relies on `make test/available` to check if a test if available. However, this so far did not take `TEST_ON_CI_BLACKLIST` and `TEST_ON_CI_WHITELIST` into account, resulting in tests being executed for boards which they are not available. This should fix the issue.

### Testing procedure


#### Expected to fail

```
$ make BOARD=nrf52840dk -C tests/gcoap_fileserver test/available
$ make BOARD=microbit -C tests/log_color test/available
```

(On `master`, they succeed, but fail in this PR.)

#### Expected to succeed

```
$ make BOARD=native -C tests/gcoap_fileserver test/available
$ make BOARD=nrf52840dk -C tests/pkg_edhoc_c test/available
$ make BOARD=nrf52840dk -C tests/log_color test/available
```

(Succeed in both `master` and this PR.)

### Issues/PRs references

None

Co-authored-by: Marian Buschsieweke <marian.buschsieweke@ovgu.de>
bors bot added a commit that referenced this pull request Jan 3, 2023
18950: tests/unittests: add unit tests for core_mbox r=benpicco a=maribu

### Contribution description

As the title says

### Testing procedure

The test cases are run on `native` by Murdock anyway.

### Issues/PRs references

Split out of #18949

Co-authored-by: Marian Buschsieweke <marian.buschsieweke@ovgu.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: core Area: RIOT kernel. Handle PRs marked with this with care! Area: network Area: Networking Area: sys Area: System Area: tests Area: tests and testing framework Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants