Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

app/netdev: application stops working after receiving frames with assertion or completely without error #8271

Open
roberthartung opened this issue Dec 18, 2017 · 17 comments
Assignees
Labels
Area: core Area: RIOT kernel. Handle PRs marked with this with care! Area: network Area: Networking Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)

Comments

@roberthartung
Copy link
Member

Description

Application stops working (Assertion failed or no output) after random(?) time with assertion with my at86rf2xx, and no error on telosb (tmote sky).

Steps to reproduce the issue

Send packets from various nodes (broadcast) to other nodes. Also, corrupt packages should be in the air.

Expected results

Application correctly receives and frees all packets.

Actual results

2017-12-18 09:05:32,868 - INFO # sys/net/gnrc/pktbuf_static/gnrc_pktbuf_static.c:230 => n/a
2017-12-18 09:05:32,868 - INFO # *** RIOT kernel panic:
2017-12-18 09:05:32,886 - INFO # FAILED ASSERTION.
2017-12-18 09:05:32,887 - INFO #
2017-12-18 09:05:32,932 - INFO # pid | name | state Q | pri | stack ( used) | base addr | current
2017-12-18 09:05:32,980 - INFO # 1 | idle | pending Q | 15 | 768 ( 88) | 0x1eb7 | 0x2174
2017-12-18 09:05:33,028 - INFO # 2 | main | bl mutex _ | 7 | 768 ( 246) | 0x21b7 | 0x23c7
2017-12-18 09:05:33,076 - INFO # 3 | at86rf2xx | bl rx _ | 2 | 512 ( 242) | 0x3de3 | 0x3f44
2017-12-18 09:05:33,124 - INFO # 4 | dump_thread | running Q | 8 | 768 ( 766) | 0x1b49 | 0x1dea
2017-12-18 09:05:33,171 - INFO # 5 | send_thread | bl mutex _ | 9 | 768 ( 170) | 0x1809 | 0x1a6d
2017-12-18 09:05:33,204 - INFO # | SUM | | | 3584 ( 1512)
2017-12-18 09:05:33,205 - INFO #
2017-12-18 09:05:33,205 - INFO # *** halted.
2017-12-18 09:05:33,219 - INFO #

@miri64
Copy link
Member

miri64 commented Dec 19, 2017

From what branch are you building? When I try the application you provided on users, I get an error:

main.c:66:67: error: 'gnrc_netif_hdr_t {aka struct <anonymous>}' has no member named 'crc_valid'
                 printf("HDR: %d %d %d\n", hdr->rssi, hdr->lqi, hdr->crc_valid);
                                                                   ^~

@miri64 miri64 self-assigned this Dec 19, 2017
@miri64 miri64 added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) GNRC labels Dec 19, 2017
@roberthartung
Copy link
Member Author

Working on https://github.com/roberthartung/RIOT/tree/wip-inga-crc , see also: #8276

@roberthartung
Copy link
Member Author

One of my telosb nodes stopped working this morning again:

2017-12-19 02:38:21,857 - INFO # send... OK
2017-12-19 02:38:22,399 - INFO # HDR: -68 107 1
2017-12-19 02:38:22,495 - INFO # HDR: -50 107 1
2017-12-19 02:38:22,527 - INFO # HDR: -59 95 1
2017-12-19 02:38:22,560 - INFO # HDR: -39 108 1
2017-12-19 02:38:22,590 - INFO # HDR: -52 107 1
2017-12-19 02:38:23,420 - INFO # HDR: -68 107 1
2017-12-19 02:38:23,500 - INFO # HDR: -50 108 1
2017-12-19 02:38:23,547 - INFO # HDR: -58 94 1
2017-12-19 02:38:23,564 - INFO # HDR: -39 108 1
2017-12-19 02:38:23,595 - INFO # HDR: -52 108 1
2017-12-19 02:38:23,675 - INFO # HDR: -54 108 1
2017-12-19 02:38:23,898 - INFO # send... OK
2017-12-19 02:38:24,456 - INFO # HDR: -68 108 1
2017-12-19 02:38:24,505 - INFO # HDR: -47 106 1
2017-12-19 02:38:24,569 - INFO # HDR: -58 89 1
2017-12-19 02:38:24,600 - INFO # HDR: -39 107 1
2017-12-19 02:38:24,601 - INFO # HDR: -52 108 1
2017-12-19 02:38:25,477 - INFO # HDR: -68 107 1
2017-12-19 02:38:25,509 - INFO # HDR: -47 107 1
2017-12-19 02:38:25,588 - INFO # HDR: -58 98 1
2017-12-19 02:38:25,604 - INFO # HDR: -50 107 1
2017-12-19 02:38:25,620 - INFO # HDR: -39 107 1
2017-12-19 02:38:25,699 - INFO # HDR: -54 107 1
2017-12-19 02:38:25,938 - INFO # send... OK
2017-12-19 02:38:26,498 - INFO # HDR: -68 106 1
2017-12-19 02:38:26,513 - INFO # HDR: -50 105 1
2017-12-19 02:38:26,608 - INFO # HDR: -58 96 1
2017-12-19 02:38:26,640 - INFO # HDR: -39 105 1
2017-12-19 02:38:26,832 - INFO # HDR: -96 29 0
2017-12-19 02:38:26,959 - INFO # Contents(100): EC 1C 01 23 22 FF FF 00 00 DE 09 34 36 29 FF 00 00 D6 09 80 00 00 00 00 00 00 00 CC 09 06 1C 08 00 FE FF 00 00 C2 09 D2 1B 15 00 00 00 BA 09 50 1C 18 00 80 FF BD FF 21 00 E3 6A FF FF 50 1C 0A 00 00 FF FF 6A E3 00 34 1C 0A 00 34 1C 0E 00 FF FF 50 1C 0E 00 80 32 FC FF 77 01 E3 6A FF FF 50 1C 0E 00 80 | ...#"......46).................................P.......!..j..P......j..4...4.....P....2..w..j..P....

Afterwards, shell is dead and sending/receiving stopped completely.

@miri64
Copy link
Member

miri64 commented Dec 19, 2017

Have you had a look if gnrc_netreg returns the correct results (in particular gnrc_netreg_num()) for your custom dumper? The dump you gave in OP suggests that a gnrc_pktbuf_release() to many is happening.

@roberthartung
Copy link
Member Author

@miri64 Can you give me a quick example of how to use gnrc_netreg_num()? It's 0 for me: printf("_dump: %d %d\n", gnrc_netreg_num(GNRC_NETTYPE_UNDEF, 0), gnrc_netreg_num(GNRC_NETTYPE_NETIF, 0));.

@roberthartung
Copy link
Member Author

roberthartung commented Dec 19, 2017

Another node stopped working now:

2017-12-19 09:50:29,846 - INFO # HDR: -91 255 1
2017-12-19 09:50:29,847 - INFO # send... OK
2017-12-19 09:50:30,853 - INFO # send... OK
2017-12-19 09:50:31,861 - INFO # send... OK
2017-12-19 09:50:32,852 - INFO # send... OK
2017-12-19 09:50:33,859 - INFO # send... OK
2017-12-19 09:50:34,850 - INFO # send... OK
2017-12-19 09:50:35,858 - INFO # send... OK
2017-12-19 09:50:36,849 - INFO # send... OK
2017-12-19 09:50:37,856 - INFO # send... OK
2017-12-19 09:50:38,848 - INFO # send... OK
2017-12-19 09:50:39,854 - INFO # send... OK
2017-12-19 09:50:40,846 - INFO # send... OK
2017-12-19 09:50:41,854 - INFO # send... OK
2017-12-19 09:50:42,845 - INFO # send... OK
2017-12-19 09:50:43,852 - INFO # send... OK
2017-12-19 09:50:44,859 - INFO # send... OK

Node is not receiving, but sending. shell still works :/

@roberthartung
Copy link
Member Author

New error now:

2017-12-19 10:03:35,485 - INFO # Contents(13): 20 02 03 00 D0 06 69 70 6E 3A 33 33 04 |  .....ipn:33.
2017-12-19 10:03:35,965 - INFO # HDR: -91 255 1
2017-12-19 10:03:36,093 - INFO # HDR: -91 255 1
2017-12-19 10:03:36,110 - INFO # send... core/thread.c:110 => n/a
2017-12-19 10:03:36,125 - INFO # *** RIOT kernel panic:
2017-12-19 10:03:36,141 - INFO # FAILED ASSERTION.
2017-12-19 10:03:36,142 - INFO # 
2017-12-19 10:03:36,189 - INFO # 	pid | name                 | state    Q | pri | stack  ( used) | base addr  | current     
2017-12-19 10:03:36,237 - INFO # 	  1 | idle                 | pending  Q |  15 |    128 (   88) |     0x1ab9 |     0x1af6 
2017-12-19 10:03:36,272 - INFO # 	  2 | main                 | bl mutex _ |   7 |    512 (  246) |     0x1b39 |     0x1c49 
2017-12-19 10:03:36,317 - INFO # 	  3 | at86rf2xx            | running  Q |   2 |    256 (  256) |     0x3665 |     0x36c6 
2017-12-19 10:03:36,365 - INFO # 	  4 | dump_thread          | bl rx    _ |   8 |    512 (  510) |     0x184b |     0x19eb 
2017-12-19 10:03:36,413 - INFO # 	  5 | send_thread          | pending  Q |   9 |    512 (  190) |     0x160b |     0x177e 
2017-12-19 10:03:36,445 - INFO # 	    | SUM                  |            |     |   1920 ( 1290)
2017-12-19 10:03:36,446 - INFO # 
2017-12-19 10:03:36,461 - INFO # *** halted.
2017-12-19 10:03:36,462 - INFO # 

@miri64
Copy link
Member

miri64 commented Dec 19, 2017

This looks more and more like a core related issue (maybe just core on inga?).

@miri64 miri64 added the Area: core Area: RIOT kernel. Handle PRs marked with this with care! label Dec 19, 2017
@roberthartung
Copy link
Member Author

Maybe. No idea what, though. Thread scheduling is taken from other atmega CPUs, as well as periph drivers. Maybe increasing stack size would help?

2017-12-19 11:37:01,888 - INFO # ps
2017-12-19 11:37:01,937 - INFO # 	pid | name                 | state    Q | pri | stack  ( used) | base addr  | current     
2017-12-19 11:37:01,984 - INFO # 	  1 | idle                 | pending  Q |  15 |    128 (   88) |     0x1ab9 |     0x1af4 
2017-12-19 11:37:02,017 - INFO # 	  2 | main                 | running  Q |   7 |    512 (  356) |     0x1b39 |     0x1c49 
2017-12-19 11:37:02,065 - INFO # 	  3 | at86rf2xx            | bl rx    _ |   2 |    256 (  256) |     0x3665 |     0x36c6 
2017-12-19 11:37:02,113 - INFO # 	  4 | dump_thread          | bl rx    _ |   8 |    512 (  510) |     0x184b |     0x19eb 
2017-12-19 11:37:02,160 - INFO # 	  5 | send_thread          | bl mutex _ |   9 |    512 (  190) |     0x160b |     0x176f 

@roberthartung
Copy link
Member Author

@miri64 INGA problem would only show up on INGA. But telosb nodes sometimes die as well.

@miri64
Copy link
Member

miri64 commented Dec 19, 2017

Maybe increasing stack size would help?

Have you tried that? Thread 3 and 4 definitely look like they need more (4 is close to the max).

@roberthartung
Copy link
Member Author

Yes tried that for now. The dump_thread is even at its limit, if I double the stack size. Any idea why?

The receive error also occurs on a TelosB mote.

2017-12-19 15:05:28,529 - INFO # HDR: -51 106 1
2017-12-19 15:05:28,546 - INFO # HDR: -51 106 1
2017-12-19 15:05:28,752 - INFO # send... OK
2017-12-19 15:05:32,116 - INFO # send... OK
2017-12-19 15:05:35,496 - INFO # send... OK
2017-12-19 15:05:38,860 - INFO # send... OK
2017-12-19 15:05:42,240 - INFO # send... OK
2017-12-19 15:05:45,604 - INFO # send... OK
2017-12-19 15:05:48,983 - INFO # send... OK
2017-12-19 15:05:52,363 - INFO # send... OK
2017-12-19 15:05:55,727 - INFO # send... OK
2017-12-19 15:05:59,107 - INFO # send... OK
2017-12-19 15:06:02,487 - INFO # send... OK
2017-12-19 15:06:05,867 - INFO # send... OK
2017-12-19 15:06:09,248 - INFO # send... OK

@miri64
Copy link
Member

miri64 commented Dec 19, 2017

Yes tried that for now. The dump_thread is even at its limit, if I double the stack size. Any idea why?

printf() uses a lot of stack-space, at least with newlib. Have you considered using sys/include/fmt.h instead?

@roberthartung
Copy link
Member Author

INGA / Samr21-xpro works fine right now. However, my telosb nodes are now dying all the time. Exactly when I try to send a packet and at the same time I receive a packet.

@roberthartung
Copy link
Member Author

Playing around with stack sizes solved the problem for now. Additionally some cleanup in the code. Thanks for now!

@roberthartung
Copy link
Member Author

However, telosb nodes are still stuck sometimes in RX state. "ifconfig set state idle" solves the problem...

@miri64 miri64 added Area: network Area: Networking and removed GNRC labels Sep 30, 2018
@stale
Copy link

stale bot commented Aug 10, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions.

@stale stale bot added the State: stale State: The issue / PR has no activity for >185 days label Aug 10, 2019
@stale stale bot closed this as completed Sep 10, 2019
@aabadie aabadie reopened this Sep 21, 2019
@stale stale bot removed the State: stale State: The issue / PR has no activity for >185 days label Sep 21, 2019
@miri64 miri64 added this to the Release 2020.07 milestone Jul 1, 2020
@MrKevinWeiss MrKevinWeiss removed this from the Release 2021.07 milestone Jul 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: core Area: RIOT kernel. Handle PRs marked with this with care! Area: network Area: Networking Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

No branches or pull requests

4 participants