Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gnrc_ipv6: crash on heavy network load on native #10875

Closed
gschorcht opened this issue Jan 26, 2019 · 16 comments
Closed

gnrc_ipv6: crash on heavy network load on native #10875

gschorcht opened this issue Jan 26, 2019 · 16 comments
Assignees
Labels
Area: network Area: Networking State: duplicate State: The issue/PR is a duplicate of another issue/PR Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)

Comments

@gschorcht
Copy link
Contributor

gschorcht commented Jan 26, 2019

Description

Bombarding native with pings of maximum size and an interval of 0 from multiple terminals leads to crash. The following is the backtrace from gdb

Program received signal SIGSEGV, Segmentation fault.
0x5656cf83 in gnrc_netif_hdr_get_netif (hdr=0x1158) at sys/include/net/gnrc/netif/hdr.h:291
291	    return gnrc_netif_get_by_pid(hdr->if_pid);
(gdb) bt
#0  0x5656cf83 in gnrc_netif_hdr_get_netif (hdr=0x1158) at sys/include/net/gnrc/netif/hdr.h:291
#1  0x5656dbc1 in _send (pkt=0x5659d348 <_pktbuf+1704>, prep_hdr=true) at sys/net/gnrc/network_layer/ipv6/gnrc_ipv6.c:539
#2  0x5656d385 in _event_loop (args=0x0) at sys/net/gnrc/network_layer/ipv6/gnrc_ipv6.c:193
#3  0xf7e0dbdb in makecontext () from /lib/i386-linux-gnu/libc.so.6
#4  0x00000000 in ?? ()

Steps to reproduce the issue

Compile examples/gnrc_networking with -g option:

gs@gunny8:~/src/RIOT-Xtensa-ESP.working$ CFLAGS="-g3" PORT=tap0 USEMODULE=gnrc_pktbuf_cmd make -C examples/gnrc_networking BOARD=native

Start gdb:

gdb examples/gnrc_networking/bin/native/gnrc_networking.elf

Run the RIOT instance in gdb:

run tap0

Ping from four terms:

term1> sudo ping6 fe80::280d:21ff:fed1:c5ed -Itap0 -s1392 -i 0
term2> sudo ping6 fe80::280d:21ff:fed1:c5ed -Itap0 -s1392 -i 0
term3> sudo ping6 fe80::280d:21ff:fed1:c5ed -Itap0 -s1392 -i 0
term4> sudo ping6 fe80::280d:21ff:fed1:c5ed -Itap0 -s1392 -i 0

After a while, RIOT instance should crash.

@gschorcht gschorcht added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Area: network Area: Networking labels Jan 26, 2019
@kaspar030 kaspar030 added the State: duplicate State: The issue/PR is a duplicate of another issue/PR label Jan 26, 2019
@kaspar030
Copy link
Contributor

kaspar030 commented Jan 26, 2019

The backtrace looks exactly as edit very similar to #6123, thus I'm closing this as duplicate. re-open if you disgree.

@gschorcht
Copy link
Contributor Author

Hm, the crash happens on different calls in the _send function. Sure, iIt might have the same cause, an inconsistent memory, but maybe not. The crash as described in this issue crashes reproducable always at the same call.

IMHO it would be reasonable to let @miri64 have a short look before we close it.

@gschorcht gschorcht reopened this Jan 26, 2019
@gschorcht
Copy link
Contributor Author

It is very probable the same cause. In both cases (#6123 and this issue), the reason seems to be an invalid pkt pointer. Even though, I would like to let @miri64 have a short look.

@miri64
Copy link
Member

miri64 commented Jan 26, 2019

Hm, the crash happens on different calls in the _send function. Sure, iIt might have the same cause, an inconsistent memory, but maybe not. The crash as described in this issue crashes reproducable always at the same call.

The send function changed significantly since 2015 2016, so I'm not sure that it might be the same GDB dump after all.

@miri64
Copy link
Member

miri64 commented Jan 26, 2019

So I think the version of master @kaspar030 reported on in #6123 was 8432d92. I determined this by running

git log --merges --before="2016-11-15 17:55"

l684 in #6123 seems to me to be the first access to a pointer in the provided pkt list

static void _send(gnrc_pktsnip_t *pkt, bool prep_hdr)
{
kernel_pid_t iface = KERNEL_PID_UNDEF;
gnrc_pktsnip_t *ipv6, *payload;
ipv6_addr_t *tmp;
ipv6_hdr_t *hdr;
/* get IPv6 snip and (if present) generic interface header */
if (pkt->type == GNRC_NETTYPE_NETIF) {
/* If there is already a netif header (routing protocols and
* neighbor discovery might add them to preset sending interface) */
iface = ((gnrc_netif_hdr_t *)pkt->data)->if_pid;
/* seize payload as temporary variable */
ipv6 = gnrc_pktbuf_start_write(pkt); /* write protect for later removal
* in _send_unicast() */
if (ipv6 == NULL) {
DEBUG("ipv6: unable to get write access to netif header, dropping packet\n");
gnrc_pktbuf_release(pkt);
return;
}
pkt = ipv6; /* Reset pkt from temporary variable */
ipv6 = pkt->next;
}
else {
ipv6 = pkt;
}
/* seize payload as temporary variable */
payload = gnrc_pktbuf_start_write(ipv6);

same goes for l539 in current master 6cd81db

static void _send(gnrc_pktsnip_t *pkt, bool prep_hdr)
{
gnrc_netif_t *netif = NULL;
gnrc_pktsnip_t *tmp_pkt;
ipv6_hdr_t *ipv6_hdr;
uint8_t netif_hdr_flags = 0U;
/* get IPv6 snip and (if present) generic interface header */
if (pkt->type == GNRC_NETTYPE_NETIF) {
/* If there is already a netif header (routing protocols and
* neighbor discovery might add them to preset sending interface or
* higher layers wants to provide flags to the interface ) */
const gnrc_netif_hdr_t *netif_hdr = pkt->data;
netif = gnrc_netif_hdr_get_netif(pkt->data);

I'd say its inconclusive if it is the same error, but in both cases the packet seems to get corrupted while being in gnrc_ipv6's message queue (possibly due to a too early release). All in all it seems to be at least in the same class of issue #6123 and the way to reproduce is also the same, so I'd say we close this one as a duplicate, as @kaspar030 proposed. Any fix should be tested with the steps to reproduce anyway, The testing procedures are better outlined here though, so I will link this issue as a reference in #6123.

@miri64
Copy link
Member

miri64 commented Jan 26, 2019

Start gdb:

gdb examples/gnrc_networking/bin/native/gnrc_networking.elf

Run the RIOT instance in gdb:

run tap0

We have make debug for that ;-).

@gschorcht
Copy link
Contributor Author

All in all it seems to be at least in the same class of issue #6123 and the way to reproduce is also the same, so I'd say we close this one as a duplicate, as @kaspar030 proposed.

Thanks. Agreed.

@miri64
Copy link
Member

miri64 commented Jan 26, 2019

Discussion below unrelated to issue at hand ;-)

@gschorcht Why -s1392 btw?

@gschorcht
Copy link
Contributor Author

gschorcht commented Jan 26, 2019

I didn't try whether it also happens with data sizes less than the maximum size. I just used the same command as for my stress tests of esp8266 esp_wifi driver.

Probably also because I thought that the crash might be related to the buffer full problem and requires maximum data size to reproduce it.

BTW, I still have a packet buffer problem there, issue 4 in #10861. I ran into the problem described here when I was trying whether I can reproduce it on native.

@miri64
Copy link
Member

miri64 commented Jan 26, 2019

Probably also because I thought that the crash might be related to the buffer full problem and requires maximum data size to reproduce it.

Since both WiFi and Ethernet have an MTU 1500 that would be -s1452 though ;-).

@gschorcht
Copy link
Contributor Author

Yes, but if the router provides the IPv6 MTU option in its RA as mine does, the MTU is downsized to 1440 as in my case 😉 Exactly this question came up also in PR #10792 and PR #10581. The interface starts with MTU 1500 but once the first RA is received and the interface gets its routing prefix, the MTU is also downsized. This happens for Linux boxes in the same way.

@miri64
Copy link
Member

miri64 commented Jan 26, 2019

Ok sorry, I forgot about that. On native however, the MTU stays 1500.

@miri64
Copy link
Member

miri64 commented Jan 26, 2019

BTW, I still have a packet buffer problem there, issue 4 in #10861. I ran into the problem described here when I was trying whether I can reproduce it on native.

Were you able to?

@gschorcht
Copy link
Contributor Author

gschorcht commented Jan 26, 2019

Were you able to?

No, I just saw the crash described here. In esp_wifi the buffer becomes full and communication stops to work, but it doesn't crash.

@gschorcht
Copy link
Contributor Author

gschorcht commented Jan 26, 2019

Ok sorry, I forgot about that. On native however, the MTU stays 1500.

Ok, I see. According to the description in #6123, the data size does not seem to matter.

@miri64
Copy link
Member

miri64 commented Jan 26, 2019

Ok, I see. According to the description in #6123, the data size does not seem to matter.

True

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: network Area: Networking State: duplicate State: The issue/PR is a duplicate of another issue/PR Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

No branches or pull requests

3 participants