gnrc_ipv6: crash on heavy network load on native #10875

gschorcht · 2019-01-26T00:13:29Z

Description

Bombarding native with pings of maximum size and an interval of 0 from multiple terminals leads to crash. The following is the backtrace from gdb

Program received signal SIGSEGV, Segmentation fault.
0x5656cf83 in gnrc_netif_hdr_get_netif (hdr=0x1158) at sys/include/net/gnrc/netif/hdr.h:291
291	    return gnrc_netif_get_by_pid(hdr->if_pid);
(gdb) bt
#0  0x5656cf83 in gnrc_netif_hdr_get_netif (hdr=0x1158) at sys/include/net/gnrc/netif/hdr.h:291
#1  0x5656dbc1 in _send (pkt=0x5659d348 <_pktbuf+1704>, prep_hdr=true) at sys/net/gnrc/network_layer/ipv6/gnrc_ipv6.c:539
#2  0x5656d385 in _event_loop (args=0x0) at sys/net/gnrc/network_layer/ipv6/gnrc_ipv6.c:193
#3  0xf7e0dbdb in makecontext () from /lib/i386-linux-gnu/libc.so.6
#4  0x00000000 in ?? ()

Steps to reproduce the issue

Compile examples/gnrc_networking with -g option:

gs@gunny8:~/src/RIOT-Xtensa-ESP.working$ CFLAGS="-g3" PORT=tap0 USEMODULE=gnrc_pktbuf_cmd make -C examples/gnrc_networking BOARD=native

Start gdb:

gdb examples/gnrc_networking/bin/native/gnrc_networking.elf

Run the RIOT instance in gdb:

run tap0

Ping from four terms:

term1> sudo ping6 fe80::280d:21ff:fed1:c5ed -Itap0 -s1392 -i 0
term2> sudo ping6 fe80::280d:21ff:fed1:c5ed -Itap0 -s1392 -i 0
term3> sudo ping6 fe80::280d:21ff:fed1:c5ed -Itap0 -s1392 -i 0
term4> sudo ping6 fe80::280d:21ff:fed1:c5ed -Itap0 -s1392 -i 0

After a while, RIOT instance should crash.

The text was updated successfully, but these errors were encountered:

kaspar030 · 2019-01-26T08:00:10Z

The backtrace looks ~~exactly as~~ edit very similar to #6123, thus I'm closing this as duplicate. re-open if you disgree.

gschorcht · 2019-01-26T09:13:36Z

Hm, the crash happens on different calls in the _send function. Sure, iIt might have the same cause, an inconsistent memory, but maybe not. The crash as described in this issue crashes reproducable always at the same call.

IMHO it would be reasonable to let @miri64 have a short look before we close it.

gschorcht · 2019-01-26T09:20:52Z

It is very probable the same cause. In both cases (#6123 and this issue), the reason seems to be an invalid pkt pointer. Even though, I would like to let @miri64 have a short look.

miri64 · 2019-01-26T11:42:43Z

Hm, the crash happens on different calls in the _send function. Sure, iIt might have the same cause, an inconsistent memory, but maybe not. The crash as described in this issue crashes reproducable always at the same call.

The send function changed significantly since ~~2015~~ 2016, so I'm not sure that it might be the same GDB dump after all.

miri64 · 2019-01-26T12:03:28Z

So I think the version of master @kaspar030 reported on in #6123 was 8432d92. I determined this by running

git log --merges --before="2016-11-15 17:55"

l684 in #6123 seems to me to be the first access to a pointer in the provided pkt list

RIOT/sys/net/gnrc/network_layer/ipv6/gnrc_ipv6.c

Lines 657 to 684 in 8432d92

    
           static void _send(gnrc_pktsnip_t *pkt, bool prep_hdr) 
        
           { 
        
               kernel_pid_t iface = KERNEL_PID_UNDEF; 
        
               gnrc_pktsnip_t *ipv6, *payload; 
        
               ipv6_addr_t *tmp; 
        
               ipv6_hdr_t *hdr; 
        
               /* get IPv6 snip and (if present) generic interface header */ 
        
               if (pkt->type == GNRC_NETTYPE_NETIF) { 
        
                   /* If there is already a netif header (routing protocols and 
        
                    * neighbor discovery might add them to preset sending interface) */ 
        
                   iface = ((gnrc_netif_hdr_t *)pkt->data)->if_pid; 
        
                   /* seize payload as temporary variable */ 
        
                   ipv6 = gnrc_pktbuf_start_write(pkt); /* write protect for later removal 
        
                                                         * in _send_unicast() */ 
        
                   if (ipv6 == NULL) { 
        
                       DEBUG("ipv6: unable to get write access to netif header, dropping packet\n"); 
        
                       gnrc_pktbuf_release(pkt); 
        
                       return; 
        
                   } 
        
                   pkt = ipv6;  /* Reset pkt from temporary variable */ 
        
                   ipv6 = pkt->next; 
        
               } 
        
               else { 
        
                   ipv6 = pkt; 
        
               } 
        
               /* seize payload as temporary variable */ 
        
               payload = gnrc_pktbuf_start_write(ipv6);

same goes for l539 in current master 6cd81db

RIOT/sys/net/gnrc/network_layer/ipv6/gnrc_ipv6.c

Lines 525 to 539 in 6cd81db

    
           static void _send(gnrc_pktsnip_t *pkt, bool prep_hdr) 
        
           { 
        
               gnrc_netif_t *netif = NULL; 
        
               gnrc_pktsnip_t *tmp_pkt; 
        
               ipv6_hdr_t *ipv6_hdr; 
        
               uint8_t netif_hdr_flags = 0U; 
        
               /* get IPv6 snip and (if present) generic interface header */ 
        
               if (pkt->type == GNRC_NETTYPE_NETIF) { 
        
                   /* If there is already a netif header (routing protocols and 
        
                    * neighbor discovery might add them to preset sending interface or 
        
                    * higher layers wants to provide flags to the interface ) */ 
        
                   const gnrc_netif_hdr_t *netif_hdr = pkt->data; 
        
                   netif = gnrc_netif_hdr_get_netif(pkt->data);

I'd say its inconclusive if it is the same error, but in both cases the packet seems to get corrupted while being in gnrc_ipv6's message queue (possibly due to a too early release). All in all it seems to be at least in the same class of issue #6123 and the way to reproduce is also the same, so I'd say we close this one as a duplicate, as @kaspar030 proposed. Any fix should be tested with the steps to reproduce anyway, The testing procedures are better outlined here though, so I will link this issue as a reference in #6123.

miri64 · 2019-01-26T12:04:11Z

Start gdb:
gdb examples/gnrc_networking/bin/native/gnrc_networking.elf
Run the RIOT instance in gdb:
run tap0

We have make debug for that ;-).

gschorcht · 2019-01-26T12:10:59Z

All in all it seems to be at least in the same class of issue #6123 and the way to reproduce is also the same, so I'd say we close this one as a duplicate, as @kaspar030 proposed.

Thanks. Agreed.

miri64 · 2019-01-26T12:40:55Z

Discussion below unrelated to issue at hand ;-)

@gschorcht Why -s1392 btw?

gschorcht · 2019-01-26T12:47:31Z

I didn't try whether it also happens with data sizes less than the maximum size. I just used the same command as for my stress tests of esp8266 esp_wifi driver.

Probably also because I thought that the crash might be related to the buffer full problem and requires maximum data size to reproduce it.

BTW, I still have a packet buffer problem there, issue 4 in #10861. I ran into the problem described here when I was trying whether I can reproduce it on native.

miri64 · 2019-01-26T12:55:01Z

Probably also because I thought that the crash might be related to the buffer full problem and requires maximum data size to reproduce it.

Since both WiFi and Ethernet have an MTU 1500 that would be -s1452 though ;-).

gschorcht · 2019-01-26T13:03:06Z

Yes, but if the router provides the IPv6 MTU option in its RA as mine does, the MTU is downsized to 1440 as in my case 😉 Exactly this question came up also in PR #10792 and PR #10581. The interface starts with MTU 1500 but once the first RA is received and the interface gets its routing prefix, the MTU is also downsized. This happens for Linux boxes in the same way.

miri64 · 2019-01-26T13:06:33Z

Ok sorry, I forgot about that. On native however, the MTU stays 1500.

miri64 · 2019-01-26T13:07:11Z

BTW, I still have a packet buffer problem there, issue 4 in #10861. I ran into the problem described here when I was trying whether I can reproduce it on native.

Were you able to?

gschorcht · 2019-01-26T13:10:28Z

Were you able to?

No, I just saw the crash described here. In esp_wifi the buffer becomes full and communication stops to work, but it doesn't crash.

gschorcht · 2019-01-26T13:12:04Z

Ok sorry, I forgot about that. On native however, the MTU stays 1500.

Ok, I see. According to the description in #6123, the data size does not seem to matter.

miri64 · 2019-01-26T13:18:33Z

Ok, I see. According to the description in #6123, the data size does not seem to matter.

True

gschorcht assigned miri64 Jan 26, 2019

gschorcht added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Area: network Area: Networking labels Jan 26, 2019

kaspar030 added the State: duplicate State: The issue/PR is a duplicate of another issue/PR label Jan 26, 2019

kaspar030 closed this as completed Jan 26, 2019

gschorcht reopened this Jan 26, 2019

miri64 mentioned this issue Jan 26, 2019

gnrc: crash with (excessive) traffic in native #6123

Closed

miri64 closed this as completed Jan 26, 2019

miri64 mentioned this issue Jan 27, 2019

core: msg_receive() on native sometimes returns without msg being re-set #10881

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gnrc_ipv6: crash on heavy network load on native #10875

gnrc_ipv6: crash on heavy network load on native #10875

gschorcht commented Jan 26, 2019 •

edited

Loading

kaspar030 commented Jan 26, 2019 •

edited

Loading

gschorcht commented Jan 26, 2019

gschorcht commented Jan 26, 2019

miri64 commented Jan 26, 2019 •

edited

Loading

miri64 commented Jan 26, 2019

miri64 commented Jan 26, 2019

gschorcht commented Jan 26, 2019

miri64 commented Jan 26, 2019 •

edited

Loading

gschorcht commented Jan 26, 2019 •

edited

Loading

miri64 commented Jan 26, 2019 •

edited

Loading

gschorcht commented Jan 26, 2019

miri64 commented Jan 26, 2019

miri64 commented Jan 26, 2019 •

edited

Loading

gschorcht commented Jan 26, 2019 •

edited by miri64

Loading

gschorcht commented Jan 26, 2019 •

edited

Loading

miri64 commented Jan 26, 2019

gnrc_ipv6: crash on heavy network load on native #10875

gnrc_ipv6: crash on heavy network load on native #10875

Comments

gschorcht commented Jan 26, 2019 • edited Loading

Description

Steps to reproduce the issue

kaspar030 commented Jan 26, 2019 • edited Loading

gschorcht commented Jan 26, 2019

gschorcht commented Jan 26, 2019

miri64 commented Jan 26, 2019 • edited Loading

miri64 commented Jan 26, 2019

miri64 commented Jan 26, 2019

gschorcht commented Jan 26, 2019

miri64 commented Jan 26, 2019 • edited Loading

Discussion below unrelated to issue at hand ;-)

gschorcht commented Jan 26, 2019 • edited Loading

miri64 commented Jan 26, 2019 • edited Loading

gschorcht commented Jan 26, 2019

miri64 commented Jan 26, 2019

miri64 commented Jan 26, 2019 • edited Loading

gschorcht commented Jan 26, 2019 • edited by miri64 Loading

gschorcht commented Jan 26, 2019 • edited Loading

miri64 commented Jan 26, 2019

gschorcht commented Jan 26, 2019 •

edited

Loading

kaspar030 commented Jan 26, 2019 •

edited

Loading

miri64 commented Jan 26, 2019 •

edited

Loading

miri64 commented Jan 26, 2019 •

edited

Loading

gschorcht commented Jan 26, 2019 •

edited

Loading

miri64 commented Jan 26, 2019 •

edited

Loading

miri64 commented Jan 26, 2019 •

edited

Loading

gschorcht commented Jan 26, 2019 •

edited by miri64

Loading

gschorcht commented Jan 26, 2019 •

edited

Loading