Skip to content

babeld segmentation fault on exit node causes lost routes #31

Closed
@paidforby

Description

@paidforby

Related to #21, the "new" exit node continues to have difficulties maintaining its babeld process.

On home nodes pointed toward new exit node, I'm seeing this is the logs

Tue Apr 24 19:07:43 2018 user.notice root: no mesh routes available yet via [l2tp0] on try [26]: checking again in [5]s...

On the "new" exit node, I'm noticed this in /var/log/messages

Apr 24 12:17:55 exit0 kernel: [433698.664554] babeld[830]: segfault at 18 ip 00005567267473d0 sp 00007ffc6a239240 error 4 in babeld[55672673d000+16000]
Apr 24 17:27:26 exit0 kernel: [452270.052907] babeld[3596]: segfault at fffffff337dc3df9 ip 000055e3a510f112 sp 00007fffd89c3400 error 7 in babeld[55e3a5104000+16000]
Apr 24 18:10:33 exit0 kernel: [454857.088438] traps: babeld[3767] general protection ip:561947add3d0 sp:7fffc111a920 error:0
Apr 24 18:10:33 exit0 kernel: [454857.088445]  in babeld[561947ad3000+16000]

However, the babeld process appears to alive and well and restarting it does nothing

● babeld.service - babeld
   Loaded: loaded (/etc/systemd/system/babeld.service; disabled; vendor preset: 
   Active: active (running) since Tue 2018-04-24 18:10:33 PDT; 59min ago
 Main PID: 5288 (babeld)
    Tasks: 1 (limit: 4915)
   CGroup: /system.slice/babeld.service
           └─5288 /usr/local/bin/babeld -F -S /var/lib/babeld/state -c /etc/babe

Apr 24 18:22:23 exit0 babeld[5288]: removing: l2tp354-354
Apr 24 18:22:32 exit0 babeld[5288]: removing: l2tp355-355
Apr 24 18:24:24 exit0 babeld[5288]: removing: l2tp356-356
Apr 24 18:26:13 exit0 babeld[5288]: removing: l2tp357-357
Apr 24 18:26:15 exit0 babeld[5288]: send: Cannot assign requested address
Apr 24 18:28:16 exit0 babeld[5288]: removing: l2tp358-358
Apr 24 19:05:13 exit0 babeld[5288]: Warning: cannot restore old configuration fo
Apr 24 19:05:13 exit0 babeld[5288]: removing: l2tp350-350
Apr 24 19:07:05 exit0 babeld[5288]: removing: l2tp360-360
Apr 24 19:08:55 exit0 babeld[5288]: removing: l2tp361-361

Any ideas of the root cause? Perhaps its related to #24 that haunts the old exitnode? I'm tempted to just restart it and see if comes back to life, but I don't think that will help us solve the problem. Some other node whispers should jump into the exit node and see what they can figure out.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions