Skip to content

zebra_apic threads not started after FRR service restart #16747

Open
@ToshikiRen

Description

Discussed in #16638

Originally posted by ToshikiRen August 23, 2024
zebra_apic threads not started after FRR service restart (happens after multiple restarts, not all the time, the issue occurrence is mostly random)

The issue is that no routes are sent from routing daemons (e.g., BGP) to the kernel.

Questions:

  1. From my understanding the zebra_apic are responsible for communication between the FRR daemons and the linux kernel. Is my assumption correct?
  2. What could be the cause for the zebra_apic to not be started?

I managed to reproduce the issue on a box with the following configuration, without configuring BGP peers:

# default to using syslog. /etc/rsyslog.d/45-frr.conf places the log in
# /var/log/frr/frr.log
#
# Note:
# FRR's configuration shell, vtysh, dynamically edits the live, in-memory
# configuration while FRR is running. When instructed, vtysh will persist the
# live configuration to this file, overwriting its contents. If you want to
# avoid this, you can edit this file manually before starting FRR, or instruct
# vtysh to write configuration to a different file.
log syslog informational
!
debug zebra events
debug zebra packet recv
debug zebra kernel
debug zebra rib
debug zebra nht
debug zebra dplane
debug zebra nexthop
debug zebra neigh
debug bgp nht
debug bgp zebra

The error from the logs when the issue occurs during restart:

bgpd[335825]: [VMFZK-56S5Y] bgp_zebra_label_manager_connect: failed connecting synchronous zclient!

FRR version: 9.1
Show version output:

FRRouting 9.1 (come-as4581) on Linux(6.6.32-dent).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
configured with:
    '--build=x86_64-linux' '--host=aarch64-dent-linux' '--target=aarch64-dent-linux' '--prefix=/usr' '--exec_prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--libexecdir=/usr/libexec' '--datadir=/usr/share' '--sysconfdir=/etc' '--sharedstatedir=/com' '--localstatedir=/var' '--libdir=/usr/lib' '--includedir=/usr/include' '--oldincludedir=/usr/include' '--infodir=/usr/share/info' '--mandir=/usr/share/man' '--disable-silent-rules' '--disable-dependency-tracking' '--with-libtool-sysroot=' '--sbindir=/usr/libexec/frr' '--sysconfdir=/etc/frr' '--localstatedir=/var/run/frr' '--enable-vtysh' '--enable-multipath=64' '--enable-user=frr' '--enable-group=frr' '--enable-vty-group=frrvty' '--enable-configfile-mask=0640' '--enable-logfile-mask=0640' '--disable-doc' '--with-clippy=/usr/lib/clippy' '--disable-capabilities' '--disable-cumulus' '--disable-datacenter' '--disable-fpm' '--disable-grpc' '--disable-ospfapi' '--disable-ospfclient' '--with-libpam' '--disable-protobuf' '--disable-snmp' '--disable-zeromq' 'build_alias=x86_64-linux' 'host_alias=aarch64-dent-linux' 'target_alias=aarch64-dent-linux' 'AR=aarch64-dent-linux-gcc-ar' 'LD=aarch64-dent-linux-ld --sysroot= ' 'OBJCOPY=aarch64-dent-linux-objcopy' 'OBJDUMP=aarch64-dent-linux-objdump' 'RANLIB=aarch64-dent-linux-gcc-ranlib' 'STRIP=aarch64-dent-linux-strip' 'PKG_CONFIG_PATH=/usr/lib/pkgconfig:/usr/share/pkgconfig' 'PKG_CONFIG_LIBDIR=/usr/lib/pkgconfig' 'CC=aarch64-dent-linux-gcc -mbranch-protection=standard -fstack-protector-strong -O2 -D_FORTIFY_SOURCE=2 -Wformat -Wformat-security -Werror=format-security --sysroot=' 'CPPFLAGS=' 'CPP=aarch64-dent-linux-gcc -E --sysroot= -mbranch-protection=standard -fstack-protector-strong -O2 -D_FORTIFY_SOURCE=2 -Wformat -Wformat-security -Werror=format-security' 'CXX=aarch64-dent-linux-g++ -mbranch-protection=standard -fstack-protector-strong -O2 -D_FORTIFY_SOURCE=2 -Wformat -Wformat-security -Werror=format-security --sysroot=' 'PYTHON=/usr/bin/python3-native/python3'

When the issue occurs the zebra zserv.api socket is owned by root instead of frr:

# ls -l /var/run/frr/zserv.api
srwx------ 1 root frr 0 Aug 28 06:20 /var/run/frr/zserv.api

Looking into the code it seems the only case for root to own this socket would be to use a TCP connection but it is not the case for our configuration.

I have seen this issue on the latest frr release (10.1) as well.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions