Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zebra: prevent multiple up notifications to zclients #14366

Conversation

pguibert6WIND
Copy link
Member

When using tcpdump over an interface, the interface raises, then unraises the promiscuous mode to that interface. The zebra daemon systematically considers this change as an up event, and all the zebra clients are triggered.

Fix this by avoiding sending multiple up notifications if at last notification, the status of the interface was already up.

When using tcpdump over an interface, the interface raises,
then unraises the promiscuous mode to that interface. The
zebra daemon systematically considers this change as an up
event, and all the zebra clients are triggered.

Fix this by avoiding sending multiple up notifications if
at last notification, the status of the interface was already
up.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Copy link
Contributor

@mjstapp mjstapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't there many interface changes that are distributed in this path - are you certain that adding this restriction will continue to distribute all of the interface attribute changes promptly?

@NetDEF-CI
Copy link
Collaborator

Continuous Integration Result: FAILED

Test incomplete. See below for issues.
CI System Testrun URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-14042/

This is a comment from an automated CI system.
For questions and feedback in regards to this CI system, please feel free to email
Martin Winter - mwinter (at) opensourcerouting.org.

Get source / Pull Request: Successful

Building Stage: Successful

Basic Tests: Incomplete

Topotests Ubuntu 18.04 arm8 part 7: Failed (click for details) Topotests Ubuntu 18.04 arm8 part 7: No useful log found
Topotests Ubuntu 18.04 arm8 part 8: Failed (click for details) Topotests Ubuntu 18.04 arm8 part 8: No useful log found
Topotests debian 10 amd64 part 9: Failed (click for details)

Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO9DEB10AMD64-14042/test

Topology Tests failed for Topotests debian 10 amd64 part 9
see full log at https://ci1.netdef.org/browse/FRR-PULLREQ2-14042/artifact/TOPO9DEB10AMD64/TopotestLogs/log_topotests.txt
Topotests debian 10 amd64 part 9: Unknown Log
URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-14042/artifact/TOPO9DEB10AMD64/TopotestDetails/

Topotests Ubuntu 18.04 arm8 part 9: Failed (click for details) Topotests Ubuntu 18.04 arm8 part 9: Unknown Log URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-14042/artifact/TOPO9U18ARM8/TopotestDetails/ Topotests Ubuntu 18.04 arm8 part 9: No useful log found
Topotests Ubuntu 18.04 amd64 part 9: Failed (click for details)

Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO9U18AMD64-14042/test

Topology Tests failed for Topotests Ubuntu 18.04 amd64 part 9
see full log at https://ci1.netdef.org/browse/FRR-PULLREQ2-14042/artifact/TOPO9U18AMD64/TopotestLogs/log_topotests.txt
Topotests Ubuntu 18.04 amd64 part 9: Unknown Log
URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-14042/artifact/TOPO9U18AMD64/TopotestDetails/

Topotests debian 10 amd64 part 6: Incomplete (check logs for details)
Addresssanitizer topotests part 4: Incomplete (check logs for details)
Addresssanitizer topotests part 1: Failed (click for details)
***********************************************************************************
Address Sanitizer Error detected in bgp_auth.test_bgp_auth2/R3.asan.ospfd.9159

=================================================================
==9159==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 384 byte(s) in 3 object(s) allocated from:
    #0 0x7f75067aad28 in __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xded28)
    #1 0x7f75061a60da in qcalloc lib/memory.c:105
    #2 0x55b548e77b97 in ospf_lsa_new ospfd/ospf_lsa.c:186
    #3 0x55b548e77ea1 in ospf_lsa_new_and_data ospfd/ospf_lsa.c:205
    #4 0x55b548eaf954 in ospf_ls_upd_list_lsa ospfd/ospf_packet.c:1800
    #5 0x55b548eaf954 in ospf_ls_upd ospfd/ospf_packet.c:1885
    #6 0x55b548eaf954 in ospf_read_helper ospfd/ospf_packet.c:3201
    #7 0x55b548eaf954 in ospf_read ospfd/ospf_packet.c:3232
    #8 0x7f7506244b99 in event_call lib/event.c:1979
    #9 0x7f7506189379 in frr_run lib/libfrr.c:1213
    #10 0x55b548e55ab4 in main ospfd/ospf_main.c:249
    #11 0x7f75057a1c86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)

Indirect leak of 116 byte(s) in 3 object(s) allocated from:
    #0 0x7f75067aad28 in __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xded28)
    #1 0x7f75061a60da in qcalloc lib/memory.c:105
    #2 0x55b548e77e90 in ospf_lsa_data_new ospfd/ospf_lsa.c:296
    #3 0x55b548e77eac in ospf_lsa_new_and_data ospfd/ospf_lsa.c:206
    #4 0x55b548eaf954 in ospf_ls_upd_list_lsa ospfd/ospf_packet.c:1800
    #5 0x55b548eaf954 in ospf_ls_upd ospfd/ospf_packet.c:1885
    #6 0x55b548eaf954 in ospf_read_helper ospfd/ospf_packet.c:3201
    #7 0x55b548eaf954 in ospf_read ospfd/ospf_packet.c:3232
    #8 0x7f7506244b99 in event_call lib/event.c:1979
    #9 0x7f7506189379 in frr_run lib/libfrr.c:1213
    #10 0x55b548e55ab4 in main ospfd/ospf_main.c:249
    #11 0x7f75057a1c86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)

SUMMARY: AddressSanitizer: 500 byte(s) leaked in 6 allocation(s).
***********************************************************************************

Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-ASAN1-14042/test

Topology Tests failed for Addresssanitizer topotests part 1
see full log at https://ci1.netdef.org/browse/FRR-PULLREQ2-14042/artifact/ASAN1/TopotestLogs/log_topotests.txt
Addresssanitizer topotests part 1: Unknown Log
URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-14042/artifact/ASAN1/TopotestDetails/

Topotests Ubuntu 18.04 i386 part 9: Failed (click for details)

Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO9U18I386-14042/test

Topology Tests failed for Topotests Ubuntu 18.04 i386 part 9
see full log at https://ci1.netdef.org/browse/FRR-PULLREQ2-14042/artifact/TOPO9U18I386/TopotestLogs/log_topotests.txt
Topotests Ubuntu 18.04 i386 part 9: Unknown Log
URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-14042/artifact/TOPO9U18I386/TopotestDetails/

Successful on other platforms/tests
  • Addresssanitizer topotests part 9
  • Topotests Ubuntu 18.04 amd64 part 3
  • Topotests Ubuntu 18.04 i386 part 2
  • Topotests debian 10 amd64 part 5
  • Topotests Ubuntu 18.04 arm8 part 2
  • Topotests debian 10 amd64 part 0
  • Addresssanitizer topotests part 7
  • Topotests Ubuntu 18.04 i386 part 6
  • Topotests Ubuntu 18.04 i386 part 1
  • Topotests Ubuntu 18.04 i386 part 7
  • Topotests Ubuntu 18.04 amd64 part 5
  • Addresssanitizer topotests part 5
  • Topotests Ubuntu 18.04 i386 part 3
  • Topotests Ubuntu 18.04 amd64 part 2
  • Topotests Ubuntu 18.04 amd64 part 0
  • Topotests Ubuntu 18.04 amd64 part 7
  • Topotests Ubuntu 18.04 i386 part 5
  • Topotests Ubuntu 18.04 i386 part 8
  • Topotests Ubuntu 18.04 i386 part 0
  • Addresssanitizer topotests part 3
  • Topotests debian 10 amd64 part 8
  • Topotests Ubuntu 18.04 arm8 part 4
  • Addresssanitizer topotests part 2
  • Topotests Ubuntu 18.04 amd64 part 4
  • CentOS 7 rpm pkg check
  • Topotests debian 10 amd64 part 2
  • Topotests Ubuntu 18.04 amd64 part 8
  • Topotests debian 10 amd64 part 7
  • Topotests Ubuntu 18.04 arm8 part 0
  • Static analyzer (clang)
  • Addresssanitizer topotests part 8
  • Ubuntu 20.04 deb pkg check
  • Debian 10 deb pkg check
  • Topotests Ubuntu 18.04 arm8 part 5
  • Addresssanitizer topotests part 6
  • Topotests Ubuntu 18.04 amd64 part 1
  • Ubuntu 18.04 deb pkg check
  • Topotests debian 10 amd64 part 1
  • Topotests Ubuntu 18.04 arm8 part 1
  • Topotests Ubuntu 18.04 arm8 part 6
  • Topotests Ubuntu 18.04 amd64 part 6
  • Addresssanitizer topotests part 0
  • Topotests debian 10 amd64 part 4
  • Topotests debian 10 amd64 part 3
  • Debian 9 deb pkg check
  • Topotests Ubuntu 18.04 i386 part 4
  • Topotests Ubuntu 18.04 arm8 part 3

@pguibert6WIND
Copy link
Member Author

Aren't there many interface changes that are distributed in this path - are you certain that adding this restriction will continue to distribute all of the interface attribute changes promptly?

I made a test that shows that on static daemon for instance, you are not disturbed again and again, because of an operator that often does tcpdump.

This said, I am still not sure this is the good place to do the fix, this is why I created the pull request to see.

@mjstapp
Copy link
Contributor

mjstapp commented Sep 7, 2023

So do you know exactly what about 'tcpdump' is triggering these notifications to zebra (and then out to the daemons)? Is it an interface flag, like a "promiscuous" flag or something similar? Why is this a problem - is staticd actually reacting to that change in some way? I mean, a zapi message is a ... microsecond sort of event, unless the message actually triggers some response somewhere in the system.

If you could identify a specific pattern, a specific attribute change, you could make a configurable filter that would allow someone to drop those notifications, but it'd take some work to distinguish that specific change, since there are so many interface attributes.

If you could identify that staticd was reacting to a benign event (like a flag change) in an incorrect way, then that could be fixed.

Aren't there many interface changes that are distributed in this path - are you certain that adding this restriction will continue to distribute all of the interface attribute changes promptly?

I made a test that shows that on static daemon for instance, you are not disturbed again and again, because of an operator that often does tcpdump.

This said, I am still not sure this is the good place to do the fix, this is why I created the pull request to see.

Copy link
Contributor

@eqvinox eqvinox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NAK — pimd actually returns the PROMISC flag from some show commands (including JSON output), so the up notification is needed to have pimd update its stored flags. (And returning the PROMISC flag can be relevant for debugging multicast issues with L2 filtering.)

@pguibert6WIND
Copy link
Member Author

Possibly, the promiscuous flag is the faulty one, and before David told about that, I was thinking none in FRR cares about those flags.

The dire consequence of an up event sent multiple times is that the calling daemon may do multiple times the same action, if it is not protected against multiple consecutive up events. For instance, suppose you receive an up event that triggers staticd to call ZAPI_ROUTE_ADD for a series of 10 static routes. Possibly, a second if up event will call again ZAPI_ROUTE_ADD 10 more times.
What I would like to address is to prevent from creating extra CPU for nothing.

So do you know exactly what about 'tcpdump' is triggering these notifications to zebra (and then out to the daemons)? Is it an interface flag, like a "promiscuous" flag or something similar? Why is this a problem - is staticd actually reacting to that change in some way? I mean, a zapi message is a ... microsecond sort of event, unless the message actually triggers some response somewhere in the system.

If you could identify a specific pattern, a specific attribute change, you could make a configurable filter that would allow someone to drop those notifications, but it'd take some work to distinguish that specific change, since there are so many interface attributes.

If you could identify that staticd was reacting to a benign event (like a flag change) in an incorrect way, then that could be fixed.

Aren't there many interface changes that are distributed in this path - are you certain that adding this restriction will continue to distribute all of the interface attribute changes promptly?

I made a test that shows that on static daemon for instance, you are not disturbed again and again, because of an operator that often does tcpdump.
This said, I am still not sure this is the good place to do the fix, this is why I created the pull request to see.

@mjstapp
Copy link
Contributor

mjstapp commented Sep 8, 2023

Ah - and this is exactly what I was trying to get at. If staticd does something bad, then staticd has a bug, and that is where the fix should be applied. The zapi message that zebra sends conveys many kinds of information, not just a binary up/down state change. The daemons should be able to see updates to an interface's properties without assuming that the interface has somehow silently gone down and returned.

The dire consequence of an up event sent multiple times is that the calling daemon may do multiple times the same action, if it is not protected against multiple consecutive up events. For instance, suppose you receive an up event that triggers staticd to call ZAPI_ROUTE_ADD for a series of 10 static routes. Possibly, a second if up event will call again ZAPI_ROUTE_ADD 10 more times. What I would like to address is to prevent from creating extra CPU for nothing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants