Improve and fix zebra GR #13145

donaldsharp · 2023-03-29T20:18:38Z

See individual commits but the high points:

a) Zebra was storing afi/safi data for GR but only acting on AFI/SAFI_UNICAST. Remove the redundancy
b) Fix GR code to not accidently stop iterating when a node goes away
c) Fix GR code to process the AFI when it receives the notification that the afi is done
d) Fix GR code to run after anything in Meta-Q since we have a race condition

The safi has no 0 value which it is set to as part of the initialization. Let's just set it to a sensible value. Signed-off-by: Donald Sharp <sharpd@nvidia.com>

The info->do_delete variable was being set to true only when u.val was 1. The problem with this is that u.val is a union and the various ways that we can call this event causes different values to be written to the union value on the thread. This makes no sense. Just set the variable to what we want it to be when we need it to be true. Since it was only ever set during a thread_execute section. Signed-off-by: Donald Sharp <sharpd@nvidia.com>

Indentation was deep and hard to understand in zebra_gr_delete_stale_route Signed-off-by: Donald Sharp <sharpd@nvidia.com>

When GR is running and attempting to clear up a node if the node that is currently saved and we are coming back to happens to be deleted during the time zebra suspends the GR code due to hitting the node limit then zebra GR code will just completely stop processing and potentially leave stale nodes around forever. Let's just remove this hole and process what we can. Can you imagine trying to debug this after the fact? If we remove a node then that counts toward the maximum to process of ZEBRA_MAX_STALE_ROUTE_COUNT. This should prevent any non-processing with a slightly larger cost of having to look at a few nodes repeatedly Signed-off-by: Donald Sharp <sharpd@nvidia.com>

By the time this function is called we have already ensured that the pointers are good several times. I like consistency but this is a bit much Signed-off-by: Donald Sharp <sharpd@nvidia.com>

We have code that tracks both afi and safi's, but we only ever operate on the afi's. So lets limit our work being done to something more sensible. I'm leaving the safi being broadcast through the zapi message, as that I am not sure what else should be ripped out at this point in time. Finally re-arrange the zread_client_capabilites function to stop the multiple levels of function calling that really serve no purpose. Signed-off-by: Donald Sharp <sharpd@nvidia.com>

The zebra_gr code had 3 functions when effectively only 1 was needed. Cleans up some code weirdness around multiple switch statements for the same api->cap as well as consolidating down to only caring about SAFI_UNICAST, since that is all we care about at the moment. Signed-off-by: Donald Sharp <sharpd@nvidia.com>

The GR code in FRR used to wait till all AFI's were complete before cleaning up the routes from the upper level protocol. This of course can lead to some weird situations where say ipv4 finishes and then v6 is stuck waiting for a peer to come up and never finishes. v4 when it finishes signals zebra that it is done but no action is taken at that moment. Modify the code to allow the zebra_gr.c code to handle a per afi removal, instead of doing it all at the end. Signed-off-by: Donald Sharp <sharpd@nvidia.com>

After the restructure of the gr code to allow zebra_gr to have individual cleanups of afi, this is no longer necessary. Signed-off-by: Donald Sharp <sharpd@nvidia.com>

siddanmk · 2023-03-29T20:28:07Z

zebra/zebra_rib.c

@@ -3727,6 +3755,12 @@ static void early_route_meta_queue_free(struct meta_queue *mq, struct list *l,
 	}
 }

+static void rib_meta_queue_gr_run_free(struct meta_queue *mq, struct list *l,
+				       struct zebra_vrf *zvrf)


Do we intend to free anything here?

it's overrated if you ask me. I've fixed it though

NetDEF-CI · 2023-03-29T23:11:16Z

Continuous Integration Result: SUCCESSFUL

Congratulations, this patch passed basic tests

Tested-by: NetDEF / OpenSourceRouting.org CI System

CI System Testrun URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-10450/

This is a comment from an automated CI system.
For questions and feedback in regards to this CI system, please feel free to email
Martin Winter - mwinter (at) opensourcerouting.org.

Warnings Generated during build:

Checkout code: Successful with additional warnings

Report for zebra_gr.c | 4 issues
===============================================
< WARNING: else is not generally useful after a break or return
< #403: FILE: /tmp/f1-996779/zebra_gr.c:403:
< WARNING: void function return statements are not generally useful
< #549: FILE: /tmp/f1-996779/zebra_gr.c:549:
Report for zebra_rib.c | 4 issues
===============================================
< WARNING: void function return statements are not generally useful
< #3116: FILE: /tmp/f1-996779/zebra_rib.c:3116:
< WARNING: void function return statements are not generally useful
< #3762: FILE: /tmp/f1-996779/zebra_rib.c:3762:

BGP signals to zebra that a afi has converged immediately after it has finished processing all routes for a given afi/safi. This generates events in zebra in this order a) Routes received from BGP, placed on early-rib Meta-Q b) Signal GR for the afi. Now imagine that zebra reads GR code and immediately processes routes that are in the actual rib and removes some routes. This generates a c) route deletion to the kernel for some number of routes that may be in the the early-rib Meta-Q d) Process the Meta-Q, and re-install the routes This is undesirable behavior in zebra. In that while we may end up in a correct state, there will be a blip for some number of routes that happen to be in the early rib Meta-Q. Modify the GR code to have it's own processing entry at the end of the Meta-Q. This will allow all routes to be processed and ready for handling by the Graceful Restart code. Signed-off-by: Donald Sharp <sharpd@nvidia.com>

NetDEF-CI · 2023-03-30T03:09:33Z

Continuous Integration Result: SUCCESSFUL

Congratulations, this patch passed basic tests

Tested-by: NetDEF / OpenSourceRouting.org CI System

CI System Testrun URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-10454/

This is a comment from an automated CI system.
For questions and feedback in regards to this CI system, please feel free to email
Martin Winter - mwinter (at) opensourcerouting.org.

donaldsharp added 9 commits March 29, 2023 07:48

lib: Ensure the safi is set to a sensible value

fbdc605

The safi has no 0 value which it is set to as part of the initialization. Let's just set it to a sensible value. Signed-off-by: Donald Sharp <sharpd@nvidia.com>

zebra: Cleanup indentation in function

559dbc2

Indentation was deep and hard to understand in zebra_gr_delete_stale_route Signed-off-by: Donald Sharp <sharpd@nvidia.com>

zebra: Remove redundant check for pointers being good

096abfb

By the time this function is called we have already ensured that the pointers are good several times. I like consistency but this is a bit much Signed-off-by: Donald Sharp <sharpd@nvidia.com>

zebra: remove current_afi as that it is no longer used

644a8d3

After the restructure of the gr code to allow zebra_gr to have individual cleanups of afi, this is no longer necessary. Signed-off-by: Donald Sharp <sharpd@nvidia.com>

github-actions bot added master size/L labels Mar 29, 2023

siddanmk reviewed Mar 29, 2023

View reviewed changes

donaldsharp force-pushed the do_delete branch from f3e2fc5 to 81322b9 Compare March 30, 2023 00:27

Jafaral self-requested a review April 4, 2023 15:52

Jafaral approved these changes Apr 5, 2023

View reviewed changes

Jafaral merged commit 92c4494 into FRRouting:master Apr 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve and fix zebra GR #13145

Improve and fix zebra GR #13145

donaldsharp commented Mar 29, 2023

siddanmk Mar 29, 2023

donaldsharp Mar 30, 2023

NetDEF-CI commented Mar 29, 2023 •

edited

Loading

Continuous Integration Result: SUCCESSFUL

Warnings Generated during build:

NetDEF-CI commented Mar 30, 2023

Improve and fix zebra GR #13145

Improve and fix zebra GR #13145

Conversation

donaldsharp commented Mar 29, 2023

siddanmk Mar 29, 2023

Choose a reason for hiding this comment

donaldsharp Mar 30, 2023

Choose a reason for hiding this comment

NetDEF-CI commented Mar 29, 2023 • edited Loading

Continuous Integration Result: SUCCESSFUL

Warnings Generated during build:

NetDEF-CI commented Mar 30, 2023

Continuous Integration Result: SUCCESSFUL

NetDEF-CI commented Mar 29, 2023 •

edited

Loading