-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve and fix zebra GR #13145
Improve and fix zebra GR #13145
Conversation
The safi has no 0 value which it is set to as part of the initialization. Let's just set it to a sensible value. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The info->do_delete variable was being set to true only when u.val was 1. The problem with this is that u.val is a union and the various ways that we can call this event causes different values to be written to the union value on the thread. This makes no sense. Just set the variable to what we want it to be when we need it to be true. Since it was only ever set during a thread_execute section. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Indentation was deep and hard to understand in zebra_gr_delete_stale_route Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When GR is running and attempting to clear up a node if the node that is currently saved and we are coming back to happens to be deleted during the time zebra suspends the GR code due to hitting the node limit then zebra GR code will just completely stop processing and potentially leave stale nodes around forever. Let's just remove this hole and process what we can. Can you imagine trying to debug this after the fact? If we remove a node then that counts toward the maximum to process of ZEBRA_MAX_STALE_ROUTE_COUNT. This should prevent any non-processing with a slightly larger cost of having to look at a few nodes repeatedly Signed-off-by: Donald Sharp <sharpd@nvidia.com>
By the time this function is called we have already ensured that the pointers are good several times. I like consistency but this is a bit much Signed-off-by: Donald Sharp <sharpd@nvidia.com>
We have code that tracks both afi and safi's, but we only ever operate on the afi's. So lets limit our work being done to something more sensible. I'm leaving the safi being broadcast through the zapi message, as that I am not sure what else should be ripped out at this point in time. Finally re-arrange the zread_client_capabilites function to stop the multiple levels of function calling that really serve no purpose. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The zebra_gr code had 3 functions when effectively only 1 was needed. Cleans up some code weirdness around multiple switch statements for the same api->cap as well as consolidating down to only caring about SAFI_UNICAST, since that is all we care about at the moment. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The GR code in FRR used to wait till all AFI's were complete before cleaning up the routes from the upper level protocol. This of course can lead to some weird situations where say ipv4 finishes and then v6 is stuck waiting for a peer to come up and never finishes. v4 when it finishes signals zebra that it is done but no action is taken at that moment. Modify the code to allow the zebra_gr.c code to handle a per afi removal, instead of doing it all at the end. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
After the restructure of the gr code to allow zebra_gr to have individual cleanups of afi, this is no longer necessary. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
@@ -3727,6 +3755,12 @@ static void early_route_meta_queue_free(struct meta_queue *mq, struct list *l, | |||
} | |||
} | |||
|
|||
static void rib_meta_queue_gr_run_free(struct meta_queue *mq, struct list *l, | |||
struct zebra_vrf *zvrf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we intend to free anything here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's overrated if you ask me. I've fixed it though
Continuous Integration Result: SUCCESSFULContinuous Integration Result: SUCCESSFULCongratulations, this patch passed basic tests Tested-by: NetDEF / OpenSourceRouting.org CI System CI System Testrun URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-10450/ This is a comment from an automated CI system. Warnings Generated during build:Checkout code: Successful with additional warnings
|
BGP signals to zebra that a afi has converged immediately after it has finished processing all routes for a given afi/safi. This generates events in zebra in this order a) Routes received from BGP, placed on early-rib Meta-Q b) Signal GR for the afi. Now imagine that zebra reads GR code and immediately processes routes that are in the actual rib and removes some routes. This generates a c) route deletion to the kernel for some number of routes that may be in the the early-rib Meta-Q d) Process the Meta-Q, and re-install the routes This is undesirable behavior in zebra. In that while we may end up in a correct state, there will be a blip for some number of routes that happen to be in the early rib Meta-Q. Modify the GR code to have it's own processing entry at the end of the Meta-Q. This will allow all routes to be processed and ready for handling by the Graceful Restart code. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Continuous Integration Result: SUCCESSFULCongratulations, this patch passed basic tests Tested-by: NetDEF / OpenSourceRouting.org CI System CI System Testrun URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-10454/ This is a comment from an automated CI system. |
See individual commits but the high points:
a) Zebra was storing afi/safi data for GR but only acting on AFI/SAFI_UNICAST. Remove the redundancy
b) Fix GR code to not accidently stop iterating when a node goes away
c) Fix GR code to process the AFI when it receives the notification that the afi is done
d) Fix GR code to run after anything in Meta-Q since we have a race condition