-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PIC support in the srv6 VPN scenario. #16879
base: master
Are you sure you want to change the base?
Add PIC support in the srv6 VPN scenario. #16879
Conversation
3c64ad3
to
ad254d2
Compare
ad254d2
to
669850c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments.
The commit should be broken into multiple commits (bgpd, lib, zebra, ...)
https://docs.frrouting.org/projects/dev-guide/en/latest/workflow.html#commit-guidelines
zebra/rt_netlink.c
Outdated
@@ -2289,6 +2289,9 @@ ssize_t netlink_route_multipath_msg_encode(int cmd, struct zebra_dplane_ctx *ctx | |||
|
|||
/* Table corresponding to this route. */ | |||
table_id = dplane_ctx_get_table(ctx); | |||
if (fpm) | |||
table_id = dplane_ctx_get_vrf(ctx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we overriding the table ID with the VRF ID here? Isn't this a breaking change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part uses table_id as VRF lookup in fpmsyncd. This should be placed in the FRR patch of SONiC instead of being modified directly here. I have fixed it.
zebra/main.c
Outdated
@@ -65,6 +65,8 @@ struct mgmt_be_client *mgmt_be_client; | |||
/* Route retain mode flag. */ | |||
int retain_mode = 0; | |||
|
|||
bool fpm_pic_nexthop = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this variable used for? It is set to true here and never changed. It seems that it is unnecessary and can be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
zebra/zebra_dplane.c
Outdated
|
||
ctx->u.rinfo.nhe.id = nhe->id; | ||
ctx->u.rinfo.nhe.old_id = 0; | ||
if (nhe->pic_nhe) { | ||
/**/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix the empty comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a topotest, and a documentation a bit?
bgpd/bgp_zebra.c
Outdated
@@ -1425,6 +1426,15 @@ static void bgp_zebra_announce_parse_nexthop( | |||
memcpy(&api_nh->seg6_segs[0], sid_tmp, | |||
sizeof(api_nh->seg6_segs[0])); | |||
|
|||
if (mpinfo->peer->connection->status == Established && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use peer_established()
?
bgpd/bgp_zebra.c
Outdated
@@ -1425,6 +1426,15 @@ static void bgp_zebra_announce_parse_nexthop( | |||
memcpy(&api_nh->seg6_segs[0], sid_tmp, | |||
sizeof(api_nh->seg6_segs[0])); | |||
|
|||
if (mpinfo->peer->connection->status == Established && | |||
mpinfo->peer->su_local->sa.sa_family == AF_INET6) { | |||
memcpy(&api_nh->seg6_src, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IPV6_ADDR_COPY()
lib/nexthop.c
Outdated
@@ -888,6 +891,7 @@ void nexthop_copy_no_recurse(struct nexthop *copy, | |||
memcpy(©->rmap_src, &nexthop->rmap_src, sizeof(nexthop->rmap_src)); | |||
memcpy(©->rmac, &nexthop->rmac, sizeof(nexthop->rmac)); | |||
copy->rparent = rparent; | |||
copy->nh_encap.vni = nexthop->nh_encap.vni; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be a separate commit?
lib/zclient.c
Outdated
@@ -2273,6 +2286,11 @@ int zapi_nexthop_from_nexthop(struct zapi_nexthop *znh, | |||
&nh->nh_srv6->seg6_segs->seg[i], | |||
sizeof(struct in6_addr)); | |||
} | |||
if (!sid_zero_ipv6(&nh->nh_srv6->seg6_src)) { | |||
SET_FLAG(znh->flags, ZAPI_NEXTHOP_FLAG_SEG6_SRC); | |||
memcpy(&znh->seg6_src, &nh->nh_srv6->seg6_src, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IPV6_ADDR_COPY()
lib/nexthop.c
Outdated
@@ -642,6 +642,9 @@ void nexthop_add_srv6_seg6(struct nexthop *nexthop, const struct in6_addr *segs, | |||
for (i = 0; i < num_segs; i++) | |||
memcpy(&nexthop->nh_srv6->seg6_segs->seg[i], &segs[i], | |||
sizeof(struct in6_addr)); | |||
if (segs_src) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Please drop {}
for a single statement (everywhere in the code).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
zebra/zebra_nhg.c
Outdated
@@ -688,18 +689,26 @@ static void handle_recursive_depend(struct nhg_connected_tree_head *nhg_depends, | |||
depends_add(nhg_depends, depend); | |||
} | |||
|
|||
static bool zebra_need_to_create_pic(struct nexthop *nh) | |||
{ | |||
if (nh && nh->nh_srv6 && !sid_zero(nh->nh_srv6->seg6_segs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be better if the pic configuration was configurable.
by the way; does it make sense to apply this only for SRv6 ?
what about MPLS ? and what about regular IP routes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Previously, we only used the PIC design for the SRv6 scenario. In fact, the PIC design is also used for MPLS and VXLAN scenarios. I will modify this logic to be more generic later. Thank you.
documentation (for outputs) and test (to demonstrate the efficiency) are obviously welcome |
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Sorry for the late reply. I have been busy with another project recently. The documentation and topotests are being supplemented locally. In the next one to two weeks, I will add the documentation and topotests to this PR as a separate commit. Thank you. |
Why closing this and opening a new one? It's not acceptable, can we continue with this PR because it retains the history, comments, etc.? |
669850c
to
941fe35
Compare
|
Sure, because I switched the submission branch and removed some redundant code, and also did a rebase, the changes are quite significant. So, I have resubmitted the PR. I have now pushed the latest modifications to the current branch and reopened the PR. Please help review it. Thank you. |
360bedc
to
39d8687
Compare
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
zebra/zebra_rib.c
Outdated
switch (afi) { | ||
case AFI_IP: | ||
nh = nexthop_from_ipv4(&p->u.prefix4, NULL, zvrf_id(zvrf)); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the prefix of a route node is a /24 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this PR, it is indeed not possible to handle non-/32 routes. I have currently found a solution to handle all scenarios, which requires using RNH to find the actual next hop. I will be pushing this modification later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks. keep me informed.
pic_nh_lookup.type = ZEBRA_ROUTE_NHG; | ||
pic_nh_lookup.vrf_id = nhe->vrf_id; | ||
SET_FLAG(pic_nh_lookup.flags, NEXTHOP_GROUP_PIC_NHT); | ||
/* the nhg.nexthop is sorted */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think having this flag will be kept in the future NHG, because the PIC NHG may be also used by an other route, and when calling nhe_find, we may stop at https://github.com/FRRouting/frr/blob/master/zebra/zebra_nhg.c#L705, before creating a PIC_NHT context.
If we want to benefit from PIC on current kernel, we have to somehow have this flag set on PIC NHE.
There are some CI issues ( I did not look yet at test results):
|
zlog_debug("%s: pic_nhe %ul become invaild during route %pRN deleted, update pic_nh dependents %ul", | ||
__func__, picnhe->id, rn, rb_node_dep->nhe->id); | ||
UNSET_FLAG(rb_node_dep->nhe->flags, NEXTHOP_GROUP_INSTALLED); | ||
zebra_nhg_install_kernel(rb_node_dep->nhe, ZEBRA_ROUTE_MAX); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To illustrate, if 172.31.0.3 is removed, the dependent NHGIDs that use 172.31.0.3 will be removed.
The NHGID will be attempted to be removed, but because there is an other nexthop, the NHE will be installed, but with only the remaining nexthop. Right?
B> 192.0.2.9/32 [200/0] (74) (pic_nh 77) via 192.0.2.5 (vrf default) (recursive), label 500, weight 1, 00:00:08
* via 172.31.0.3, r1-eth1 (vrf default), label 16005/500, weight 1, 00:00:08
* via 172.31.2.4, r1-eth2 (vrf default), label 16005/500, weight 1, 00:00:08
r1# show nexthop-group rib 77
ID: 77 (zebra)
RefCnt: 1
Uptime: 00:00:13
VRF: default(bad-value)
Valid
Depends: (68)
via 192.0.2.5 (vrf default) (recursive), weight 1
via 172.31.0.3, r1-eth1 (vrf default), weight 1
via 172.31.2.4, r1-eth2 (vrf default), weight 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be started from 172.31.0.3 withdraw event to zebra. PIC fix up could be kicked in to fix this NHG via removing the failed path. The NHG still belongs to 172.31.0.3 and 172.31.2.4.
Zebra will inform protocol process (bgp) who owns 192.0.2.9/32 to recalculate the paths. BGP will inform zebra for 192.0.2.9/32 via 172.31.2.4. This would create a new NHG for 172.31.2.4, rebind prefix 192.0.2.9/32 to this new NHG, and remove old NHG for 172.31.0.3 and 172.31.2.4 if no one uses it.
Currently, we use NH address as part of hash key for NHG, so we can't reuse NHG.
@zice312963205 , in reply to our discussion, I think a consistent solution should be found for linux kernel too. I think it is possible to handle the missing nexthop group hierarchy in the linux kernel, by doing a backwalk of all the NHE that use that pic_nhe in the |
04dca9e
to
594caa8
Compare
zebra/main.c
Outdated
{ "v6-rr-semantics", no_argument, NULL, OPTION_V6_RR_SEMANTICS }, | ||
{ "vrfwnetns", no_argument, NULL, 'n' }, | ||
{ "nl-bufsize", required_argument, NULL, 's' }, | ||
{ "v6-rr-semantics", no_argument, NULL, OPTION_V6_RR_SEMANTICS }, | ||
#endif /* HAVE_NETLINK */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it really necessary to change the indentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
I didn't get your comments. NHG would be aware it is via v4 NH or via V6 NH. But it would not be aware what type of prefixes would via it. It could be VPNv6 or VPNv6 both via the same NHG, which is v4 NH plus vpn label. In your example, PIC NHG is NHG 46, a.k.a NHG without any context. label NHG would be NHG 45, which contains label context. The purpose for PIC NHG is to allow NH information be sharable. This way, we could have a quick fix up when an NH is lost. |
In the latest changes, I added a picnh_dependents structure. This structure can assist with reverse lookup in the zebra_update_pic_nhe() function to update all NHEs to the kernel. I will help complete this part of the development later. |
1b945ec
to
7140e12
Compare
512d8f1
to
b6d2a9f
Compare
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
b6d2a9f
to
9657333
Compare
This PR implements the separation of nh_srv6-related information, which was originally stored in the nexthop, in the srv6-vpn scenario. It generates a new pic nexthop that contains only the next-hop forwarding information. Meanwhile, the original nexthop containing nh_srv6 is used as pic context information and indexed separately. With this modification, we can converge the nexthop group and update the FIB before notifying other protocols (such as BGP) when a route is withdrawn. This can significantly reduce packet loss duration during link failures. Signed-off-by: hanyu.zly&freddy <hanyu.zly@alibaba-inc.com>
9657333
to
a97f2b3
Compare
8be626a
to
722404b
Compare
…latform by reverse looking up all associated NHEs through pic_nh, and then sending NHG updates to the kernel. Signed-off-by: hanyu.zly <hanyu.zly@alibaba-inc.com>
722404b
to
814d402
Compare
What I did
[HLD] BGP PIC HLD
Why I did it
This PR implements the separation of nh_srv6-related information, which was originally stored in the nexthop, in the srv6-vpn scenario. It generates a new pic nexthop that contains only the next-hop forwarding information. Meanwhile, the original nexthop containing nh_srv6 is used as pic context information and indexed separately.
This is used to achieve fast switching of nexthop group members by identifying the corresponding members in the nexthop group based on route changes, and then performing operations on the members.
This PR is temporarily for review purposes. Documentation and test cases will be added later, and a complete PR will be resubmitted.