Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PIC support in the srv6 VPN scenario. #16879

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

zice312963205
Copy link
Contributor

What I did
[HLD] BGP PIC HLD

Why I did it
This PR implements the separation of nh_srv6-related information, which was originally stored in the nexthop, in the srv6-vpn scenario. It generates a new pic nexthop that contains only the next-hop forwarding information. Meanwhile, the original nexthop containing nh_srv6 is used as pic context information and indexed separately.
This is used to achieve fast switching of nexthop group members by identifying the corresponding members in the nexthop group based on route changes, and then performing operations on the members.

This PR is temporarily for review purposes. Documentation and test cases will be added later, and a complete PR will be resubmitted.

@frrbot frrbot bot added the bgp label Sep 20, 2024
@zice312963205 zice312963205 force-pushed the project-phoenixwing-freddy branch from 3c64ad3 to ad254d2 Compare September 20, 2024 06:41
@zice312963205 zice312963205 changed the title Add PIC support in the srv6 VPN scenario. [preview]Add PIC support in the srv6 VPN scenario. Sep 20, 2024
@zice312963205 zice312963205 force-pushed the project-phoenixwing-freddy branch from ad254d2 to 669850c Compare September 20, 2024 09:16
Copy link
Contributor

@cscarpitta cscarpitta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments.

The commit should be broken into multiple commits (bgpd, lib, zebra, ...)
https://docs.frrouting.org/projects/dev-guide/en/latest/workflow.html#commit-guidelines

@@ -2289,6 +2289,9 @@ ssize_t netlink_route_multipath_msg_encode(int cmd, struct zebra_dplane_ctx *ctx

/* Table corresponding to this route. */
table_id = dplane_ctx_get_table(ctx);
if (fpm)
table_id = dplane_ctx_get_vrf(ctx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we overriding the table ID with the VRF ID here? Isn't this a breaking change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part uses table_id as VRF lookup in fpmsyncd. This should be placed in the FRR patch of SONiC instead of being modified directly here. I have fixed it.

zebra/main.c Outdated
@@ -65,6 +65,8 @@ struct mgmt_be_client *mgmt_be_client;
/* Route retain mode flag. */
int retain_mode = 0;

bool fpm_pic_nexthop = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this variable used for? It is set to true here and never changed. It seems that it is unnecessary and can be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


ctx->u.rinfo.nhe.id = nhe->id;
ctx->u.rinfo.nhe.old_id = 0;
if (nhe->pic_nhe) {
/**/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the empty comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Member

@ton31337 ton31337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a topotest, and a documentation a bit?

bgpd/bgp_zebra.c Outdated
@@ -1425,6 +1426,15 @@ static void bgp_zebra_announce_parse_nexthop(
memcpy(&api_nh->seg6_segs[0], sid_tmp,
sizeof(api_nh->seg6_segs[0]));

if (mpinfo->peer->connection->status == Established &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use peer_established()?

bgpd/bgp_zebra.c Outdated
@@ -1425,6 +1426,15 @@ static void bgp_zebra_announce_parse_nexthop(
memcpy(&api_nh->seg6_segs[0], sid_tmp,
sizeof(api_nh->seg6_segs[0]));

if (mpinfo->peer->connection->status == Established &&
mpinfo->peer->su_local->sa.sa_family == AF_INET6) {
memcpy(&api_nh->seg6_src,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IPV6_ADDR_COPY()

lib/nexthop.c Outdated
@@ -888,6 +891,7 @@ void nexthop_copy_no_recurse(struct nexthop *copy,
memcpy(&copy->rmap_src, &nexthop->rmap_src, sizeof(nexthop->rmap_src));
memcpy(&copy->rmac, &nexthop->rmac, sizeof(nexthop->rmac));
copy->rparent = rparent;
copy->nh_encap.vni = nexthop->nh_encap.vni;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be a separate commit?

lib/zclient.c Outdated
@@ -2273,6 +2286,11 @@ int zapi_nexthop_from_nexthop(struct zapi_nexthop *znh,
&nh->nh_srv6->seg6_segs->seg[i],
sizeof(struct in6_addr));
}
if (!sid_zero_ipv6(&nh->nh_srv6->seg6_src)) {
SET_FLAG(znh->flags, ZAPI_NEXTHOP_FLAG_SEG6_SRC);
memcpy(&znh->seg6_src, &nh->nh_srv6->seg6_src,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IPV6_ADDR_COPY()

lib/nexthop.c Outdated
@@ -642,6 +642,9 @@ void nexthop_add_srv6_seg6(struct nexthop *nexthop, const struct in6_addr *segs,
for (i = 0; i < num_segs; i++)
memcpy(&nexthop->nh_srv6->seg6_segs->seg[i], &segs[i],
sizeof(struct in6_addr));
if (segs_src) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Please drop {} for a single statement (everywhere in the code).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@@ -688,18 +689,26 @@ static void handle_recursive_depend(struct nhg_connected_tree_head *nhg_depends,
depends_add(nhg_depends, depend);
}

static bool zebra_need_to_create_pic(struct nexthop *nh)
{
if (nh && nh->nh_srv6 && !sid_zero(nh->nh_srv6->seg6_segs))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be better if the pic configuration was configurable.
by the way; does it make sense to apply this only for SRv6 ?
what about MPLS ? and what about regular IP routes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Previously, we only used the PIC design for the SRv6 scenario. In fact, the PIC design is also used for MPLS and VXLAN scenarios. I will modify this logic to be more generic later. Thank you.

@pguibert6WIND
Copy link
Member

documentation (for outputs) and test (to demonstrate the efficiency) are obviously welcome

Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@zice312963205
Copy link
Contributor Author

documentation (for outputs) and test (to demonstrate the efficiency) are obviously welcome

Sorry for the late reply. I have been busy with another project recently. The documentation and topotests are being supplemented locally. In the next one to two weeks, I will add the documentation and topotests to this PR as a separate commit. Thank you.

@zice312963205
Copy link
Contributor Author

#17316

@ton31337
Copy link
Member

Why closing this and opening a new one? It's not acceptable, can we continue with this PR because it retains the history, comments, etc.?

@zice312963205 zice312963205 reopened this Nov 1, 2024
@zice312963205 zice312963205 force-pushed the project-phoenixwing-freddy branch from 669850c to 941fe35 Compare November 1, 2024 01:59
@frrbot frrbot bot added the zebra label Nov 1, 2024
@github-actions github-actions bot removed the conflicts label Nov 1, 2024
Copy link

mergify bot commented Nov 1, 2024

⚠️ The sha of the head commit of this PR conflicts with #17316. Mergify cannot evaluate rules on this PR. ⚠️

@zice312963205 zice312963205 changed the title [preview]Add PIC support in the srv6 VPN scenario. Add PIC support in the srv6 VPN scenario. Nov 1, 2024
@zice312963205
Copy link
Contributor Author

Why closing this and opening a new one? It's not acceptable, can we continue with this PR because it retains the history, comments, etc.?

Sure, because I switched the submission branch and removed some redundant code, and also did a rebase, the changes are quite significant. So, I have resubmitted the PR. I have now pushed the latest modifications to the current branch and reopened the PR. Please help review it. Thank you.

@zice312963205 zice312963205 force-pushed the project-phoenixwing-freddy branch 4 times, most recently from 360bedc to 39d8687 Compare November 1, 2024 10:05
Copy link

github-actions bot commented Nov 4, 2024

This pull request has conflicts, please resolve those before we can evaluate the pull request.

switch (afi) {
case AFI_IP:
nh = nexthop_from_ipv4(&p->u.prefix4, NULL, zvrf_id(zvrf));

Copy link
Member

@pguibert6WIND pguibert6WIND Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the prefix of a route node is a /24 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this PR, it is indeed not possible to handle non-/32 routes. I have currently found a solution to handle all scenarios, which requires using RNH to find the actual next hop. I will be pushing this modification later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. keep me informed.

pic_nh_lookup.type = ZEBRA_ROUTE_NHG;
pic_nh_lookup.vrf_id = nhe->vrf_id;
SET_FLAG(pic_nh_lookup.flags, NEXTHOP_GROUP_PIC_NHT);
/* the nhg.nexthop is sorted */
Copy link
Member

@pguibert6WIND pguibert6WIND Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think having this flag will be kept in the future NHG, because the PIC NHG may be also used by an other route, and when calling nhe_find, we may stop at https://github.com/FRRouting/frr/blob/master/zebra/zebra_nhg.c#L705, before creating a PIC_NHT context.

If we want to benefit from PIC on current kernel, we have to somehow have this flag set on PIC NHE.

@pguibert6WIND
Copy link
Member

There are some CI issues ( I did not look yet at test results):

zlog_debug("%s: pic_nhe %ul become invaild during route %pRN deleted, update pic_nh dependents %ul",
__func__, picnhe->id, rn, rb_node_dep->nhe->id);
UNSET_FLAG(rb_node_dep->nhe->flags, NEXTHOP_GROUP_INSTALLED);
zebra_nhg_install_kernel(rb_node_dep->nhe, ZEBRA_ROUTE_MAX);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To illustrate, if 172.31.0.3 is removed, the dependent NHGIDs that use 172.31.0.3 will be removed.
The NHGID will be attempted to be removed, but because there is an other nexthop, the NHE will be installed, but with only the remaining nexthop. Right?

B>  192.0.2.9/32 [200/0] (74) (pic_nh 77) via 192.0.2.5 (vrf default) (recursive), label 500, weight 1, 00:00:08
  *                                         via 172.31.0.3, r1-eth1 (vrf default), label 16005/500, weight 1, 00:00:08
  *                                         via 172.31.2.4, r1-eth2 (vrf default), label 16005/500, weight 1, 00:00:08
r1# show nexthop-group rib 77
ID: 77 (zebra)
     RefCnt: 1
     Uptime: 00:00:13
     VRF: default(bad-value)
     Valid
     Depends: (68)
        via 192.0.2.5 (vrf default) (recursive), weight 1
           via 172.31.0.3, r1-eth1 (vrf default), weight 1
           via 172.31.2.4, r1-eth2 (vrf default), weight 1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be started from 172.31.0.3 withdraw event to zebra. PIC fix up could be kicked in to fix this NHG via removing the failed path. The NHG still belongs to 172.31.0.3 and 172.31.2.4.

Zebra will inform protocol process (bgp) who owns 192.0.2.9/32 to recalculate the paths. BGP will inform zebra for 192.0.2.9/32 via 172.31.2.4. This would create a new NHG for 172.31.2.4, rebind prefix 192.0.2.9/32 to this new NHG, and remove old NHG for 172.31.0.3 and 172.31.2.4 if no one uses it.

Currently, we use NH address as part of hash key for NHG, so we can't reuse NHG.

@pguibert6WIND
Copy link
Member

@zice312963205 , in reply to our discussion, I think a consistent solution should be found for linux kernel too. I think it is possible to handle the missing nexthop group hierarchy in the linux kernel, by doing a backwalk of all the NHE that use that pic_nhe in the zebra_update_pic_nhe() function.

@zice312963205 zice312963205 force-pushed the project-phoenixwing-freddy branch from 04dca9e to 594caa8 Compare January 13, 2025 07:35
zebra/main.c Outdated
{ "v6-rr-semantics", no_argument, NULL, OPTION_V6_RR_SEMANTICS },
{ "vrfwnetns", no_argument, NULL, 'n' },
{ "nl-bufsize", required_argument, NULL, 's' },
{ "v6-rr-semantics", no_argument, NULL, OPTION_V6_RR_SEMANTICS },
#endif /* HAVE_NETLINK */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it really necessary to change the indentation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@eddieruan-alibaba
Copy link

When PIC is used, it seems the PIC nexthop is not aware if it should be used for ipv4 or ipv6.
This is the case for NHG 46, but not for NHG45.

I didn't get your comments. NHG would be aware it is via v4 NH or via V6 NH. But it would not be aware what type of prefixes would via it. It could be VPNv6 or VPNv6 both via the same NHG, which is v4 NH plus vpn label.

In your example, PIC NHG is NHG 46, a.k.a NHG without any context. label NHG would be NHG 45, which contains label context. The purpose for PIC NHG is to allow NH information be sharable. This way, we could have a quick fix up when an NH is lost.

@zice312963205
Copy link
Contributor Author

@zice312963205 , in reply to our discussion, I think a consistent solution should be found for linux kernel too. I think it is possible to handle the missing nexthop group hierarchy in the linux kernel, by doing a backwalk of all the NHE that use that pic_nhe in the zebra_update_pic_nhe() function.

In the latest changes, I added a picnh_dependents structure. This structure can assist with reverse lookup in the zebra_update_pic_nhe() function to update all NHEs to the kernel. I will help complete this part of the development later.

@zice312963205 zice312963205 force-pushed the project-phoenixwing-freddy branch 2 times, most recently from 1b945ec to 7140e12 Compare January 16, 2025 14:41
@frrbot frrbot bot added libfrr tests Topotests, make check, etc labels Jan 16, 2025
@zice312963205 zice312963205 force-pushed the project-phoenixwing-freddy branch 2 times, most recently from 512d8f1 to b6d2a9f Compare January 16, 2025 14:54
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@zice312963205 zice312963205 force-pushed the project-phoenixwing-freddy branch from b6d2a9f to 9657333 Compare January 22, 2025 09:26
    This PR implements the separation of nh_srv6-related information, which was originally stored in the nexthop, in the srv6-vpn scenario. It generates a new pic nexthop that contains only the next-hop forwarding information. Meanwhile, the original nexthop containing nh_srv6 is used as pic context information and indexed separately.

    With this modification, we can converge the nexthop group and update the FIB before notifying other protocols (such as BGP) when a route is withdrawn. This can significantly reduce packet loss duration during link failures.

    Signed-off-by: hanyu.zly&freddy <hanyu.zly@alibaba-inc.com>
@zice312963205 zice312963205 force-pushed the project-phoenixwing-freddy branch from 9657333 to a97f2b3 Compare January 22, 2025 09:33
@zice312963205 zice312963205 force-pushed the project-phoenixwing-freddy branch 3 times, most recently from 8be626a to 722404b Compare February 6, 2025 12:33
…latform by reverse looking up all associated NHEs through pic_nh, and then sending NHG updates to the kernel.

Signed-off-by: hanyu.zly <hanyu.zly@alibaba-inc.com>
@zice312963205 zice312963205 force-pushed the project-phoenixwing-freddy branch from 722404b to 814d402 Compare February 6, 2025 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bgp libfrr master rebase PR needs rebase size/XL tests Topotests, make check, etc zebra
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants