Skip to content

feat(fib): multipath-relax — group ECMP by AS_PATH length (ADR-0066)#262

Merged
lance0 merged 2 commits into
mainfrom
feat/ecmp-multipath-relax
May 24, 2026
Merged

feat(fib): multipath-relax — group ECMP by AS_PATH length (ADR-0066)#262
lance0 merged 2 commits into
mainfrom
feat/ecmp-multipath-relax

Conversation

@lance0
Copy link
Copy Markdown
Owner

@lance0 lance0 commented May 24, 2026

ECMP follow-up 1/4 — multipath-relax (ADR-0066)

First of the four deferred ADR-0066 ECMP follow-ups. Adds the global
[global].multipath_relax knob (default false): unicast ECMP groups equal-cost
candidates by AS_PATH length instead of an exact AS_PATH match, so
equal-length paths through different ASes co-install as multipath. This is FRR's
bgp bestpath as-path multipath-relax.

Why global, not per-[[fib_tables]]

multipath-relax changes grouping, not just the per-table count. The FIB
install-candidate query runs once at the widest maximum_paths and the
result is re-capped per table — so a single grouping pass can't serve tables that
disagree on relax (re-capping reduces count, not grouping). It's therefore a
best-path-wide knob, which also matches FRR (multipath-relax is a bestpath
setting). multipath_equal is only called by the FIB install-candidate handler,
so the scope is contained.

Change

  • GlobalConfig.multipath_relaxFibRuntimeConfigQueryFibInstallCandidates { relax }handle_query_fib_install_candidatesmultipath_equal(best, other, relax).
  • In multipath_equal, the AS_PATH comparison becomes length-only when relaxed; eBGP/iBGP class homogeneity and all other tie conditions are unchanged.
  • Defaults off → byte-for-byte today's behavior; inert unless a table sets maximum_paths > 1.

Tests

  • multipath_equal_exact_as_path_strict_vs_relax + relax assertions across the existing grouping tests (LOCAL_PREF / length / class boundaries still disqualify under relax).
  • fib_install_candidates_multipath_relax_groups_different_as_paths — handler-level: two different-AS, equal-length paths install 1 next-hop strict, 2 next-hops relaxed.
  • multipath_relax_parses_and_defaults_false (config round-trip), m52_interop_config_enables_multipath_relax_with_mixed_asns (config pin).
  • Interop M52 (tests/interop/m52-fib-ecmp-relax-frr.clab.yml): two FRR peers in different ASes (65002/65003), equal AS_PATH length; with relax rustbgpd installs a two-way kernel ECMP route, collapses to the survivor on withdraw, restores on re-advertise. Wired into kernel-dataplane CI.

Docs: CONFIGURATION.md, COMPARISON.md footnote, CHANGELOG.md, INTEROP.md, milestones.md, runner doc.

Next in the thread: per-class maximum_paths_ebgp/ibgp, Link Bandwidth ext-community parse, weighted/unequal-cost multipath (ADR).

New global `[global].multipath_relax` (default false) relaxes unicast ECMP
grouping from an exact AS_PATH match to AS_PATH-length equality, so equal-length
paths through different ASes co-install as multipath — FRR's
`bgp bestpath as-path multipath-relax`.

It is a best-path-wide knob, not per-table: the FIB install-candidate query
groups once at the widest maximum_paths, so a single relax setting keeps grouping
consistent (and it matches FRR, where multipath-relax is a bestpath setting).
The flag threads GlobalConfig → FibRuntimeConfig → QueryFibInstallCandidates →
multipath_equal, where the AS_PATH check becomes length-only when set. eBGP/iBGP
class homogeneity and every other best-path tie condition are unchanged; inert
unless a [[fib_tables]] sets maximum_paths > 1.

Tests: best_path strict-vs-relax grouping, handler relax (different-AS paths
group only with relax), config round-trip, and interop M52
(tests/interop/m52-fib-ecmp-relax-frr.clab.yml — two FRR peers in different ASes,
equal AS_PATH length; relax installs two-way kernel ECMP, collapses on withdraw,
restores on re-advertise) wired into kernel-dataplane CI. Docs: CONFIGURATION,
COMPARISON footnote, CHANGELOG, INTEROP, milestones, runner.
@lance0 lance0 requested a review from Copilot May 24, 2026 23:44
@lance0 lance0 temporarily deployed to kernel-dataplane-auto May 24, 2026 23:44 — with GitHub Actions Inactive
@lance0 lance0 temporarily deployed to kernel-dataplane-auto May 24, 2026 23:44 — with GitHub Actions Inactive
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds ADR-0066 “multipath-relax” support to unicast ECMP: a new global [global].multipath_relax flag (default false) that relaxes ECMP grouping from exact AS_PATH equality to AS_PATH-length equality, enabling equal-length paths through different ASes to co-install as multipath.

Changes:

  • Thread multipath_relax from config (GlobalConfig) into FIB runtime and the RIB install-candidate query, and apply it in multipath_equal(..., relax).
  • Add unit/integration coverage for strict vs relaxed grouping and a new M52 FRR interop topology + CI job.
  • Update documentation and changelog to describe the new knob and interop milestone.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated no comments.

Show a summary per file
File Description
crates/rib/src/best_path.rs Add relax parameter to multipath_equal and implement AS_PATH length-only equality when enabled; update tests.
crates/rib/src/manager/mod.rs Plumb relax from RibUpdate::QueryFibInstallCandidates into multipath_equal.
crates/rib/src/manager/tests.rs Add handler-level test verifying strict vs relaxed install-candidate grouping for different AS_PATHs of equal length.
crates/rib/src/update.rs Extend QueryFibInstallCandidates with relax: bool.
src/fib_runtime.rs Add multipath_relax to FibRuntimeConfig and pass it into the RIB install-candidate query; update tests/stubs.
src/main.rs Populate FibRuntimeConfig.multipath_relax from config.global.multipath_relax; update config literals in tests/examples in-file.
src/config/schema.rs Add [global].multipath_relax to the config schema with default false and inline docs.
src/config/tests.rs Add parse/default test for multipath_relax and pin the new M52 interop fixture expectations.
src/peer_manager/mod.rs Update Global config literal to include multipath_relax: false.
src/peer_manager/tests.rs Update test config literal to include multipath_relax: false.
tests/interop/m52-fib-ecmp-relax-frr.clab.yml New containerlab topology: rustbgpd peered with two FRR nodes in different ASes.
tests/interop/scripts/test-m52-fib-ecmp-relax-frr.sh New interop driver asserting kernel ECMP behavior and gRPC next-hop counts under multipath-relax.
tests/interop/configs/rustbgpd-m52-fib-ecmp-relax.toml New rustbgpd config fixture enabling multipath_relax and maximum_paths=2 with mixed remote ASNs.
tests/interop/configs/frr-bgpd-m52-relax-1.conf New FRR config fixture (AS 65002) originating the test prefix.
tests/interop/configs/frr-bgpd-m52-relax-2.conf New FRR config fixture (AS 65003) originating the test prefix.
.github/workflows/kernel-dataplane.yml Add M52 job to kernel-dataplane CI workflow.
docs/CONFIGURATION.md Document the new [global].multipath_relax knob.
docs/COMPARISON.md Update multipath comparison footnote to include multipath-relax support and remaining follow-ups.
docs/INTEROP.md Document M52 interop coverage in the interop matrix and kernel-dataplane list.
docs/kernel-dataplane-runner.md Update runner docs to include M52 in the suite list.
docs/milestones.md Add M52 milestone entry describing the new interop validation.
CHANGELOG.md Add an Unreleased entry describing ADR-0066 multipath-relax.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@lance0 lance0 had a problem deploying to kernel-dataplane-auto May 24, 2026 23:47 — with GitHub Actions Failure
@lance0 lance0 temporarily deployed to kernel-dataplane-auto May 24, 2026 23:47 — with GitHub Actions Inactive
@lance0 lance0 temporarily deployed to kernel-dataplane-auto May 24, 2026 23:47 — with GitHub Actions Inactive
@lance0 lance0 temporarily deployed to kernel-dataplane-auto May 24, 2026 23:47 — with GitHub Actions Inactive
@lance0 lance0 temporarily deployed to kernel-dataplane-auto May 24, 2026 23:47 — with GitHub Actions Inactive
@lance0 lance0 temporarily deployed to kernel-dataplane-auto May 24, 2026 23:47 — with GitHub Actions Inactive
@lance0 lance0 temporarily deployed to kernel-dataplane-auto May 24, 2026 23:47 — with GitHub Actions Inactive
@lance0 lance0 temporarily deployed to kernel-dataplane-auto May 24, 2026 23:47 — with GitHub Actions Inactive
@lance0 lance0 temporarily deployed to kernel-dataplane-auto May 24, 2026 23:47 — with GitHub Actions Inactive
@lance0 lance0 temporarily deployed to kernel-dataplane-auto May 24, 2026 23:47 — with GitHub Actions Inactive
@lance0 lance0 temporarily deployed to kernel-dataplane-auto May 24, 2026 23:49 — with GitHub Actions Inactive
@lance0 lance0 temporarily deployed to kernel-dataplane-auto May 24, 2026 23:49 — with GitHub Actions Inactive
@lance0 lance0 temporarily deployed to kernel-dataplane-auto May 24, 2026 23:50 — with GitHub Actions Inactive
The first M52 run proved the core behavior (relaxed two-way ECMP installs across
the two different-AS paths), but the failover step failed: `no network` on frr2
did not withdraw (its PfxSnt stayed 1), so the route never collapsed. Switch the
failover to `neighbor 10.0.1.1 shutdown` / `no ... shutdown` — a deterministic
session-level withdraw of frr2's contribution — and wait for frr2 to
re-establish on restore. Same FIB collapse/restore coverage, reliably triggered.
@lance0 lance0 temporarily deployed to kernel-dataplane-auto May 24, 2026 23:52 — with GitHub Actions Inactive
@lance0 lance0 temporarily deployed to kernel-dataplane-auto May 24, 2026 23:52 — with GitHub Actions Inactive
@lance0 lance0 temporarily deployed to kernel-dataplane-auto May 24, 2026 23:55 — with GitHub Actions Inactive
@lance0 lance0 temporarily deployed to kernel-dataplane-auto May 24, 2026 23:55 — with GitHub Actions Inactive
@lance0 lance0 temporarily deployed to kernel-dataplane-auto May 24, 2026 23:55 — with GitHub Actions Inactive
@lance0 lance0 temporarily deployed to kernel-dataplane-auto May 24, 2026 23:55 — with GitHub Actions Inactive
@lance0 lance0 temporarily deployed to kernel-dataplane-auto May 24, 2026 23:55 — with GitHub Actions Inactive
@lance0 lance0 temporarily deployed to kernel-dataplane-auto May 24, 2026 23:55 — with GitHub Actions Inactive
@lance0 lance0 temporarily deployed to kernel-dataplane-auto May 24, 2026 23:55 — with GitHub Actions Inactive
@lance0 lance0 temporarily deployed to kernel-dataplane-auto May 24, 2026 23:55 — with GitHub Actions Inactive
@lance0 lance0 temporarily deployed to kernel-dataplane-auto May 24, 2026 23:55 — with GitHub Actions Inactive
@lance0 lance0 temporarily deployed to kernel-dataplane-auto May 24, 2026 23:55 — with GitHub Actions Inactive
@lance0 lance0 temporarily deployed to kernel-dataplane-auto May 24, 2026 23:56 — with GitHub Actions Inactive
@lance0 lance0 temporarily deployed to kernel-dataplane-auto May 24, 2026 23:57 — with GitHub Actions Inactive
@lance0 lance0 deployed to kernel-dataplane-auto May 24, 2026 23:57 — with GitHub Actions Active
@lance0 lance0 merged commit fb09ff8 into main May 24, 2026
39 checks passed
@lance0 lance0 deleted the feat/ecmp-multipath-relax branch May 24, 2026 23:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants