Skip to content

Conversation

@rcgoodfellow
Copy link
Contributor

@rcgoodfellow rcgoodfellow commented Jan 19, 2024

This is a staging PR and should most likely be pulled into

Here we simply pass through BFD commands to the underlying mgd daemons on the switches. No attempt is made to add BFD to the database schema or persist BFD information. As that would likely conflict with #4822.

The purpose of the RFD is to set up the scaffolding and API interfaces for BFD to work end-to-end, and to do some interim testing without the benefit of persistence.

Depends on

@rcgoodfellow
Copy link
Contributor Author

rcgoodfellow commented Jan 19, 2024

I've discussed this PR with @internet-diglett. What we'd like to do is get this reviewed and merged into main, in this partially complete form (sans-db-persistence). And then he'll pick it up in #4822. This is to avoid building in db-persistence for BFD that would just have to be torn down and redone in #4822.

This PR now comes with database plumbing and an RPW that manages BFD on the rack switches.

@rcgoodfellow
Copy link
Contributor Author

Testing notes. On a4x2 I am able to test this as follows.

Set up BFD via Omicron API

oxide system networking bfd enable --detection-threshold 3 --mode single_hop --remote 198.51.101.1 --required-rx 1000000 --switch switch0
oxide system networking bfd enable --detection-threshold 3 --mode single_hop --remote 198.51.101.9 --required-rx 1000000 --switch switch0
oxide system networking bfd enable --detection-threshold 3 --mode single_hop --remote 198.51.101.13 --required-rx 1000000 --switch switch1
oxide system networking bfd enable --detection-threshold 3 --mode single_hop --remote 198.51.101.5 --required-rx 1000000 --switch switch1

Query BFD

oxide system networking bfd status
success
[
    BfdStatus {
        detection_threshold: 3,
        local: Some(
            0.0.0.0,
        ),
        mode: SingleHop,
        peer: 198.51.101.1,
        required_rx: 1000000,
        state: Up,
        switch: Name(
            "switch0",
        ),
    },
    BfdStatus {
        detection_threshold: 3,
        local: Some(
            0.0.0.0,
        ),
        mode: SingleHop,
        peer: 198.51.101.9,
        required_rx: 1000000,
        state: Up,
        switch: Name(
            "switch0",
        ),
    },
    BfdStatus {
        detection_threshold: 3,
        local: Some(
            0.0.0.0,
        ),
        mode: SingleHop,
        peer: 198.51.101.13,
        required_rx: 1000000,
        state: Up,
        switch: Name(
            "switch1",
        ),
    },
    BfdStatus {
        detection_threshold: 3,
        local: Some(
            0.0.0.0,
        ),
        mode: SingleHop,
        peer: 198.51.101.5,
        required_rx: 1000000,
        state: Up,
        switch: Name(
            "switch1",
        ),
    },
]

Testing BFD link detection

From the host machine running the falcon topology

pfexec dladm set-linkprop a4x2_g3_sn_vnic7 -p maxbw=0

note that a BFD session is now down

oxide system networking bfd status
success
[
    BfdStatus {
        detection_threshold: 3,
        local: Some(
            0.0.0.0,
        ),
        mode: SingleHop,
        peer: 198.51.101.1,
        required_rx: 1000000,
        state: Up,
        switch: Name(
            "switch0",
        ),
    },
    BfdStatus {
        detection_threshold: 3,
        local: Some(
            0.0.0.0,
        ),
        mode: SingleHop,
        peer: 198.51.101.9,
        required_rx: 1000000,
        state: Up,
        switch: Name(
            "switch0",
        ),
    },
    BfdStatus {
        detection_threshold: 3,
        local: Some(
            0.0.0.0,
        ),
        mode: SingleHop,
        peer: 198.51.101.13,
        required_rx: 1000000,
        state: Up,
        switch: Name(
            "switch1",
        ),
    },
    BfdStatus {
        detection_threshold: 3,
        local: Some(
            0.0.0.0,
        ),
        mode: SingleHop,
        peer: 198.51.101.5,
        required_rx: 1000000,
        state: Down,
        switch: Name(
            "switch1",
        ),
    },
]

restore the link

pfexec dladm reset-linkprop a4x2_g3_sn_vnic7 -p maxbw

everyone should be back up

oxide system networking bfd status
success
[
    BfdStatus {
        detection_threshold: 3,
        local: Some(
            0.0.0.0,
        ),
        mode: SingleHop,
        peer: 198.51.101.1,
        required_rx: 1000000,
        state: Up,
        switch: Name(
            "switch0",
        ),
    },
    BfdStatus {
        detection_threshold: 3,
        local: Some(
            0.0.0.0,
        ),
        mode: SingleHop,
        peer: 198.51.101.9,
        required_rx: 1000000,
        state: Up,
        switch: Name(
            "switch0",
        ),
    },
    BfdStatus {
        detection_threshold: 3,
        local: Some(
            0.0.0.0,
        ),
        mode: SingleHop,
        peer: 198.51.101.13,
        required_rx: 1000000,
        state: Up,
        switch: Name(
            "switch1",
        ),
    },
    BfdStatus {
        detection_threshold: 3,
        local: Some(
            0.0.0.0,
        ),
        mode: SingleHop,
        peer: 198.51.101.5,
        required_rx: 1000000,
        state: Up,
        switch: Name(
            "switch1",
        ),
    },
]

Copy link
Contributor

@internet-diglett internet-diglett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, verified functionality in a4x2. Really cool seeing all of this come together!

impl From<BfdSession> for BfdSessionKey {
fn from(value: BfdSession) -> Self {
Self {
switch: value.switch.parse().unwrap(), //TODO unwrap
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we handling this in a follow up issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants