Add option to enable HA/CARP failover support to the os-netbird plugin by myah-mitchell · Pull Request #5067 · opnsense/plugins

myah-mitchell · 2025-12-04T17:34:47Z

This PR has the required changes to add fix the issue I reported in issue: #5023

The goal of this PR is to add automated support for CARP failover. When the OPNsense firewall is "MASTER" a carp syshook will netbird up the Netbird interface causing the peer to Netbird connect to the network. When an OPNsense firewall is "BACKUP" the same carp syshook will netbird down the Netbird interface causing the peer to disconnect from the Netbird network. This resolves the issues reported in the original issue and allows Netbird to work in a HA OPNsense environment.

bcmmbaga · 2026-02-12T10:26:30Z

@fichtner can we get this reviewed?

Monviech · 2026-02-12T10:57:14Z

Hello, let me help here.

I recently did the same thing:
#5108

Whats also important is not only the transition, but also guarding the service so it cannot start if the current host is not master, otherwise each HA sync will activate it again even after a transition. Simplest way for me was a rc script condition:

https://github.com/opnsense/ports/blob/bbb4f5e3ba959f762ba4e614d9d8e1880952c686/opnsense/ndp-proxy-go/files/ndp-proxy-go.in#L50-L53

It's also very important that stdout is not blocking any start or stop here, as we recently had a bug here:
2cc2215

The scripts run serialized, so if one script blocks, it blocks all other scripts during failover for 2-3 minutes.

If that is all taken into consideration I will review this PR.

bcmmbaga · 2026-02-12T12:18:51Z

Thanks for the feedback! We’ll review and apply your suggestions

…sync. Redirected stdout on carp hook to prevent any potential blocking.

myah-mitchell · 2026-02-20T16:05:23Z

I've tested the above (minus d56d16c) updates on 25.10. I don't currently have a test firewall or any installs on 26.1 to test the mwexec to mwexecfm change (d56d16c) but as far as I understand mwexecfm replaced mwexec in 26.1.

I ended up using a postcmd instead of a precmd as Netbird should be running on the secondary firewall, we are just ensuring that Netbird is in a down (netbird down) state. Let me know if there are any other concerns.

I'll see about getting some firewalls spun up on 26.1 soon if no one else has ones they can test this on.

myah-mitchell · 2026-02-20T23:22:47Z

After some more testing, I've determined that this still is not a full solution. Netbird up/down is actively creating/removing the wt0 interface. Without a "configctl filter reload", firewall rules applied to Netbird do not apply and traffic is blocked. I'll continue working on this next week.

fichtner · 2026-02-24T14:58:59Z

@myah-mitchell thanks for the update, keep us posted :)

… up/down.

myah-mitchell · 2026-02-26T20:47:14Z

The updated NetBird syshook now uses lock files to ensure the netbird up or netbird down is only run once per CARP state change. So, no matter if there is just one CARP address, or 100 CARP addresses this code only runs once when changing from MASTER to BACKUP or BACKUP to MASTER.

This was needed specifically on the netbird up side as we also need to reload the packet filter after the wt0 interface is created. This requires waiting a second or two after calling netbird up before running configctl filter reload. In my testing if I let each CARP state change reload the filter, we ended up with multiple running at the same time and the filter ended up in a broken state until manually reloaded.

This means that the script does cause a blocking state for a few seconds (up to 10 seconds) once per group of CARP state changes. I did not think the alternative of something like the following was a better option.

                mwexecfb(
                    '/bin/sh -c "'
                    . '/usr/local/bin/netbird up;'
                    . ' i=0; while [ $i -lt 10 ]; do'
                    . '   if [ -e /dev/wt0 ]; then'
                    . '     /usr/local/sbin/configctl filter reload;'
                    . '     exit 0;'
                    . '   fi;'
                    . '   sleep 1; i=$((i+1));'
                    . ' done;'
                    . '"'
                );

myah-mitchell · 2026-02-26T20:49:50Z

I should note that commit ad8bee6 has been testing on my stack of OPNsense 25.10 units and will be rolled out to our test sites this afternoon/tomorrow. Commit d5e0b8f has not been tested as I have still not installed 26.1 anywhere yet.

Monviech · 2026-02-26T20:52:49Z

Is "netbird up" not idempodent? I would like to prevent any locking in the carp syshooks.

If it should only be called once for some reason here is a non blocking trampoline example that was recently used:

opnsense/core@5423b72

Though not needing that would be preferred.

myah-mitchell · 2026-02-26T20:58:21Z

As far as I know, yes, it is idempotent. However, the first time netbird up is run, it also creates wt0. Until configctl filter reload is run, none of the firewall rules applied to wt0 take effect. When testing with something like "netbird up; sleep 3; configctl filter reload" the filter was not working correctly and would not pass new traffic until a reload of the filter. Limiting this reload to only once per CARP state change resolved the issue.

I'll take a look at what you linked.

myah-mitchell · 2026-02-26T21:22:23Z

If I'm following, your main concern is the waiting for wt0 and then the filter reload being within the carp syshook. So, move those items to their own script that the syshook can call? Is python the preferred language to use in that case?

Monviech · 2026-02-27T05:29:30Z

Only the logic is important, less the language. You can also use sh or php I don't mind.

And yes my main concern is any blocking in carp, please execute as fast as possible since if all other scripts are blocked people will come for us "why is my failover taking n seconds before my services are back up". :)

…cript folder paths. This change mirrors a commit in OPNsense/ports.

…e background so syshook runs non-blocking.

myah-mitchell · 2026-02-27T17:44:12Z

Alright, I've moved the blocking code to be in a separate script so that the syshook process is non-blocking.

To prevent delays in NetBird starting after a CARP failover, I've configured the script to run with the first CARP MASTER event instead of the last. On small firewalls with just a few CARP addresses this will save a second or two, but on larger units (we have some firewalls with nearly 100 CARP addresses) this should save much more time resulting in a shorter down time.

The way I've set up the "debounce" is to trigger on the first event and then to not trigger again for any event that occurs within an updating 10 second time window of the last event. After 10 seconds have passed the next MASTER event will trigger the whole process again.

Let me know if you see any other issues with this or would like any other changes.

Monviech · 2026-02-27T19:37:42Z

Thanks for taking care of this, just tell us when you are finished with testing in your environment.

You have to determine how fragile this is (try to swap between master and backup a few times in a short timeframe), short flaps like this can happen.

If the service gets stuck in some way then you still have something to fix, if not then its good (imho)

Add option to enable HA/CARP failover support to the os-netbird plugin

58d14aa

Monviech self-assigned this Feb 12, 2026

myah-mitchell added 2 commits February 20, 2026 09:44

Added carp status check to ensure carp is down on secondary after HA …

1fe1bff

…sync. Redirected stdout on carp hook to prevent any potential blocking.

Changed mwexec to mwexecfm for compatibility with OPNsense 26.1

d56d16c

myah-mitchell mentioned this pull request Feb 20, 2026

os-netbird: Add netbird postcmd carp check for HA support opnsense/ports#259

Open

myah-mitchell mentioned this pull request Feb 20, 2026

os-netbird plugin not following OPNsense HA failover #5023

Open

3 tasks

Set execute (x) bit on scripts.

41260a8

myah-mitchell added 3 commits February 26, 2026 14:00

OPNsense 25.10 compatible syshook that supports lockfiles for NetBird…

ad8bee6

… up/down.

Changed out mwexec with mwexecfm to maintain compatibility with 26.1+

d5e0b8f

Merge branch 'master' of https://github.com/INDIGEX/opnsense-plugins

eb0c211

myah-mitchell and others added 3 commits February 27, 2026 11:26

Updated NetBird plugins scripts folder path to emulate other plugin s…

3158f2b

…cript folder paths. This change mirrors a commit in OPNsense/ports.

Moved all blocking code into its own script that can be started in th…

04cdc83

…e background so syshook runs non-blocking.

Set execute (x) bit on scripts.

17ca54e

Conversation

myah-mitchell commented Dec 4, 2025

Uh oh!

bcmmbaga commented Feb 12, 2026

Uh oh!

Monviech commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bcmmbaga commented Feb 12, 2026

Uh oh!

myah-mitchell commented Feb 20, 2026

Uh oh!

myah-mitchell commented Feb 20, 2026

Uh oh!

fichtner commented Feb 24, 2026

Uh oh!

myah-mitchell commented Feb 26, 2026

Uh oh!

myah-mitchell commented Feb 26, 2026

Uh oh!

Monviech commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

myah-mitchell commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

myah-mitchell commented Feb 26, 2026

Uh oh!

Monviech commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

myah-mitchell commented Feb 27, 2026

Uh oh!

Monviech commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

5 participants

Monviech commented Feb 12, 2026 •

edited

Loading

Monviech commented Feb 26, 2026 •

edited

Loading

myah-mitchell commented Feb 26, 2026 •

edited

Loading

Monviech commented Feb 27, 2026 •

edited

Loading

Monviech commented Feb 27, 2026 •

edited

Loading