Skip to content

Wire up external API for attaching external subnets#9781

Open
bnaecker wants to merge 5 commits intomainfrom
wire-up-external-subnet-attach-api
Open

Wire up external API for attaching external subnets#9781
bnaecker wants to merge 5 commits intomainfrom
wire-up-external-subnet-attach-api

Conversation

@bnaecker
Copy link
Collaborator

@bnaecker bnaecker commented Feb 3, 2026

- Add APIs to the sled agent for attaching and detaching either a single
  subnet on an instance, or setting / clearing the entire set for an
  instance.
- Add list of attached subnets in the instance-creation request body,
  and fill that in from Nexus with the (currently-empty) set of attached
  subnets for the target instnace.
- Plumb attachment requests all the way through the sled-agent internals
  to the new APIs in OPTE.
- Add mapping of attached subnets per-instance to the simulated sled
  agent for testing.
- Fixes #9702
- Adds a subnet attach and detach saga in Nexus, modeled after the
  existing floating IP attachment sagas.
- Update the common code for sagas to include passing or removing the
  attached subnets to Dendrite and / or OPTE. This is to pick up those
  changes during the existing instance sagas, e.g. instance update.
- Fixes #9685
@bnaecker
Copy link
Collaborator Author

bnaecker commented Feb 3, 2026

Stacked on #9780

@bnaecker bnaecker added this to the 18 milestone Feb 3, 2026
- Add a background task to Nexus that periodically pushes all attached
  subnets to Dendrite and the sled-agents / OPTE.
- Add task output to `omdb`.
- Fixes #9581 and fixes #9582
@bnaecker bnaecker force-pushed the nexus-attached-subnet-bg-task branch from 2867c38 to 8e62a71 Compare February 3, 2026 06:35
- Plumb the existing app layer code to the new attach / detach sagas
- Add a bunch of integration tests confirming the new API behavior
- Closes #9453
@bnaecker bnaecker force-pushed the wire-up-external-subnet-attach-api branch from fe3ec22 to 7f2f485 Compare February 3, 2026 06:36
@bnaecker
Copy link
Collaborator Author

bnaecker commented Feb 3, 2026

Ok, I've taken this for a spin on madrid for another smoke test of the whole stack of PRs here. I'm going to write up testing notes as I see them. I'm not 100% certain this works all the way through, yet, but here's what I have so far.

I installed the TUF repo from 7f2f485 using rkadm on madrid. That went through the full installation and RSS using the stock config file at /opt/rackletteadm/configs/london/config-rkadm.toml.

I then used a combination of the CLI and console to:

  • create an IPv4 and IPv6 pool (the v6 isn't used, this just works around the API bugs I'm fixing elsewhere)
  • create a project attached-subnet-test
  • create an Alpine Linux image and disk, and boot an instance from it

I then created a subnet pool and member:

bnaecker@flint : ~/file-cabinet/oxide/oxide.rs $ ./target/release/oxide --profile madrid subnet-pool list
WARNING: 644 permissions on "/Users/bnaecker/.config/oxide/credentials.toml" may allow other users to access your login credentials.
[
  {
    "description": "",
    "id": "3aa266d0-3663-4450-8cfd-9821d8eaa072",
    "ip_version": "v4",
    "name": "test-subnet-pool",
    "time_created": "2026-02-03T18:33:07.238989Z",
    "time_modified": "2026-02-03T18:41:37.963659Z"
  }
]

bnaecker@flint : ~/file-cabinet/oxide/oxide.rs $ ./target/release/oxide --profile madrid subnet-pool member list --pool test-subnet-pool
WARNING: 644 permissions on "/Users/bnaecker/.config/oxide/credentials.toml" may allow other users to access your login credentials.
[
  {
    "id": "1ac00346-4751-4ef1-a9cc-2d1115e98e0f",
    "max_prefix_length": 32,
    "min_prefix_length": 28,
    "subnet": "172.20.28.96/28",
    "subnet_pool_id": "3aa266d0-3663-4450-8cfd-9821d8eaa072",
    "time_created": "2026-02-03T18:36:03.404879Z"
  }
]

And an external subnet out of that:

./target/release/oxide --profile madrid api -X POST /v1/external-subnets?project=attached-subnet-test --input - <<EOF
{ "name" : "sub0", "description" : "test subnet", "allocator" : { "type" : "auto", "prefix_len" : 30, "pool_selector" : { "type" : "auto" } } }"
EOF

You can actually already use the CLI to do this with oxide external-subnet create, I just didn't know that at this point. Then I attached it to the instance:

bnaecker@flint : ~/file-cabinet/oxide/oxide.rs $ ./target/release/oxide --profile madrid external-subnet attach --external-subnet sub0 --instance attached-subnet-test --project attached-subnet-test
WARNING: 644 permissions on "/Users/bnaecker/.config/oxide/credentials.toml" may allow other users to access your login credentials.
{
  "description": "test subnet",
  "id": "bcda3c3e-38d8-4123-b24f-beebe91cf14d",
  "instance_id": "571e525e-f5ef-4264-a08e-adc7ee19ce15",
  "name": "sub0",
  "project_id": "76ed4350-6384-411e-a1a1-b10d8c331e28",
  "subnet": "172.20.28.96/30",
  "subnet_pool_id": "3aa266d0-3663-4450-8cfd-9821d8eaa072",
  "subnet_pool_member_id": "1ac00346-4751-4ef1-a9cc-2d1115e98e0f",
  "time_created": "2026-02-03T18:41:37.961541Z",
  "time_modified": "2026-02-03T19:41:35.409393Z"
}

It shows up in the list of attached subnets for the instance now:

bnaecker@flint : ~/file-cabinet/oxide/oxide.rs $ ./target/release/oxide --profile madrid instance external-subnet list --instance 571e525e-f5ef-4264-a08e-adc7ee19ce15
WARNING: 644 permissions on "/Users/bnaecker/.config/oxide/credentials.toml" may allow other users to access your login credentials.
{
  "items": [
    {
      "description": "test subnet",
      "id": "bcda3c3e-38d8-4123-b24f-beebe91cf14d",
      "instance_id": "571e525e-f5ef-4264-a08e-adc7ee19ce15",
      "name": "sub0",
      "project_id": "76ed4350-6384-411e-a1a1-b10d8c331e28",
      "subnet": "172.20.28.96/30",
      "subnet_pool_id": "3aa266d0-3663-4450-8cfd-9821d8eaa072",
      "subnet_pool_member_id": "1ac00346-4751-4ef1-a9cc-2d1115e98e0f",
      "time_created": "2026-02-03T18:41:37.961541Z",
      "time_modified": "2026-02-03T19:41:35.409393Z"
    }
  ]
}

The background task for this appears to be running, correctly propagating the subnet to Dendrite and OPTE:

ben@castle /data/local/env/madrid/ben-update-2026-02-03 $ pilot -r madrid tp login 0
The illumos Project     helios-2.0.23751        February 2026
root@oxz_switch0:~# omdb nexus background-tasks show attached_subnet_manager
note: Nexus URL not specified.  Will pick one from DNS.
note: using DNS from system config (typically /etc/resolv.conf)
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using Nexus URL http://[fd29:f10f:c4de:104::5]:12232
task: "attached_subnet_manager"
  configured period: every 1m
  currently executing: no
  last completed activation: iter 105, triggered by a periodic timer firing
    started at 2026-02-03T19:59:21.893Z (25s ago) and ran for 1508ms
   dendrite instance on switch switch0
     n_subnets_removed=0
     n_subnets_added=0
     n_subnets_total=1
   dendrite instance on switch switch1
     n_subnets_removed=0
     n_subnets_added=0
     n_subnets_total=1
   sled 4bac49a0-0537-4178-8445-e42bb081b806
     n_subnets=1

root@oxz_switch0:~# omdb db instance list
note: database URL not specified.  Will search DNS.
note: (override with --db-url or OMDB_DB_URL)
note: using DNS from system config (typically /etc/resolv.conf)
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using database URL postgresql://root@[fd29:f10f:c4de:102::3]:32221,[fd29:f10f:c4de:103::3]:32221,[fd29:f10f:c4de:104::4]:32221,[fd29:f10f:c4de:101::3]:32221,[fd29:f10f:c4de:104::3]:32221/omicron?sslmode=disable
note: database schema version matches expected (228.0.0)
ID                                   STATE   INTENT  PROPOLIS_ID                          SLED_ID                              HOST_SERIAL NAME
571e525e-f5ef-4264-a08e-adc7ee19ce15 running running 22ebd256-87c8-4d7b-a4a3-554087b84825 4bac49a0-0537-4178-8445-e42bb081b806 BRM42220081 attached-subnet-test
root@oxz_switch0:~# swadm attached-subnet list
Attached Subnet  Internal IP            Inner MAC          VNI
172.20.28.96/30  fd29:f10f:c4de:104::1  a8:40:25:f0:00:00  16029899

Next I'm going to see if we can actually transit packets through this thing.

Copy link
Contributor

@rcgoodfellow rcgoodfellow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@mergeconflict mergeconflict left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woohoo!

const INSTANCE_NAME: &str = "test-instance";

#[nexus_test]
async fn test_instance_external_subnet_list_empty(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little hard for me to tell, is this old test subsumed by the new cannot_detach_subnet_that_is_not_attached?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. The test test_external_subnet_attach has a few calls to this list endpoint for individual instances, e.g., L321 and L341.

@bnaecker
Copy link
Collaborator Author

bnaecker commented Feb 4, 2026

Last testing note, just to confirm that this actually works end-to-end.

I spun up a new debian instance on madrid and attached the subnet 172.20.28.96/30:

bnaecker@flint : ~/file-cabinet/oxide/oxide.rs $ ./target/release/oxide --profile madrid external-subnet attach --external-subnet sub0 --instance att-test-2 --project attached-subnet-test
WARNING: 644 permissions on "/Users/bnaecker/.config/oxide/credentials.toml" may allow other users to access your login credentials.
{
  "description": "test subnet",
  "id": "bcda3c3e-38d8-4123-b24f-beebe91cf14d",
  "instance_id": "04bfee41-ad4b-4982-a8b6-40b5bce09cba",
  "name": "sub0",
  "project_id": "76ed4350-6384-411e-a1a1-b10d8c331e28",
  "subnet": "172.20.28.96/30",
  "subnet_pool_id": "3aa266d0-3663-4450-8cfd-9821d8eaa072",
  "subnet_pool_member_id": "1ac00346-4751-4ef1-a9cc-2d1115e98e0f",
  "time_created": "2026-02-03T18:41:37.961541Z",
  "time_modified": "2026-02-04T19:27:38.065776Z"
}
(reverse-i-search)`attached': ./target/release/oxide --profile madrid external-subnet attach --external-subnet sub0 --instance att-test-2 --project ^Ctached-subnet-test
bnaecker@flint : ~/file-cabinet/oxide/oxide.rs $
bnaecker@flint : ~/file-cabinet/oxide/oxide.rs $ ./target/release/oxide --profile madrid instance external-subnet list --instance 04bfee41-ad4b-4982-a8b6-40b5bce09cba
WARNING: 644 permissions on "/Users/bnaecker/.config/oxide/credentials.toml" may allow other users to access your login credentials.
{
  "items": [
    {
      "description": "test subnet",
      "id": "bcda3c3e-38d8-4123-b24f-beebe91cf14d",
      "instance_id": "04bfee41-ad4b-4982-a8b6-40b5bce09cba",
      "name": "sub0",
      "project_id": "76ed4350-6384-411e-a1a1-b10d8c331e28",
      "subnet": "172.20.28.96/30",
      "subnet_pool_id": "3aa266d0-3663-4450-8cfd-9821d8eaa072",
      "subnet_pool_member_id": "1ac00346-4751-4ef1-a9cc-2d1115e98e0f",
      "time_created": "2026-02-03T18:41:37.961541Z",
      "time_modified": "2026-02-04T19:27:38.385430Z"
    }
  ]
}

Then I SSHd into the machine, and listened for pings while pinging from my dev laptop (which is on the VPN). From the laptop:

bnaecker@flint : ~/file-cabinet/oxide/oxide.rs $ ping -c 1 172.20.28.96 -p abbabbab
PATTERN: 0xabbabbab
PING 172.20.28.96 (172.20.28.96): 56 data bytes
64 bytes from 172.20.28.96: icmp_seq=0 ttl=61 time=47.936 ms

--- 172.20.28.96 ping statistics ---
1 packets transmitted, 1 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 47.936/47.936/47.936/nan ms

On the VM:

debian@attachme:~$ sudo tcpdump -X -i enp0s8 'icmp and dst 172.20.28.96'
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on enp0s8, link-type EN10MB (Ethernet), snapshot length 262144 bytes
19:54:12.165514 IP 172.20.17.106 > attachme: ICMP echo request, id 21268, seq 0, length 64
	0x0000:  4500 0054 ffa7 0000 3e01 f70e ac14 116a  E..T....>......j
	0x0010:  ac14 1c60 0800 ffca 5314 0000 6983 a3e4  ...`....S...i...
	0x0020:  0001 beea abba bbab abba bbab abba bbab  ................
	0x0030:  abba bbab abba bbab abba bbab abba bbab  ................
	0x0040:  abba bbab abba bbab abba bbab abba bbab  ................
	0x0050:  abba bbab                                ....
^C
1 packet captured
1 packet received by filter
0 packets dropped by kernel

- Add tests ensuring we can attach subnets to running and stopped
  instances both.
- Add test ensuring we can't delete subnet while it's attached
- Cleanup, use max attached subnet const
Base automatically changed from nexus-attached-subnet-bg-task to main February 5, 2026 07:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants