Skip to content

Plumb firewall rules #1636

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 42 commits into from
Oct 14, 2022
Merged

Plumb firewall rules #1636

merged 42 commits into from
Oct 14, 2022

Conversation

plotnick
Copy link
Contributor

@plotnick plotnick commented Aug 22, 2022

Plumb VPC firewall rules through Nexus → sled-agent → OPTE. Builds on @bnaecker's work in the propagate-firewall-rules-to-sled branch, which in turn was based on work started in PR #465.

Outstanding issues:

  • Target interfaces are matched by MAC address only right now. Is this ok? Use (VNI, MAC) pair to match.
  • IP and network targets are not yet supported. The console doesn't support these yet either, but the API does.
  • Send default rules to sled agent on instance creation. Right now rules are only sent on VPC creation and rule updates. Fixes Send current firewall rules applying to a guest in the sled agent instance-creation request #1017.
  • Missing integration tests (issue Need VPC firewall rule integration tests #1792). I have tested this manually with multiple instances and a small variety of targets and host filters, but we definitely need automated tests.
  • Fix existing tests, which are now failing with cross-VPC firewall target unsupported

@plotnick plotnick requested a review from bnaecker August 22, 2022 22:06
@zephraph
Copy link
Contributor

Feel free to create an issue in the console repo to capture whatever is missing.

Copy link
Collaborator

@bnaecker bnaecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot of great work, thanks for taking it on! I see your TODO-list on the PR comment itself. Is the plan to tackle those now as part of this PR? Or are those going to be separate issues?

{
*name = params.identity.name.clone()
}
_ => (),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking that the intent is to silently ignore targets other than VPCs, rather than, say, returning 400. Maybe when we add IP address and subnet targets, we should have a catchall that returns 400?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1dc02d3 now raises an error for these cases, which should never occur (if they do, it's because we made a mistake in specifying DEFAULT_FIREWALL_RULES).

) if name.as_str() == "default" => {
*name = params.identity.name.clone()
}
_ => (),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1dc02d3.

let no_networks: Vec<IpNetwork> = Vec::new();
let no_interfaces: Vec<NetworkInterface> = Vec::new();

let mut instance_interfaces: NicMap = HashMap::new();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're splitting up interfaces by instance, VPC, and VPC Subnet here. Below, in the compilation loop starting on L600, we then push these onto a single array, targets. What happens with duplicate interfaces? As long as they don't change between those calls to derive_{guests,vpc,subnet}_network_interface_info, is that just an optimization to avoid looking up or sending more information to the sled agents than necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent question; 1fa97ab does explicit de-duplication on the NICs. (If sled_agent_client::types::NetworkInterface were Hash, this could be simplified by accumulating directly into a HashSet; please LMK if you know how to do that.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can implement Hash manually, probably by hashing all the fields. Not sure if it's worth it, up to you at this point, since I don't want to further delay this PR unnecessarily.

)
.inner_join(vpc::table.on(vpc_subnet::vpc_id.eq(vpc::id)))
.order_by(network_interface::slot)
// TODO-cleanup: See DRY comment above.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably remove this, since it refers to a comment in the other guest-specific function now.


use db::schema::{network_interface, vpc, vpc_subnet};
let rows = network_interface::table
.filter(network_interface::subnet_id.eq(authz_subnet.id()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only line that differs between these three implementations, AFAICT. Could we either make this function generic, on the column we're filtering, or have a shared function that implements most of this and accepts a boxed query on the table? Let me know if you'd like to talk through the Diesel details here, it can get pretty hairy with generics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hairy indeed, but probably worth it. Refactoring of these three routines forthcoming.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, factored using a boxed query in 902e6a8.

Comment on lines 728 to 729
entry.push(ipnetwork::IpNetwork::V4(subnet.ipv4_block.0 .0));
entry.push(ipnetwork::IpNetwork::V6(subnet.ipv6_block.0 .0));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're just interested in the name and IP subnets, we could select exactly those from the table rather than the whole record. E.g., .select((vpc_subnet::name, vpc_subnet::ipv4_block, vpc_subnet::ipv6_block)), I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 578a2d9, thanks.

VpcFirewallRuleStatus::Enabled => true,
})
.filter(|rule| {
// TODO: no targets means apply everywhere, right?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. We can generally think of each filter in a rule as restricting the set of objects to which the rule applies. Adding a protocol filter of TCP means it applies only TCP traffic, and no other protocol, for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, that's what I thought.

.filter(|rule| {
// TODO: no targets means apply everywhere, right?
rule.targets.is_empty()
|| rule.targets.iter().any(|nic| nic.mac.0 == *port.mac())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is quite right. MAC addresses are unique within a VPC, not globally. Suppose we had two rules, each applying exclusively to two different VPCs, but otherwise identical. If those VPCs happened to each have a NIC with the same MAC, the rule would be applied to both of those.

If that sounds right to you as well, then I think we to check the identity of the VPC and the MAC of the Port. The vni field of the NetworkInterface sent in in the rules is bijective with the VPC identity, so that's the best thing to use at this point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1994a43, thanks.

@rzezeski
Copy link
Contributor

Heads up: connectivity inbound to a guest will fail when firewall rules are populated until oxidecomputer/opte#252 lands.

@rzezeski
Copy link
Contributor

Okay, oxidecomputer/opte#252 is fixed. So inbound connectivity should work with latest xde. If it doesn't please let me know.

Copy link
Collaborator

@bnaecker bnaecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My apologies for the long delay. There's a lot of great work here, thank you! I've mostly got questions about improving names or comments, but the functionality itself looks great.

let no_networks: Vec<IpNetwork> = Vec::new();
let no_interfaces: Vec<NetworkInterface> = Vec::new();

let mut instance_interfaces: NicMap = HashMap::new();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can implement Hash manually, probably by hashing all the fields. Not sure if it's worth it, up to you at this point, since I don't want to further delay this PR unnecessarily.

@plotnick
Copy link
Contributor Author

plotnick commented Oct 5, 2022

@bnaecker: Thanks for the very helpful second review! The one remaining concern I have that I'd like to fix is that if you use a non-existent VPC name in a target or host filter, things currently fail in a rather opaque way. I'm going to try to fix that today (Wed), and then I think it'll be in ok shape (though see also issue #1756, which I think is serious but not blocking).

However, I'd like to maybe delay actually merging this until @smklein branches for the demo next week. Since this change involves touching many critical pieces (including instance creation), does not have integration tests, and is not necessary for that demo, I think it's a bit risky to land with so little time for testing. Other opinions welcome.

@bnaecker
Copy link
Collaborator

bnaecker commented Oct 6, 2022

That's all excellent, thanks for chasing down those threads @plotnick! I think your plan of merging after the demo branch is created sounds very wise. I'm approving this now, feel free to merge whenever that happens. And again, nice work. It's really excellent to see such a core networking primitive start to take shape!

@bnaecker bnaecker self-requested a review October 6, 2022 03:44
@plotnick plotnick merged commit 56a586d into main Oct 14, 2022
@plotnick plotnick deleted the firewall branch October 14, 2022 02:27
leftwo pushed a commit that referenced this pull request Mar 26, 2025
Crucible changes are:
Make `ImpactedBlocks` an absolute range (#1685)
update omicron (#1687)
Update Rust crate chrono to v0.4.40 (#1682)
Update Rust crate tempfile to v3.19.0 (#1676)
Log loops during crutest replay (#1674)
Refactor live-repair start and send (#1673)
Update Rust crate libc to v0.2.171 (#1684)
Update Rust crate clap to v4.5.32 (#1683)
Update Rust crate async-trait to 0.1.88 (#1680)
Update Rust crate bytes to v1.10.1 (#1681)
Update Rust crate anyhow to v1.0.97 (#1679)
Update Rust crate http to v1 (#1678)
Update Rust crate tokio-util to v0.7.14 (#1675)
Update Rust crate thiserror to v2 (#1657)
Update Rust crate omicron-zone-package to 0.12.0 (#1647)
Update Rust crate opentelemetry to 0.28.0 (#1543)
Update Rust crate itertools to 0.14.0 (#1646)
Update Rust crate clearscreen to v4 (#1656)
nightly test polish (#1665)
Update Rust crate strum_macros to 0.27 (#1654)
Update Rust crate strum to 0.27 (#1653)
Update Rust crate rusqlite to 0.34 (#1652)
Better return semantics from should_send
Trying to simplify enqueue
Remove unnecessary argument
Remove unused code from `ImpactedBlocks` (#1668)
Log when pantry/volume layers activate things (#1670)
Fix unused import (#1669)
Use `RangeSet` instead of tracking every complete job (#1663)
Update Rust crate dropshot to 0.16.0 (#1645)
Simplify pending work tracking (#1662)
Remove rusqlite dependency from crucible-common (#1659)
Update Rust crate reedline to 0.38.0 (#1651)
Update Rust crate proptest to 1.6.0 (#1648)
Update Rust crate bytes to v1.10.0 (#1644)
Update Rust crate tracing-subscriber to 0.3.19 (#1643)
Update Rust crate tracing to v0.1.41 (#1642)
Update Rust crate toml to v0.8.20 (#1641)
Update Rust crate thiserror to v1.0.69 (#1640)
Update Rust crate serde_json to v1.0.139 (#1639)
Update Rust crate serde to v1.0.218 (#1638)
Update Rust crate reqwest to v0.12.12 (#1637)
Update Rust crate libc to v0.2.170 (#1636)
Update Rust crate indicatif to 0.17.11 (#1635)
Update Rust crate httptest to 0.16.3 (#1634)
Update Rust crate clap to v4.5.31 (#1632)
Update Rust crate chrono to v0.4.39 (#1631)
Update Rust crate byte-unit to 5.1.6 (#1630)
Update Rust crate async-trait to 0.1.86 (#1629)
Update Rust crate csv to 1.3.1 (#1633)
Update Rust crate anyhow to v1.0.96 (#1628)
Fix a test flake (#1626)
Bitpack per-block slot data in memory (#1625)
Use `div_ceil` instead of `(x + 7) / 8` (#1624)
Add repair server dynamometer (#1618)

Propolis changes are:
Paper over minor() differences
Update crucible (#886)
Test oximeter metrics end to end (#855)
server: improve behavior of starting VMs that are waiting for Crucible activation (#873)
server: extend API request queue's memory of prior requests (#880)
Use production propolis-server flags in PHD CI builds (#878)

Added a new field to the Board struct that propolis requires.
leftwo added a commit that referenced this pull request Mar 27, 2025
Crucible changes are:
Make `ImpactedBlocks` an absolute range (#1685)
update omicron (#1687)
Update Rust crate chrono to v0.4.40 (#1682)
Update Rust crate tempfile to v3.19.0 (#1676)
Log loops during crutest replay (#1674)
Refactor live-repair start and send (#1673)
Update Rust crate libc to v0.2.171 (#1684)
Update Rust crate clap to v4.5.32 (#1683)
Update Rust crate async-trait to 0.1.88 (#1680)
Update Rust crate bytes to v1.10.1 (#1681)
Update Rust crate anyhow to v1.0.97 (#1679)
Update Rust crate http to v1 (#1678)
Update Rust crate tokio-util to v0.7.14 (#1675)
Update Rust crate thiserror to v2 (#1657)
Update Rust crate omicron-zone-package to 0.12.0 (#1647) Update Rust
crate opentelemetry to 0.28.0 (#1543)
Update Rust crate itertools to 0.14.0 (#1646)
Update Rust crate clearscreen to v4 (#1656)
nightly test polish (#1665)
Update Rust crate strum_macros to 0.27 (#1654)
Update Rust crate strum to 0.27 (#1653)
Update Rust crate rusqlite to 0.34 (#1652)
Better return semantics from should_send
Trying to simplify enqueue
Remove unnecessary argument
Remove unused code from `ImpactedBlocks` (#1668)
Log when pantry/volume layers activate things (#1670) Fix unused import
(#1669)
Use `RangeSet` instead of tracking every complete job (#1663) Update
Rust crate dropshot to 0.16.0 (#1645)
Simplify pending work tracking (#1662)
Remove rusqlite dependency from crucible-common (#1659) Update Rust
crate reedline to 0.38.0 (#1651)
Update Rust crate proptest to 1.6.0 (#1648)
Update Rust crate bytes to v1.10.0 (#1644)
Update Rust crate tracing-subscriber to 0.3.19 (#1643) Update Rust crate
tracing to v0.1.41 (#1642)
Update Rust crate toml to v0.8.20 (#1641)
Update Rust crate thiserror to v1.0.69 (#1640)
Update Rust crate serde_json to v1.0.139 (#1639)
Update Rust crate serde to v1.0.218 (#1638)
Update Rust crate reqwest to v0.12.12 (#1637)
Update Rust crate libc to v0.2.170 (#1636)
Update Rust crate indicatif to 0.17.11 (#1635)
Update Rust crate httptest to 0.16.3 (#1634)
Update Rust crate clap to v4.5.31 (#1632)
Update Rust crate chrono to v0.4.39 (#1631)
Update Rust crate byte-unit to 5.1.6 (#1630)
Update Rust crate async-trait to 0.1.86 (#1629)
Update Rust crate csv to 1.3.1 (#1633)
Update Rust crate anyhow to v1.0.96 (#1628)
Fix a test flake (#1626)
Bitpack per-block slot data in memory (#1625)
Use `div_ceil` instead of `(x + 7) / 8` (#1624)
Add repair server dynamometer (#1618)

Propolis changes are:
Paper over minor() differences
Update crucible (#886)
Test oximeter metrics end to end (#855)
server: improve behavior of starting VMs that are waiting for Crucible
activation (#873) server: extend API request queue's memory of prior
requests (#880) Use production propolis-server flags in PHD CI builds
(#878)

---------

Co-authored-by: Alan Hanson <alan@oxide.computer>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Send current firewall rules applying to a guest in the sled agent instance-creation request
4 participants