-
Notifications
You must be signed in to change notification settings - Fork 45
Plumb firewall rules #1636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plumb firewall rules #1636
Conversation
Needed to use Ipv4Net for firewall rule targets.
Feel free to create an issue in the console repo to capture whatever is missing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a lot of great work, thanks for taking it on! I see your TODO-list on the PR comment itself. Is the plan to tackle those now as part of this PR? Or are those going to be separate issues?
nexus/src/app/vpc.rs
Outdated
{ | ||
*name = params.identity.name.clone() | ||
} | ||
_ => (), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checking that the intent is to silently ignore targets other than VPCs, rather than, say, returning 400. Maybe when we add IP address and subnet targets, we should have a catchall that returns 400?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1dc02d3 now raises an error for these cases, which should never occur (if they do, it's because we made a mistake in specifying DEFAULT_FIREWALL_RULES
).
nexus/src/app/vpc.rs
Outdated
) if name.as_str() == "default" => { | ||
*name = params.identity.name.clone() | ||
} | ||
_ => (), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 1dc02d3.
let no_networks: Vec<IpNetwork> = Vec::new(); | ||
let no_interfaces: Vec<NetworkInterface> = Vec::new(); | ||
|
||
let mut instance_interfaces: NicMap = HashMap::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're splitting up interfaces by instance, VPC, and VPC Subnet here. Below, in the compilation loop starting on L600, we then push these onto a single array, targets
. What happens with duplicate interfaces? As long as they don't change between those calls to derive_{guests,vpc,subnet}_network_interface_info
, is that just an optimization to avoid looking up or sending more information to the sled agents than necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent question; 1fa97ab does explicit de-duplication on the NICs. (If sled_agent_client::types::NetworkInterface
were Hash
, this could be simplified by accumulating directly into a HashSet
; please LMK if you know how to do that.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can implement Hash
manually, probably by hashing all the fields. Not sure if it's worth it, up to you at this point, since I don't want to further delay this PR unnecessarily.
) | ||
.inner_join(vpc::table.on(vpc_subnet::vpc_id.eq(vpc::id))) | ||
.order_by(network_interface::slot) | ||
// TODO-cleanup: See DRY comment above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably remove this, since it refers to a comment in the other guest-specific function now.
|
||
use db::schema::{network_interface, vpc, vpc_subnet}; | ||
let rows = network_interface::table | ||
.filter(network_interface::subnet_id.eq(authz_subnet.id())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only line that differs between these three implementations, AFAICT. Could we either make this function generic, on the column we're filtering, or have a shared function that implements most of this and accepts a boxed query on the table? Let me know if you'd like to talk through the Diesel details here, it can get pretty hairy with generics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hairy indeed, but probably worth it. Refactoring of these three routines forthcoming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, factored using a boxed query in 902e6a8.
nexus/src/db/datastore/vpc.rs
Outdated
entry.push(ipnetwork::IpNetwork::V4(subnet.ipv4_block.0 .0)); | ||
entry.push(ipnetwork::IpNetwork::V6(subnet.ipv6_block.0 .0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're just interested in the name and IP subnets, we could select exactly those from the table rather than the whole record. E.g., .select((vpc_subnet::name, vpc_subnet::ipv4_block, vpc_subnet::ipv6_block))
, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 578a2d9, thanks.
VpcFirewallRuleStatus::Enabled => true, | ||
}) | ||
.filter(|rule| { | ||
// TODO: no targets means apply everywhere, right? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. We can generally think of each filter in a rule as restricting the set of objects to which the rule applies. Adding a protocol filter of TCP
means it applies only TCP traffic, and no other protocol, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, that's what I thought.
.filter(|rule| { | ||
// TODO: no targets means apply everywhere, right? | ||
rule.targets.is_empty() | ||
|| rule.targets.iter().any(|nic| nic.mac.0 == *port.mac()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is quite right. MAC addresses are unique within a VPC, not globally. Suppose we had two rules, each applying exclusively to two different VPCs, but otherwise identical. If those VPCs happened to each have a NIC with the same MAC, the rule would be applied to both of those.
If that sounds right to you as well, then I think we to check the identity of the VPC and the MAC of the Port
. The vni
field of the NetworkInterface
sent in in the rules is bijective with the VPC identity, so that's the best thing to use at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 1994a43, thanks.
Also return an error on default rules we don't understand during the walk.
Heads up: connectivity inbound to a guest will fail when firewall rules are populated until oxidecomputer/opte#252 lands. |
Okay, oxidecomputer/opte#252 is fixed. So inbound connectivity should work with latest xde. If it doesn't please let me know. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My apologies for the long delay. There's a lot of great work here, thank you! I've mostly got questions about improving names or comments, but the functionality itself looks great.
let no_networks: Vec<IpNetwork> = Vec::new(); | ||
let no_interfaces: Vec<NetworkInterface> = Vec::new(); | ||
|
||
let mut instance_interfaces: NicMap = HashMap::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can implement Hash
manually, probably by hashing all the fields. Not sure if it's worth it, up to you at this point, since I don't want to further delay this PR unnecessarily.
@bnaecker: Thanks for the very helpful second review! The one remaining concern I have that I'd like to fix is that if you use a non-existent VPC name in a target or host filter, things currently fail in a rather opaque way. I'm going to try to fix that today (Wed), and then I think it'll be in ok shape (though see also issue #1756, which I think is serious but not blocking). However, I'd like to maybe delay actually merging this until @smklein branches for the demo next week. Since this change involves touching many critical pieces (including instance creation), does not have integration tests, and is not necessary for that demo, I think it's a bit risky to land with so little time for testing. Other opinions welcome. |
That's all excellent, thanks for chasing down those threads @plotnick! I think your plan of merging after the demo branch is created sounds very wise. I'm approving this now, feel free to merge whenever that happens. And again, nice work. It's really excellent to see such a core networking primitive start to take shape! |
Crucible changes are: Make `ImpactedBlocks` an absolute range (#1685) update omicron (#1687) Update Rust crate chrono to v0.4.40 (#1682) Update Rust crate tempfile to v3.19.0 (#1676) Log loops during crutest replay (#1674) Refactor live-repair start and send (#1673) Update Rust crate libc to v0.2.171 (#1684) Update Rust crate clap to v4.5.32 (#1683) Update Rust crate async-trait to 0.1.88 (#1680) Update Rust crate bytes to v1.10.1 (#1681) Update Rust crate anyhow to v1.0.97 (#1679) Update Rust crate http to v1 (#1678) Update Rust crate tokio-util to v0.7.14 (#1675) Update Rust crate thiserror to v2 (#1657) Update Rust crate omicron-zone-package to 0.12.0 (#1647) Update Rust crate opentelemetry to 0.28.0 (#1543) Update Rust crate itertools to 0.14.0 (#1646) Update Rust crate clearscreen to v4 (#1656) nightly test polish (#1665) Update Rust crate strum_macros to 0.27 (#1654) Update Rust crate strum to 0.27 (#1653) Update Rust crate rusqlite to 0.34 (#1652) Better return semantics from should_send Trying to simplify enqueue Remove unnecessary argument Remove unused code from `ImpactedBlocks` (#1668) Log when pantry/volume layers activate things (#1670) Fix unused import (#1669) Use `RangeSet` instead of tracking every complete job (#1663) Update Rust crate dropshot to 0.16.0 (#1645) Simplify pending work tracking (#1662) Remove rusqlite dependency from crucible-common (#1659) Update Rust crate reedline to 0.38.0 (#1651) Update Rust crate proptest to 1.6.0 (#1648) Update Rust crate bytes to v1.10.0 (#1644) Update Rust crate tracing-subscriber to 0.3.19 (#1643) Update Rust crate tracing to v0.1.41 (#1642) Update Rust crate toml to v0.8.20 (#1641) Update Rust crate thiserror to v1.0.69 (#1640) Update Rust crate serde_json to v1.0.139 (#1639) Update Rust crate serde to v1.0.218 (#1638) Update Rust crate reqwest to v0.12.12 (#1637) Update Rust crate libc to v0.2.170 (#1636) Update Rust crate indicatif to 0.17.11 (#1635) Update Rust crate httptest to 0.16.3 (#1634) Update Rust crate clap to v4.5.31 (#1632) Update Rust crate chrono to v0.4.39 (#1631) Update Rust crate byte-unit to 5.1.6 (#1630) Update Rust crate async-trait to 0.1.86 (#1629) Update Rust crate csv to 1.3.1 (#1633) Update Rust crate anyhow to v1.0.96 (#1628) Fix a test flake (#1626) Bitpack per-block slot data in memory (#1625) Use `div_ceil` instead of `(x + 7) / 8` (#1624) Add repair server dynamometer (#1618) Propolis changes are: Paper over minor() differences Update crucible (#886) Test oximeter metrics end to end (#855) server: improve behavior of starting VMs that are waiting for Crucible activation (#873) server: extend API request queue's memory of prior requests (#880) Use production propolis-server flags in PHD CI builds (#878) Added a new field to the Board struct that propolis requires.
Crucible changes are: Make `ImpactedBlocks` an absolute range (#1685) update omicron (#1687) Update Rust crate chrono to v0.4.40 (#1682) Update Rust crate tempfile to v3.19.0 (#1676) Log loops during crutest replay (#1674) Refactor live-repair start and send (#1673) Update Rust crate libc to v0.2.171 (#1684) Update Rust crate clap to v4.5.32 (#1683) Update Rust crate async-trait to 0.1.88 (#1680) Update Rust crate bytes to v1.10.1 (#1681) Update Rust crate anyhow to v1.0.97 (#1679) Update Rust crate http to v1 (#1678) Update Rust crate tokio-util to v0.7.14 (#1675) Update Rust crate thiserror to v2 (#1657) Update Rust crate omicron-zone-package to 0.12.0 (#1647) Update Rust crate opentelemetry to 0.28.0 (#1543) Update Rust crate itertools to 0.14.0 (#1646) Update Rust crate clearscreen to v4 (#1656) nightly test polish (#1665) Update Rust crate strum_macros to 0.27 (#1654) Update Rust crate strum to 0.27 (#1653) Update Rust crate rusqlite to 0.34 (#1652) Better return semantics from should_send Trying to simplify enqueue Remove unnecessary argument Remove unused code from `ImpactedBlocks` (#1668) Log when pantry/volume layers activate things (#1670) Fix unused import (#1669) Use `RangeSet` instead of tracking every complete job (#1663) Update Rust crate dropshot to 0.16.0 (#1645) Simplify pending work tracking (#1662) Remove rusqlite dependency from crucible-common (#1659) Update Rust crate reedline to 0.38.0 (#1651) Update Rust crate proptest to 1.6.0 (#1648) Update Rust crate bytes to v1.10.0 (#1644) Update Rust crate tracing-subscriber to 0.3.19 (#1643) Update Rust crate tracing to v0.1.41 (#1642) Update Rust crate toml to v0.8.20 (#1641) Update Rust crate thiserror to v1.0.69 (#1640) Update Rust crate serde_json to v1.0.139 (#1639) Update Rust crate serde to v1.0.218 (#1638) Update Rust crate reqwest to v0.12.12 (#1637) Update Rust crate libc to v0.2.170 (#1636) Update Rust crate indicatif to 0.17.11 (#1635) Update Rust crate httptest to 0.16.3 (#1634) Update Rust crate clap to v4.5.31 (#1632) Update Rust crate chrono to v0.4.39 (#1631) Update Rust crate byte-unit to 5.1.6 (#1630) Update Rust crate async-trait to 0.1.86 (#1629) Update Rust crate csv to 1.3.1 (#1633) Update Rust crate anyhow to v1.0.96 (#1628) Fix a test flake (#1626) Bitpack per-block slot data in memory (#1625) Use `div_ceil` instead of `(x + 7) / 8` (#1624) Add repair server dynamometer (#1618) Propolis changes are: Paper over minor() differences Update crucible (#886) Test oximeter metrics end to end (#855) server: improve behavior of starting VMs that are waiting for Crucible activation (#873) server: extend API request queue's memory of prior requests (#880) Use production propolis-server flags in PHD CI builds (#878) --------- Co-authored-by: Alan Hanson <alan@oxide.computer>
Plumb VPC firewall rules through Nexus → sled-agent → OPTE. Builds on @bnaecker's work in the propagate-firewall-rules-to-sled branch, which in turn was based on work started in PR #465.
Outstanding issues:
Target interfaces are matched by MAC address only right now. Is this ok?Use (VNI, MAC) pair to match.The console doesn't support these yet either, but the API does.cross-VPC firewall target unsupported