Skip to content

Fix Vxlan code + small additions #509

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
May 23, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions net/src/headers.rs
Original file line number Diff line number Diff line change
Expand Up @@ -319,6 +319,11 @@ impl Headers {
Headers::default()
}

/// Add / Replace Ethernet header
pub fn set_eth(&mut self, eth: Eth) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically this method is unsafe in the sense that you can set an ethernet header with an ethertype which doesn't match the lower layers of the frame. Will think about a good way to fix this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having thought about it some more, I don't think we can fix this safety issue without requiring the user to go through a strongly typed builder API on every packet manipulation.

The actual fix here should be to make the API contract of the packet / headers types free form and then create a validated type wrapper which does not allow mutations without consuming the structure back into the mutable form.

That will prevent us from sending (or accepting) total nonsense while still giving us the latitude to actually manipulate packets.

That said, wherever possible, we should still forbid creating nonsense packets; moving errors closer to their actual cause point is just good practice anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say it's the caller's responsibility to set the ether-type. I mean, IMO, the net crate should provide the machinery to build packets, but can't be foreseeing all possible combinations and, if it ought to, it should probably be after the fact maybe when deparsing(), or via some validation method to be called explicitly?

self.eth = Some(eth);
}

/// Push a VLAN header to the top of the stack.
///
/// # Errors:
Expand Down
2 changes: 1 addition & 1 deletion net/src/ipv4/dscp.rs
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ impl Dscp {
///
/// Will return an [`InvalidDscpError`] if the supplied value for `raw` exceeds 6-bits.
#[allow(dead_code)]
fn new(raw: u8) -> Result<Dscp, InvalidDscpError> {
pub fn new(raw: u8) -> Result<Dscp, InvalidDscpError> {
Ok(Dscp(
IpDscp::try_new(raw).map_err(|e| InvalidDscpError::TooBig(e.actual))?,
))
Expand Down
47 changes: 47 additions & 0 deletions net/src/ipv4/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,16 @@ pub use contract::*;
#[derive(Debug, Clone, PartialEq, Eq, Default)]
pub struct Ipv4(Ipv4Header);

/// Error describing illegal length in an IPv4 header
#[derive(Debug, thiserror::Error)]
#[error(
"Invalid IPv4 length requested: {requested}, max is {max} when considering all options and headers"
)]
pub struct Ipv4LengthError {
requested: usize,
max: usize,
}

impl Ipv4 {
/// The minimum length of an IPv4 header (i.e., a header with no options)
#[allow(clippy::unwrap_used)] // const-eval and trivially safe
Expand Down Expand Up @@ -191,6 +201,7 @@ impl Ipv4 {
return Err(TtlAlreadyZero);
}
self.0.time_to_live -= 1;
self.update_checksum();
Ok(())
}

Expand Down Expand Up @@ -257,6 +268,42 @@ impl Ipv4 {
self.0.protocol = next_header.0;
self
}

/// get the header checksum
#[must_use]
pub fn checksum(&self) -> u16 {
self.0.header_checksum
}

/// set the header checksum
pub fn set_checksum(&mut self, checksum: u16) -> &mut Self {
self.0.header_checksum = checksum;
self
}

/// update the header checksum
pub fn update_checksum(&mut self) -> &mut Self {
self.set_checksum(self.0.calc_header_checksum());
self
}

/// Set the length _of the payload_ of the ipv4 packet.
///
/// This method will adjust the total length of the header to account for options and the length
/// of this header.
///
/// This method _will not_ update the checksum of the header.
/// # Errors
/// This method returns [`Ipv4LengthError`] if the value is too big
pub fn set_payload_len(&mut self, payload_len: u16) -> Result<(), Ipv4LengthError> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, technically an unsafe method. Invariants pushed to the caller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is probably a distinct case. Here consistency matters a bit more. There's distinct ways we could enforce correctness. Either with validation or by "encapsulation"; i.e. somewhat emulate a stack by wrapping layers. But this requires a significant refactor and APIs? E.g. you could have something like packet.eth_encap( Params ), with packet being anything above L2 (ARP, IPv4, IPv6). This would get the packet, automatically determine ethertype and use params to set src/dst macs. Similarly you could have something like packet.ip_encap( Params ) (where packet would be assumed to contain L4 stuff maybe) and the params contain the fields that would make sense for the user to set. Protocol, header-length, packet length, and so on could be set automatically. So you could do things like:

udp_packet.ip_encap( l3-params ).eth_encap( l2-params )

match self.0.set_payload_len(payload_len as usize) {
Ok(()) => Ok(()),
Err(err) => Err(Ipv4LengthError {
requested: payload_len as usize + self.header_len(),
max: err.max_allowed,
}),
}
}
}

/// Error which is triggered when decrementing the TTL which is already zero.
Expand Down
15 changes: 13 additions & 2 deletions net/src/packet/display.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,15 @@
//! Display of Packets

use crate::eth::Eth;
use crate::headers::{Headers, Net, Transport};
use crate::icmp4::Icmp4;
use crate::icmp6::Icmp6;
use crate::ipv4::Ipv4;
use crate::ipv6::Ipv6;
use crate::tcp::Tcp;
use crate::udp::Udp;
use crate::udp::{Udp, UdpEncap};

use crate::buffer::PacketBufferMut;
use crate::headers::{Headers, Net, Transport};
use crate::packet::{BridgeDomain, DoneReason, InterfaceId, InvalidPacket, Packet, PacketMeta};
use std::fmt::{Display, Formatter};

Expand Down Expand Up @@ -161,6 +161,14 @@ impl Display for Transport {
}
}
}
impl Display for UdpEncap {
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
write!(f, " ENCAP:")?;
match self {
UdpEncap::Vxlan(vxlan) => writeln!(f, " vxlan, vni={}", vxlan.vni()),
}
}
}

/* ============= All headers ============ */
impl Display for Headers {
Expand All @@ -175,6 +183,9 @@ impl Display for Headers {
if let Some(transport) = &self.transport {
write!(f, "{transport}")?;
}
if let Some(udp_encap) = &self.udp_encap {
write!(f, "{udp_encap}")?;
}
Ok(())
}
}
Expand Down
28 changes: 27 additions & 1 deletion net/src/packet/hash.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@

//! Module to compute packet hashes

use crate::buffer::PacketBufferMut;
use crate::headers::{Net, Transport, TryHeaders, TryIp, TryTransport};
use crate::packet::Packet;
use crate::{buffer::PacketBufferMut, headers::TryEth};
use ahash::AHasher;
use std::hash::{Hash, Hasher};

Expand Down Expand Up @@ -44,13 +44,39 @@ impl<Buf: PacketBufferMut> Packet<Buf> {
}
}

/// Computes a hash over a `Packet` including Ethernet header, vlans if present and IP invariant fields
pub fn hash_l2_frame<H: Hasher>(&self, state: &mut H) {
// ethernet
if let Some(eth) = self.headers().try_eth() {
eth.source().hash(state);
eth.destination().hash(state);
eth.ether_type().hash(state);
}
// vlan tags - we don't include PCP/DEI
for tag in &self.headers.vlan {
tag.vid().hash(state);
}
// Ip and transport
self.hash_ip(state);
}

#[allow(unused)]
/// Uses the ip hash `Packet` method to provide a value in the range [first, last].
pub fn packet_hash_ecmp(&self, first: u8, last: u8) -> u64 {
let mut hasher = AHasher::default();
self.hash_ip(&mut hasher);
hasher.finish() % u64::from(last - first + 1) + u64::from(first)
}

#[allow(unused)]
/// Uses the `hash_l2_frame` `Packet` method to provide a hash in the range [49152,65535] suitable
/// as UDP source port for vxlan-encapsulated packets, as recommended by RFC7348.
#[allow(clippy::cast_possible_truncation)]
pub fn packet_hash_vxlan(&self) -> u16 {
let mut hasher = AHasher::default();
self.hash_l2_frame(&mut hasher);
(hasher.finish() % (65535u64 - 49152 + 1) + 49152u64) as u16
}
}

#[cfg(test)]
Expand Down
2 changes: 2 additions & 0 deletions net/src/packet/meta.rs
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,8 @@ pub enum DoneReason {
Filtered, /* The packet was administratively filtered */
Unhandled, /* there exists no support to handle this type of packet */
MissL2resolution, /* adjacency failure: we don't know mac of some ip next-hop */
Malformed, /* the packet does not conform / is malformed */
MissingEtherType, /* can't determine ethertype to use */
Delivered, /* the packet buffer was delivered by the NF - e.g. for xmit */
}

Expand Down
72 changes: 52 additions & 20 deletions net/src/packet/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -11,21 +11,21 @@ mod meta;
pub mod test_utils;

use crate::buffer::{Headroom, PacketBufferMut, Prepend, Tailroom, TrimFromStart};
use crate::eth::Eth;
use crate::eth::EthError;
use crate::headers::{
AbstractHeaders, AbstractHeadersMut, Headers, Net, TryHeaders, TryHeadersMut, TryIpMut,
TryUdpMut, TryVxlan,
AbstractHeaders, AbstractHeadersMut, Headers, Net, Transport, TryHeaders, TryHeadersMut,
TryIpMut, TryVxlan,
};
use crate::parse::{DeParse, Parse, ParseError};
use crate::udp::Udp;
use crate::vxlan::{Vxlan, VxlanEncap};

use crate::vxlan::{Vxlan, VxlanEncap};
#[allow(unused_imports)] // re-export
pub use hash::*;
#[allow(unused_imports)] // re-export
pub use meta::*;
use std::num::NonZero;
use tracing::debug;

mod utils;

Expand Down Expand Up @@ -61,7 +61,7 @@ impl<Buf: PacketBufferMut> Packet<Buf> {
}
};
mbuf.trim_from_start(consumed.get())
.unwrap_or_else(|_| unreachable!());
.unwrap_or_else(|e| unreachable!("{:?}", e));
Ok(Packet {
headers,
payload: mbuf,
Expand All @@ -74,6 +74,11 @@ impl<Buf: PacketBufferMut> Packet<Buf> {
&self.payload
}

/// Add / Replace Ethernet header
pub fn set_eth(&mut self, eth: Eth) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically unsafe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you want to proceed? You want me to annotate all those methods as unsafe?

self.headers.set_eth(eth);
}

/// Get the length of the packet's payload
///
/// # Note
Expand Down Expand Up @@ -133,10 +138,7 @@ impl<Buf: PacketBufferMut> Packet<Buf> {
/// ```
pub fn vxlan_decap(&mut self) -> Option<Result<Vxlan, ParseError<EthError>>> {
match self.headers.try_vxlan() {
None => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we removing the log here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We call vxlan_decap for every packet destined to the gateway, but the gateway may receive packets that are not vxlan (control-plane, icmp, etc..). So the prior log would show for every packet sent to it.

debug!("attempted to remove VXLAN header from non-vxlan packet");
None
}
None => None,
Some(vxlan) => {
match Headers::parse(self.payload.as_ref()) {
Ok((headers, consumed)) => {
Expand All @@ -161,7 +163,10 @@ impl<Buf: PacketBufferMut> Packet<Buf> {

/// Encapsulate the packet in the supplied [`Vxlan`] [`Headers`]
///
/// The supplied [`Headers`] will be validated to ensure they form a VXLAN header.
/// * The supplied [`Headers`] will be validated to ensure they form a VXLAN header.
/// * If the supplied headers describe an IPv4 encapsulation, then the IPv4 checksum will be
/// updated.
/// * The IPv4 / IPv6 headers will be updated to correctly describe the length of the packet.
///
/// # Errors
///
Expand All @@ -175,6 +180,7 @@ impl<Buf: PacketBufferMut> Packet<Buf> {
/// This is extremely unlikely in that the maximum mbuf length is far less than that, and we
/// don't currently support multi-segment packets.
pub fn vxlan_encap(&mut self, params: &VxlanEncap) -> Result<(), <Buf as Prepend>::Error> {
//compute room required
let needed = self.headers.size().get();
let buf = self.payload.prepend(needed)?;
self.headers
Expand All @@ -187,9 +193,26 @@ impl<Buf: PacketBufferMut> Packet<Buf> {
u16::try_from(len).is_ok(),
"encap would result in frame larger than 2^16 bytes"
);

// compute UDP entropy source port for UDP header
let udp_src_port = self
.packet_hash_vxlan()
.try_into()
.unwrap_or_else(|_| unreachable!());

// build UDP header for Vxlan, setting ports, length and checksum.
let mut udp = Udp::new(udp_src_port, Vxlan::PORT);
udp.set_checksum(0); /* optional in ipv4 (must be zero if unused) */

#[allow(clippy::cast_possible_truncation)] // checked
let udp_len = NonZero::new(len as u16).unwrap_or_else(|| unreachable!());
#[allow(unsafe_code)] // sound usage due to length check
unsafe {
udp.set_length(udp_len);
}

let mut headers = params.headers().clone();
headers.transport = Some(Transport::Udp(udp));
match headers.try_ip_mut() {
None => unreachable!(),
Some(Net::Ipv6(ipv6)) => {
Expand All @@ -199,18 +222,13 @@ impl<Buf: PacketBufferMut> Packet<Buf> {
ipv6.set_payload_length(udp_len.get());
}
}
Some(Net::Ipv4(_)) => { /* nothing to do here */ }
}
match headers.try_udp_mut() {
None => {
unreachable!();
Some(Net::Ipv4(ipv4)) => {
ipv4.set_payload_len(udp_len.get())
.unwrap_or_else(|e| unreachable!("{:?}", e));
ipv4.update_checksum();
}
#[allow(unsafe_code)] // sound usage due to length check
Some(udp) => unsafe {
udp.set_length(udp_len);
udp.set_checksum(0);
},
}
self.headers = headers;
Ok(())
}

Expand Down Expand Up @@ -311,4 +329,18 @@ impl<Buf: PacketBufferMut> Packet<Buf> {
pub fn get_meta_mut(&mut self) -> &mut PacketMeta {
&mut self.meta
}

/// Wraps a packet in an `Option` depending on the metadata:
/// If [`Packet`] is to be dropped, returns `None`. Else, `Some`.
pub fn enforce(self) -> Option<Self> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow the logic for the naming here. Why enforce?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we iterate over a burst of packets, we pervasively use filter_map which requires returning Some(packet) or None to indicate if the packet should be returned or not. The routing stages (as currently defined) annotate in the packet metadata what to do next: 1) no annotation, 2) some drop condition (there are plenty) 3) whether the packet can be delivered etc. These annotations happen inside the pipeline functions. The enforce is just some sugar syntax to, based on those, determine if Some(packet) / None is to be returned. So, most stages end up calling packet.enforce(). This allows hiding all that logic to the stages and allowing to change it without needing to change them. That's the explanation. Given this purpose, do you have suggestions for the naming? You're the native speaker here!

#[cfg(test)]
if self.meta.keep {
// ignore the request to drop and keep the packet instead.
return Some(self);
}
match self.get_done() {
Some(DoneReason::Delivered) | None => Some(self),
Some(_) => None,
}
}
}
Loading
Loading