Skip to content

Conversation

@roypat
Copy link
Member

@roypat roypat commented Jul 18, 2025

Summary of the PR

During the review of #245 we identified various opportunities for simplifying the sock_ctrl_msg.rs code and tests. This PR first unifies some tests and fixes a bug where CMSG_TRUNC wasn't properly handled if the given slice of RawFds to receive was empty. Then, simplify code for parsing arbitrary cmsg(3) structures, as this is not actually needed (the library constructs the msghdr itself, and thus knows exactly the layout of the contained cmsghdr structs). Lastly, switch over to treating cmsghdr as fam.rs FamStructs, to remove a ton of unsafe code that was used to do an ad-hoc re-implementation of fam struct handling.

Requirements

Before submitting your PR, please make sure you addressed the following
requirements:

  • All commits in this PR have Signed-Off-By trailers (with
    git commit -s), and the commit message has max 60 characters for the
    summary and max 75 characters for each description line.
  • All added/changed functionality has a corresponding unit/integration
    test.
  • All added/changed public-facing functionality has entries in the "Upcoming
    Release" section of CHANGELOG.md (if no such section exists, please create one).
  • Any newly added unsafe code is properly documented.

@roypat roypat force-pushed the sockctrl-refactor branch from 9337b2d to 042c82f Compare July 18, 2025 13:48
@roypat roypat force-pushed the sockctrl-refactor branch 2 times, most recently from 08cf044 to 0f76eb7 Compare July 21, 2025 12:13
@roypat roypat force-pushed the sockctrl-refactor branch 3 times, most recently from 3057eec to 009475c Compare July 30, 2025 15:12
Copy link
Member

@stefano-garzarella stefano-garzarella left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for this!

@roypat Have you had a chance to test it on macOS?

@roypat
Copy link
Member Author

roypat commented Jul 31, 2025

LGTM! Thanks for this!

@roypat Have you had a chance to test it on macOS?

I've only build-tested it via cargo check for the mac target. @ShadowCurse when you come back from PTO, can you give this a try locally on your Mac to make sure I didn't break anything? :o

@stefano-garzarella
Copy link
Member

LGTM! Thanks for this!
@roypat Have you had a chance to test it on macOS?

I've only build-tested it via cargo check for the mac target. @ShadowCurse when you come back from PTO, can you give this a try locally on your Mac to make sure I didn't break anything? :o

I can try it too on my wife mac (I used it to debug the issue). @uran0sH can you also give this a try?

@stefano-garzarella
Copy link
Member

unix::sock_ctrl_msg::tests::send_more_recv_less1 is failing :-(

On main:

$ git describe                                                               on git:main->origin/main|⚑1 o
v0.11.1-71-g02321df
$ cargo test --lib -- unix::sock_ctrl_msg                                    on git:main->origin/main|⚑1 o
warning: `/Volumes/Work/stefano/repos/vmm-sys-util/.cargo/config` is deprecated in favor of `config.toml`
note: if you need to support cargo 1.38 or earlier, you can symlink `config` to `config.toml`
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.04s
     Running unittests src/lib.rs (target/debug/deps/vmm_sys_util-093378d3c831eda3)

running 5 tests
test unix::sock_ctrl_msg::tests::send_recv_no_fd ... ok
test unix::sock_ctrl_msg::tests::send_recv_only_fd ... ok
test unix::sock_ctrl_msg::tests::send_more_recv_less2 ... ok
test unix::sock_ctrl_msg::tests::send_more_recv_less1 ... ok
test unix::sock_ctrl_msg::tests::send_recv_with_fd ... ok

test result: ok. 5 passed; 0 failed; 0 ignored; 0 measured; 48 filtered out; finished in 0.00s

On this branch:

$ git describe                                                               on git:sockctrl-refactor|⚑1 o
v0.11.1-71-g009475c
$ cargo test --lib -- unix::sock_ctrl_msg                                    on git:sockctrl-refactor|⚑1 o
warning: `/Volumes/Work/stefano/repos/vmm-sys-util/.cargo/config` is deprecated in favor of `config.toml`
note: if you need to support cargo 1.38 or earlier, you can symlink `config` to `config.toml`
   Compiling vmm-sys-util v0.14.0 (/Volumes/Work/stefano/repos/vmm-sys-util)
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.99s
     Running unittests src/lib.rs (target/debug/deps/vmm_sys_util-77e0ef2e162c3722)

running 4 tests
test unix::sock_ctrl_msg::tests::send_recv_with_fd ... ok
test unix::sock_ctrl_msg::tests::send_recv_no_fd ... ok
test unix::sock_ctrl_msg::tests::send_recv_only_fd ... ok
test unix::sock_ctrl_msg::tests::send_more_recv_less1 ... FAILED

failures:

---- unix::sock_ctrl_msg::tests::send_more_recv_less1 stdout ----

thread 'unix::sock_ctrl_msg::tests::send_more_recv_less1' panicked at src/unix/sock_ctrl_msg.rs:535:68:
called `Result::unwrap_err()` on an `Ok` value: (1, 0)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    unix::sock_ctrl_msg::tests::send_more_recv_less1

test result: FAILED. 3 passed; 1 failed; 0 ignored; 0 measured; 43 filtered out; finished in 0.00s

I used --lib because a doc-test is failing (EventFd related, I'll send a PR), and I specified the test filter because others are failing on unix::tempdir that I need to investigate.

@roypat
Copy link
Member Author

roypat commented Jul 31, 2025

unix::sock_ctrl_msg::tests::send_more_recv_less1 is failing :-(

On main:

$ git describe                                                               on git:main->origin/main|⚑1 o
v0.11.1-71-g02321df
$ cargo test --lib -- unix::sock_ctrl_msg                                    on git:main->origin/main|⚑1 o
warning: `/Volumes/Work/stefano/repos/vmm-sys-util/.cargo/config` is deprecated in favor of `config.toml`
note: if you need to support cargo 1.38 or earlier, you can symlink `config` to `config.toml`
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.04s
     Running unittests src/lib.rs (target/debug/deps/vmm_sys_util-093378d3c831eda3)

running 5 tests
test unix::sock_ctrl_msg::tests::send_recv_no_fd ... ok
test unix::sock_ctrl_msg::tests::send_recv_only_fd ... ok
test unix::sock_ctrl_msg::tests::send_more_recv_less2 ... ok
test unix::sock_ctrl_msg::tests::send_more_recv_less1 ... ok
test unix::sock_ctrl_msg::tests::send_recv_with_fd ... ok

test result: ok. 5 passed; 0 failed; 0 ignored; 0 measured; 48 filtered out; finished in 0.00s

On this branch:

$ git describe                                                               on git:sockctrl-refactor|⚑1 o
v0.11.1-71-g009475c
$ cargo test --lib -- unix::sock_ctrl_msg                                    on git:sockctrl-refactor|⚑1 o
warning: `/Volumes/Work/stefano/repos/vmm-sys-util/.cargo/config` is deprecated in favor of `config.toml`
note: if you need to support cargo 1.38 or earlier, you can symlink `config` to `config.toml`
   Compiling vmm-sys-util v0.14.0 (/Volumes/Work/stefano/repos/vmm-sys-util)
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.99s
     Running unittests src/lib.rs (target/debug/deps/vmm_sys_util-77e0ef2e162c3722)

running 4 tests
test unix::sock_ctrl_msg::tests::send_recv_with_fd ... ok
test unix::sock_ctrl_msg::tests::send_recv_no_fd ... ok
test unix::sock_ctrl_msg::tests::send_recv_only_fd ... ok
test unix::sock_ctrl_msg::tests::send_more_recv_less1 ... FAILED

failures:

---- unix::sock_ctrl_msg::tests::send_more_recv_less1 stdout ----

thread 'unix::sock_ctrl_msg::tests::send_more_recv_less1' panicked at src/unix/sock_ctrl_msg.rs:535:68:
called `Result::unwrap_err()` on an `Ok` value: (1, 0)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    unix::sock_ctrl_msg::tests::send_more_recv_less1

test result: FAILED. 3 passed; 1 failed; 0 ignored; 0 measured; 43 filtered out; finished in 0.00s

I used --lib because a doc-test is failing (EventFd related, I'll send a PR), and I specified the test filter because others are failing on unix::tempdir that I need to investigate.

Thanks for testing!

huh, I wonder if that's 1b8d717, e.g. Mac doesn't set CMSG_TRUNC if the buffer for control data is empty? Can you adjust the loop in the failing test to start at 1, to see if those cases work at least? If they still fail, I guess that means CMSG_TRUNC isn't set at all by Mac? 🤔 Maybe that's why it set cmsg->len to such a weird value (and the teardown_control_data |= fds.len() > in_fds.len(); I added which should handle that is just broken, so let me fix that up too)

@stefano-garzarella
Copy link
Member

@roypat yeah, with this change tests passes:

diff --git a/src/unix/sock_ctrl_msg.rs b/src/unix/sock_ctrl_msg.rs
index 291de71..6881bff 100644
--- a/src/unix/sock_ctrl_msg.rs
+++ b/src/unix/sock_ctrl_msg.rs
@@ -505,7 +505,7 @@ mod tests {
     // Exercise the code paths that activate the issue of receiving the all the ancillary data,
     // but missing to provide enough buffer space to store it.
     fn send_more_recv_less1() {
-        for too_small in 0..3 {
+        for too_small in 1..3 {
             let (s1, s2) = UnixDatagram::pair().expect("failed to create socket pair");

             let (_, evt_notifier1) = pipe().expect("failed to create pipe");

@roypat
Copy link
Member Author

roypat commented Jul 31, 2025

@roypat yeah, with this change tests passes:

diff --git a/src/unix/sock_ctrl_msg.rs b/src/unix/sock_ctrl_msg.rs
index 291de71..6881bff 100644
--- a/src/unix/sock_ctrl_msg.rs
+++ b/src/unix/sock_ctrl_msg.rs
@@ -505,7 +505,7 @@ mod tests {
     // Exercise the code paths that activate the issue of receiving the all the ancillary data,
     // but missing to provide enough buffer space to store it.
     fn send_more_recv_less1() {
-        for too_small in 0..3 {
+        for too_small in 1..3 {
             let (s1, s2) = UnixDatagram::pair().expect("failed to create socket pair");

             let (_, evt_notifier1) = pipe().expect("failed to create pipe");

ugh, okaaay, so that just works differently on Mac and Linux then. What about the below?

diff --git a/src/unix/sock_ctrl_msg.rs b/src/unix/sock_ctrl_msg.rs
index 291de71dc837..f102ec87ee84 100644
--- a/src/unix/sock_ctrl_msg.rs
+++ b/src/unix/sock_ctrl_msg.rs
@@ -202,13 +202,13 @@ unsafe fn raw_recvmsg(
             // the cmsg->len field to the untruncated length of the message sent,
             // even if the buffer is actually smaller. Compensate for this by reducing
             // the length to be at most the size of the memory we have allocated.
+            teardown_control_data |= cmsg.len() > in_fds.len();
             unsafe { cmsg.set_len(cmsg.len().min(cmsg_capacity)) }
             let fds = cmsg.as_slice();
             // It could be that while constructing the cmsg structures, alignment constraints made
             // our allocation big enough that it fits more than the in_fds.len() file descriptors
             // we intended to receive. Treat this the same way we would treat a truncated message,
             // because there is no way for us to communicate these extra FDs back to the caller.
-            teardown_control_data |= fds.len() > in_fds.len();
             if teardown_control_data {
                 for fd in fds {
                     libc::close(*fd);

(and having the test back to starting the loop at 0)

@stefano-garzarella
Copy link
Member

@roypat nope, same issue as #246 (comment)

We had two test cases that only different by the value of a single
constant. Merge these two cases into a single case that iterates over
all possible constants that will fulfill the test assertions (e.g. all
buffer sizes that are too small for 4 file descriptors).

Don't cover too_small = 0 for now, as due to a bug in raw_recvmsg(), we
dont actually handle CMSG_TRUNC if in_fds is empty, so this test case
would currently incorrectly pass.

While we're here, replace assert!((...).is_err()) with .unwrap_err() so
that in case the test fails we get a meaningful error message.

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
@roypat
Copy link
Member Author

roypat commented Jul 31, 2025

@roypat nope, same issue as #246 (comment)

Alright, in that case, I think the only thing we can do is adjust the test to not do the "in buffer size == 0" thing, and just say it works differently on linux and mac :(

roypat added 3 commits July 31, 2025 11:34
Currently, if in_fds.len() == 0, but the kernel has some control data
to read, raw_recvmsg() returns Ok((_, 0)), despite CMSG_TRUNC being set.
Return ENOBUFS instead, as would be the case if in_fds.len() > 0 yet
smaller than the number of fds the kernel has for us.

Simply removing the early-return is correct, as the only way to
msg_controllen to be less than sizeof::<cmsghdr>() is when the kernel
has zeroed it out because no control data was available, or because it
was zero all along because in_fds.len() == 0. In both cases,
cmsg_ptr.is_null() will be true, we skip over the loop, and the final
return will end up being Ok((_, 0)).

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
raw_recvmsg implemented a loop to be able to process arbitrary control
message chains. However, in this specific function, we know exactly what
the control message chain looks like, because we construct it ourselves.
It always consists of a single cmsghdr structure. So remove all the code
related to processing different layouts.

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
cmsghdr is a fam struct with some additional alignment constraints for
the flexible array member, so we can use the machinery in fam.rs to deal
with it. This comes with the benefit of a _lot_ less unsafe code.

It removes an optimization where sometimes cmsghdr structures were
allocated on the stack if the number of fds to receive was small enough,
but raw_recvmsg was already allocating on the heap for the iovecs
anyway, so this optimization seems a bit pre-mature.

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
@roypat roypat force-pushed the sockctrl-refactor branch from 88032cc to b73a9fd Compare July 31, 2025 10:35
Copy link
Member

@stefano-garzarella stefano-garzarella left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests passed on macOS!

@roypat roypat enabled auto-merge (rebase) August 5, 2025 09:38
@roypat roypat merged commit ee71906 into rust-vmm:main Aug 5, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants