Skip to content

Conversation

@asomers
Copy link
Member

@asomers asomers commented Oct 12, 2025

Recently a transitive dev-dependency published a new version that is incompatible with Nix's MSRV, causing CI to break. That's annoying, because there's really no reason why a crate's dev-dependencies ought to respect the MSRV. Fix Nix's CI by:

  • Running all tests with stable Rust (or nightly, for certain targets)
  • Adding an additional set of CI checks to ensure that the crate will compile with MSRV, on all of the major operating systems, but don't try to compile its tests.
  • Eliminate the now-redundant "rust-stable" CI task.

What does this PR do

Checklist:

  • I have read CONTRIBUTING.md
  • I have written necessary tests and rustdoc comments
  • A change log has been added if this PR modifies nix's API

Recently a transitive dev-dependency published a new version that is
incompatible with Nix's MSRV, causing CI to break.  That's annoying,
because there's really no reason why a crate's dev-dependencies ought to
respect the MSRV.  Fix Nix's CI by:

* Running all tests with stable Rust (or nightly, for certain targets)
* Adding an additional set of CI checks to ensure that the crate will
  compile with MSRV, on all of the major operating systems, but don't
  try to compile its tests.
* Eliminate the now-redundant "rust-stable" CI task.
I think these never showed up in CI before, because we've never run
clippy on Rust stable on apple until now.
Since changing the compiler version, this test has begun to occasionally
hang in CI.  And it's not just in QEMU.  It's probably due to an atomic
ordering problem.
@asomers
Copy link
Member Author

asomers commented Oct 19, 2025

The lio_listio doc tests are still hanging, sometimes, on aarch64 linux and i686-unknown-linux-musl . I guess to debug that I'll need to install a new Linux VM locally and try to duplicate the i686-unknown-linux-musl failure.

@SteveLauC
Copy link
Member

The lio_listio doc tests are still hanging, sometimes, on aarch64 linux and i686-unknown-linux-musl . I guess to debug that I'll need to install a new Linux VM locally and try to duplicate the i686-unknown-linux-musl failure.

It is easy for me to set up an aarch64 Linu VM, so I will work on this

@SteveLauC
Copy link
Member

SteveLauC commented Nov 4, 2025

Hi @asomers, I would like to know your thoughts on this comment, maybe we should test the MSRV with deps of minimal versions as well

@SteveLauC
Copy link
Member

Well, I cannot reproduce the issue locally 🥲:

steve@ubuntu:~/nix$ git status
On branch msrv-stable

# Run tests 30 times
steve@ubuntu:~/nix$ cat test.sh
count=30
for i in $(seq $count); do
    cargo test --doc --all-features sys::aio::lio_listio
done

steve@ubuntu:~/nix$ bash test.sh > /dev/null 2>/dev/null
steve@ubuntu:~/nix$ echo $?
0

I think it is possible that our signal handler set in the test can be
overwritten by integration tests (under dir 'test'), which means the
`SIGNALED` variable would never be set.

We have a lock to prevent this issue in integration tests, but since this
is a doc test, we cannot employ it here.

```rust
/// Any test that alters signal handling must grab this mutex.
pub static SIGNAL_MTX: Mutex<()> = Mutex::new(());
```

So let's run tests sequentially and see if it will work.
@SteveLauC
Copy link
Member

SteveLauC commented Nov 4, 2025

Interesting, I just noticed the failed CIs are using musl. And testing with --target aarch64-unknown-linux-musl can indeed reproduce the issue, let me debug the test

$ cargo test --all-features --doc  --target aarch64-unknown-linux-musl sys::aio::lio_listio

@SteveLauC
Copy link
Member

SteveLauC commented Nov 4, 2025

Created a repo for this: https://github.com/SteveLauC/musl_aarch64_memory_order

$ cd musl_aarch64_memory_order/
$ cargo r --target aarch64-unknown-linux-musl

Update:

  1. I added a write(2) to the signal handler, in the cases where the program hung, I didn't see any output in the stdout, which basically means the signal handler didn't get invoked at all
  2. I replaced AtomicBool with a pipe(), write(pipe_tx) in the signal handler and wait for I/O completion via read(pipe_rx). The issue still exists, so I guess it is not realted to memory ordering
  3. BTW, as long as I run the program with strace, it works

@asomers
Copy link
Member Author

asomers commented Nov 4, 2025

Hi @asomers, I would like to know your thoughts on this comment, maybe we should test the MSRV with deps of minimal versions as well

Yes, I've tried the Cargo.lock.msrv before. However, in this case I don't think it will help us, since our only problematic dependency is a dev dependency. And we can work around that just by doing "cargo check" or "cargo build" in CI on MSRV, without doing "cargo test".

@asomers
Copy link
Member Author

asomers commented Nov 4, 2025

I can now reproduce it locally too, using i686-unknown-linux-musl . But not x86_64-unknown-linux-musl. And it would be interesting to try building the tests for i686-unknown-linux-gnu and running them in an x86_64 VM. But I don't know what Debian packages I need to install for that. Do you?
It's possible that this is a musl bug.

@SteveLauC
Copy link
Member

And it would be interesting to try building the tests for i686-unknown-linux-gnu and running them in an x86_64 VM. But I don't know what Debian packages I need to install for that. Do you?

I am not sure about this

It's possible that this is a musl bug.

Yeah, I doubt it as well. But it is strange why we didn't encounter this issue in the past. I just tried running that test with our MSRV toolchain, the program still hung

@asomers
Copy link
Member Author

asomers commented Nov 4, 2025

And it would be interesting to try building the tests for i686-unknown-linux-gnu and running them in an x86_64 VM. But I don't know what Debian packages I need to install for that. Do you?

I am not sure about this

It's possible that this is a musl bug.

Yeah, I doubt it as well. But it is strange why we didn't encounter this issue in the past. I just tried running that test with our MSRV toolchain, the program still hung

It's not our job to debug musl problems. I think we should just disable this test for that environment and move on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants