Skip to content

Conversation

@clementperon
Copy link
Contributor

@clementperon clementperon commented Jun 30, 2024

Hi @m3bm3b,

This is a simple workflow to test and run on multiple OS / Architecture

It seems that the Nsync fails on MacOS both X86 and ARM64

Pipeline status on my Fork
https://github.com/clementperon/nsync/actions/runs/9733627516/job/26860771134

@m3bm3b
Copy link
Collaborator

m3bm3b commented Jul 2, 2024

I need to read more about GitHub's CI setup, but at first look it seems like a good idea.

However, I am confused by the test failures you reported.
I have re-run all the tests on both an x86_64 Mac Pro running MacOS 12.7.5,
and an aarch64 Mac Mini (M1) running MacOS 14.5, and the tests all pass.

What hardware was in use for the failures?

@clementperon
Copy link
Contributor Author

clementperon commented Jul 2, 2024

I need to read more about GitHub's CI setup, but at first look it seems like a good idea.

However, I am confused by the test failures you reported. I have re-run all the tests on both an x86_64 Mac Pro running MacOS 12.7.5, and an aarch64 Mac Mini (M1) running MacOS 14.5, and the tests all pass.

What hardware was in use for the failures?

Reading from Github Documentation it's based on M1 more precisely 3 vCPU.
So maybe the Virtualization layer introduce some issue?

The complete OS / Software is here: https://github.com/actions/runner-images/blob/main/images/macos/macos-14-arm64-Readme.md

@m3bm3b
Copy link
Collaborator

m3bm3b commented Jul 2, 2024

So maybe the Virtualization layer introduce some issue?

The test that failed is testing that the timing-related parts of the API are working correctly.
These are things like "wait until this event occurs, but time out after X seconds".

It's hard to test things related to timing since scheduling can delay things arbitrarily.
But on the other hand, it would be bad not to test timing-related behaviour at all, since it's part of the specification.

I could relax the timing, to add more slop.

In that test (cv_wait_example_test.c), the strings "one", "two", "three" are added to a priority queue
by a worker thread, with a 0.5s delay after each add. The main thread waits 1.2s after initiating the
worker thread, then it reads 3 entries from the queue. It should see "one", "three", "two", because
a) after 1.2s, all three of those elements should be on the queue
(the worker should by then by in the 0.5s wait after adding "three")
b) "three" should come out before "two" because priority for this queue is based on lexicographical order, and "three" < "two".

So for this to fail in the way it did, two 0.5s waits (plus some thread scheduling) must have taken more than the 1.2s wait
to complete. Of course, that can happen, but I would not have expected it unless it was on a machine that was
quite heavily loaded: 0.2s is a pretty long time on a modern machine.

If a virtual machine has been given only a small fraction of a CPU to run on, it becomes more likely.
We could make the failure less likely by multiplying all the waits and timeouts by 2, say. The primary annoyance
is that the test would then take twice as long to run, because its timing is dominated by the waits and timeouts.
But it's only a few seconds.

If you wanted to test this conjecture about the cause on that platform, you could look for the ten calls to
nsync_time_ms() in example_cv_wait() in cv_wait_example_test.c
If you double the argument to each of those ten calls (e.g., 1200 -> 2400 etc.), and you double the
values in the string "expected" near the top of that routine (e.g., 0.1s -> 0.2s etc. for the three values in the string)
then the test should take twice as long, still pass, and slow machines will have twice as long to react.
If that works. perhaps I should make that change permanent.

@clementperon
Copy link
Contributor Author

So maybe the Virtualization layer introduce some issue?

The test that failed is testing that the timing-related parts of the API are working correctly. These are things like "wait until this event occurs, but time out after X seconds".

It's hard to test things related to timing since scheduling can delay things arbitrarily. But on the other hand, it would be bad not to test timing-related behaviour at all, since it's part of the specification.

I could relax the timing, to add more slop.

In that test (cv_wait_example_test.c), the strings "one", "two", "three" are added to a priority queue by a worker thread, with a 0.5s delay after each add. The main thread waits 1.2s after initiating the worker thread, then it reads 3 entries from the queue. It should see "one", "three", "two", because a) after 1.2s, all three of those elements should be on the queue (the worker should by then by in the 0.5s wait after adding "three") b) "three" should come out before "two" because priority for this queue is based on lexicographical order, and "three" < "two".

So for this to fail in the way it did, two 0.5s waits (plus some thread scheduling) must have taken more than the 1.2s wait to complete. Of course, that can happen, but I would not have expected it unless it was on a machine that was quite heavily loaded: 0.2s is a pretty long time on a modern machine.

If a virtual machine has been given only a small fraction of a CPU to run on, it becomes more likely. We could make the failure less likely by multiplying all the waits and timeouts by 2, say. The primary annoyance is that the test would then take twice as long to run, because its timing is dominated by the waits and timeouts. But it's only a few seconds.

If you wanted to test this conjecture about the cause on that platform, you could look for the ten calls to nsync_time_ms() in example_cv_wait() in cv_wait_example_test.c If you double the argument to each of those ten calls (e.g., 1200 -> 2400 etc.), and you double the values in the string "expected" near the top of that routine (e.g., 0.1s -> 0.2s etc. for the three values in the string) then the test should take twice as long, still pass, and slow machines will have twice as long to react. If that works. perhaps I should make that change permanent.

It makes sense as sometimes it's working and sometimes not.

@m3bm3b
Copy link
Collaborator

m3bm3b commented Jul 2, 2024

Perhaps I wasn't clear. You need to double all the arguments to all ten of the calls to nsync_time_ms(),
and you need to double the corresponding three times in the "expected" string.

The last push you did increases only the 1200 to 2400. That will probably change the error you get, but it will still fail.

@clementperon clementperon force-pushed the main branch 2 times, most recently from 4f6970c to 88daf00 Compare July 2, 2024 19:15
@clementperon
Copy link
Contributor Author

clementperon commented Jul 2, 2024

@m3bm3b sorry i was in a hurry and read a bit fast your comment.

Should be good now.

This CI passed on my repo but doesn't show properly here 🤔
https://github.com/clementperon/nsync/actions/runs/9766743957

Could be due to:

  push:
    branches: [master]
  pull_request:
    branches: [master]
    ```

@clementperon
Copy link
Contributor Author

I will close this MR and properly reopen one from my master branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants