-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI failures: 20180927-20181002 #18
Comments
Progress
In Progress
|
|
Windows fanned can now resume. RPI job disabled. |
@Joyee could you add a few lines before the match? Shows:
but a key line is 2 line Before:
|
@refack I tried to do that before but not every real cause of I could strip the |
After talking a bit with @addaleax, I did a bisect using a series of stress tests to determine what commit introduced the unreliability for Taking our "approving the PRl means you are willing to take responsibility for the code" statement seriously, /ping @cjihrig @jasnell @mcollina @BridgeAR Ref: https://gist.github.com/Trott/d3c3b5e4419497fdc8651ac2d5f805c7 |
Since it's a libuv bump, It might need further unpacking: https://github.com/libuv/libuv/releases/tag/v1.23.1:
|
Going through the changelog:
libuv/libuv@90891b4 – Wouldn’t be obvious why, but this seems like a potential culprit in that it affects kinda fundamental libuv behaviour cross-platform. Maybe.
libuv/libuv@1391a3d – I sure hope this isn’t it :)
libuv/libuv@ff45b0d – No.
libuv/libuv@ff45b0d...fa5c1d9 – This seems unlikely because it doesn’t affect Windows?
libuv/libuv@89a9ea6 – No.
libuv/libuv@c0c672e – No.
libuv/libuv@c0c672e...f43c663 – This doesn’t affect Windows.
libuv/libuv@153ea11 – This only affects Windows, making it very unlikely as well, if I understand correctly?
libuv/libuv@baa8146 – No.
libuv/libuv@baa621c – No.
libuv/libuv@abe9e01 – No.
libuv/libuv@8813dca – No.
libuv/libuv@57b3363 – Only affects UDP on Windows, doesn’t seem likely at all.
libuv/libuv@b721891 – No.
libuv/libuv@bb1a49e – No.
libuv/libuv@956bf6b – I’d go with No.
libuv/libuv@4049879 – Hm… I also wouldn’t understand why, but this is, again, a cross-platform change that affects a number of things. The Windows-specific part is very small, though – essentially one added call to Very soft Maybe. Here’s a stress test for libuv/libuv@90891b4 being reverted: |
Thanks @addaleax I was just stating to do the same 🥇 On a more systematic approach, I'll try to devise a CI stress test that we should use for future uv and V8 bumps. |
I think the root cause might be that my patch prevents all of the 500 DNS requests per test from finishing under some circumstances. GC might be a red herring here, it might only be showing up in these tests because they are the only ones stress-testing the threadpool like that. Also, I think I screwed up the stress test because I rebuild one of Rich’s ones, but that was for a commit where the tests were still in parallel/, not sequential/. New attempt: https://ci.nodejs.org/job/node-stress-single-test/2059/ |
The stress test is green on FreeBSD. That is probably not conclusive evidence, though, because most of Rich's I'm building on Windows to see if I can reproduce + try to address the issue there. My first guess would be a race condition on |
90891b4 introduced a race condition when accessing `slow_io_work_running` – it is being increased and later decreased as part of the worker thread loop, but was accessed with different mutexes during these operations. This fixes the race condition by making sure both accesses are protected through the global `mutex` of `threadpool.c`. This fixes a number of flaky Node.js tests. Refs: libuv#1845 Refs: nodejs/reliability#18 Refs: nodejs/node#23089 Refs: nodejs/node#23067 Refs: nodejs/node#23066 Refs: nodejs/node#23219
90891b4232e91dbd7a2e2077e4d23d16a374b41d introduced a race condition when accessing `slow_io_work_running` – it is being increased and later decreased as part of the worker thread loop, but was accessed with different mutexes during these operations. This fixes the race condition by making sure both accesses are protected through the global `mutex` of `threadpool.c`. This fixes a number of flaky Node.js tests. Refs: libuv/libuv#1845 Refs: nodejs/reliability#18 Refs: nodejs#23089 Refs: nodejs#23067 Refs: nodejs#23066 Refs: nodejs#23219
The Windows stress test doesn’t have the characteristic failures that we’re looking for, although there are other failures in there:
I’m not sure how that can happen, but I believe it’s independent of the bug that is making the test flaky in non-stress-test situations. Maybe Windows is running out of ports in some way…? @nodejs/platform-windows |
Likely libuv fix is up @ libuv/libuv#2021 |
FreeBSD ended up being a canary. If it was red, something else was wrong than the test. Windows was the definitively problematic host and AIX was also problematic, but at 1/10 the rate that Windows was experiencing. |
|
I guess that could be the case even for https://github.com/libuv/libuv/blob/v1.x/src/win/tcp.c#L283-L291 |
90891b4 introduced a race condition when accessing `slow_io_work_running` – it is being increased and later decreased as part of the worker thread loop, but was accessed with different mutexes during these operations. This fixes the race condition by making sure both accesses are protected through the global `mutex` of `threadpool.c`. This fixes a number of flaky Node.js tests. Refs: #1845 Refs: nodejs/reliability#18 Refs: nodejs/node#23089 Refs: nodejs/node#23067 Refs: nodejs/node#23066 Refs: nodejs/node#23219 PR-URL: #2021 Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Failures in node-test-pull-request/17485 to node-test-pull-request/17581 that failed more than 2 PRs
Jenkins Failure
Build Failure
Failed in Propagate Binaries phase (git-nodesource-update-reference)
Example
sh: line 42: pgrep: command not found
Example
JSTest Failure
sequential/test-gc-http-client-timeout
Example
parallel/test-net-connect-options-port
Example
sequential/test-gc-http-client
Example
sequential/test-gc-http-client-onerror
Example
sequential/test-gc-net-timeout
Example
parallel/test-gc-http-client-connaborted
Example
sequential/test-gc-http-client-connaborted
Example
message/max_tick_depth
Example
parallel/test-crypto-pbkdf2
Example
parallel/test-crypto-scrypt
Example
parallel/test-gc-http-client
Example
parallel/test-gc-http-client-onerror
Example
parallel/test-gc-http-client-timeout
Example
parallel/test-gc-net-timeout
Example
parallel/test-http2-client-upload
Example
Git Failure
hudson.plugins.git.GitException: Command "git fetch --no-tags --progress git@github.com:nodejs/node.git +refs/heads/:refs/remotes/origin/" returned status code 143:
Example
hudson.plugins.git.GitException: Command "git fetch --tags --progress git@github.com:nodejs/node.git +refs/heads/:refs/remotes/origin/" returned status code 143:
Example
The text was updated successfully, but these errors were encountered: