-
Notifications
You must be signed in to change notification settings - Fork 29.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
investigate flaky internet/test-uv-threadpool-schedule #25305
Comments
@Trott - do you know how many CPUs this (failing) system has? IMO the most probable way this test case fail the way it had, is by real thread (CPU) congestion. I will see if I can replicate this locally; or else request for CI access, if it is very easily reproduced there. thanks! |
if the host type is a single one for this test, I can also calibrate the test for the host - meaning, baseline the ratio of pooled workers for fast I/O vs. slow I/O that suites the host based on its CPU characteristics. |
@gireeshpunathil It seems to be a 4-CPU machine: root@test-rackspace-ubuntu1604-x64-1:~# cat /proc/cpuinfo | grep processor | wc -l
4
root@test-rackspace-ubuntu1604-x64-1:~# |
Failed again today on node-daily-master: https://ci.nodejs.org/job/node-test-commit-custom-suites/811/default/console test-rackspace-ubuntu1604-x64-1 00:02:17 not ok 25 internet/test-uv-threadpool-schedule
00:02:17 ---
00:02:17 duration_ms: 1.316
00:02:17 severity: fail
00:02:17 exitcode: 1
00:02:17 stack: |-
00:02:17 assert.js:351
00:02:17 throw err;
00:02:17 ^
00:02:17
00:02:17 AssertionError [ERR_ASSERTION]: fast I/O took longer to complete, actual: 16, expected: 10.08
00:02:17 at GetAddrInfoReqWrap.onResolve (/home/iojs/build/workspace/node-test-commit-custom-suites/default/test/internet/test-uv-threadpool-schedule.js:41:12)
00:02:17 at GetAddrInfoReqWrap.callback (/home/iojs/build/workspace/node-test-commit-custom-suites/default/test/common/index.js:376:15)
00:02:17 at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:57:17)
00:02:17 ... |
I am able to recreate the failure locally. in short, the test case is at fault, and needs to be fixed. I will investigate. |
ok, there is the picture:
intent: make sure dns threads (slow IO) do not throttle fs threads (fast I/O) all is good, except the placement of the time recording. As the 100 dns are initiated back-to-back, few of those finishes before the fs starts, and fires back the result into the event loop. So when the fs finishes, 2 things happen:
so;
wondering on these. /cc @nodejs/libuv @addaleax |
The fix (for the issue that this test is supposed to detect) is in 11.0 and newer. But with 11.6.0, running at the command line on a macOS laptop, the test fails for me: $ node test/internet/test-uv-threadpool-schedule.js
assert.js:351
throw err;
^
AssertionError [ERR_ASSERTION]: fast I/O took longer to complete, actual: 9, expected: 5.03
at GetAddrInfoReqWrap.onResolve (/Users/trott/io.js/test/internet/test-uv-threadpool-schedule.js:41:12)
at GetAddrInfoReqWrap.callback (/Users/trott/io.js/test/common/index.js:376:15)
at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:57:17)
$ Of course, results may depend on network, DNS, etc. It fails on master too: $ ./node test/internet/test-uv-threadpool-schedule.js
assert.js:351
throw err;
^
AssertionError [ERR_ASSERTION]: fast I/O took longer to complete, actual: 10, expected: 4.65
at GetAddrInfoReqWrap.onResolve (/Users/trott/io.js/test/internet/test-uv-threadpool-schedule.js:41:12)
at GetAddrInfoReqWrap.callback (/Users/trott/io.js/test/common/index.js:376:15)
at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:57:17)
$ Results don't look significantly different for 10.x: $ node test/internet/test-uv-threadpool-schedule.js
assert.js:350
throw err;
^
AssertionError [ERR_ASSERTION]: fast I/O took longer to complete, actual: 9, expected: 4.9
at GetAddrInfoReqWrap.onResolve (/Users/trott/io.js/test/internet/test-uv-threadpool-schedule.js:41:12)
at GetAddrInfoReqWrap.callback (/Users/trott/io.js/test/common/index.js:376:15)
at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:57:17)
$ Changing one line of the test: const slowIOmax = 100 * require('os').cpus().length; ...doesn't change the results other than increasing the times. The test still fails on master, 11.x, and 10.x for me locally. Is it possible that the test as written is just too dependent on environment? Perhaps we need a different approach, or maybe the test should be removed? (It failed for the third time in a row today on node-daily-master.) |
Oh, the fix is also in 10.12, so I have to run Node.js earlier than that. The test still fails but we do a significant difference! 10.12: $ node test/internet/test-uv-threadpool-schedule.js
assert.js:350
throw err;
^
AssertionError [ERR_ASSERTION]: fast I/O took longer to complete, actual: 13, expected: 4.86
at GetAddrInfoReqWrap.onResolve (/Users/trott/io.js/test/internet/test-uv-threadpool-schedule.js:41:12)
at GetAddrInfoReqWrap.callback (/Users/trott/io.js/test/common/index.js:376:15)
at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:57:17)
$ 10.11: $ node test/internet/test-uv-threadpool-schedule.js
assert.js:348
throw err;
^
AssertionError [ERR_ASSERTION]: fast I/O took longer to complete, actual: 239, expected: 2.42
at GetAddrInfoReqWrap.onResolve (/Users/trott/io.js/test/internet/test-uv-threadpool-schedule.js:41:12)
at GetAddrInfoReqWrap.callback (/Users/trott/io.js/test/common/index.js:376:15)
at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:57:17)
$ Maybe changing |
@Trott - thanks for running many tests, making inferences and the recommendation. I agree that being more conservative with the calculation is the most reasonable thing to do here. But let me wait to hear back from libuv folks once. The key question is: do we think just fair sharing of threads for 2 types of I/O is sufficient here, or do we need to make sure the work done reaches the consumer in a timely manner to ensure fairness.
this is the kind of problem that the test is supposed to catch. 239 milliseconds for a 1K file read! this is definitely thread starvation. I strongly believe this is either without the libuv fix, or with a fix that had issues (which got refixed later) |
Yes, exactly. That was on Node 10.11 (without the fix). The run above it is Node 10.12 where the fix first appeared. The result there is only 13 ms. |
Today's: 00:03:42 not ok 25 internet/test-uv-threadpool-schedule
00:03:42 ---
00:03:42 duration_ms: 0.712
00:03:42 severity: fail
00:03:42 exitcode: 1
00:03:42 stack: |-
00:03:42 assert.js:351
00:03:42 throw err;
00:03:42 ^
00:03:42
00:03:42 AssertionError [ERR_ASSERTION]: fast I/O took longer to complete, actual: 17, expected: 4.03
00:03:42 at GetAddrInfoReqWrap.onResolve (/home/iojs/build/workspace/node-test-commit-custom-suites/default/test/internet/test-uv-threadpool-schedule.js:41:12)
00:03:42 at GetAddrInfoReqWrap.callback (/home/iojs/build/workspace/node-test-commit-custom-suites/default/test/common/index.js:376:15)
00:03:42 at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:57:17)
00:03:42 ... |
test-uv-threadpool-schedule has been failing consistently in node-daily-master CI. It also fails consistently on my personal laptop. These changes make it pass consistently with current master but fail consistently with Node.js 10.11 (which was the last release in the 10.x line before the fix that the test is for). It succeeds/fails as expected whether or not I am connected to the network. Fixes: nodejs#25305
test-uv-threadpool-schedule has been failing consistently in node-daily-master CI. It also fails consistently on my personal laptop. These changes make it pass consistently with current master but fail consistently with Node.js 10.11 (which was the last release in the 10.x line before the fix that the test is for). It succeeds/fails as expected whether or not I am connected to the network. Fixes: #25305 PR-URL: #25358 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com> Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com>
test-uv-threadpool-schedule has been failing consistently in node-daily-master CI. It also fails consistently on my personal laptop. These changes make it pass consistently with current master but fail consistently with Node.js 10.11 (which was the last release in the 10.x line before the fix that the test is for). It succeeds/fails as expected whether or not I am connected to the network. Fixes: nodejs#25305 PR-URL: nodejs#25358 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com> Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com>
test-uv-threadpool-schedule has been failing consistently in node-daily-master CI. It also fails consistently on my personal laptop. These changes make it pass consistently with current master but fail consistently with Node.js 10.11 (which was the last release in the 10.x line before the fix that the test is for). It succeeds/fails as expected whether or not I am connected to the network. Fixes: nodejs#25305 PR-URL: nodejs#25358 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com> Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com>
test-uv-threadpool-schedule has been failing consistently in node-daily-master CI. It also fails consistently on my personal laptop. These changes make it pass consistently with current master but fail consistently with Node.js 10.11 (which was the last release in the 10.x line before the fix that the test is for). It succeeds/fails as expected whether or not I am connected to the network. Fixes: #25305 PR-URL: #25358 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com> Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com>
test-uv-threadpool-schedule has been failing consistently in node-daily-master CI. It also fails consistently on my personal laptop. These changes make it pass consistently with current master but fail consistently with Node.js 10.11 (which was the last release in the 10.x line before the fix that the test is for). It succeeds/fails as expected whether or not I am connected to the network. Fixes: #25305 PR-URL: #25358 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com> Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com>
test-uv-threadpool-schedule has been failing consistently in node-daily-master CI. It also fails consistently on my personal laptop. These changes make it pass consistently with current master but fail consistently with Node.js 10.11 (which was the last release in the 10.x line before the fix that the test is for). It succeeds/fails as expected whether or not I am connected to the network. Fixes: #25305 PR-URL: #25358 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com> Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com>
internet/test-uv-threadpool-schedule failed on node-daily-master (the only place where internet tests are run) last night. Considering it was only added two days, that's probably cause for concern?
https://ci.nodejs.org/job/node-test-commit-custom-suites/810/default/console
test-rackspace-ubuntu1604-x64-1
@gireeshpunathil
The text was updated successfully, but these errors were encountered: