-
Notifications
You must be signed in to change notification settings - Fork 29.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: add uv threadpool congestion regression test #23099
test: add uv threadpool congestion regression test #23099
Conversation
This test fails without an internet connection so it probably needs to go in Here's what I get if I disconnect from the network: $ tools/test.py test/sequential/test-uv-threadpool-schedule.js
=== release test-uv-threadpool-schedule ===
Path: sequential/test-uv-threadpool-schedule
assert.js:349
throw err;
^
AssertionError [ERR_ASSERTION]: fast I/O took longer to complete, actual: 10, expected: 0.57
at GetAddrInfoReqWrap.dns.lookup.common.mustCall (/Users/trott/io.js/test/sequential/test-uv-threadpool-schedule.js:45:14)
at GetAddrInfoReqWrap.callback (/Users/trott/io.js/test/common/index.js:349:15)
at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:58:17)
Command: out/Release/node /Users/trott/io.js/test/sequential/test-uv-threadpool-schedule.js
[00:00|% 100|+ 0|- 1]: Done
$ |
This also seems like it will have the potential to be flakey when run on the CI. |
})); | ||
} | ||
|
||
fs.readFile(__filename, common.mustCall((e, d) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thefourtheye - removed, thanks.
// We need to refresh the domain string everytime, | ||
// otherwise the TCP stack that cache the previous lookup | ||
// returns result from memory, breaking all our Math. | ||
dns.lookup(`${randomDomain()}.com`, {}, common.mustCall((e, a, f) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throw in an assert.ifError
maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, didn't follow you - can you please elaborate? all these lookups are destined to fail, we are latching on the time they take for un-optimized full blown lookups (so as to maximize the threadpool worker thread's engagement)
for (let i = 0; i < slowIOmax; i++) { | ||
// We need to refresh the domain string everytime, | ||
// otherwise the TCP stack that cache the previous lookup | ||
// returns result from memory, breaking all our Math. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, using a base string and then keep incrementing an offset would guarantee that there will be no repetition in the domain names used, right? Because, the random numbers might repeat (even though odds of that happening is less)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the suggestion, followed your suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thefourtheye - I just spotted a drawback of doing this: if you run the test back-to-back, it fails the second time onwards - for obvious reasons (the domain strings being static, they get resolved faster). I don't know if it affects our CI tests though.
function randomDomain() { | ||
const d = Buffer.alloc(10); | ||
for (let i = 0; i < 10; i++) | ||
d[i] = 92 + (Math.round(Math.random() * 13247)) % 26; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to use 97
('a'
) instead of 92
('\'
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed the method altogether. (yes, I meant a
, don't know how I kept it at 92. ;) )
95fdbbe
to
cfb651e
Compare
@Trott - moved this to @cjihrig - I acknowledge that the test is sensitive to response times. For that matter I took these precautions:
If you think any specific construct / logic is prone to flake, please let me know, I am happy to improve on it. |
b114e0a
to
e2541ee
Compare
// We need to refresh the domain string everytime, | ||
// otherwise the TCP stack that cache the previous lookup | ||
// returns result from memory, breaking all our Math. | ||
dns.lookup(`${randomDomain}${i}.com`, {}, common.mustCall(() => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the test still work if we use one of the reserved TLDs (e.g. .test
)? https://tools.ietf.org/html/rfc2606
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- dns.lookup(`${randomDomain}${i}.com`, {}, common.mustCall(() => {
+ dns.lookup(`${randomDomain}${i}.test`, {}, common.mustCall(() => {
@richardlau - I tested with this change (if that is what you meant) and it still works. But what would be an advantage of it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reserved TLDs are specifically reserved to avoid conflicts with actual registered domain names, e.g. nonexistent0.com
might exist in the future.
They can also avoid unnecessary load on the DNS servers, see https://tools.ietf.org/html/rfc6761 section 6.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@richardlau : ok - let me clarify: the purpose of the test is to engage (half of) the libuv threadpool workers as long as possible while making sure at least one thread is free and available to serve the file I/Os. So in that context, what we want is a list of domain names that are unique, and were unresolved earlier (resolved names are cached so retrieved faster than we wanted them to), existent or non-existent is not a consideration for the test. If the name is confusing, I can change it to unique0.com
etc.
Am I missing anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If caching is an issue, we might need to generate domain names that are unique per-test-run, e.g. based on Date.now()
or Math.random()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes (my first commit 1d0fcefec8a42b71166e572de14b308e1dc97e58) had it based on Math.random(). I guess I will re-instate that, thanks.
I suppose pinging @nodejs/libuv to review may be appropriate here. |
thanks @Trott . Also cc @nodejs/testing ? |
re-instated the random domain function, else the test runs the risk of looking up previously resolved domain names and messing up with the time calculation. new CI: https://ci.nodejs.org/job/node-test-pull-request/17620/ |
|
'use strict'; | ||
|
||
// test to validate massive dns lookups do not block filesytem I/O | ||
// (or any fast I/O).Prior to https://github.com/libuv/libuv/pull/1845 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Space after the dot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @thefourtheye - done.
// We need to refresh the domain string everytime, | ||
// otherwise the TCP stack that cache the previous lookup | ||
// returns result from memory, breaking all our Math. | ||
dns.lookup(`${randomDomain()}.com`, {}, common.mustCall(() => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: The callback function can be moved out and used with common.mustCall(..., slowIOmax)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the callback body out of the loop. However, issue with attaching expected call count to the common.mustCall
is: the loop executes 100 times, and each time when we invoke common.mustCall
with slowIOmax
, the total expectation becomes 10K. So I left it as is (default:1). Hope this is fine with you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about something like
const onResolve = common.mustCall(..., slowIOmax);
...
dns.lookup(..., onResolve);
But your current change is intuitive. Let's go with that.
Full CI re-run post-fixup: https://ci.nodejs.org/job/node-test-pull-request/17627/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but we might need to be careful about landing this because the libuv patch may be at fault for a regression in our tests, see nodejs/reliability#18 (comment)
thanks @addaleax for the info. sure, let me hold this until the |
@addaleax - also if you can describe a higher level statement on the flakes that the current |
@gireeshpunathil So … as far as I can tell, the issue was that I think any test that would perform a lot of fast DNS requests (e.g. for |
@addaleax @gireeshpunathil Can this move forward? Or is it blocked on libuv for now? Or something else? |
I believe the test makes meaningful assertions only in the presence of libuv/libuv@daf04e8 , that is slated to land in node through libuv v1.24.x |
Resume Build CI: https://ci.nodejs.org/job/node-test-commit/24015/ |
not sure what to make out of the CI result. the graphical view says something failed, but going into the link I can't see any? |
to be sure, re-run CI: https://ci.nodejs.org/job/node-test-commit/24081/ |
Failures in both Linux and AIX are the same, Fresh CI: https://ci.nodejs.org/job/node-test-pull-request/19350/ |
after many runs of CI in expectation of a green sign giving up for now!
I will attempt to debug that first to see what is happening. |
all the known failures are either marked as flaky or resolved; so another round of CI - looks like I can't resume from the old build, it is expired: |
04786b3
to
05c3ae9
Compare
a test is timing out in |
with 2 separate runs on arm and windows https://ci.nodejs.org/job/node-test-commit-arm-fanned/5471/ now the CI gets a full green. |
Validate that massive dns lookups do not block filesytem I/O (or any fast I/O for that matter). Prior to libuv/libuv#1845 few back-to-back dns lookup were sufficient to engage libuv threadpool workers in a blocking manner, throttling other work items that need the pool. this test acts as a regression test for the same. Start slow and fast I/Os together, and make sure fast I/O can complete in at least in 1/100th of time for slow I/O. Refs: libuv/libuv#1845 Refs: nodejs#8436 PR-URL: nodejs#23099 Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com> Reviewed-By: Anna Henningsen <anna@addaleax.net>
05c3ae9
to
54fa59c
Compare
Validate that massive dns lookups do not block filesytem I/O (or any fast I/O for that matter). Prior to libuv/libuv#1845 few back-to-back dns lookup were sufficient to engage libuv threadpool workers in a blocking manner, throttling other work items that need the pool. this test acts as a regression test for the same. Start slow and fast I/Os together, and make sure fast I/O can complete in at least in 1/100th of time for slow I/O. Refs: libuv/libuv#1845 Refs: #8436 PR-URL: #23099 Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com> Reviewed-By: Anna Henningsen <anna@addaleax.net>
I'll open a separate issue, but this test failed on node-daily-master (the only place where internet tests are run) last night. Considering it was only added two days, that's probably cause for concern? https://ci.nodejs.org/job/node-test-commit-custom-suites/810/default/console test-rackspace-ubuntu1604-x64-1 00:02:11 not ok 25 internet/test-uv-threadpool-schedule
00:02:11 ---
00:02:11 duration_ms: 0.613
00:02:11 severity: fail
00:02:11 exitcode: 1
00:02:11 stack: |-
00:02:11 assert.js:351
00:02:11 throw err;
00:02:11 ^
00:02:11
00:02:11 AssertionError [ERR_ASSERTION]: fast I/O took longer to complete, actual: 17, expected: 3.63
00:02:11 at GetAddrInfoReqWrap.onResolve (/home/iojs/build/workspace/node-test-commit-custom-suites/default/test/internet/test-uv-threadpool-schedule.js:41:12)
00:02:11 at GetAddrInfoReqWrap.callback (/home/iojs/build/workspace/node-test-commit-custom-suites/default/test/common/index.js:376:15)
00:02:11 at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:57:17)
00:02:11 ... |
Interesting. Looks like fast IO is really not 100 times faster than slow IO in our infra. @gireeshpunathil Would it be okay to change the test to ensure that the fast IO takes lesser time (not 1/100th but simply |
@thefourtheye - yes, and that is @Trott's #25358 |
Validate that massive dns lookups do not block filesytem I/O (or any fast I/O for that matter). Prior to libuv/libuv#1845 few back-to-back dns lookup were sufficient to engage libuv threadpool workers in a blocking manner, throttling other work items that need the pool. this test acts as a regression test for the same. Start slow and fast I/Os together, and make sure fast I/O can complete in at least in 1/100th of time for slow I/O. Refs: libuv/libuv#1845 Refs: nodejs#8436 PR-URL: nodejs#23099 Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com> Reviewed-By: Anna Henningsen <anna@addaleax.net>
Validate that massive dns lookups do not block filesytem I/O (or any fast I/O for that matter). Prior to libuv/libuv#1845 few back-to-back dns lookup were sufficient to engage libuv threadpool workers in a blocking manner, throttling other work items that need the pool. this test acts as a regression test for the same. Start slow and fast I/Os together, and make sure fast I/O can complete in at least in 1/100th of time for slow I/O. Refs: libuv/libuv#1845 Refs: #8436 PR-URL: #23099 Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com> Reviewed-By: Anna Henningsen <anna@addaleax.net>
Validate that massive dns lookups do not block filesytem I/O (or any fast I/O for that matter). Prior to libuv/libuv#1845 few back-to-back dns lookup were sufficient to engage libuv threadpool workers in a blocking manner, throttling other work items that need the pool. this test acts as a regression test for the same. Start slow and fast I/Os together, and make sure fast I/O can complete in at least in 1/100th of time for slow I/O. Refs: libuv/libuv#1845 Refs: #8436 PR-URL: #23099 Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com> Reviewed-By: Anna Henningsen <anna@addaleax.net>
Validate that massive dns lookups do not block filesytem I/O (or any fast I/O for that matter). Prior to libuv/libuv#1845 few back-to-back dns lookup were sufficient to engage libuv threadpool workers in a blocking manner, throttling other work items that need the pool. this test acts as a regression test for the same. Start slow and fast I/Os together, and make sure fast I/O can complete in at least in 1/100th of time for slow I/O. Refs: libuv/libuv#1845 Refs: #8436 PR-URL: #23099 Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com> Reviewed-By: Anna Henningsen <anna@addaleax.net>
Validate that massive dns lookups do not block filesytem I/O (or any fast I/O for that matter). Prior to libuv/libuv#1845 few back-to-back dns lookup were sufficient to engage libuv threadpool workers in a blocking manner, throttling other work items that need the pool. this test acts as a regression test for the same. Start slow and fast I/Os together, and make sure fast I/O can complete in at least in 1/100th of time for slow I/O. Refs: libuv/libuv#1845 Refs: #8436 PR-URL: #23099 Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com> Reviewed-By: Anna Henningsen <anna@addaleax.net>
Validate that massive dns lookups do not block filesytem I/O (or any fast I/O for that matter).
Prior to libuv/libuv#1845 few back-to-back dns lookup were sufficient to engage libuv threadpool workers in a blocking manner, throttling other work items that need the pool. this test acts as a regression test for the same.
Start slow and fast I/Os together, and make sure fast I/O can complete in at least in 1/100th of time for slow I/O.
Refs: libuv/libuv#1845
Refs: #8436
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passes