-
Notifications
You must be signed in to change notification settings - Fork 29.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: fix flaky test-http-client-timeout-option-with-agent #22083
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @oyyd! Welcome and thanks for the pull request!
This basically increases the tolerance from 1 second to 2 seconds, right? Before increasing timing tolerance, we should move the test out of the parallel
directory and into the sequential
directory. That will probably be enough to solve the problem.
EDIT: Hmmm...I'm not sure the parallel->sequential suggestion I make is correct either. I'm having a hard time replicating the failure. It would be very useful to have some idea of how often it fails now and how often it fails with this change.
Here's a stress test on current master branch to hopefully get an idea of how often this fails: https://ci.nodejs.org/job/node-stress-single-test/1975/nodes=ubuntu1604_sharedlibs_debug_x64/console [EDIT: No failures.] |
Trying on master again but increasing both the simultaneous number of test processes (from 4 to 8) and the number of times the test is run (from 800 to 3200). https://ci.nodejs.org/job/node-stress-single-test/1976/nodes=ubuntu1604_sharedlibs_debug_x64/console [EDIT: No failures.] |
On master again, 16 simultaneous test processes running the test 16K times: https://ci.nodejs.org/job/node-stress-single-test/1977/nodes=ubuntu1604_sharedlibs_debug_x64/console [EDIT: No failures.] |
Oh, I see, this fails in a debug build, which is certainly one way to induce timing issues... Will need to try something else for the stress test.... |
We'll see if it works, but here is my attempt to persuade the CI server to run the test on a debug build with 96 test processes running the test 192 times: https://ci.nodejs.org/job/node-test-commit-custom-suites/410/default/console [EDIT: Lots of failures. Awesome! We are able to replicate the problem! Now to compare the fix here with moving to |
@Trott As my increasing tolerance won't fix this fundamentally, moving to |
The timeout event cannot be precisely timed and the actual timeout may be longer in some situations. Here we move this test into the sequential folder to make it happens less likely. Fixes: nodejs#22041
@Trott does this LGTY with the move to sequential? |
Yes. There may be more to be done to make it reliable, but this is a fine way to go unless and until that can be done, so LGTM. |
The CI did not start (infrastructure related) but it does not seem to require one. Moving a test from parallel to sequential should not have impact on the test result. |
landed in: f8fda89 |
The timeout event cannot be precisely timed and the actual timeout may be longer in some situations. Here we move this test into the sequential folder to make it happens less likely. PR-URL: #22083 Fixes: #22041 Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Rich Trott <rtrott@gmail.com> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: George Adams <george.adams@uk.ibm.com>
(closing this since it's been merged :)) |
The timeout event cannot be precisely timed and the actual timeout may be longer in some situations. Here we move this test into the sequential folder to make it happens less likely. PR-URL: #22083 Fixes: #22041 Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Rich Trott <rtrott@gmail.com> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: George Adams <george.adams@uk.ibm.com>
The timeout event cannot be precisely timed and the actual timeout may be longer in some situations. So that we need to loose the asserting.
Fixes: #22041
make -j4 test
(UNIX), orvcbuild test
(Windows) passes