Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: fix flaky test-http-client-timeout-option-with-agent #22083

Closed
wants to merge 1 commit into from

Conversation

oyyd
Copy link
Contributor

@oyyd oyyd commented Aug 2, 2018

The timeout event cannot be precisely timed and the actual timeout may be longer in some situations. So that we need to loose the asserting.

Fixes: #22041

  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines

@nodejs-github-bot nodejs-github-bot added the test Issues and PRs related to the tests. label Aug 2, 2018
Copy link
Member

@lpinca lpinca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

@lpinca
Copy link
Member

lpinca commented Aug 2, 2018

Copy link
Member

@Trott Trott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @oyyd! Welcome and thanks for the pull request!

This basically increases the tolerance from 1 second to 2 seconds, right? Before increasing timing tolerance, we should move the test out of the parallel directory and into the sequential directory. That will probably be enough to solve the problem.

EDIT: Hmmm...I'm not sure the parallel->sequential suggestion I make is correct either. I'm having a hard time replicating the failure. It would be very useful to have some idea of how often it fails now and how often it fails with this change.

@Trott
Copy link
Member

Trott commented Aug 3, 2018

Here's a stress test on current master branch to hopefully get an idea of how often this fails:

https://ci.nodejs.org/job/node-stress-single-test/1975/nodes=ubuntu1604_sharedlibs_debug_x64/console

[EDIT: No failures.]

@Trott
Copy link
Member

Trott commented Aug 3, 2018

Trying on master again but increasing both the simultaneous number of test processes (from 4 to 8) and the number of times the test is run (from 800 to 3200).

https://ci.nodejs.org/job/node-stress-single-test/1976/nodes=ubuntu1604_sharedlibs_debug_x64/console

[EDIT: No failures.]

@Trott
Copy link
Member

Trott commented Aug 3, 2018

On master again, 16 simultaneous test processes running the test 16K times:

https://ci.nodejs.org/job/node-stress-single-test/1977/nodes=ubuntu1604_sharedlibs_debug_x64/console

[EDIT: No failures.]

@Trott
Copy link
Member

Trott commented Aug 3, 2018

Oh, I see, this fails in a debug build, which is certainly one way to induce timing issues... Will need to try something else for the stress test....

@Trott
Copy link
Member

Trott commented Aug 3, 2018

We'll see if it works, but here is my attempt to persuade the CI server to run the test on a debug build with 96 test processes running the test 192 times: https://ci.nodejs.org/job/node-test-commit-custom-suites/410/default/console

[EDIT: Lots of failures. Awesome! We are able to replicate the problem! Now to compare the fix here with moving to sequential etc....]

@oyyd
Copy link
Contributor Author

oyyd commented Aug 4, 2018

@Trott As my increasing tolerance won't fix this fundamentally, moving to sequential is totally okay for me. Appreciate your investigation!

The timeout event cannot be precisely timed and the actual timeout
may be longer in some situations. Here we move this test into the
sequential folder to make it happens less likely.

Fixes: nodejs#22041
@maclover7
Copy link
Contributor

@Trott does this LGTY with the move to sequential?

@Trott
Copy link
Member

Trott commented Aug 13, 2018

@Trott does this LGTY with the move to sequential?

Yes. There may be more to be done to make it reliable, but this is a fine way to go unless and until that can be done, so LGTM.

@Trott
Copy link
Member

Trott commented Aug 13, 2018

@Trott Trott added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Aug 13, 2018
@BridgeAR
Copy link
Member

The CI did not start (infrastructure related) but it does not seem to require one. Moving a test from parallel to sequential should not have impact on the test result.

@gdams
Copy link
Member

gdams commented Aug 13, 2018

landed in: f8fda89

gdams pushed a commit that referenced this pull request Aug 13, 2018
The timeout event cannot be precisely timed and the actual timeout
may be longer in some situations. Here we move this test into the
sequential folder to make it happens less likely.

PR-URL: #22083
Fixes: #22041
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: George Adams <george.adams@uk.ibm.com>
@maclover7
Copy link
Contributor

(closing this since it's been merged :))

@maclover7 maclover7 closed this Aug 13, 2018
rvagg pushed a commit that referenced this pull request Aug 15, 2018
The timeout event cannot be precisely timed and the actual timeout
may be longer in some situations. Here we move this test into the
sequential folder to make it happens less likely.

PR-URL: #22083
Fixes: #22041
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: George Adams <george.adams@uk.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. test Issues and PRs related to the tests.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants