Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: mark test-net-write-fully-async-buffer as flaky #52959

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

aduh95
Copy link
Contributor

@aduh95 aduh95 commented May 12, 2024

Refs: #47428

@nodejs-github-bot nodejs-github-bot added needs-ci PRs that need a full CI run. test Issues and PRs related to the tests. labels May 12, 2024
@aduh95 aduh95 changed the title mark: test-net-write-fully-async-buffer as flaky test: mark test-net-write-fully-async-buffer as flaky May 12, 2024
@aduh95 aduh95 force-pushed the test-net-write-fully-async-buffer-flaky branch from 368473c to 9fd9192 Compare May 12, 2024 16:34
Co-authored-by: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com>
@lpinca
Copy link
Member

lpinca commented May 13, 2024

Did we try to investigate? It is not flaky on my machines.

@lpinca
Copy link
Member

lpinca commented May 13, 2024

On macOS, I actually get a 1 ‰ failures when running:

python3 tools/test.py -J --repeat=1000 test/parallel/test-net-write-fully-async-buffer.js

16 ‰ when running

python3 tools/test.py -J --repeat=1000 test/parallel/test-net-write-fully-async-hex-string.js

@lpinca
Copy link
Member

lpinca commented May 14, 2024

Something is fishy here. I get no failures if I remove the common module.

@aduh95
Copy link
Contributor Author

aduh95 commented May 15, 2024

Something is fishy here. I get no failures if I remove the common module.

#52964 (comment) also reported that removing ../common import would remove the flakiness of a different test (i.e. the test no longer times out)

@lpinca
Copy link
Member

lpinca commented May 16, 2024

Bisecting point to 1d29d81 as the first bad commit.

@lpinca
Copy link
Member

lpinca commented May 16, 2024

I think there is a recent bug in V8 or Node.js behind the flakiness of some tests. This is one of those. I think we should not land this.

@richardlau
Copy link
Member

Something is fishy here. I get no failures if I remove the common module.

#52964 (comment) also reported that removing ../common import would remove the flakiness of a different test (i.e. the test no longer times out)

Maybe something to do with the exit hooks common installs?

@lpinca
Copy link
Member

lpinca commented May 16, 2024

@richardlau I tried commenting almost everything in common and failures still occur. Note that, as per git bisect no failures occur from 07f481c which is the parent commit of 1d29d81.

@richardlau
Copy link
Member

I meant that it's possible that the V8 update has caused something to have changed during shutdown. common registers Javascript code to be run prior to Node.js exiting. If you've commented out the registering of those hooks then it's probably something else.

@lpinca
Copy link
Member

lpinca commented May 16, 2024

If you are referring to process.on('exit', handler) then yes I've commented them.

@lpinca
Copy link
Member

lpinca commented May 16, 2024

The behavior described in #52964 (comment) also occurs for this test. It correctly finishes, the server emits the 'close' event, but the process does not exit.

@lpinca
Copy link
Member

lpinca commented May 16, 2024

Adding the --jitless flag fixes the issue.

@H4ad H4ad added commit-queue Add this label to land a pull request using GitHub Actions. commit-queue-squash Add this label to instruct the Commit Queue to squash all the PR commits into the first one. labels Jun 8, 2024
@nodejs-github-bot nodejs-github-bot added commit-queue-failed An error occurred while landing this pull request using GitHub Actions. and removed commit-queue Add this label to land a pull request using GitHub Actions. labels Jun 8, 2024
@nodejs-github-bot
Copy link
Collaborator

Commit Queue failed
- Loading data for nodejs/node/pull/52959
✔  Done loading data for nodejs/node/pull/52959
----------------------------------- PR info ------------------------------------
Title      test: mark `test-net-write-fully-async-buffer` as flaky (#52959)
   ⚠  Could not retrieve the email or name of the PR author's from user's GitHub profile!
Branch     aduh95:test-net-write-fully-async-buffer-flaky -> nodejs:main
Labels     test, needs-ci, commit-queue-squash
Commits    2
 - test: mark `test-net-write-fully-async-buffer` as flaky
 - Update test/parallel/parallel.status
Committers 2
 - Antoine du Hamel 
 - GitHub 
PR-URL: https://github.com/nodejs/node/pull/52959
Refs: https://github.com/nodejs/node/issues/47428
Reviewed-By: Benjamin Gruenbaum 
Reviewed-By: Ulises Gascón 
------------------------------ Generated metadata ------------------------------
PR-URL: https://github.com/nodejs/node/pull/52959
Refs: https://github.com/nodejs/node/issues/47428
Reviewed-By: Benjamin Gruenbaum 
Reviewed-By: Ulises Gascón 
--------------------------------------------------------------------------------
   ℹ  This PR was created on Sun, 12 May 2024 10:10:41 GMT
   ✔  Approvals: 2
   ✔  - Benjamin Gruenbaum (@benjamingr) (TSC): https://github.com/nodejs/node/pull/52959#pullrequestreview-2052433668
   ✔  - Ulises Gascón (@UlisesGascon): https://github.com/nodejs/node/pull/52959#pullrequestreview-2060894926
   ✔  Last GitHub CI successful
   ✘  No Jenkins CI runs detected
--------------------------------------------------------------------------------
   ✔  Aborted `git node land` session in /home/runner/work/node/node/.ncu
https://github.com/nodejs/node/actions/runs/9424968514

@RedYetiDev
Copy link
Member

Is this test still flaky? If so, is this land able?

@lpinca
Copy link
Member

lpinca commented Aug 8, 2024

Yes, it is still flaky. test-net-write-fully-async-hex-string also is. I would still not land it to not forget about it, unless the failure rate is intolerable.

@jakecastelli
Copy link
Member

I have been investigating this issue as it has been failing a lot recently, here are my two quick findings:

  • I think this may not be a recent V8 bug issue as it was first reported in April 2023
  • 1000 runs may not be sufficient, I start to see reliable flakiness when I bump the runs to 10k

@jasnell
Copy link
Member

jasnell commented Sep 9, 2024

Testing locally I spotted an interesting result. Posted this originally over on #54809 ...

Running some local tests on test-net-write-fully-async-hex-string.js ... here are the results I'm getting locally

If I change the allocation size of the Buffer that is written, we can change the likelihood of hitting the timeout.

Allocation Size Average Timeouts per 10000 repetitions Runs Timeout
65600 8 10 60s
65580 4 10 60s
65570 0 10 60s
65560 0 10 60s
65550 0 10 60s

So far this is the only variable that I've been able to identify that reliably alters the rate of timeout on this particular test.

The reason for choosing 65550 is that it is fairly close (but still over the default high water mark for the stream. Setting the highwatermark to 65600 appears to reduce the rate of timeout when using 65600 as the allocation size but does not eliminate it.

This makes me think that the tests that are timing out are due to some intersection between v8 GC and streams backpressure management -- I'm unable to reproduce any timeouts for any allocations smaller than the default high water mark size.

/cc @mcollina @ronag @anonrig for visibility

@RedYetiDev RedYetiDev removed the commit-queue-failed An error occurred while landing this pull request using GitHub Actions. label Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
commit-queue-squash Add this label to instruct the Commit Queue to squash all the PR commits into the first one. needs-ci PRs that need a full CI run. test Issues and PRs related to the tests.
Projects
None yet
Development

Successfully merging this pull request may close these issues.