-
Notifications
You must be signed in to change notification settings - Fork 29.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate test-worker-fshandles-open-close-on-termination and test-worker-fshandles-error-on-termination crash #43499
Comments
We are deciding whether to end `fs` promises by checking `can_call_into_js()` whereas in the `FSReqPromise` destructor we're using the `is_stopping()` check. Though this may look as semantically correct it has issues because though both values are modified before termination on `Environment::ExitEnv()` and both are atomic they are not syncronized together so it may happen that when reaching the destructor `call_into_js` may be set to `false` whereas `is_stopping` remains `false` causing the crash. Fix this by using the same checks everywhere. Fixes: nodejs#43499
We are deciding whether to end `fs` promises by checking `can_call_into_js()` whereas in the `FSReqPromise` destructor we're using the `is_stopping()` check. Though this may look as semantically correct it has issues because though both values are modified before termination on `Environment::ExitEnv()` and both are atomic they are not syncronized together so it may happen that when reaching the destructor `call_into_js` may be set to `false` whereas `is_stopping` remains `false` causing the crash. Fix this by using the same checks everywhere. Fixes: nodejs#43499
We are deciding whether to end `fs` promises by checking `can_call_into_js()` whereas in the `FSReqPromise` destructor we're using the `is_stopping()` check. Though this may look as semantically correct it has issues because though both values are modified before termination on `Environment::ExitEnv()` and both are atomic they are not syncronized together so it may happen that when reaching the destructor `call_into_js` may be set to `false` whereas `is_stopping` remains `false` causing the crash. Fix this by checking with `can_call_into_js()` also in the destructor. Fixes: nodejs#43499
Seems #43533 has not fixed this flaky test, they are still failing: |
There seems to be something wrong with the stack trace in https://ci.nodejs.org/job/node-test-binary-windows-js-suites/15541/RUN_SUBSET=3,nodes=win10-COMPILED_BY-vs2019/testReport/junit/(root)/test/parallel_test_worker_fshandles_open_close_on_termination: 01:38:28 not ok 775 parallel/test-worker-fshandles-open-close-on-termination
01:38:28 ---
01:38:28 duration_ms: 3.528
01:38:28 severity: fail
01:38:28 exitcode: 134
01:38:28 stack: |-
01:38:28 C:\Windows\SYSTEM32\cmd.exe[8708]: C:\workspace\node-compile-windows\node\src\node_file-inl.h:162: Assertion `finished_' failed.
01:38:28 1: 00007FF65494964F node_api_throw_syntax_error+176223
01:38:28 2: 00007FF6548D74E6 SSL_get_quiet_shutdown+67398
01:38:28 3: 00007FF6548D78B2 SSL_get_quiet_shutdown+68370
01:38:28 4: 00007FF6548C03ED v8::base::CPU::has_fpu+38813
01:38:28 5: 00007FF6548C19A9 v8::base::CPU::has_fpu+44377
01:38:28 6: 00007FF6549AB4A7 uv_timer_stop+1207
01:38:28 7: 00007FF6549A7A4B uv_async_send+331
01:38:28 8: 00007FF6549A71DC uv_loop_init+1292
01:38:28 9: 00007FF6549A737A uv_run+202
01:38:28 10: 00007FF654976225 node::SpinEventLoop+309
01:38:28 11: 00007FF65480CCC0 v8::internal::interpreter::BytecodeLabel::bind+35904
01:38:28 12: 00007FF6548083E8 v8::internal::interpreter::BytecodeLabel::bind+17256
01:38:28 13: 00007FF654997A4D uv_poll_stop+557
01:38:28 14: 00007FF655951D70 v8::internal::compiler::ToString+145936
01:38:28 15: 00007FFB74404ED0 BaseThreadInitThunk+16
01:38:28 16: 00007FFB75FCE39B RtlUserThreadStart+43
01:38:28 ... Not sure how |
We are deciding whether to end `fs` promises by checking `can_call_into_js()` whereas in the `FSReqPromise` destructor we're using the `is_stopping()` check. Though this may look as semantically correct it has issues because though both values are modified before termination on `Environment::ExitEnv()` and both are atomic they are not syncronized together so it may happen that when reaching the destructor `call_into_js` may be set to `false` whereas `is_stopping` remains `false` causing the crash. Fix this by checking with `can_call_into_js()` also in the destructor. Fixes: #43499 PR-URL: #43533 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Darshan Sen <raisinten@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com>
We are deciding whether to end `fs` promises by checking `can_call_into_js()` whereas in the `FSReqPromise` destructor we're using the `is_stopping()` check. Though this may look as semantically correct it has issues because though both values are modified before termination on `Environment::ExitEnv()` and both are atomic they are not syncronized together so it may happen that when reaching the destructor `call_into_js` may be set to `false` whereas `is_stopping` remains `false` causing the crash. Fix this by checking with `can_call_into_js()` also in the destructor. Fixes: #43499 PR-URL: #43533 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Darshan Sen <raisinten@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com>
We are deciding whether to end `fs` promises by checking `can_call_into_js()` whereas in the `FSReqPromise` destructor we're using the `is_stopping()` check. Though this may look as semantically correct it has issues because though both values are modified before termination on `Environment::ExitEnv()` and both are atomic they are not syncronized together so it may happen that when reaching the destructor `call_into_js` may be set to `false` whereas `is_stopping` remains `false` causing the crash. Fix this by checking with `can_call_into_js()` also in the destructor. Fixes: #43499 PR-URL: #43533 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Darshan Sen <raisinten@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com>
These tests seem to timeout quite often. I don't know why, but one possible reason is that they are starting a lot of threads. It seems that tests in `test/parallel` are assumed to only start one thread each, so having 11 threads running at a time feels like a lot. It also seems that these tests fail in a correlated fashion: take a look at [this reliability report][]. The failures all occur on the same build machines on the same PRs. This suggests to me some sort of CPU contention. [this reliability report]: nodejs/reliability#334 On my Linux machine decreasing the parallelism & iterations here reduce the `user` time from ~11.5 seconds to ~2 seconds, depending on the test. I have seen these tests take 30-60 seconds on CI (Alpine in particular). I went back to the diffs that introduced that introduced these changes and verified that they failed at least 90% of the time with the reduced iteration count, which feels sufficient. Refs: nodejs#43499 Refs: nodejs#43084
These tests seem to timeout quite often. I don't know why, but one possible reason is that they are starting a lot of threads. It seems that tests in `test/parallel` are assumed to only start one thread each, so having 11 threads running at a time feels like a lot. It also seems that these tests fail in a correlated fashion: take a look at [this reliability report][]. The failures all occur on the same build machines on the same PRs. This suggests to me some sort of CPU contention. [this reliability report]: nodejs/reliability#334 On my Linux machine decreasing the parallelism & iterations here reduce the `user` time from ~11.5 seconds to ~2 seconds, depending on the test. I have seen these tests take 30-60 seconds on CI (Alpine in particular). I went back to the diffs that introduced that introduced these changes and verified that they failed at least 90% of the time with the reduced iteration count, which feels sufficient. Refs: #43499 Refs: #43084 PR-URL: #44090 Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
These tests seem to timeout quite often. I don't know why, but one possible reason is that they are starting a lot of threads. It seems that tests in `test/parallel` are assumed to only start one thread each, so having 11 threads running at a time feels like a lot. It also seems that these tests fail in a correlated fashion: take a look at [this reliability report][]. The failures all occur on the same build machines on the same PRs. This suggests to me some sort of CPU contention. [this reliability report]: nodejs/reliability#334 On my Linux machine decreasing the parallelism & iterations here reduce the `user` time from ~11.5 seconds to ~2 seconds, depending on the test. I have seen these tests take 30-60 seconds on CI (Alpine in particular). I went back to the diffs that introduced that introduced these changes and verified that they failed at least 90% of the time with the reduced iteration count, which feels sufficient. Refs: #43499 Refs: #43084 PR-URL: #44090 Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
These tests seem to timeout quite often. I don't know why, but one possible reason is that they are starting a lot of threads. It seems that tests in `test/parallel` are assumed to only start one thread each, so having 11 threads running at a time feels like a lot. It also seems that these tests fail in a correlated fashion: take a look at [this reliability report][]. The failures all occur on the same build machines on the same PRs. This suggests to me some sort of CPU contention. [this reliability report]: nodejs/reliability#334 On my Linux machine decreasing the parallelism & iterations here reduce the `user` time from ~11.5 seconds to ~2 seconds, depending on the test. I have seen these tests take 30-60 seconds on CI (Alpine in particular). I went back to the diffs that introduced that introduced these changes and verified that they failed at least 90% of the time with the reduced iteration count, which feels sufficient. Refs: #43499 Refs: #43084 PR-URL: #44090 Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
These tests seem to timeout quite often. I don't know why, but one possible reason is that they are starting a lot of threads. It seems that tests in `test/parallel` are assumed to only start one thread each, so having 11 threads running at a time feels like a lot. It also seems that these tests fail in a correlated fashion: take a look at [this reliability report][]. The failures all occur on the same build machines on the same PRs. This suggests to me some sort of CPU contention. [this reliability report]: nodejs/reliability#334 On my Linux machine decreasing the parallelism & iterations here reduce the `user` time from ~11.5 seconds to ~2 seconds, depending on the test. I have seen these tests take 30-60 seconds on CI (Alpine in particular). I went back to the diffs that introduced that introduced these changes and verified that they failed at least 90% of the time with the reduced iteration count, which feels sufficient. Refs: #43499 Refs: #43084 PR-URL: #44090 Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
These tests seem to timeout quite often. I don't know why, but one possible reason is that they are starting a lot of threads. It seems that tests in `test/parallel` are assumed to only start one thread each, so having 11 threads running at a time feels like a lot. It also seems that these tests fail in a correlated fashion: take a look at [this reliability report][]. The failures all occur on the same build machines on the same PRs. This suggests to me some sort of CPU contention. [this reliability report]: nodejs/reliability#334 On my Linux machine decreasing the parallelism & iterations here reduce the `user` time from ~11.5 seconds to ~2 seconds, depending on the test. I have seen these tests take 30-60 seconds on CI (Alpine in particular). I went back to the diffs that introduced that introduced these changes and verified that they failed at least 90% of the time with the reduced iteration count, which feels sufficient. Refs: nodejs#43499 Refs: nodejs#43084 PR-URL: nodejs#44090 Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
We are deciding whether to end `fs` promises by checking `can_call_into_js()` whereas in the `FSReqPromise` destructor we're using the `is_stopping()` check. Though this may look as semantically correct it has issues because though both values are modified before termination on `Environment::ExitEnv()` and both are atomic they are not syncronized together so it may happen that when reaching the destructor `call_into_js` may be set to `false` whereas `is_stopping` remains `false` causing the crash. Fix this by checking with `can_call_into_js()` also in the destructor. Fixes: nodejs/node#43499 PR-URL: nodejs/node#43533 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Darshan Sen <raisinten@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com>
These tests seem to timeout quite often. I don't know why, but one possible reason is that they are starting a lot of threads. It seems that tests in `test/parallel` are assumed to only start one thread each, so having 11 threads running at a time feels like a lot. It also seems that these tests fail in a correlated fashion: take a look at [this reliability report][]. The failures all occur on the same build machines on the same PRs. This suggests to me some sort of CPU contention. [this reliability report]: nodejs/reliability#334 On my Linux machine decreasing the parallelism & iterations here reduce the `user` time from ~11.5 seconds to ~2 seconds, depending on the test. I have seen these tests take 30-60 seconds on CI (Alpine in particular). I went back to the diffs that introduced that introduced these changes and verified that they failed at least 90% of the time with the reduced iteration count, which feels sufficient. Refs: #43499 Refs: #43084 PR-URL: #44090 Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
These tests seem to timeout quite often. I don't know why, but one possible reason is that they are starting a lot of threads. It seems that tests in `test/parallel` are assumed to only start one thread each, so having 11 threads running at a time feels like a lot. It also seems that these tests fail in a correlated fashion: take a look at [this reliability report][]. The failures all occur on the same build machines on the same PRs. This suggests to me some sort of CPU contention. [this reliability report]: nodejs/reliability#334 On my Linux machine decreasing the parallelism & iterations here reduce the `user` time from ~11.5 seconds to ~2 seconds, depending on the test. I have seen these tests take 30-60 seconds on CI (Alpine in particular). I went back to the diffs that introduced that introduced these changes and verified that they failed at least 90% of the time with the reduced iteration count, which feels sufficient. Refs: #43499 Refs: #43084 PR-URL: #44090 Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
@aduh95 Realized that you reopen this issue, did you find any recent failure? :) |
These tests seem to timeout quite often. I don't know why, but one possible reason is that they are starting a lot of threads. It seems that tests in `test/parallel` are assumed to only start one thread each, so having 11 threads running at a time feels like a lot. It also seems that these tests fail in a correlated fashion: take a look at [this reliability report][]. The failures all occur on the same build machines on the same PRs. This suggests to me some sort of CPU contention. [this reliability report]: nodejs/reliability#334 On my Linux machine decreasing the parallelism & iterations here reduce the `user` time from ~11.5 seconds to ~2 seconds, depending on the test. I have seen these tests take 30-60 seconds on CI (Alpine in particular). I went back to the diffs that introduced that introduced these changes and verified that they failed at least 90% of the time with the reduced iteration count, which feels sufficient. Refs: nodejs/node#43499 Refs: nodejs/node#43084 PR-URL: nodejs/node#44090 Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
These tests seem to timeout quite often. I don't know why, but one possible reason is that they are starting a lot of threads. It seems that tests in `test/parallel` are assumed to only start one thread each, so having 11 threads running at a time feels like a lot. It also seems that these tests fail in a correlated fashion: take a look at [this reliability report][]. The failures all occur on the same build machines on the same PRs. This suggests to me some sort of CPU contention. [this reliability report]: nodejs/reliability#334 On my Linux machine decreasing the parallelism & iterations here reduce the `user` time from ~11.5 seconds to ~2 seconds, depending on the test. I have seen these tests take 30-60 seconds on CI (Alpine in particular). I went back to the diffs that introduced that introduced these changes and verified that they failed at least 90% of the time with the reduced iteration count, which feels sufficient. Refs: nodejs/node#43499 Refs: nodejs/node#43084 PR-URL: nodejs/node#44090 Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Ping @aduh95 |
Test
test-worker-fshandles-open-close-on-termination
test-worker-fshandles-error-on-termination
Platform
test-worker-fshandles-open-close-on-termination on MacOS
test-worker-fshandles-error-on-termination on Windows
Console output
test-worker-fshandles-open-close-on-termination:
test-worker-fshandles-error-on-termination:
Build links
test-worker-fshandles-open-close-on-termination:
https://ci.nodejs.org/job/node-test-commit-osx/nodes=osx11-x64/45661/testReport/junit/(root)/test/parallel_test_worker_fshandles_open_close_on_termination/
test-worker-fshandles-error-on-termination:
https://ci.nodejs.org/job/node-test-binary-windows-js-suites/15222/RUN_SUBSET=3,nodes=win10-COMPILED_BY-vs2019/testReport/junit/(root)/test/parallel_test_worker_fshandles_error_on_termination/
Additional information
Related pr: #42910
This pr landed on 2022-07-18, modify
src/node_file-inl.h
file and addtest-worker-fshandles-open-close-on-termination
andtest-worker-fshandles-error-on-termination
.cc @santigimeno @aduh95 @RaisinTen
The text was updated successfully, but these errors were encountered: