-
Notifications
You must be signed in to change notification settings - Fork 29.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate flaky parallel/test-child-process-fork-regr-gh-2847 #8950
Comments
cc/ me, @gireeshpunathil |
Can reproduce it on |
This has a long history, and to put it short: the second send() call is vulnerable due to race condition in the send() function with the server close. Depending on the timing, we will see failure in establishing a connection (line 32), or failure in sending data (line33). The former seem to have been taken care(line 57), while the failure at send is not. @santigimeno , what are your thoughts? |
@gireeshpunathil I've seen a couple of different errors:
I'll add a fixup for the first case to that PR. |
OK, I missed to see your PR. Yes - your point 1 is what I think has happened in this case. If you are accommodating in the PR great, thanks! |
It's not guaranteed that the first socket that tries to connect is the first that succeeds so the rest of assumptions made in the test are not correct. Fix it by making sure the second socket does not try to connect until the first has succeeded. The IPC channel can already be closed when sending the second socket. It should be allowed. Also, don't start sending messages until the worker is online. Fixes: nodejs#8950 PR-URL: nodejs#8954 Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com> Reviewed-By: Brian White <mscdex@mscdex.net> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
It's not guaranteed that the first socket that tries to connect is the first that succeeds so the rest of assumptions made in the test are not correct. Fix it by making sure the second socket does not try to connect until the first has succeeded. The IPC channel can already be closed when sending the second socket. It should be allowed. Also, don't start sending messages until the worker is online. Fixes: #8950 PR-URL: #8954 Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com> Reviewed-By: Brian White <mscdex@mscdex.net> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
It's not guaranteed that the first socket that tries to connect is the first that succeeds so the rest of assumptions made in the test are not correct. Fix it by making sure the second socket does not try to connect until the first has succeeded. The IPC channel can already be closed when sending the second socket. It should be allowed. Also, don't start sending messages until the worker is online. Fixes: #8950 PR-URL: #8954 Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com> Reviewed-By: Brian White <mscdex@mscdex.net> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
It's not guaranteed that the first socket that tries to connect is the first that succeeds so the rest of assumptions made in the test are not correct. Fix it by making sure the second socket does not try to connect until the first has succeeded. The IPC channel can already be closed when sending the second socket. It should be allowed. Also, don't start sending messages until the worker is online. Fixes: #8950 PR-URL: #8954 Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com> Reviewed-By: Brian White <mscdex@mscdex.net> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
This test failure recently occurred on AIX: https://ci.nodejs.org/job/node-test-commit-aix/1317/nodes=aix61-ppc64/console
Output:
/cc @mhdawson ?
The text was updated successfully, but these errors were encountered: