-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
test: deflake async-hooks/test-improper-order on AIX #58567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
The first commit is from a different account. Is that intentional?
Instead of making the fixed issue a part of the title, we typically add it to the end of the description like so:
Fixes: https://github.com/nodejs/node/issues/58562
It would be good if you're able to update the commit message accordingly, otherwise a maintainer can adjust it during landing.
I’m using two different GitHub accounts, so that’s probably the reason for the confusion. Both accounts belong to me. GitHub might have gotten a bit confused :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #58567 +/- ##
==========================================
- Coverage 90.22% 90.14% -0.09%
==========================================
Files 635 636 +1
Lines 187462 187891 +429
Branches 36824 36877 +53
==========================================
+ Hits 169133 169365 +232
- Misses 11105 11285 +180
- Partials 7224 7241 +17 🚀 New features to boost your workflow:
|
This comment was marked as outdated.
This comment was marked as outdated.
assert.strictEqual(code, 1); | ||
child.on('close', common.mustCall((code, signal) => { | ||
if (signal) { | ||
console.log(`Child closed with signal: ${signal}`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know why the signal can be truthy? And if so, what signal is it? Normally this should not happen unless the child process get killed, but this test does not actively kill the child process - it expects it to exit normally with exit code 1 - so it seems to be something outside the test killing the process, and it's AIX only. If the workaround applies universally, technically one can break this test on all platforms by introducing a crash in the async hooks and making the child process abort (SIGSEGV) and this test would not be able to catch the change of behavior any more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joyeecheung You're absolutely right about preserving test integrity - we shouldn't mask real crashes.
@gulbaki Building on joyeecheung's feedback, maybe we can make this more targeted? Instead of skipping all signals, we could handle only expected system signals (SIGKILL/SIGTERM) while still catching crashes (SIGSEGV/SIGABRT).
Could you check what specific signal you're seeing on AIX? If it's just system cleanup during test teardown, we can handle that case specifically. But if it's something unexpected, we'd want the test to still catch it.
This way we fix the flakiness without losing the test's ability to detect real async-hooks issues. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You’re right on AIX I’ve actually seen the signal come back as SIGABRT when close fires before exit. If we instead do:
child.once('exit', common.mustCall((code, signal) => {
assert.strictEqual(code, 1);
assert.strictEqual(signal, null);
}));
We avoid the race (since exit always has the real code/signal).
A real crash (e.g. SIGSEGV/SIGABRT) will still make signal !== null and fail the test.
Does this look like a safe way to fix the AIX flake while preserving the test’s crash detection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, can you send it as a commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If signal is SIGABRT
then I think this would still fail at assert.strictEqual(signal, null)
, I am not sure where the race is coming from because this is synchronous JS code, if signal
is SIGABRT
by the time of the first assertion, it can't be set by something else to null before the second assertion. If it's SIGABRT
on AIX specifically it may make more sense to just do
if (common.isAIX && signal == 'SIGABRT') {
// XXX: The child process could be aborted on AIX due to unknown reasons. Work around it.
} else {
assert. strictEqual(code, 1);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @nodejs/platform-aix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was some parallel discussion in #58478 (comment), so I have opened #58601
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good work thanks so much for taking care of this 🚀
Co-authored-by: Darshan Sen <raisinten@gmail.com>
Landed in 58e1cba |
Congrats! Thanks for the contribution 🎉 |
PR-URL: nodejs#58567 Fixes: nodejs#58562 Reviewed-By: Darshan Sen <raisinten@gmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Zeyu "Alex" Yang <himself65@outlook.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
When the spawned child process gets closed with a signal, the exit code
is not set and that is why the exit code assertion was failing. This
change adjusts the test to check the signal and if it is truthy, it
doesn't assert the exit code and instead logs the signal and continues
the rest of the assertions.
#58562
Fixes: #58562
baki gul bakigul@outlook.com