Skip to content

test: deflake async-hooks/test-improper-order on AIX #58567

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

gulbaki
Copy link

@gulbaki gulbaki commented Jun 3, 2025

When the spawned child process gets closed with a signal, the exit code
is not set and that is why the exit code assertion was failing. This
change adjusts the test to check the signal and if it is truthy, it
doesn't assert the exit code and instead logs the signal and continues
the rest of the assertions.
#58562
Fixes: #58562
baki gul bakigul@outlook.com

@nodejs-github-bot nodejs-github-bot added async_hooks Issues and PRs related to the async hooks subsystem. needs-ci PRs that need a full CI run. test Issues and PRs related to the tests. labels Jun 3, 2025
Copy link
Member

@RaisinTen RaisinTen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

The first commit is from a different account. Is that intentional?

Instead of making the fixed issue a part of the title, we typically add it to the end of the description like so:

Fixes: https://github.com/nodejs/node/issues/58562

It would be good if you're able to update the commit message accordingly, otherwise a maintainer can adjust it during landing.

@gulbaki gulbaki changed the title test: deflake async-hooks/test-improper-order on AIX #58562 test: deflake async-hooks/test-improper-order on AIX Jun 3, 2025
@gulbaki
Copy link
Author

gulbaki commented Jun 3, 2025

I’m using two different GitHub accounts, so that’s probably the reason for the confusion. Both accounts belong to me. GitHub might have gotten a bit confused :)

@gulbaki gulbaki requested a review from RaisinTen June 3, 2025 11:52
Copy link
Member

@RaisinTen RaisinTen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nodejs-github-bot

This comment was marked as outdated.

@RaisinTen RaisinTen added author ready PRs that have at least one approval, no pending requests for changes, and a CI started. flaky-test Issues and PRs related to the tests with unstable failures on the CI. labels Jun 3, 2025
Copy link
Member

@mertcanaltin mertcanaltin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

codecov bot commented Jun 3, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.14%. Comparing base (3d22cdb) to head (e9f8488).
Report is 53 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #58567      +/-   ##
==========================================
- Coverage   90.22%   90.14%   -0.09%     
==========================================
  Files         635      636       +1     
  Lines      187462   187891     +429     
  Branches    36824    36877      +53     
==========================================
+ Hits       169133   169365     +232     
- Misses      11105    11285     +180     
- Partials     7224     7241      +17     

see 67 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@nodejs-github-bot

This comment was marked as outdated.

assert.strictEqual(code, 1);
child.on('close', common.mustCall((code, signal) => {
if (signal) {
console.log(`Child closed with signal: ${signal}`);
Copy link
Member

@joyeecheung joyeecheung Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know why the signal can be truthy? And if so, what signal is it? Normally this should not happen unless the child process get killed, but this test does not actively kill the child process - it expects it to exit normally with exit code 1 - so it seems to be something outside the test killing the process, and it's AIX only. If the workaround applies universally, technically one can break this test on all platforms by introducing a crash in the async hooks and making the child process abort (SIGSEGV) and this test would not be able to catch the change of behavior any more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joyeecheung You're absolutely right about preserving test integrity - we shouldn't mask real crashes.

@gulbaki Building on joyeecheung's feedback, maybe we can make this more targeted? Instead of skipping all signals, we could handle only expected system signals (SIGKILL/SIGTERM) while still catching crashes (SIGSEGV/SIGABRT).

Could you check what specific signal you're seeing on AIX? If it's just system cleanup during test teardown, we can handle that case specifically. But if it's something unexpected, we'd want the test to still catch it.

This way we fix the flakiness without losing the test's ability to detect real async-hooks issues. What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You’re right on AIX I’ve actually seen the signal come back as SIGABRT when close fires before exit. If we instead do:

child.once('exit', common.mustCall((code, signal) => {
  assert.strictEqual(code, 1);
  assert.strictEqual(signal, null);
}));

We avoid the race (since exit always has the real code/signal).
A real crash (e.g. SIGSEGV/SIGABRT) will still make signal !== null and fail the test.
Does this look like a safe way to fix the AIX flake while preserving the test’s crash detection?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, can you send it as a commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If signal is SIGABRT then I think this would still fail at assert.strictEqual(signal, null), I am not sure where the race is coming from because this is synchronous JS code, if signal is SIGABRT by the time of the first assertion, it can't be set by something else to null before the second assertion. If it's SIGABRT on AIX specifically it may make more sense to just do

if (common.isAIX && signal == 'SIGABRT') {
  // XXX: The child process could be aborted on AIX due to unknown reasons. Work around it.
} else {
  assert. strictEqual(code, 1);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @nodejs/platform-aix

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was some parallel discussion in #58478 (comment), so I have opened #58601

@RaisinTen RaisinTen removed the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Jun 6, 2025
Copy link
Member

@mertcanaltin mertcanaltin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good work thanks so much for taking care of this 🚀

@gulbaki gulbaki requested a review from RaisinTen June 6, 2025 15:56
Co-authored-by: Darshan Sen <raisinten@gmail.com>
@gulbaki gulbaki requested a review from RaisinTen June 8, 2025 19:54
@gulbaki gulbaki requested review from mertcanaltin and RaisinTen June 9, 2025 16:21
@nodejs-github-bot
Copy link
Collaborator

@RaisinTen RaisinTen added author ready PRs that have at least one approval, no pending requests for changes, and a CI started. commit-queue-squash Add this label to instruct the Commit Queue to squash all the PR commits into the first one. and removed commit-queue-squash Add this label to instruct the Commit Queue to squash all the PR commits into the first one. labels Jun 10, 2025
RaisinTen pushed a commit that referenced this pull request Jun 10, 2025
PR-URL: #58567
Fixes: #58562
Reviewed-By: Darshan Sen <raisinten@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Zeyu "Alex" Yang <himself65@outlook.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
@RaisinTen
Copy link
Member

Landed in 58e1cba

@RaisinTen RaisinTen closed this Jun 10, 2025
@RaisinTen
Copy link
Member

Congrats! Thanks for the contribution 🎉

seriousme pushed a commit to seriousme/node that referenced this pull request Jun 10, 2025
PR-URL: nodejs#58567
Fixes: nodejs#58562
Reviewed-By: Darshan Sen <raisinten@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Zeyu "Alex" Yang <himself65@outlook.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
targos pushed a commit that referenced this pull request Jun 16, 2025
PR-URL: #58567
Fixes: #58562
Reviewed-By: Darshan Sen <raisinten@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Zeyu "Alex" Yang <himself65@outlook.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
async_hooks Issues and PRs related to the async hooks subsystem. author ready PRs that have at least one approval, no pending requests for changes, and a CI started. flaky-test Issues and PRs related to the tests with unstable failures on the CI. needs-ci PRs that need a full CI run. test Issues and PRs related to the tests.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate flaky async-hooks/test-improper-order
8 participants