Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SEA tests run into SIGSEGV during SEA execution #50740

Closed
joyeecheung opened this issue Nov 15, 2023 · 5 comments · Fixed by #52000 or #54120
Closed

SEA tests run into SIGSEGV during SEA execution #50740

joyeecheung opened this issue Nov 15, 2023 · 5 comments · Fixed by #52000 or #54120
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI. single-executable Issues and PRs related to single-executable applications

Comments

@joyeecheung
Copy link
Member

joyeecheung commented Nov 15, 2023

From the latest reliability report, it seems many of the SEA tests are running into SIGSEGV on Windows/macOS/RHEL when executing the generated SEA.

nodejs/reliability#718

Some ideas about investigating into the flakes:

  1. Use spawnSyncAndExitWithoutError() for better output
  2. Use NODE_DEBUG_NATIVE=SEA to log the deserialization
  3. Log onto the machines to check out what's going on

cc @nodejs/single-executable

@joyeecheung joyeecheung added single-executable Issues and PRs related to single-executable applications flaky-test Issues and PRs related to the tests with unstable failures on the CI. labels Nov 15, 2023
@richardlau
Copy link
Member

richardlau commented Nov 16, 2023

Crashed on GHA too on main: https://github.com/nodejs/node/actions/runs/6892307239

test-linux: test/sequential/test-single-executable-application-snapshot-and-code-cache.js#L64
--- stderr ---
[process 207047]: --- stderr ---

[process 207047]: --- stdout ---

[process 207047]: status = null, signal = SIGSEGV
/home/runner/work/node/node/test/common/child_process.js:86
    throw new Error(`${failures.join('\n')}`);
    ^

Error: - process terminated with status null, expected 0
- process terminated with signal SIGSEGV, expected null
    at logAndThrow (/home/runner/work/node/node/test/common/child_process.js:86:11)
    at expectSyncExit (/home/runner/work/node/node/test/common/child_process.js:91:5)
    at spawnSyncAndExitWithoutError (/home/runner/work/node/node/test/common/child_process.js:125:10)
    at Object.<anonymous> (/home/runner/work/node/node/test/sequential/test-single-executable-application-snapshot-and-code-cache.js:64:3)
    at Module._compile (node:internal/modules/cjs/loader:1376:14)
    at Module._extensions..js (node:internal/modules/cjs/loader:1435:10)
    at Module.load (node:internal/modules/cjs/loader:1207:32)
    at Module._load (node:internal/modules/cjs/loader:1023:12)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:135:12)
    at node:internal/main/run_main_module:28:49

Node.js v22.0.0-pre
Command: out/Release/node --test-reporter=spec --test-reporter-destination=stdout --test-reporter=./tools/github_reporter/index.js --test-reporter-destination=stdout /home/runner/work/node/node/test/sequential/test-single-executable-application-snapshot-and-code-cache.js

@joyeecheung
Copy link
Member Author

Not sure if this is the same as the "all failed on PPC" that we are seeing. I'll open another PR to at least log some stuff.

Seperately it would be great if we can install a built-in SIGSEGV handler to dump the stack trace..

mhdawson pushed a commit that referenced this issue Nov 17, 2023
PR-URL: #50750
Refs: #50740
Refs: nodejs/reliability#718
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
targos pushed a commit that referenced this issue Nov 21, 2023
PR-URL: #50750
Refs: #50740
Refs: nodejs/reliability#718
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
targos pushed a commit that referenced this issue Nov 23, 2023
PR-URL: #50750
Refs: #50740
Refs: nodejs/reliability#718
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
martenrichter pushed a commit to martenrichter/node that referenced this issue Nov 26, 2023
PR-URL: nodejs#50750
Refs: nodejs#50740
Refs: nodejs/reliability#718
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
lucshi pushed a commit to lucshi/node that referenced this issue Nov 27, 2023
PR-URL: nodejs#50750
Refs: nodejs#50740
Refs: nodejs/reliability#718
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
nodejs-github-bot pushed a commit that referenced this issue Nov 28, 2023
PR-URL: #50759
Refs: #50740
Reviewed-By: Yagiz Nizipli <yagiz.nizipli@sentry.io>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Darshan Sen <raisinten@gmail.com>
nodejs-github-bot pushed a commit that referenced this issue Nov 28, 2023
- Use spawnSyncAndExitWithoutError to log more information on error.
- Use NODE_DEBUG_NATIVE to log internals
- Skip the test when available disk space < 120MB

PR-URL: #50759
Refs: #50740
Reviewed-By: Yagiz Nizipli <yagiz.nizipli@sentry.io>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Darshan Sen <raisinten@gmail.com>
RafaelGSS pushed a commit that referenced this issue Nov 29, 2023
PR-URL: #50750
Refs: #50740
Refs: nodejs/reliability#718
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
RafaelGSS pushed a commit that referenced this issue Nov 30, 2023
PR-URL: #50750
Refs: #50740
Refs: nodejs/reliability#718
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
targos pushed a commit that referenced this issue Dec 4, 2023
PR-URL: #50759
Refs: #50740
Reviewed-By: Yagiz Nizipli <yagiz.nizipli@sentry.io>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Darshan Sen <raisinten@gmail.com>
targos pushed a commit that referenced this issue Dec 4, 2023
- Use spawnSyncAndExitWithoutError to log more information on error.
- Use NODE_DEBUG_NATIVE to log internals
- Skip the test when available disk space < 120MB

PR-URL: #50759
Refs: #50740
Reviewed-By: Yagiz Nizipli <yagiz.nizipli@sentry.io>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Darshan Sen <raisinten@gmail.com>
nodejs-github-bot pushed a commit that referenced this issue Jan 11, 2024
In test status files, `$system` will be the OS and not the arch (which
would be `$arch`).

Add missing single-executable-application test to the list of tests
marked flaky on Linux ppc64le.

PR-URL: #51422
Refs: #50828
Refs: #50740
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
Reviewed-By: Yagiz Nizipli <yagiz.nizipli@sentry.io>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
RafaelGSS pushed a commit that referenced this issue Jan 11, 2024
In test status files, `$system` will be the OS and not the arch (which
would be `$arch`).

Add missing single-executable-application test to the list of tests
marked flaky on Linux ppc64le.

PR-URL: #51422
Refs: #50828
Refs: #50740
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
Reviewed-By: Yagiz Nizipli <yagiz.nizipli@sentry.io>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
marco-ippolito pushed a commit to marco-ippolito/node that referenced this issue Jan 12, 2024
In test status files, `$system` will be the OS and not the arch (which
would be `$arch`).

Add missing single-executable-application test to the list of tests
marked flaky on Linux ppc64le.

PR-URL: nodejs#51422
Refs: nodejs#50828
Refs: nodejs#50740
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
Reviewed-By: Yagiz Nizipli <yagiz.nizipli@sentry.io>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Medhansh404 pushed a commit to Medhansh404/node that referenced this issue Jan 19, 2024
In test status files, `$system` will be the OS and not the arch (which
would be `$arch`).

Add missing single-executable-application test to the list of tests
marked flaky on Linux ppc64le.

PR-URL: nodejs#51422
Refs: nodejs#50828
Refs: nodejs#50740
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
Reviewed-By: Yagiz Nizipli <yagiz.nizipli@sentry.io>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
@joyeecheung
Copy link
Member Author

One day I was looking at the CI logs, and I think I know what's going on..#52000

nodejs-github-bot pushed a commit that referenced this issue Mar 10, 2024
The string writing/reading was intended for debugging info
in snapshot, which had a CHECK_GT(length, 0) check, it then
got repurposed for SEA resource writing/reading and turned
into a helper for string views, but was not updated to handle
empty views, causing occasional crash in the CI when the
read is protected. This patch fixes it.

PR-URL: #52000
Fixes: #50740
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
@targos targos reopened this Mar 14, 2024
@targos
Copy link
Member

targos commented Mar 14, 2024

Still happening: https://ci.nodejs.org/job/node-test-commit-linux-containered/42065/nodes=ubuntu2204_sharedlibs_withoutintl_x64/console

12:42:11 not ok 4088 sequential/test-single-executable-application-snapshot-and-code-cache
12:42:11   ---
12:42:11   duration_ms: 8854.83400
12:42:11   severity: fail
12:42:11   exitcode: 1
12:42:11   stack: |-
12:42:11     Copied /home/iojs/build/workspace/node-test-commit-linux-containered/out/Release/node to /home/iojs/node-tmp/.tmp.4087/sea
12:42:11     Injected /home/iojs/node-tmp/.tmp.4087/sea-prep.blob into /home/iojs/node-tmp/.tmp.4087/sea
12:42:11     [process 1363650]: --- stderr ---
12:42:11     Found SEA blob 0x556134220080, size=4308861
12:42:11     Found SEA resource 0x556134220080, size=4308861
12:42:11     Read<uint32_t>()(4-byte), count=1: { 21223968 }, read 4 bytes
12:42:11     Read SEA magic 143da20
12:42:11     Read<uint32_t>()(4-byte), count=1: { 6 }, read 4 bytes
12:42:11     Read SEA flags 6
12:42:11     Read<uint64_t>()(8-byte), count=1: { 11 }, read 8 bytes
12:42:11     ReadStringView(), length=11: 0x556134220090, read 11 bytes, content: snapshot.js
12:42:11     Read SEA code path 0x556134220090, size=11
12:42:11     Read<uint64_t>()(8-byte), count=1: { 4308826 }, read 8 bytes
12:42:11     ReadStringView(), length=4308826: 0x5561342200a3, read 4308826 bytes
12:42:11     Read SEA resource snapshot 0x5561342200a3, size=4308826
12:42:11     Read<uint64_t>()(8-byte), count=1: 
12:42:11     [process 1363650]: --- stdout ---
12:42:11     
12:42:11     [process 1363650]: status = null, signal = SIGSEGV
12:42:11     /home/iojs/build/workspace/node-test-commit-linux-containered/test/common/child_process.js:86
12:42:11         throw new Error(`${failures.join('\n')}`);
12:42:11         ^
12:42:11     
12:42:11     Error: - process terminated with status null, expected 0
12:42:11     - process terminated with signal SIGSEGV, expected null
12:42:11         at logAndThrow (/home/iojs/build/workspace/node-test-commit-linux-containered/test/common/child_process.js:86:11)
12:42:11         at expectSyncExit (/home/iojs/build/workspace/node-test-commit-linux-containered/test/common/child_process.js:91:5)
12:42:11         at spawnSyncAndExitWithoutError (/home/iojs/build/workspace/node-test-commit-linux-containered/test/common/child_process.js:125:10)
12:42:11         at Object.<anonymous> (/home/iojs/build/workspace/node-test-commit-linux-containered/test/sequential/test-single-executable-application-snapshot-and-code-cache.js:67:3)
12:42:11         at Module._compile (node:internal/modules/cjs/loader:1357:14)
12:42:11         at Module._extensions..js (node:internal/modules/cjs/loader:1415:10)
12:42:11         at Module.load (node:internal/modules/cjs/loader:1207:32)
12:42:11         at Module._load (node:internal/modules/cjs/loader:1023:12)
12:42:11         at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:177:12)
12:42:11         at node:internal/main/run_main_module:28:49
12:42:11     
12:42:11     Node.js v22.0.0-pre

richardlau pushed a commit that referenced this issue Mar 25, 2024
PR-URL: #50759
Refs: #50740
Reviewed-By: Yagiz Nizipli <yagiz.nizipli@sentry.io>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Darshan Sen <raisinten@gmail.com>
richardlau pushed a commit that referenced this issue Mar 25, 2024
- Use spawnSyncAndExitWithoutError to log more information on error.
- Use NODE_DEBUG_NATIVE to log internals
- Skip the test when available disk space < 120MB

PR-URL: #50759
Refs: #50740
Reviewed-By: Yagiz Nizipli <yagiz.nizipli@sentry.io>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Darshan Sen <raisinten@gmail.com>
richardlau added a commit that referenced this issue Mar 25, 2024
In test status files, `$system` will be the OS and not the arch (which
would be `$arch`).

Add missing single-executable-application test to the list of tests
marked flaky on Linux ppc64le.

PR-URL: #51422
Refs: #50828
Refs: #50740
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
Reviewed-By: Yagiz Nizipli <yagiz.nizipli@sentry.io>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
richardlau added a commit that referenced this issue Mar 25, 2024
In test status files, `$system` will be the OS and not the arch (which
would be `$arch`).

Add missing single-executable-application test to the list of tests
marked flaky on Linux ppc64le.

PR-URL: #51422
Refs: #50828
Refs: #50740
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
Reviewed-By: Yagiz Nizipli <yagiz.nizipli@sentry.io>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
rdw-msft pushed a commit to rdw-msft/node that referenced this issue Mar 26, 2024
The string writing/reading was intended for debugging info
in snapshot, which had a CHECK_GT(length, 0) check, it then
got repurposed for SEA resource writing/reading and turned
into a helper for string views, but was not updated to handle
empty views, causing occasional crash in the CI when the
read is protected. This patch fixes it.

PR-URL: nodejs#52000
Fixes: nodejs#50740
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
marco-ippolito pushed a commit that referenced this issue May 2, 2024
The string writing/reading was intended for debugging info
in snapshot, which had a CHECK_GT(length, 0) check, it then
got repurposed for SEA resource writing/reading and turned
into a helper for string views, but was not updated to handle
empty views, causing occasional crash in the CI when the
read is protected. This patch fixes it.

PR-URL: #52000
Fixes: #50740
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
jcbhmr pushed a commit to jcbhmr/node that referenced this issue May 15, 2024
The string writing/reading was intended for debugging info
in snapshot, which had a CHECK_GT(length, 0) check, it then
got repurposed for SEA resource writing/reading and turned
into a helper for string views, but was not updated to handle
empty views, causing occasional crash in the CI when the
read is protected. This patch fixes it.

PR-URL: nodejs#52000
Fixes: nodejs#50740
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
@joyeecheung
Copy link
Member Author

I figured out the cause of the crash and opened #54120

nodejs-github-bot pushed a commit that referenced this issue Aug 5, 2024
When both useCodeCache and useSnapshot are set, we generate the
snapshot and skip the generation of the code cache since the
snapshot already includes the code cache. But we previously still
persist the code cache setting in the flags that got serialized
into the SEA, so the resulting executable would still try to read
the code cache even if it's not added to the SEA, leading to a flaky
crash caused by OOB on some platforms.

This patch fixes the crash by ignoring the code cache setting when
generating the flag if both snapshot and code cache is configured.

PR-URL: #54120
Fixes: #50740
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
targos pushed a commit that referenced this issue Aug 14, 2024
When both useCodeCache and useSnapshot are set, we generate the
snapshot and skip the generation of the code cache since the
snapshot already includes the code cache. But we previously still
persist the code cache setting in the flags that got serialized
into the SEA, so the resulting executable would still try to read
the code cache even if it's not added to the SEA, leading to a flaky
crash caused by OOB on some platforms.

This patch fixes the crash by ignoring the code cache setting when
generating the flag if both snapshot and code cache is configured.

PR-URL: #54120
Fixes: #50740
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
targos pushed a commit that referenced this issue Sep 21, 2024
When both useCodeCache and useSnapshot are set, we generate the
snapshot and skip the generation of the code cache since the
snapshot already includes the code cache. But we previously still
persist the code cache setting in the flags that got serialized
into the SEA, so the resulting executable would still try to read
the code cache even if it's not added to the SEA, leading to a flaky
crash caused by OOB on some platforms.

This patch fixes the crash by ignoring the code cache setting when
generating the flag if both snapshot and code cache is configured.

PR-URL: #54120
Fixes: #50740
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
targos pushed a commit that referenced this issue Oct 2, 2024
When both useCodeCache and useSnapshot are set, we generate the
snapshot and skip the generation of the code cache since the
snapshot already includes the code cache. But we previously still
persist the code cache setting in the flags that got serialized
into the SEA, so the resulting executable would still try to read
the code cache even if it's not added to the SEA, leading to a flaky
crash caused by OOB on some platforms.

This patch fixes the crash by ignoring the code cache setting when
generating the flag if both snapshot and code cache is configured.

PR-URL: #54120
Fixes: #50740
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Reviewed-By: Richard Lau <rlau@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI. single-executable Issues and PRs related to single-executable applications
Projects
None yet
3 participants