Skip to content

Conversation

@StekPerepolnen
Copy link
Collaborator

@StekPerepolnen StekPerepolnen commented Apr 29, 2025

Changelog entry

Fix segfault that could happen while retrying Whiteboard requests #18145

Changelog category

  • Bugfix

Description for reviewers

A segfault occurs when iterating over nodeSystemState.poolstats().

Here, nodeSystemState is retrieved from MergedNodeSystemState, which stores raw pointers (TSystemStateInfo*).
These pointers refer to objects inside NodeSystemState, which is intended to keep the pointed-to objects alive.

However, if an assignment is made to the same key in NodeSystemState, the old TSystemStateInfo object is destroyed,
leaving the pointer in MergedNodeSystemState dangling.

Possible scenario

  • HealthCheck starts send Whiteboard requests for static group.
  • TEvInterconnect::TEvNodeDisconnected is received from one of the nodes; HealthCheck schedules a TEvRetryNodeWhiteboard without waiting for the first SystemInfo response.
  • The first SystemInfo response arrives.
  • TEvRetryNodeWhiteboard is processed, overwriting the value in NodeSystemState, leaving a dangling pointer in MergedNodeSystemState.
  • The second SystemInfo response does not arrive in time; HealthCheck response formation begins.
  • A segfault occurs during iteration over nodeSystemState.poolstats().

@github-actions
Copy link

github-actions bot commented Apr 29, 2025

🟢 2025-05-08 12:38:05 UTC The validation of the Pull Request description is successful.

@github-actions
Copy link

github-actions bot commented Apr 29, 2025

2025-04-29 01:25:48 UTC Pre-commit check linux-x86_64-release-asan for 7def863 has started.
2025-04-29 01:25:55 UTC Artifacts will be uploaded here
2025-04-29 01:28:51 UTC ya make is running...
🔴 2025-04-29 01:32:20 UTC Build failed, see the logs. Also see fail summary

@github-actions
Copy link

github-actions bot commented Apr 29, 2025

2025-04-29 02:44:46 UTC Pre-commit check linux-x86_64-relwithdebinfo for 7def863 has started.
2025-04-29 02:45:15 UTC Artifacts will be uploaded here
2025-04-29 02:48:46 UTC ya make is running...
🔴 2025-04-29 02:57:47 UTC Build failed, see the logs. Also see fail summary

@StekPerepolnen StekPerepolnen force-pushed the healthcheck-segfault2 branch 3 times, most recently from 7aafc8c to b0d6515 Compare April 29, 2025 03:26
@github-actions
Copy link

github-actions bot commented Apr 29, 2025

2025-04-29 03:33:34 UTC Pre-commit check linux-x86_64-release-asan for 4b6a238 has started.
2025-04-29 03:33:49 UTC Artifacts will be uploaded here
2025-04-29 03:36:32 UTC ya make is running...
🟡 2025-04-29 04:58:47 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet Going to retry failed tests...

Test history | Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
12833 12669 0 96 44 24

2025-04-29 04:59:53 UTC ya make is running... (failed tests rerun, try 2)
🟡 2025-04-29 05:29:08 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet Going to retry failed tests...

Test history | Ya make output | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
790 (only retried tests) 698 0 46 24 22

2025-04-29 05:29:20 UTC ya make is running... (failed tests rerun, try 3)
🟡 2025-04-29 05:56:28 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet

Test history | Ya make output | Test bloat | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
714 (only retried tests) 620 0 43 29 22

🟢 2025-04-29 06:06:13 UTC Build successful.
🟢 2025-04-29 06:06:46 UTC ydbd size 3.8 GiB changed* by +880 Bytes, which is < 100.0 KiB vs main: OK

ydbd size dash main: 86255fd merge: 4b6a238 diff diff %
ydbd size 4 121 948 056 Bytes 4 121 948 936 Bytes +880 Bytes +0.000%
ydbd stripped size 1 428 821 016 Bytes 1 428 821 208 Bytes +192 Bytes +0.000%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

@github-actions
Copy link

github-actions bot commented Apr 29, 2025

2025-04-29 03:51:59 UTC Pre-commit check linux-x86_64-relwithdebinfo for 4b6a238 has started.
2025-04-29 03:52:06 UTC Artifacts will be uploaded here
2025-04-29 03:54:58 UTC ya make is running...
🟡 2025-04-29 05:14:38 UTC Some tests failed, follow the links below. Going to retry failed tests...

Test history | Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
20595 19223 0 5 1327 40

2025-04-29 05:16:20 UTC ya make is running... (failed tests rerun, try 2)
🟡 2025-04-29 05:58:04 UTC Some tests failed, follow the links below. Going to retry failed tests...

Test history | Ya make output | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
563 (only retried tests) 518 0 2 10 33

2025-04-29 06:06:18 UTC ya make is running... (failed tests rerun, try 3)
🔴 2025-04-29 06:18:16 UTC Some tests failed, follow the links below.

Test history | Ya make output | Test bloat | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
106 (only retried tests) 61 0 2 14 29

🟢 2025-04-29 06:18:23 UTC Build successful.
🟢 2025-04-29 06:18:47 UTC ydbd size 2.2 GiB changed* by +272 Bytes, which is < 100.0 KiB vs main: OK

ydbd size dash main: 86255fd merge: 4b6a238 diff diff %
ydbd size 2 343 122 584 Bytes 2 343 122 856 Bytes +272 Bytes +0.000%
ydbd stripped size 492 681 984 Bytes 492 682 048 Bytes +64 Bytes +0.000%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

@StekPerepolnen StekPerepolnen marked this pull request as ready for review April 29, 2025 03:56
@StekPerepolnen StekPerepolnen requested a review from a team as a code owner April 29, 2025 03:56
@StekPerepolnen StekPerepolnen requested a review from pixcc April 29, 2025 03:56
@StekPerepolnen StekPerepolnen changed the title healthcheck segfault2 healthcheck segfault while retrying Whiteboard Apr 29, 2025
@github-actions github-actions bot added bugfix and removed bugfix labels Apr 29, 2025
@github-actions github-actions bot added bugfix and removed bugfix labels May 8, 2025
@StekPerepolnen StekPerepolnen force-pushed the healthcheck-segfault2 branch from b0d6515 to 20f249e Compare May 8, 2025 12:43
@github-actions
Copy link

github-actions bot commented May 8, 2025

2025-05-08 12:44:55 UTC Pre-commit check linux-x86_64-release-asan for 9dfe080 has started.
2025-05-08 12:45:23 UTC Artifacts will be uploaded here
2025-05-08 12:48:40 UTC ya make is running...
🟡 2025-05-08 14:09:28 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet Going to retry failed tests...

Test history | Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
12925 12741 0 117 40 27

2025-05-08 14:10:36 UTC ya make is running... (failed tests rerun, try 2)
🟡 2025-05-08 14:41:43 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet Going to retry failed tests...

Test history | Ya make output | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
1281 (only retried tests) 1167 0 59 29 26

2025-05-08 14:41:59 UTC ya make is running... (failed tests rerun, try 3)
🟡 2025-05-08 15:14:46 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet

Test history | Ya make output | Test bloat | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
1413 (only retried tests) 1274 0 69 46 24

🟢 2025-05-08 15:15:01 UTC Build successful.
🟢 2025-05-08 15:15:30 UTC ydbd size 3.8 GiB changed* by +944 Bytes, which is < 100.0 KiB vs main: OK

ydbd size dash main: 52058e4 merge: 9dfe080 diff diff %
ydbd size 4 126 364 088 Bytes 4 126 365 032 Bytes +944 Bytes +0.000%
ydbd stripped size 1 432 733 728 Bytes 1 432 733 984 Bytes +256 Bytes +0.000%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

@github-actions
Copy link

github-actions bot commented May 8, 2025

2025-05-08 12:46:50 UTC Pre-commit check linux-x86_64-relwithdebinfo for 9dfe080 has started.
2025-05-08 12:47:02 UTC Artifacts will be uploaded here
2025-05-08 12:49:58 UTC ya make is running...
🟡 2025-05-08 13:59:10 UTC Some tests failed, follow the links below. Going to retry failed tests...

Test history | Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
20746 19383 0 4 1323 36

2025-05-08 14:00:50 UTC ya make is running... (failed tests rerun, try 2)
🟡 2025-05-08 14:25:30 UTC Some tests failed, follow the links below. Going to retry failed tests...

Test history | Ya make output | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
595 (only retried tests) 561 0 1 0 33

2025-05-08 14:25:40 UTC ya make is running... (failed tests rerun, try 3)
🟢 2025-05-08 14:50:17 UTC Tests successful.

Test history | Ya make output | Test bloat | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
119 (only retried tests) 88 0 0 0 31

🟢 2025-05-08 14:50:24 UTC Build successful.
🟢 2025-05-08 14:50:45 UTC ydbd size 2.2 GiB changed* by +272 Bytes, which is < 100.0 KiB vs main: OK

ydbd size dash main: 52058e4 merge: 9dfe080 diff diff %
ydbd size 2 346 283 760 Bytes 2 346 284 032 Bytes +272 Bytes +0.000%
ydbd stripped size 493 372 752 Bytes 493 372 816 Bytes +64 Bytes +0.000%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

@StekPerepolnen StekPerepolnen merged commit c64966f into main May 8, 2025
14 checks passed
@StekPerepolnen StekPerepolnen deleted the healthcheck-segfault2 branch May 8, 2025 17:45
StekPerepolnen added a commit that referenced this pull request May 13, 2025
…heck-segfault

healthcheck segfault while retrying Whiteboard  (#17836) - merge stable-25-1
StekPerepolnen added a commit that referenced this pull request May 13, 2025
…heck-segfault

healthcheck segfault while retrying Whiteboard  (#17836) - merge stable-24-4
vporyadke pushed a commit to vporyadke/ydb that referenced this pull request May 27, 2025
@liruoko liruoko added the changelog/f25-3 PR участвует в списке изменений label Nov 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix changelog/f25-3 PR участвует в списке изменений

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants