Skip to content

Only collect async ID when it's safe to do so #197

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 19, 2025
Merged

Conversation

szegedi
Copy link

@szegedi szegedi commented Feb 17, 2025

What does this PR do?:
Reenables async ID collection with guards. Prevents collecting async ID when either GC is ongoing (captures the ID at the GC prologue) or when the isolate has no current context.

Motivation:
We want to safely collect async ID.

Additional Notes:
Jira: PROF-11265

@szegedi szegedi changed the title Guard against collecting async ID while performing GC Only collect async ID when it's safe to do so Feb 17, 2025
Copy link

github-actions bot commented Feb 17, 2025

Overall package size

Self size: 9.57 MB
Deduped: 9.94 MB
No deduping: 9.94 MB

Dependency sizes | name | version | self size | total size | |------|---------|-----------|------------| | source-map | 0.7.4 | 226 kB | 226 kB | | pprof-format | 2.1.0 | 111.69 kB | 111.69 kB | | p-limit | 3.1.0 | 7.75 kB | 13.78 kB | | delay | 5.0.0 | 11.17 kB | 11.17 kB | | node-gyp-build | 3.9.0 | 8.81 kB | 8.81 kB |

🤖 This report was automatically generated by heaviest-objects-in-the-universe

@pr-commenter
Copy link

pr-commenter bot commented Feb 17, 2025

Benchmarks

Benchmark execution time: 2025-03-06 17:10:57

Comparing candidate commit 79992a0 in PR branch szegedi/async-id-2 with baseline commit 51951b7 in branch main.

Found 3 performance improvements and 0 performance regressions! Performance is the same for 90 metrics, 27 unstable metrics.

scenario:profiler-idle-with-wall-profiler-20

  • 🟩 cpu_user_time [-7.193ms; -1.554ms] or [-11.979%; -2.587%]

scenario:profiler-light-load-no-wall-profiler-16

  • 🟩 cpu_user_time [-7.421ms; -2.057ms] or [-7.705%; -2.136%]

scenario:profiler-light-load-no-wall-profiler-18

  • 🟩 cpu_user_time [-8.856ms; -2.354ms] or [-9.567%; -2.543%]

@szegedi szegedi marked this pull request as ready for review February 26, 2025 13:00
@szegedi szegedi requested a review from nsavoire as a code owner February 26, 2025 13:00
@szegedi szegedi changed the base branch from main to szegedi/keep-idle February 26, 2025 13:00
@szegedi szegedi added the semver-patch Bug or security fixes, mainly label Feb 26, 2025
@@ -291,6 +291,9 @@ void SignalHandler::HandleProfilerSignal(int sig,
return;
}
auto isolate = Isolate::GetCurrent();
if (!isolate || isolate->IsDead()) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which cases does GetCurrent returns null ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly don't know – I'm just trying to be pedantically defensive and not unconditionally dereference a pointer. I can envision a race condition between signal delivery and an isolate going away? I know the logic is different, but even the V8 sampler manager checks for isolate on a sampler not being null.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be more lenient and still invoke old_handler?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's ok as this.

Base automatically changed from szegedi/keep-idle to main February 27, 2025 10:12
- don't sample a dead isolate
- only try to retrieve an async ID when there's a context
Use relaxed load/store with appropriate signal fences for cheaper reading/writing of gcCount.
@szegedi szegedi force-pushed the szegedi/async-id-2 branch from 5be2a03 to a694680 Compare March 3, 2025 14:44
@szegedi szegedi requested a review from nsavoire March 4, 2025 10:19
@@ -66,6 +66,7 @@ export interface TimeProfilerOptions {
withContexts?: boolean;
workaroundV8Bug?: boolean;
collectCpuTime?: boolean;
collectAsyncId?: boolean;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Quality Violation

Prop name (collectAsyncId) doesn't match rule (is|has) (...read more)

Enforces a consistent naming pattern for boolean props.

The pattern is: "^(is|has)[A-Z]([A-Za-z0-9]?)+" to enforce is and has prefixes.

View in Datadog  Leave us feedback  Documentation

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that's new. I'm inclined to ignore it, seeing how existing boolean properties also don't follow this format.

@szegedi szegedi force-pushed the szegedi/async-id-2 branch 3 times, most recently from 3c86526 to 79992a0 Compare March 6, 2025 17:05
}
}

#define DD_WALL_PROFILER_GET_BOOLEAN_CONFIG(name) \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No a big fan of macros, but here this nicely simplifies the code.

@nsavoire
Copy link

nsavoire commented Mar 7, 2025

Overall this looks good !
Are you confident that GC callbacks + null/dead/incontext isolate checks are enough to avoid crashes ?

@szegedi
Copy link
Author

szegedi commented Mar 7, 2025

Are you confident that GC callbacks + null/dead/incontext isolate checks are enough to avoid crashes ?

I'd say I'm reasonably certain. In any case, I don't have an idea for other kinds of safeguards right now. All crashtracker reports (about 165) we have for the past month from people stuck on the unsafe version of the profiler were in GC, as well as 5 reports where the crashtracker failed to unwind so we can't tell.

@szegedi szegedi requested a review from nsavoire March 12, 2025 15:08
@szegedi szegedi merged commit d213a01 into main Mar 19, 2025
121 checks passed
@szegedi szegedi deleted the szegedi/async-id-2 branch March 19, 2025 15:27
szegedi added a commit that referenced this pull request Mar 19, 2025
* Guard against collecting async ID while performing GC

* Additional sanity checks:
- don't sample a dead isolate
- only try to retrieve an async ID when there's a context

* Move GetAsyncId and GC callbacks out of the header file.
Use relaxed load/store with appropriate signal fences for cheaper reading/writing of gcCount.

* Have a config option for collecting async IDs
@szegedi szegedi mentioned this pull request Mar 19, 2025
szegedi added a commit that referenced this pull request Mar 26, 2025
* Guard against collecting async ID while performing GC

* Additional sanity checks:
- don't sample a dead isolate
- only try to retrieve an async ID when there's a context

* Move GetAsyncId and GC callbacks out of the header file.
Use relaxed load/store with appropriate signal fences for cheaper reading/writing of gcCount.

* Have a config option for collecting async IDs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver-patch Bug or security fixes, mainly
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants