Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib: rewrite AsyncLocalStorage without async_hooks #48528

Merged
merged 1 commit into from
Aug 2, 2024

Conversation

Qard
Copy link
Member

@Qard Qard commented Jun 22, 2023

I'm working on rewriting AsyncLocalStorage to use v8::SetContinuationPreservedEmbedderData(...) in a similar model to Cloudflare and in anticipation of the upcoming AsyncContext proposal.

This is not done yet but is mostly working as-is. I have not got to benchmarking or writing tests for it yet, but it seems to be passing most of the existing AsyncLocalStorage. 😄

cc @nodejs/diagnostics @nodejs/performance

@Qard Qard added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. performance Issues and PRs related to the performance of Node.js. async_hooks Issues and PRs related to the async hooks subsystem. diag-agenda Issues and PRs to discuss during the meetings of the diagnostics working group. labels Jun 22, 2023
@Qard Qard requested review from jasnell and legendecas June 22, 2023 23:56
@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/gyp
  • @nodejs/startup

@nodejs-github-bot nodejs-github-bot added the needs-ci PRs that need a full CI run. label Jun 22, 2023
@Qard
Copy link
Member Author

Qard commented Jun 23, 2023

Also worth considering that as this no longer depends on async_hooks in any way it might make sense to give it a separate module name at some point too and just have async_hooks re-export it.

@Qard Qard force-pushed the async-local-storage-rewrite branch 2 times, most recently from b35d35c to 8b30200 Compare June 23, 2023 01:07
@Qard
Copy link
Member Author

Qard commented Jun 23, 2023

Some additional notes:

  • This will have to be behind a build flag because of a conflict with Chromium using the same single-tenant API in Electron.
  • I have not yet wired this up to AsyncWrap. It will need some linkage there to actually work across real async boundaries.
    • current() and exchange() are exposed on the C++ side to make this straightforward but may do some scope helpers for that. We'll see if it's necessary.
  • I also need to make a PR to V8 to make it propagate embedder data through thenables.
  • There are probably still lots of bugs and the existing tests don't have great coverage. I'll improve those while I'm working on this as I want to be sure we're not breaking any edge cases.

@jasnell
Copy link
Member

jasnell commented Jun 23, 2023

Woo! Thanks for getting this started. The need for the flag is unfortunate but definitely required given how chromium is using the API and the conflict that causes with electron. But otherwise this should be a major improvement over the current state. Will do a thorough review this weekend.

@Qard
Copy link
Member Author

Qard commented Jun 23, 2023

Another note:

This currently uses a model of building a linked-list of AsyncContextFrames which each contain only the key and value they were constructed with. The alternative which is what was originally proposed was to use a map which is cloned from the parent frame at each frame construction and then the single additional entry for that frame is added but it is considered otherwise immutable.

I went with the current model mostly because it was easier and made debugging simpler but I plan to spend some time benchmarking the two approaches to identify the trade-offs and figure out which approach would be generally better. The one definite downside to the current model is it holds past values for the same store from further up the call graph alive longer which may be undesirable but may only really apply to top-level scopes that will need to be kept alive anyway.

Further analysis should provide some better insight into the behaviour.

@mmarchini
Copy link
Contributor

This is awesome!

Also worth considering that as this no longer depends on async_hooks in any way it might make sense to give it a separate module name at some point too and just have async_hooks re-export it.

That'd be a welcome change. We've been trying to eliminate async_hooks usage in our apps (as we've detected performance issues when async_hooks are misused), and having to import AsyncLocalStorage from the async_hooks module messes with how we detect if an app is using async_hooks or not.

@Qard
Copy link
Member Author

Qard commented Jul 9, 2023

Thenables propagation fix is here: https://chromium-review.googlesource.com/c/v8/v8/+/4674242

@mcollina
Copy link
Member

mcollina commented Jul 9, 2023

I propose tagging this as semver-major.

@Qard
Copy link
Member Author

Qard commented Jul 10, 2023

@mcollina Could do that. Though the way in which we land it is still not entirely decided. Due to the Electron conflict it needs build-flagging anyway, so this could be landed as a minor if it's by default turned off. It could land off by default and then have a follow-up to turn it on by default as a major. What do you think?

@mcollina
Copy link
Member

That works

@Qard
Copy link
Member Author

Qard commented Jul 20, 2023

I've got all but four two of the existing tests passing at this point:

  • thenables I have a fix for this here.
  • http.Agent, which I haven't yet figured out why my existing AsyncResource patches don't cover it.
  • storage.disable(), which currently only blocks propagation after the call but also needs to block for the rest of the tick before the call too in order to pass the existing tests.
  • unhandledRejection, which does not match the context of the promise. James said he'll point me at how Cloudflare did it.

@theanarkh
Copy link
Contributor

theanarkh commented Jul 21, 2023

Hi, @Qard, Can you explain how the API v8::SetContinuationPreservedEmbedderData(...) work ? I have read some docs and code, but I don't understand it very well, thank you ! It looks a bit like SetEmbedderData/GetEmbedderData API.

@Qard
Copy link
Member Author

Qard commented Jul 24, 2023

The SetContinuationPreservedEmbedderData API stores a v8::Value in the v8::Context. Whenever a promise reaction job is created (meaning when it resolves or rejects) it captures the value held at that time and stores it on that reaction job. Whenever the reaction job runs (meaning a then/catch/finally continuation runs) it restores that value as the current stored value and then reverts to the previous value when the job is done.

This is essentially what async_hooks does currently with executionAsyncResource but exclusively for promises. However we can also manipulate that current value externally to capture and restore the value around AsyncWrap, AsyncResource, and any other scope we need to manipulate to match our execution flow.

@Qard Qard force-pushed the async-local-storage-rewrite branch from e009382 to 6d26f70 Compare July 28, 2023 03:20
@Qard Qard marked this pull request as ready for review July 28, 2023 03:20
@Qard Qard force-pushed the async-local-storage-rewrite branch from 6d26f70 to 5a7b995 Compare July 28, 2023 03:23
@Qard Qard added the request-ci Add this label to start a Jenkins CI on a PR. label Jul 29, 2024
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Jul 29, 2024
@nodejs-github-bot

This comment was marked as outdated.

@nodejs-github-bot
Copy link
Collaborator


> Stability: 1 - Experimental

Enables the use of AsyncLocalStorage backed by AsyncContextFrame rather than
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Enables the use of AsyncLocalStorage backed by AsyncContextFrame rather than
Enables the use of `AsyncLocalStorage` backed by `AsyncContextFrame` rather than

maybe even convert it to links.

lib/internal/async_local_storage/native.js Show resolved Hide resolved
}

disable() {
AsyncContextFrame.disable(this);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This disable variant seems quite different to the other one as this version effects only the current frame (which might be even undefined) whereas the other variant disables the ALS instance globally

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not quite accurate as run/enterWith will enable it again automatically. The existing disable also doesn't actually change the value stored on the associated resource so, for example, you could call disable() and then follow it with a run() and the previously entered context before the disable would be restored after the run() exits, which is kind of a bug. I'm intentionally leaving the existing ALS behaviour as-is, despite having several bugs. Those incorrect behaviours can be fixed in follow-up PRs.

The API is also considered experimental, so can be changed at any time. Support for the existing experimental APIs is somewhat best-effort and I would prefer to deprecate these APIs entirely unless some specific effort goes into fixing and stabilizing these APIs.

I don't think exact bug-for-bug reproduction of the existing ALS should be a goal here. The stabilized parts should be expected to behave the same, but the rest is best effort and likely to deviate a bit as the new model allows fixing some issues in the design of the prior model.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, run/enterWith reenable it but e.g. a timers or task_queue items do not use these APIs.

Anyhow, I agree that disable/exit APIs are strange (e.g. exit isn't the pendent of enter,...) already now. I also think they have no real use case besides testing maybe. So fine for me if we have behavior changes or even remove them later on.

Just want to make sure that these differences are not slipping in unintended and people complain. Maybe worth to mention this in the initial PR comment and/or commit message to avoid people say this was a silent change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intend to do some follow-up improvements, like there's currently a bunch of enabled checks in hot-paths that I want to restructure somewhat to have already made that check ahead of time.

I do think we should actually keep the exit behaviour at least, but I want to work on making the behaviour more coherent and consistent.

I'm not quite so convinced we should be keeping disable as it not being scoped doesn't match up evenly with the rest of the API and so the behaviour is a bit confusing given that it gets re-enabled automatically in certain cases. If it required an explicit enable then it might be more reasonable, but with the implicit re-enable it functions essentially the same as exit but in a much more edge-case-y way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exit() without the internal disable() is mostly an enter(undefined).
I would prefer a single set() instead but I guess it's a bit too late :o)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually not, and I do plan on doing that eventually, but some more work is needed to make that happen. Basically if we split the scoping part of run() from the mutating part we can raise the scoping part to provide implicit scopes in many places (like around every callback) and eliminate the need for all the closures. 😅

I'm working on an RFC that proves the correctness of the model which I should hopefully be able to share soon.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if native.js is a good name.

@@ -41,6 +42,10 @@ const {

const { AsyncResource } = require('async_hooks');

const AsyncContextFrame = require('internal/async_context_frame');

const async_context_frame = Symbol('asyncContextFrame');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const async_context_frame = Symbol('asyncContextFrame');
const async_context_frame = Symbol('kAsyncContextFrame');

Not sure but I think there was some naming guideline somewhere. But maybe I'm wrong so feel free to ignore.

if you go for the change there are a few similar places.

}

AsyncResource::~AsyncResource() {
CHECK_NOT_NULL(env_);
env_->RemoveAsyncResourceContextFrame(reinterpret_cast<std::uintptr_t>(this));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we change the sequence here?
In most places you enter context frame before asyncId handling and exit/restore after. But here it's the other way around.
I guess this has impact to anyone using a destroy hook for whatever reason.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timing consistency between ALS and async_hooks was never deterministic. Promises also produce a similar timing inconsistency. I can move that around if you feel strongly about it, but I don't think it really matters much.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a fast test and it seems destroy hook is not running with the the store as of now. So actually this sequence seems to be less breaking one.

In general I think that changing the implementation details of ALS should not result in too many side effects to other node APIs.
I know AsyncHooks API is experimental and their use for ALS will/should go away. But there are other async hooks users which might also rely on ALS therefore the result of als.getStore() seen within init/before/after/destroy hooks should stay the same if possible.
I expect there are no tests for this because till now async hooks were fully bound to ALS and quite a lot testing happens on async hooks level as of now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so my long-term plan is to:

  • migrate ALS off async_hooks
  • create some new API for resource tracking that doesn't conflate consumable and non-consumable resource, plus probably have some filtering mechanism to subscribe to specific resource types and skip the barrier hop altogether when not interested.
  • eventually deprecate async_hooks

This will be a very long-term thing as I only have so much free time to work on this stuff, but that's my general aim with this. 🙂

src/async_context_frame.cc Show resolved Hide resolved
namespace node {
namespace async_context_frame {

class Scope {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we disallow copy/assignment? Or is it anyway implicit because of the members?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. Any @nodejs/cpp-reviewers have opinions on this? I don't have strong opinions either way.

@Flarna
Copy link
Member

Flarna commented Jul 30, 2024

I think my older comments regarding ABI stability are all solved by moving the relevant parts into env.async_resource_context_frames_.

But this comment seems to be still relevant as the new/old behavior is likely different.

As this is behind a flag it should be fine.

@Qard
Copy link
Member Author

Qard commented Jul 30, 2024

The scope thing should cover that. I've tested against some native modules already and it seems to be working fine, as far as I can tell. My plan was just to land this behind a flag, get some customers to try it out, and shake out any inconsistencies we haven't already found. I've done my best making things as consistent as I can, but there were a few things that needed to change a bit to fit the different model. As far as I can tell though, none of the changes should ever have user-visible impact.

@Qard
Copy link
Member Author

Qard commented Aug 2, 2024

I'm going to just land this and start a follow-up PR with further improvements. What's here works, and I'd like to get it into users' hands sooner rather than later to start gathering feedback. The remaining suggestions are largely cosmetic, so I will just defer those to the follow-up rather than dealing with yet another round of CI flakiness and endless restarts. 😅

@Qard Qard added the request-ci Add this label to start a Jenkins CI on a PR. label Aug 2, 2024
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Aug 2, 2024
@nodejs-github-bot
Copy link
Collaborator

@Qard Qard added the commit-queue Add this label to land a pull request using GitHub Actions. label Aug 2, 2024
@Qard
Copy link
Member Author

Qard commented Aug 2, 2024

I accidentally added the request-ci label rather than the commit-queue label so now I need to wait for the CI again. Hopefully I get lucky and don't get bit by flakes. 🙈

@nodejs-github-bot nodejs-github-bot removed the commit-queue Add this label to land a pull request using GitHub Actions. label Aug 2, 2024
@nodejs-github-bot nodejs-github-bot merged commit d1229ee into nodejs:main Aug 2, 2024
59 checks passed
@nodejs-github-bot
Copy link
Collaborator

Landed in d1229ee

@Qard Qard deleted the async-local-storage-rewrite branch August 2, 2024 19:52
@kibertoad
Copy link
Contributor

@Qard Hey! Did you ever get to benchmarking the change?

@Qard
Copy link
Member Author

Qard commented Aug 5, 2024

Yep. See: #48528 (comment)

@kibertoad
Copy link
Contributor

@Qard thanks, this looks great!

targos pushed a commit that referenced this pull request Aug 14, 2024
PR-URL: #48528
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
Reviewed-By: Gerhard Stöbich <deb2001-github@yahoo.de>
RafaelGSS added a commit that referenced this pull request Aug 19, 2024
Notable changes:

buffer:
  * use fast API for writing one-byte strings (Robert Nagy) #54311
  * optimize createFromString (Robert Nagy) #54324
  * use native copy impl (Robert Nagy) #54087
inspector:
  * (SEMVER-MINOR) support `Network.loadingFailed` event (Kohei Ueno) #54246
lib:
  * (SEMVER-MINOR) rewrite AsyncLocalStorage without async_hooks (Stephen Belanger) #48528
module:
  * (SEMVER-MINOR) unflag detect-module (Geoffrey Booth) #53619

PR-URL: TODO
RafaelGSS added a commit that referenced this pull request Aug 19, 2024
Notable changes:

buffer:
  * use fast API for writing one-byte strings (Robert Nagy) #54311
  * optimize createFromString (Robert Nagy) #54324
  * use native copy impl (Robert Nagy) #54087
inspector:
  * (SEMVER-MINOR) support `Network.loadingFailed` event (Kohei Ueno) #54246
lib:
  * (SEMVER-MINOR) rewrite AsyncLocalStorage without async_hooks (Stephen Belanger) #48528
module:
  * (SEMVER-MINOR) unflag detect-module (Geoffrey Booth) #53619

PR-URL: #54452
@RafaelGSS RafaelGSS mentioned this pull request Aug 19, 2024
RafaelGSS added a commit that referenced this pull request Aug 20, 2024
Notable changes:

buffer:
  * use fast API for writing one-byte strings (Robert Nagy) #54311
  * optimize createFromString (Robert Nagy) #54324
  * use native copy impl (Robert Nagy) #54087
inspector:
  * (SEMVER-MINOR) support `Network.loadingFailed` event (Kohei Ueno) #54246
lib:
  * (SEMVER-MINOR) rewrite AsyncLocalStorage without async_hooks (Stephen Belanger) #48528
module:
  * (SEMVER-MINOR) add --experimental-transform-types flag (Marco Ippolito) #54283
  * (SEMVER-MINOR) unflag detect-module (Geoffrey Booth) #53619

PR-URL: #54452
RafaelGSS added a commit that referenced this pull request Aug 21, 2024
Notable changes:

buffer:
  * use fast API for writing one-byte strings (Robert Nagy) #54311
  * optimize createFromString (Robert Nagy) #54324
  * use native copy impl (Robert Nagy) #54087
inspector:
  * (SEMVER-MINOR) support `Network.loadingFailed` event (Kohei Ueno) #54246
lib:
  * (SEMVER-MINOR) rewrite AsyncLocalStorage without async_hooks (Stephen Belanger) #48528
module:
  * (SEMVER-MINOR) add --experimental-transform-types flag (Marco Ippolito) #54283
  * (SEMVER-MINOR) unflag detect-module (Geoffrey Booth) #53619

PR-URL: #54452
RafaelGSS added a commit that referenced this pull request Aug 22, 2024
Notable changes:

buffer:
  * use fast API for writing one-byte strings (Robert Nagy) #54311
  * optimize createFromString (Robert Nagy) #54324
  * use native copy impl (Robert Nagy) #54087
inspector:
  * (SEMVER-MINOR) support `Network.loadingFailed` event (Kohei Ueno) #54246
lib:
  * (SEMVER-MINOR) rewrite AsyncLocalStorage without async_hooks (Stephen Belanger) #48528
module:
  * (SEMVER-MINOR) add --experimental-transform-types flag (Marco Ippolito) #54283
  * (SEMVER-MINOR) unflag detect-module (Geoffrey Booth) #53619

PR-URL: #54452
@targos targos added the dont-land-on-v20.x PRs that should not land on the v20.x-staging branch and should not be released in v20.x. label Sep 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
async_hooks Issues and PRs related to the async hooks subsystem. blocked PRs that are blocked by other issues or PRs. c++ Issues and PRs that require attention from people who are familiar with C++. dont-land-on-v20.x PRs that should not land on the v20.x-staging branch and should not be released in v20.x. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. performance Issues and PRs related to the performance of Node.js. semver-minor PRs that contain new features and should be released in the next minor version. wip Issues and PRs that are still a work in progress.
Projects
None yet
Development

Successfully merging this pull request may close these issues.