Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node-api: Add API to check safety of calling JS #43805

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

srriddle
Copy link

This PR seeks to resolve an race condition issue while shutting down node with neon bindings. Due to the race, it is possible to call into JavaScript during shutdown while it is not allowed. The added API cal exposes the can_call_into_js so that we are able to check and catch the condition where we are no longer allowed to call into JavaScript.

neon-bindings/neon#913

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/node-api

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. labels Jul 12, 2022
@kjvalencik
Copy link

It is not currently possible with APIs to distinguish between a pending exception and not being able to call into js for some other reason. This can be worked around by abusing NAPI_PREAMBLE in some existing Node-API method, but ideally there would be a method that can be more cleanly called.

@daeyeon daeyeon added the node-api Issues and PRs related to the Node-API. label Jul 13, 2022
@mhdawson
Copy link
Member

mhdawson commented Aug 5, 2022

We discussed in the Node-API team meeting today. Could you expand on the scenario on when you are runing into this problem so that we can better understand? For example what do you do in the case where it will return false?

@kjvalencik
Copy link

kjvalencik commented Aug 5, 2022

@mhdawson The most common use case is to stop calling JavaScript instead of attempting to handle a JavaScript exception.

Consider the first call in a threadsafe function. It fails with napi_exception. Did it throw an exception or is it not possible to call into JS at all? If it's not possible to call at all, most of the time, I just want to exit early.

Neon currently uses a side effect of napi_throw with a bad argument to detect this state. Since it's internal to Neon, it knows it's not possible to be in a throwing state yet.

https://github.com/neon-bindings/neon/blob/c6f75a774a5320412d373f46e381287c3a8e701e/crates/neon/src/sys/no_panic.rs#L124-L135

@legendecas
Copy link
Member

legendecas commented Aug 29, 2022

For finalizers called at environment shutdown, I don't think it is encouraged to call into JavaScript in the finalizers, and we are working to eagerly invoke the finalizers at a time that the JavaScript execution is disallowed: #42651 (comment).

For napi_threadsafe_function, I think we should stop draining the queue of the threadsafe function at the termination of an environment, and wait for its finalizer to be invoked and user's data to be cleaned up, similarly, with JavaScript execution disallowed in the finalizers.

However, even with the work of #42651, we can not get rid of what is already carved in the stone as node-api is declared to be abi-stable. I believe it is worth landing this new API as experimental and re-initiate the effort on the issue #37446. As a requirement for landing a new node-api, I'll bring this topic up in this week's node-api working group meeting.

Thank you for working on this!

@@ -566,6 +566,11 @@ NAPI_EXTERN napi_status NAPI_CDECL napi_object_seal(napi_env env,
napi_value object);
#endif // NAPI_VERSION >= 8

#ifdef NAPI_EXPERIMENTAL
NAPI_EXTERN napi_status NAPI_CDECL napi_can_call_into_js(napi_env env,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From dicussion in Node-api team meeting today new methods should use node_api -> node_api_can_call_into_js in this case.

Overall there was agreement that adding a method makes sense.

The other suggestion was that we need a unit test.

> Stability: 1 - Experimental

```c
NAPI_EXTERN napi_status napi_can_call_into_js(napi_env env);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is missing the bool* result parameter.

NAPI_EXTERN napi_status napi_can_call_into_js(napi_env env);
```

* `[in] env`: The environment that the API is invoked under.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

@gabrielschulhof
Copy link
Contributor

gabrielschulhof commented Apr 8, 2023

There's another problem. @addaleax correct me if I'm wrong, but AFAICT can_call_into_js() can become false at any time, because, on a worker, turning off the ability to call into JS is a matter of the main thread setting an atomic boolean in the worker's node::Environment. https://github.com/gabrielschulhof/tsfn-test/blob/main/README.md shows an example of can_call_into_js() becoming false while the worker thread is already inside napi_call_function(). So, having a separate check doesn't really prevent such race conditions. I think the lesson is that any Node-API can fail with napi_pending_exception at any time.

One way I found to distinguish an actual exception from an env teardown is that in the case of a teardown, the call returns napi_pending_exception but a call to napi_is_exception_pending() returns napi_ok and false. After #45715 lands we can add the napi_cannot_call_into_js status to make it clearer why the call has failed.

Copy link
Contributor

@gabrielschulhof gabrielschulhof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking the PR until we can resolve the discussion about the race condition.

@gabrielschulhof
Copy link
Contributor

@addaleax would it make sense to protect the boolean flag with a mutex instead of making it atomic? This would prevent the main thread from setting it while there's a node api call going on. So, like,

  1. CallIntoModule try-locks the mutex. If it succeeds, it calls into the module and releases the lock, otherwise it returns.
  2. On the main thread, the native binding for terminate() locks the mutex, sets the boolean, releases the mutex, and returns.

This would not address calls into module that do not execute JS. Providing APIs that allow one to attach these "pure" callbacks is a separate discussion though.

WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. node-api Issues and PRs related to the Node-API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants