Skip to content

Entrypoint Hooks (carry over discussion from Austin Collab Summit) #43408

Open
@jasnell

Description

@jasnell

Originally had this as a discussion in https://github.com/nodejs/node/discussions/43384


At the Austin Collaborator Summit, there was significant discussion around the need for a more well-defined startup lifecycle with a clearer boundary between the preload phase and the loading/evaluation of the user entry point. The use cases include more reliable handling for APMs, dynamic transpilers, diagnostic tooling, and more. I took the task of working up an initial proposal. Here is that proposal:


Entrypoint Hooks

Currently, the Node.js startup process consists of a single bootstrap phase in which the Node.js core internal mechanisms and environment are set up followed by the loading and instantiation of the user-provided entry point script.

stateDiagram-v2
  state "Node.js Startup" as A
  state "Preloads (sync eval)" as B
  state "User entry point script (sync eval)" as C
  state "Start event loop" as D
  state "Process preload and entry point async tasks" as E
  state "Run event loop" as F
  [*] --> A
  A --> B
  B --> C
  C --> D
  D --> E
  E --> F
  F --> [*]
Loading

The User Entry point here is the script that is provided as the argument to the node binary (e.g. node foo.js, foo.js is the User Entry point).

Historically with Node.js, there have always been scenarios where it is desirable to load and run code before the User Entry point performs any actions. This can be accomplished with several methods:

  • Require-first: By strategically positioning require and import statements at the beginning of the User Entry point so that they are loaded and evaluated before anything else. This is the mechanism used typically by many Node.js APMs.
  • Wrapper/Loader: By using an alternative user entry point that ensures that certain code is loaded and evaluated first before the actual user entry point is loaded. This is the mechanism used typically by certain test frameworks, serverless environments like lambda.
  • Preloads: By using the Node.js -r command-line argument, Node.js can be instructed to load and evaluate one or more CommonJS scripts synchronously before loading and evaluating the user entry point script. This is used, for instance, by tools like Node.js Clinic to preload diagnostic tooling into the Node.js process.
  • Module Loaders: By providing an alternative module loader implementation using the still experimental loader API, it is possible to execute startup code the first time a module is loaded – including the user entry point module. This is the mechanism used by tools such as ts-node, for instance.

While each of these have historically been effective, they each suffer from a number of limitations, not the least of which is the lack of a clear separation between the execution of the preload code and the user entry point. Take, for instance, the following example:

Imagine a preload script with a simple one-line of code:

// preload.js
setImmediate(() => console.log('preload');

And a User Entrypoint script with the following:

// entry.js
console.log('entrypoint');

Now run the node binary as:

node -r ./preload.js entry.js

The order of the statements printed will be:

entrypoint
preload

This is because while the preload script does run before entry point script, it schedules async activity that does not get invoked until after the event loop has started, after the entry point script has been evaluated. While waiting for the preload script to complete, a lot of user code can run.

In other words, while there is a clear boundary at which preload can begin, there is no such boundary for when preload completes.

This is a proposal for establishing a clearer lifecycle boundary

Proposal

In the proposed new model, a new Entrypoint Hook phase is introduced into the Node.js startup following the completion of the bootstrap. During the Entrypoint Hook phase, one or more preload scripts can be loaded and evaluated in a user-defined order, in precisely the same way that preload scripts (using the -r argument) are loaded except for one very important distinction: Immediately after loading and evaluating these preload scripts, the Node.js event loop will be started to allow any asynchronous operations initiated by those to be run to completion. When there are no further async tasks for that first run of the event loop to complete, the entry point hook phase of the bootstrap will be considered to be complete, the event loop will be reset, and the user entry point will be loaded and evaluated, continuing the Node.js startup just as it does today. If there are no preload scripts to run, this entire new phase is skipped.

stateDiagram-v2
  state "Node.js Startup" as A
  state "Preloads (sync eval)" as B
  state "Start event loop" as C
  state "Process preload async tasks" as D
  state "Stop event loop" as E
  state "User entry point script (sync eval)" as F
  state "Start event loop" as G
  state "Process entry point async tasks" as H
  state "Run event loop" as I
  state "Entry point hook phase" as J
  state "User entry point run phase" as K
  [*] --> A
  A --> B
  state J {
    B --> C
    C --> D
    D --> E
    E --> F
  }
  state K {
    F --> G
    G --> H
    H --> I
    I --> [*]
  }
Loading

With this approach, the preload scripts run during the Entrypoint Hook phase are permitted to fully complete and can alter the user entry point before it begins.

Importantly, at the end of the entry point hook phase, there are no pending async tasks of any kind carrying over into the evaluation of the user entry point script. The entry point hooks may allocate handles that persist across the boundary between phases (e.g. network handles, file descriptors, etc) but those will have no pending i/o by the end of the phase.

Use Case: Serverless

In the serverless use case, a serverless host environment can use the entry point hook phase to load any supporting framework code and initialization process it needs before completing the actual user entry point script.

Use Case: APMs/Diagnostic Tools

In the APM use case, diagnostic tools can use the entry point hook phase to load any diagnostic instrumentation it needs to prepare, even if that tooling is initialized asynchronously (e.g. to query file system or network for license or configuration data)

Use Case: Dynamic Transpilers

Because the entry point hook is guaranteed to run to completion before the start of the user entry point, they can be used to implement dynamic transpilation of the user entry point before it completes. For instance, a TypeScript entry point hook can transpile a typescript file passed in as the user entry point and trigger Node.js to load and execute the compiled JavaScript result rather than trying to run the typescript file that was provided:

What about startup time? Cold starts?

Entrypoint Hook scripts will have an impact on Node.js binary startup time when used. There are, fortunately, mechanisms for mitigating such costs. It would be possible, for instance, to capture a snapshot of the preloads such that loading and initial evaluation cost is reduced in exactly the same way that we have created snapshots of the Node.js bootstrap and are working to create snapshots of the user entry point. Preloads, however, are not trivial and effort will need to be made to ensure a minimal performance cost.

What is the relationship to Loaders?

Pluggable loaders are invoked as a result of require() or import (static or dynamic). The entry point hooks run once immediately upon start of the Node.js process or worker thread startup, and that is it. As such, they serve two entirely different purposes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    discussIssues opened for discussions and feedbacks.loadersIssues and PRs related to ES module loaders

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions