Skip to content

Commit 34b6765

Browse files
committed
Explain the current state of our SEA work
This document is a bit like blog post, in that in documents the state at this point in time and will likely be out of date at some point. For this reason, I put this document inside a `blog/` directory, prefixed with its date. Signed-off-by: Juan Cruz Viotti <jv@jviotti.com>
1 parent 2ed9e73 commit 34b6765

File tree

5 files changed

+296
-0
lines changed

5 files changed

+296
-0
lines changed

README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,3 +9,8 @@ Table of Contents
99
-----------------
1010

1111
- [Existing SEA Solutions](./docs/existing-solutions.md)
12+
13+
Blog
14+
----
15+
16+
- (2022/08/05) [An Overview of the Current State](./blog/2022-08-05-an-overview-of-the-current-state.md)

assets/SEA.sketch

105 KB
Binary file not shown.

assets/architecture.png

84 KB
Loading

assets/ingredients.png

229 KB
Loading
Lines changed: 291 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,291 @@
1+
An Overview of the Current State
2+
================================
3+
4+
Authors:
5+
[@dsanders11](https://github.com/dsanders11),
6+
[@jviotti](https://github.com/jviotti),
7+
[@guest271314](https://github.com/guest271314),
8+
[@RaisinTen](https://github.com/RaisinTen),
9+
[@robertgzr](https://github.com/robertgzr),
10+
[@saswatds](https://github.com/saswatds),
11+
[@Trott](https://github.com/Trott).
12+
13+
> This document aims to provide a short introduction to the problem and our
14+
> current understanding of the desired solution. The objective is to set the
15+
> minimal ground for further discussion rather than providing extensive
16+
> details. Reach out through [GitHub
17+
> Discussions](https://github.com/nodejs/single-executable/discussions) if you
18+
> have any questions!
19+
20+
Introduction
21+
------------
22+
23+
A Node.js Single Executable Application (SEA) is a full Node.js program
24+
distributed with the Node.js runtime as a single standalone binary. The
25+
problem of bundling app code with the Node.js platform runtime has been
26+
explored by the Node.js community for many years as documented in [Existing SEA
27+
Solutions](../docs/existing-solutions.md). What it is even more interesting,
28+
is that variants of at least one facet of same problem have also been explored
29+
by the general open-source community in projects such as
30+
[Deno](https://deno.land/manual/tools/compiler),
31+
[AppImage](https://appimage.github.io),
32+
[Electron](https://github.com/electron/asar) and
33+
[Redbean](https://redbean.dev).
34+
35+
While there are many good solutions to this problem in the Node.js ecosystem,
36+
none of them has proven to be strictly superior than the rest. Many of these
37+
tools implement similar architectures, often end up solving the same problems
38+
and as a consequence tend to face the same challenges.
39+
40+
We believe that joining forces will not only result in a superior solution in
41+
the context of Node.js, but also introduce foundational blocks to solve related
42+
problems in the open-source world at large.
43+
44+
How do SEAs work?
45+
-----------------
46+
47+
The problem of combining application resources with a Node.js executable may
48+
seem daunting at first, but it's not! The process typically looks something
49+
like this:
50+
51+
![The classic architecture of SEAs](../assets/architecture.png)
52+
53+
First, we start with the assets that make a Node.js application. These
54+
typically consist of JavaScript files from the application code and
55+
`node_modules` directory, any native add-on built by either the top-level
56+
application or any of its dependencies, JSON files and more. Code implemented
57+
using transpilers like TypeScript or compiled to JavaScript through WebAssembly
58+
are assumed to have been already processed at this point.
59+
60+
Node.js programs are often composed by more than one file. To embed these
61+
multiple files into the executable, we need to "bundle" them together.
62+
Application code may rely on file-system characteristics such as their location
63+
relative to the root of the project, the permissions added to the application
64+
files (i.e. the execution bit) or a symlink-based directory structure. To make
65+
sure these characteristics are preserved, application files are often bundled
66+
as a Virtual File System (VFS).
67+
68+
Native executable programs, objects, shared libraries and static libraries are
69+
represented using a binary format understood by the operating system. For
70+
example, macOS and Windows make use of the
71+
[Mach-O](https://en.wikipedia.org/wiki/Mach-O) and [Portable Executable
72+
(PE)](https://en.wikipedia.org/wiki/Portable_Executable) binary formats,
73+
respectively. These binary formats typically organize their data as a set of
74+
sections. For example, one section may include the main executable code of the
75+
program while another section may include every statically initialized
76+
variable.
77+
78+
Therefore, the Virtual File System obtained by bundling the application files
79+
can be "appended" to the Node.js binary as a new section. The Node.js C++
80+
native initialization logic can introspect on which sections it contains, and
81+
jump execution to the VFS section if available.
82+
83+
Problems with current SEAs
84+
--------------------------
85+
86+
In our experience, existing Node.js single-executable implementations have at
87+
least one of the following problems or limitations:
88+
89+
- **Require maintaining custom Node.js patches.** These patches typically touch
90+
on the initialization logic of Node.js and are different across Node.js
91+
versions. These patches result in high maintenance burden and make it
92+
difficult for SEA implementations to support new Node.js versions as soon as
93+
they are released. The ideal SEA tool must be able to create a standalone
94+
executable application using any upstream Node.js build published to the
95+
official website.
96+
- **Require building Node.js from source.** Managing custom Node.js patches
97+
also means that SEA implementations need to provide its own custom builds of
98+
Node.js. Operating custom builds is complex, error-prone and potentially
99+
expensive resources-wise. In other cases, SEA projects embed the application
100+
resources at build time, pushing the need of building Node.js from source to
101+
their users.
102+
- **Require monkey-patching Node.js internal modules.** Often, SEA
103+
implementations need to intercept I/O related Node.js functionality ranging
104+
from the `fs` and `child_process` module, to the inner workings of
105+
`require()`. Node.js does not provide facilities to aid in this problem,
106+
resulting in complex and error-prone monkey patching.
107+
- **Limited interoperability with code signing.** In many cases, the
108+
application resources are embedded at the tail of the binary, outside of the
109+
boundaries of binary formats such as Mach-O and PE. Given that code-signing
110+
operates at the binary format level, these approaches can result in binaries
111+
that cannot be code-signed, that cannot be executed, or signed binaries whose
112+
application resources are actually not protected by the signature.
113+
- **No support for non-JavaScript assets.** Some SEA implementations
114+
concatenate JavaScript assets and inject the resulting "single" file into the
115+
executable. While this approach works for simple programs, many Node.js
116+
applications require application resources that are not typically
117+
concatenated as part of the JavaScript code, such as Node.js native add-ons,
118+
text files, executable scripts and more. These assets are often dynamically
119+
consumed using `fs` and `child_process`.
120+
- **Partial coverage of the platforms supported by Node.js.** Node.js provides
121+
official support, of varying degrees, for a [wide
122+
range](https://github.com/nodejs/node/blob/08d6a82f62962015b03ae7076487ba209cfd2ab5/BUILDING.md#supported-platforms)
123+
of operating systems and architectures. In comparison, many SEA
124+
implementations limit their support to a small subset of these.
125+
126+
The Architecture of SEAs
127+
------------------------
128+
129+
The problem of supporting Node.js SEAs can be broken down as the sum of 3
130+
complementary and orthogonal ingredients: the **Resource Injection**, the
131+
**Virtual File System** and the **Bootstrapper**. Interestingly enough, these
132+
components are generic enough that they may each open up use cases beyond SEAs
133+
too.
134+
135+
![The 3 orthogonal ingredients of Node.js SEAs](../assets/ingredients.png)
136+
137+
### Resource Injection
138+
139+
This ingredient is concerned with providing the ability to inject arbitrary
140+
data to a pre-compiled binary on a binary-format-friendly manner. After the
141+
data injection takes place, the program must be able to detect the offset and
142+
length of the injected data at runtime. Implementing a cross-platform resource
143+
injector and supporting the capability for runtime introspection requires
144+
knowledge of various binary formats.
145+
146+
The initial set of requirements for this ingredient are:
147+
148+
| # | Requirement |
149+
|---|---------------------------------------------------------------------------------------------|
150+
| 1 | Inject data within the boundaries of the binary format, not outside of it |
151+
| 2 | Support every binary format adopted by the platforms supported by Node.js |
152+
| 3 | Support the injection of any arbitrary data, irrelevant of its format and contents |
153+
| 4 | Provide a complementary cross-platform native API for runtime reflection |
154+
| 5 | The injector implementation must not require native add-ons |
155+
| 6 | Allow injection to any supported binary format in any platform (from macOS to Windows, etc) |
156+
157+
The purpose of requirement #5 is to prevent developers making use of the SEA
158+
technology from needing to have a full native compiler toolchain on their local
159+
environments to build a native add-on that enables injection. This setup is
160+
often complicated on i.e. Windows and would likely become a common source of
161+
issues asking for help.
162+
163+
Requirement #6 is a convenience feature primarily used in Continuous
164+
Integration. With it, developers do not need to perform data injection using a
165+
host operating system that matches the target. For example, a developer would
166+
be able to perform resource injection to a Windows binary on Linux.
167+
168+
While the immediate use case for this tool is to inject Node.js resource files
169+
into the main executable, many native programs find the need of injecting
170+
arbitrary data files into the program for a wide number of reasons. Features
171+
such as [`#embed`](https://thephd.dev/finally-embed-in-c23) have landed in the
172+
C23 programming language for this reason.
173+
174+
### Virtual File System
175+
176+
The VFS is the read-only format in which the application data is bundled before
177+
getting injected into the Node.js executable. The presence of a VFS is
178+
essential for supporting runtime logic that relies on file-system metadata such
179+
as directory structure and file permissions. This type of VFS format is
180+
typically simple: a concatenation of the files along with some structure that
181+
annotates each file with file-system-related metadata.
182+
183+
The initial set of requirements for this ingredient are:
184+
185+
| # | Requirement |
186+
|---|-------------------------------------------------------------------------|
187+
| 1 | Support random access reads for performance reasons |
188+
| 2 | Support the concept of symbolic links |
189+
| 3 | Support preserving file permissions, at least the executable bit |
190+
| 4 | Support general purpose data compression for space optimization reasons |
191+
| 5 | Preserve file-hierarchy information |
192+
| 6 | Increase locality of related files for performance reasons |
193+
| 7 | No interference with valid paths in the file system |
194+
195+
A virtual file system has wide applicability beyond being embedded into an
196+
existing executable. Use cases range from virtual machines and data
197+
transmission, to even [package
198+
managers](https://github.com/yarnpkg/berry/tree/b6273b3f393f1485b810ee09a66acd5b2af564dd/packages/yarnpkg-fslib).
199+
200+
### Bootstrapper
201+
202+
This is the ingredient that ties it all together. The bootstrapper makes use of
203+
the *Resource Injection* runtime introspection capability to detect the
204+
presence of the embedded *Virtual File System*. It implements the logic for
205+
jumping execution into the program bundled in the virtual file system and knows
206+
how to intercept I/O to the virtual file system to provide seamless execution.
207+
208+
The initial set of requirements for this ingredient are:
209+
210+
| # | Requirement |
211+
|---|-------------------------------------------------------------------|
212+
| 1 | Support loading Node.js native add-ons |
213+
| 2 | Proxy Node.js API function calls that involve I/O back to the VFS |
214+
| 3 | Support running executable programs from within the VFS |
215+
| 4 | Proxy command-line arguments to the program embedded in the VFS |
216+
| 5 | Proxy environment variables to the program embedded in the VFS |
217+
218+
In the context of Node.js, the community was historically forced to
219+
monkey-patch modules such as `fs` for advanced I/O use cases. For example,
220+
Electron
221+
[monkey-patches](https://github.com/electron/electron/blob/06a00b74e817a61f20e2734d50d8eb7bc9b099f6/lib/asar/fs-wrapper.ts)
222+
several Node.js modules to power their ASAR integration. Vercel's PKG
223+
[monkey-patches](https://github.com/vercel/pkg/blob/f0c4e8cd113e761958ab387f4b0237f4d8797335/prelude/bootstrap.js#L596)
224+
a wide range of functions in a similar way. This approach is not only
225+
error-prone, but might not be possible in the future if Node.js prevents
226+
runtime modification to its internal modules (i.e. for security reasons).
227+
228+
What do we have so far?
229+
-----------------------
230+
231+
At Postman, [@dsanders11](https://github.com/dsanders11),
232+
[@raisinten](https://github.com/raisinten) and
233+
[@robertgzr](https://github.com/robertgzr) made outstanding progress on
234+
experimenting with the above architecture. Much of this progress builds on the
235+
interesting previous work and discussions that took place in
236+
[#42334](https://github.com/nodejs/node/pull/42334) and
237+
[#43432](https://github.com/nodejs/node/issues/43432).
238+
239+
In the area of *Resource Injection*, we open-sourced a tool called
240+
[Postject](https://github.com/postmanlabs/postject). Postject enables arbitrary
241+
injection of data as Mach-O, PE and ELF sections, and ships with a
242+
cross-platform C/C++ header file providing runtime introspection APIs. It is
243+
currently implemented in a mixture of Python and C++, and it builds on the
244+
foundations provided by the [LIEF](https://github.com/lief-project/LIEF)
245+
project.
246+
247+
In the area of *Virtual File Systems*, we are currently basing our
248+
proof-of-concept on the [ASAR](https://github.com/electron/asar) archive format
249+
designed and battle-tested by Electron. The ASAR format is extensible by
250+
design, allowing us to add arbitrary new metadata and functionality to our
251+
custom implementation.
252+
253+
In the area of the *Bootstrapper*, we are maintaining a custom Node.js patch
254+
that makes use of Postject reflection APIs, provides a basic ASAR read-only
255+
implementation, and takes over the entry point of Node.js to jump execution to
256+
the embedded app.
257+
258+
However, we are far from done! Our goal is to rethink and continue evolving
259+
these components with the help of the community, and push for a better approach
260+
to SEAs that everybody from the Node.js ecosystem and beyond can benefit from.
261+
262+
Future work
263+
-----------
264+
265+
SEAs open interesting possibilities for innovation. For example, we can explore
266+
creating SEAs that make use of v8 snapshots to speed up application startup
267+
time, or ways in which we can trim down Node.js to optimize for
268+
space-efficiency in the context of SEAs.
269+
270+
Ideas are very welcome!
271+
272+
Taking it from here
273+
-------------------
274+
275+
There are lots of ways to help! If you made it this far, you probably have some
276+
questions, observations and feedback. If so, do write them down as [GitHub
277+
Discussions](https://github.com/nodejs/single-executable/discussions) or
278+
[GitHub Issues](https://github.com/nodejs/single-executable/issues). Other than
279+
that:
280+
281+
- We need to collect proper requirements for each of the components we will be
282+
building
283+
- We need to continue researching what's out there, and what we can learn from
284+
it
285+
- We need to align implementers of SEA tooling to join forces and create a
286+
top-notch technology
287+
- We need to sort out philosophical questions about how much of the SEA work
288+
will be a proper part of the core Node.js project, and how much will be
289+
community tooling
290+
291+
Last but not least, we have to actually write some code and get this done!

0 commit comments

Comments
 (0)