|
| 1 | +An Overview of the Current State |
| 2 | +================================ |
| 3 | + |
| 4 | +Authors: |
| 5 | +[@dsanders11](https://github.com/dsanders11), |
| 6 | +[@jviotti](https://github.com/jviotti), |
| 7 | +[@guest271314](https://github.com/guest271314), |
| 8 | +[@RaisinTen](https://github.com/RaisinTen), |
| 9 | +[@robertgzr](https://github.com/robertgzr), |
| 10 | +[@saswatds](https://github.com/saswatds), |
| 11 | +[@Trott](https://github.com/Trott). |
| 12 | + |
| 13 | +> This document aims to provide a short introduction to the problem and our |
| 14 | +> current understanding of the desired solution. The objective is to set the |
| 15 | +> minimal ground for further discussion rather than providing extensive |
| 16 | +> details. Reach out through [GitHub |
| 17 | +> Discussions](https://github.com/nodejs/single-executable/discussions) if you |
| 18 | +> have any questions! |
| 19 | +
|
| 20 | +Introduction |
| 21 | +------------ |
| 22 | + |
| 23 | +A Node.js Single Executable Application (SEA) is a full Node.js program |
| 24 | +distributed with the Node.js runtime as a single standalone binary. The |
| 25 | +problem of bundling app code with the Node.js platform runtime has been |
| 26 | +explored by the Node.js community for many years as documented in [Existing SEA |
| 27 | +Solutions](../docs/existing-solutions.md). What it is even more interesting, |
| 28 | +is that variants of at least one facet of same problem have also been explored |
| 29 | +by the general open-source community in projects such as |
| 30 | +[Deno](https://deno.land/manual/tools/compiler), |
| 31 | +[AppImage](https://appimage.github.io), |
| 32 | +[Electron](https://github.com/electron/asar) and |
| 33 | +[Redbean](https://redbean.dev). |
| 34 | + |
| 35 | +While there are many good solutions to this problem in the Node.js ecosystem, |
| 36 | +none of them has proven to be strictly superior than the rest. Many of these |
| 37 | +tools implement similar architectures, often end up solving the same problems |
| 38 | +and as a consequence tend to face the same challenges. |
| 39 | + |
| 40 | +We believe that joining forces will not only result in a superior solution in |
| 41 | +the context of Node.js, but also introduce foundational blocks to solve related |
| 42 | +problems in the open-source world at large. |
| 43 | + |
| 44 | +How do SEAs work? |
| 45 | +----------------- |
| 46 | + |
| 47 | +The problem of combining application resources with a Node.js executable may |
| 48 | +seem daunting at first, but it's not! The process typically looks something |
| 49 | +like this: |
| 50 | + |
| 51 | + |
| 52 | + |
| 53 | +First, we start with the assets that make a Node.js application. These |
| 54 | +typically consist of JavaScript files from the application code and |
| 55 | +`node_modules` directory, any native add-on built by either the top-level |
| 56 | +application or any of its dependencies, JSON files and more. Code implemented |
| 57 | +using transpilers like TypeScript or compiled to JavaScript through WebAssembly |
| 58 | +are assumed to have been already processed at this point. |
| 59 | + |
| 60 | +Node.js programs are often composed by more than one file. To embed these |
| 61 | +multiple files into the executable, we need to "bundle" them together. |
| 62 | +Application code may rely on file-system characteristics such as their location |
| 63 | +relative to the root of the project, the permissions added to the application |
| 64 | +files (i.e. the execution bit) or a symlink-based directory structure. To make |
| 65 | +sure these characteristics are preserved, application files are often bundled |
| 66 | +as a Virtual File System (VFS). |
| 67 | + |
| 68 | +Native executable programs, objects, shared libraries and static libraries are |
| 69 | +represented using a binary format understood by the operating system. For |
| 70 | +example, macOS and Windows make use of the |
| 71 | +[Mach-O](https://en.wikipedia.org/wiki/Mach-O) and [Portable Executable |
| 72 | +(PE)](https://en.wikipedia.org/wiki/Portable_Executable) binary formats, |
| 73 | +respectively. These binary formats typically organize their data as a set of |
| 74 | +sections. For example, one section may include the main executable code of the |
| 75 | +program while another section may include every statically initialized |
| 76 | +variable. |
| 77 | + |
| 78 | +Therefore, the Virtual File System obtained by bundling the application files |
| 79 | +can be "appended" to the Node.js binary as a new section. The Node.js C++ |
| 80 | +native initialization logic can introspect on which sections it contains, and |
| 81 | +jump execution to the VFS section if available. |
| 82 | + |
| 83 | +Problems with current SEAs |
| 84 | +-------------------------- |
| 85 | + |
| 86 | +In our experience, existing Node.js single-executable implementations have at |
| 87 | +least one of the following problems or limitations: |
| 88 | + |
| 89 | +- **Require maintaining custom Node.js patches.** These patches typically touch |
| 90 | + on the initialization logic of Node.js and are different across Node.js |
| 91 | + versions. These patches result in high maintenance burden and make it |
| 92 | + difficult for SEA implementations to support new Node.js versions as soon as |
| 93 | + they are released. The ideal SEA tool must be able to create a standalone |
| 94 | + executable application using any upstream Node.js build published to the |
| 95 | + official website. |
| 96 | +- **Require building Node.js from source.** Managing custom Node.js patches |
| 97 | + also means that SEA implementations need to provide its own custom builds of |
| 98 | + Node.js. Operating custom builds is complex, error-prone and potentially |
| 99 | + expensive resources-wise. In other cases, SEA projects embed the application |
| 100 | + resources at build time, pushing the need of building Node.js from source to |
| 101 | + their users. |
| 102 | +- **Require monkey-patching Node.js internal modules.** Often, SEA |
| 103 | + implementations need to intercept I/O related Node.js functionality ranging |
| 104 | + from the `fs` and `child_process` module, to the inner workings of |
| 105 | + `require()`. Node.js does not provide facilities to aid in this problem, |
| 106 | + resulting in complex and error-prone monkey patching. |
| 107 | +- **Limited interoperability with code signing.** In many cases, the |
| 108 | + application resources are embedded at the tail of the binary, outside of the |
| 109 | + boundaries of binary formats such as Mach-O and PE. Given that code-signing |
| 110 | + operates at the binary format level, these approaches can result in binaries |
| 111 | + that cannot be code-signed, that cannot be executed, or signed binaries whose |
| 112 | + application resources are actually not protected by the signature. |
| 113 | +- **No support for non-JavaScript assets.** Some SEA implementations |
| 114 | + concatenate JavaScript assets and inject the resulting "single" file into the |
| 115 | + executable. While this approach works for simple programs, many Node.js |
| 116 | + applications require application resources that are not typically |
| 117 | + concatenated as part of the JavaScript code, such as Node.js native add-ons, |
| 118 | + text files, executable scripts and more. These assets are often dynamically |
| 119 | + consumed using `fs` and `child_process`. |
| 120 | +- **Partial coverage of the platforms supported by Node.js.** Node.js provides |
| 121 | + official support, of varying degrees, for a [wide |
| 122 | + range](https://github.com/nodejs/node/blob/08d6a82f62962015b03ae7076487ba209cfd2ab5/BUILDING.md#supported-platforms) |
| 123 | + of operating systems and architectures. In comparison, many SEA |
| 124 | + implementations limit their support to a small subset of these. |
| 125 | + |
| 126 | +The Architecture of SEAs |
| 127 | +------------------------ |
| 128 | + |
| 129 | +The problem of supporting Node.js SEAs can be broken down as the sum of 3 |
| 130 | +complementary and orthogonal ingredients: the **Resource Injection**, the |
| 131 | +**Virtual File System** and the **Bootstrapper**. Interestingly enough, these |
| 132 | +components are generic enough that they may each open up use cases beyond SEAs |
| 133 | +too. |
| 134 | + |
| 135 | + |
| 136 | + |
| 137 | +### Resource Injection |
| 138 | + |
| 139 | +This ingredient is concerned with providing the ability to inject arbitrary |
| 140 | +data to a pre-compiled binary on a binary-format-friendly manner. After the |
| 141 | +data injection takes place, the program must be able to detect the offset and |
| 142 | +length of the injected data at runtime. Implementing a cross-platform resource |
| 143 | +injector and supporting the capability for runtime introspection requires |
| 144 | +knowledge of various binary formats. |
| 145 | + |
| 146 | +The initial set of requirements for this ingredient are: |
| 147 | + |
| 148 | +| # | Requirement | |
| 149 | +|---|---------------------------------------------------------------------------------------------| |
| 150 | +| 1 | Inject data within the boundaries of the binary format, not outside of it | |
| 151 | +| 2 | Support every binary format adopted by the platforms supported by Node.js | |
| 152 | +| 3 | Support the injection of any arbitrary data, irrelevant of its format and contents | |
| 153 | +| 4 | Provide a complementary cross-platform native API for runtime reflection | |
| 154 | +| 5 | The injector implementation must not require native add-ons | |
| 155 | +| 6 | Allow injection to any supported binary format in any platform (from macOS to Windows, etc) | |
| 156 | + |
| 157 | +The purpose of requirement #5 is to prevent developers making use of the SEA |
| 158 | +technology from needing to have a full native compiler toolchain on their local |
| 159 | +environments to build a native add-on that enables injection. This setup is |
| 160 | +often complicated on i.e. Windows and would likely become a common source of |
| 161 | +issues asking for help. |
| 162 | + |
| 163 | +Requirement #6 is a convenience feature primarily used in Continuous |
| 164 | +Integration. With it, developers do not need to perform data injection using a |
| 165 | +host operating system that matches the target. For example, a developer would |
| 166 | +be able to perform resource injection to a Windows binary on Linux. |
| 167 | + |
| 168 | +While the immediate use case for this tool is to inject Node.js resource files |
| 169 | +into the main executable, many native programs find the need of injecting |
| 170 | +arbitrary data files into the program for a wide number of reasons. Features |
| 171 | +such as [`#embed`](https://thephd.dev/finally-embed-in-c23) have landed in the |
| 172 | +C23 programming language for this reason. |
| 173 | + |
| 174 | +### Virtual File System |
| 175 | + |
| 176 | +The VFS is the read-only format in which the application data is bundled before |
| 177 | +getting injected into the Node.js executable. The presence of a VFS is |
| 178 | +essential for supporting runtime logic that relies on file-system metadata such |
| 179 | +as directory structure and file permissions. This type of VFS format is |
| 180 | +typically simple: a concatenation of the files along with some structure that |
| 181 | +annotates each file with file-system-related metadata. |
| 182 | + |
| 183 | +The initial set of requirements for this ingredient are: |
| 184 | + |
| 185 | +| # | Requirement | |
| 186 | +|---|-------------------------------------------------------------------------| |
| 187 | +| 1 | Support random access reads for performance reasons | |
| 188 | +| 2 | Support the concept of symbolic links | |
| 189 | +| 3 | Support preserving file permissions, at least the executable bit | |
| 190 | +| 4 | Support general purpose data compression for space optimization reasons | |
| 191 | +| 5 | Preserve file-hierarchy information | |
| 192 | +| 6 | Increase locality of related files for performance reasons | |
| 193 | +| 7 | No interference with valid paths in the file system | |
| 194 | + |
| 195 | +A virtual file system has wide applicability beyond being embedded into an |
| 196 | +existing executable. Use cases range from virtual machines and data |
| 197 | +transmission, to even [package |
| 198 | +managers](https://github.com/yarnpkg/berry/tree/b6273b3f393f1485b810ee09a66acd5b2af564dd/packages/yarnpkg-fslib). |
| 199 | + |
| 200 | +### Bootstrapper |
| 201 | + |
| 202 | +This is the ingredient that ties it all together. The bootstrapper makes use of |
| 203 | +the *Resource Injection* runtime introspection capability to detect the |
| 204 | +presence of the embedded *Virtual File System*. It implements the logic for |
| 205 | +jumping execution into the program bundled in the virtual file system and knows |
| 206 | +how to intercept I/O to the virtual file system to provide seamless execution. |
| 207 | + |
| 208 | +The initial set of requirements for this ingredient are: |
| 209 | + |
| 210 | +| # | Requirement | |
| 211 | +|---|-------------------------------------------------------------------| |
| 212 | +| 1 | Support loading Node.js native add-ons | |
| 213 | +| 2 | Proxy Node.js API function calls that involve I/O back to the VFS | |
| 214 | +| 3 | Support running executable programs from within the VFS | |
| 215 | +| 4 | Proxy command-line arguments to the program embedded in the VFS | |
| 216 | +| 5 | Proxy environment variables to the program embedded in the VFS | |
| 217 | + |
| 218 | +In the context of Node.js, the community was historically forced to |
| 219 | +monkey-patch modules such as `fs` for advanced I/O use cases. For example, |
| 220 | +Electron |
| 221 | +[monkey-patches](https://github.com/electron/electron/blob/06a00b74e817a61f20e2734d50d8eb7bc9b099f6/lib/asar/fs-wrapper.ts) |
| 222 | +several Node.js modules to power their ASAR integration. Vercel's PKG |
| 223 | +[monkey-patches](https://github.com/vercel/pkg/blob/f0c4e8cd113e761958ab387f4b0237f4d8797335/prelude/bootstrap.js#L596) |
| 224 | +a wide range of functions in a similar way. This approach is not only |
| 225 | +error-prone, but might not be possible in the future if Node.js prevents |
| 226 | +runtime modification to its internal modules (i.e. for security reasons). |
| 227 | + |
| 228 | +What do we have so far? |
| 229 | +----------------------- |
| 230 | + |
| 231 | +At Postman, [@dsanders11](https://github.com/dsanders11), |
| 232 | +[@raisinten](https://github.com/raisinten) and |
| 233 | +[@robertgzr](https://github.com/robertgzr) made outstanding progress on |
| 234 | +experimenting with the above architecture. Much of this progress builds on the |
| 235 | +interesting previous work and discussions that took place in |
| 236 | +[#42334](https://github.com/nodejs/node/pull/42334) and |
| 237 | +[#43432](https://github.com/nodejs/node/issues/43432). |
| 238 | + |
| 239 | +In the area of *Resource Injection*, we open-sourced a tool called |
| 240 | +[Postject](https://github.com/postmanlabs/postject). Postject enables arbitrary |
| 241 | +injection of data as Mach-O, PE and ELF sections, and ships with a |
| 242 | +cross-platform C/C++ header file providing runtime introspection APIs. It is |
| 243 | +currently implemented in a mixture of Python and C++, and it builds on the |
| 244 | +foundations provided by the [LIEF](https://github.com/lief-project/LIEF) |
| 245 | +project. |
| 246 | + |
| 247 | +In the area of *Virtual File Systems*, we are currently basing our |
| 248 | +proof-of-concept on the [ASAR](https://github.com/electron/asar) archive format |
| 249 | +designed and battle-tested by Electron. The ASAR format is extensible by |
| 250 | +design, allowing us to add arbitrary new metadata and functionality to our |
| 251 | +custom implementation. |
| 252 | + |
| 253 | +In the area of the *Bootstrapper*, we are maintaining a custom Node.js patch |
| 254 | +that makes use of Postject reflection APIs, provides a basic ASAR read-only |
| 255 | +implementation, and takes over the entry point of Node.js to jump execution to |
| 256 | +the embedded app. |
| 257 | + |
| 258 | +However, we are far from done! Our goal is to rethink and continue evolving |
| 259 | +these components with the help of the community, and push for a better approach |
| 260 | +to SEAs that everybody from the Node.js ecosystem and beyond can benefit from. |
| 261 | + |
| 262 | +Future work |
| 263 | +----------- |
| 264 | + |
| 265 | +SEAs open interesting possibilities for innovation. For example, we can explore |
| 266 | +creating SEAs that make use of v8 snapshots to speed up application startup |
| 267 | +time, or ways in which we can trim down Node.js to optimize for |
| 268 | +space-efficiency in the context of SEAs. |
| 269 | + |
| 270 | +Ideas are very welcome! |
| 271 | + |
| 272 | +Taking it from here |
| 273 | +------------------- |
| 274 | + |
| 275 | +There are lots of ways to help! If you made it this far, you probably have some |
| 276 | +questions, observations and feedback. If so, do write them down as [GitHub |
| 277 | +Discussions](https://github.com/nodejs/single-executable/discussions) or |
| 278 | +[GitHub Issues](https://github.com/nodejs/single-executable/issues). Other than |
| 279 | +that: |
| 280 | + |
| 281 | +- We need to collect proper requirements for each of the components we will be |
| 282 | + building |
| 283 | +- We need to continue researching what's out there, and what we can learn from |
| 284 | + it |
| 285 | +- We need to align implementers of SEA tooling to join forces and create a |
| 286 | + top-notch technology |
| 287 | +- We need to sort out philosophical questions about how much of the SEA work |
| 288 | + will be a proper part of the core Node.js project, and how much will be |
| 289 | + community tooling |
| 290 | + |
| 291 | +Last but not least, we have to actually write some code and get this done! |
0 commit comments