Skip to content

Commit 1724671

Browse files
committed
bootstrap: implement --snapshot-blob and --build-snapshot
This patch introduces `--build-snapshot` and `--snapshot-blob` options for creating and using user land snapshots. For the initial iteration, user land CJS modules and ESM are not yet supported in the snapshot, so only one single file can be snapshotted (users can bundle their applications into a single script with their bundler of choice to build a snapshot though). A subset of builtins should already work, and support for more builtins are being added. This PR includes tests checking that the TypeScript compiler and the marked markdown renderer (and the builtins they use) can be snapshotted and deserialized. To generate a snapshot using `snapshot.js` as entry point and write the snapshot blob to `snapshot.blob`: ``` $ echo "globalThis.foo = 'I am from the snapshot'" > snapshot.js $ node --snapshot-blob snapshot.blob --build-snapshot snapshot.js ``` To restore application state from `snapshot.blob`, with `index.js` as the entry point script for the deserialized application: ``` $ echo "console.log(globalThis.foo)" > index.js $ node --snapshot-blob snapshot.blob index.js I am from the snapshot ``` Users can also use the `v8.startupSnapshot` API to specify an entry point at snapshot building time, thus avoiding the need of an additional entry script at deserialization time: ``` $ echo "require('v8').startupSnapshot.setDeserializeMainFunction(() => console.log('I am from the snapshot'))" > snapshot.js $ node --snapshot-blob snapshot.blob --build-snapshot snapshot.js $ node --snapshot-blob snapshot.blob I am from the snapshot ``` Note that this patch only adds functionality to the `node` executable for building run-time user-land snapshots, the generated snapshot is stored into a separate file on disk. Building a single binary with both Node.js and an embedded snapshot has already been possible with the `--node-snapshot-main` option to the `configure` script if the user compiles Node.js from source. It would be a different task to enable the `node` executable to produce a single binary that contains both Node.js and an embedded snapshot without building Node.js from source, which should be layered on top of the SEA (Single Executable Apps) initiative. Known limitations/bugs that are being fixed in the upstream: - V8 hits a DCHECK when deserializing certain mutated globals, e.g. `Error.stackTraceLimit` (it should work fine in the release build, however): https://chromium-review.googlesource.com/c/v8/v8/+/3319481 - Layout of V8's read-only heap can be inconsistent after deserialization, resulting in memory corruption: https://bugs.chromium.org/p/v8/issues/detail?id=12921 PR-URL: nodejs#38905 Refs: nodejs#35711 Reviewed-By: Chengzhong Wu <legendecas@gmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
1 parent cb955e0 commit 1724671

17 files changed

+1408
-52
lines changed

doc/api/cli.md

+76
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,62 @@ If this flag is passed, the behavior can still be set to not abort through
100100
[`process.setUncaughtExceptionCaptureCallback()`][] (and through usage of the
101101
`node:domain` module that uses it).
102102

103+
### `--build-snapshot`
104+
105+
<!-- YAML
106+
added: REPLACEME
107+
-->
108+
109+
> Stability: 1 - Experimental
110+
111+
Generates a snapshot blob when the process exits and writes it to
112+
disk, which can be loaded later with `--snapshot-blob`.
113+
114+
When building the snapshot, if `--snapshot-blob` is not specified,
115+
the generated blob will be written, by default, to `snapshot.blob`
116+
in the current working directory. Otherwise it will be written to
117+
the path specified by `--snapshot-blob`.
118+
119+
```console
120+
$ echo "globalThis.foo = 'I am from the snapshot'" > snapshot.js
121+
122+
# Run snapshot.js to intialize the application and snapshot the
123+
# state of it into snapshot.blob.
124+
$ node --snapshot-blob snapshot.blob --build-snapshot snapshot.js
125+
126+
$ echo "console.log(globalThis.foo)" > index.js
127+
128+
# Load the generated snapshot and start the application from index.js.
129+
$ node --snapshot-blob snapshot.blob index.js
130+
I am from the snapshot
131+
```
132+
133+
The [`v8.startupSnapshot` API][] can be used to specify an entry point at
134+
snapshot building time, thus avoiding the need of an additional entry
135+
script at deserialization time:
136+
137+
```console
138+
$ echo "require('v8').startupSnapshot.setDeserializeMainFunction(() => console.log('I am from the snapshot'))" > snapshot.js
139+
$ node --snapshot-blob snapshot.blob --build-snapshot snapshot.js
140+
$ node --snapshot-blob snapshot.blob
141+
I am from the snapshot
142+
```
143+
144+
For more information, check out the [`v8.startupSnapshot` API][] documentation.
145+
146+
Currently the support for run-time snapshot is experimental in that:
147+
148+
1. User-land modules are not yet supported in the snapshot, so only
149+
one single file can be snapshotted. Users can bundle their applications
150+
into a single script with their bundler of choice before building
151+
a snapshot, however.
152+
2. Only a subset of the built-in modules work in the snapshot, though the
153+
Node.js core test suite checks that a few fairly complex applications
154+
can be snapshotted. Support for more modules are being added. If any
155+
crashes or buggy behaviors occur when building a snapshot, please file
156+
a report in the [Node.js issue tracker][] and link to it in the
157+
[tracking issue for user-land snapshots][].
158+
103159
### `--completion-bash`
104160

105161
<!-- YAML
@@ -1121,6 +1177,22 @@ minimum allocation from the secure heap. The minimum value is `2`.
11211177
The maximum value is the lesser of `--secure-heap` or `2147483647`.
11221178
The value given must be a power of two.
11231179

1180+
### `--snapshot-blob=path`
1181+
1182+
<!-- YAML
1183+
added: REPLACEME
1184+
-->
1185+
1186+
> Stability: 1 - Experimental
1187+
1188+
When used with `--build-snapshot`, `--snapshot-blob` specifies the path
1189+
where the generated snapshot blob will be written to. If not specified,
1190+
the generated blob will be written, by default, to `snapshot.blob`
1191+
in the current working directory.
1192+
1193+
When used without `--build-snapshot`, `--snapshot-blob` specifies the
1194+
path to the blob that will be used to restore the application state.
1195+
11241196
### `--test`
11251197

11261198
<!-- YAML
@@ -1735,6 +1807,7 @@ Node.js options that are allowed are:
17351807
* `--require`, `-r`
17361808
* `--secure-heap-min`
17371809
* `--secure-heap`
1810+
* `--snapshot-blob`
17381811
* `--test-only`
17391812
* `--throw-deprecation`
17401813
* `--title`
@@ -2109,6 +2182,7 @@ done
21092182
[ECMAScript module loader]: esm.md#loaders
21102183
[Fetch API]: https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API
21112184
[Modules loaders]: packages.md#modules-loaders
2185+
[Node.js issue tracker]: https://github.com/nodejs/node/issues
21122186
[OSSL_PROVIDER-legacy]: https://www.openssl.org/docs/man3.0/man7/OSSL_PROVIDER-legacy.html
21132187
[REPL]: repl.md
21142188
[ScriptCoverage]: https://chromedevtools.github.io/devtools-protocol/tot/Profiler#type-ScriptCoverage
@@ -2141,6 +2215,7 @@ done
21412215
[`tls.DEFAULT_MAX_VERSION`]: tls.md#tlsdefault_max_version
21422216
[`tls.DEFAULT_MIN_VERSION`]: tls.md#tlsdefault_min_version
21432217
[`unhandledRejection`]: process.md#event-unhandledrejection
2218+
[`v8.startupSnapshot` API]: v8.md#startup-snapshot-api
21442219
[`worker_threads.threadId`]: worker_threads.md#workerthreadid
21452220
[conditional exports]: packages.md#conditional-exports
21462221
[context-aware]: addons.md#context-aware-addons
@@ -2156,4 +2231,5 @@ done
21562231
[security warning]: #warning-binding-inspector-to-a-public-ipport-combination-is-insecure
21572232
[semi-space]: https://www.memorymanagement.org/glossary/s.html#semi.space
21582233
[timezone IDs]: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
2234+
[tracking issue for user-land snapshots]: https://github.com/nodejs/node/issues/44014
21592235
[ways that `TZ` is handled in other environments]: https://www.gnu.org/software/libc/manual/html_node/TZ-Variable.html

src/env.cc

+21-15
Original file line numberDiff line numberDiff line change
@@ -248,17 +248,6 @@ std::ostream& operator<<(std::ostream& output,
248248
return output;
249249
}
250250

251-
std::ostream& operator<<(std::ostream& output,
252-
const std::vector<PropInfo>& vec) {
253-
output << "{\n";
254-
for (const auto& info : vec) {
255-
output << " { \"" << info.name << "\", " << std::to_string(info.id) << ", "
256-
<< std::to_string(info.index) << " },\n";
257-
}
258-
output << "}";
259-
return output;
260-
}
261-
262251
std::ostream& operator<<(std::ostream& output,
263252
const IsolateDataSerializeInfo& i) {
264253
output << "{\n"
@@ -298,7 +287,7 @@ IsolateDataSerializeInfo IsolateData::Serialize(SnapshotCreator* creator) {
298287
for (size_t i = 0; i < AsyncWrap::PROVIDERS_LENGTH; i++)
299288
info.primitive_values.push_back(creator->AddData(async_wrap_provider(i)));
300289

301-
size_t id = 0;
290+
uint32_t id = 0;
302291
#define V(PropertyName, TypeName) \
303292
do { \
304293
Local<TypeName> field = PropertyName(); \
@@ -352,7 +341,7 @@ void IsolateData::DeserializeProperties(const IsolateDataSerializeInfo* info) {
352341

353342
const std::vector<PropInfo>& values = info->template_values;
354343
i = 0; // index to the array
355-
size_t id = 0;
344+
uint32_t id = 0;
356345
#define V(PropertyName, TypeName) \
357346
do { \
358347
if (values.size() > i && id == values[i].id) { \
@@ -1485,6 +1474,7 @@ std::ostream& operator<<(std::ostream& output,
14851474
AsyncHooks::SerializeInfo AsyncHooks::Serialize(Local<Context> context,
14861475
SnapshotCreator* creator) {
14871476
SerializeInfo info;
1477+
// TODO(joyeecheung): some of these probably don't need to be serialized.
14881478
info.async_ids_stack = async_ids_stack_.Serialize(context, creator);
14891479
info.fields = fields_.Serialize(context, creator);
14901480
info.async_id_fields = async_id_fields_.Serialize(context, creator);
@@ -1679,7 +1669,7 @@ EnvSerializeInfo Environment::Serialize(SnapshotCreator* creator) {
16791669
info.should_abort_on_uncaught_toggle =
16801670
should_abort_on_uncaught_toggle_.Serialize(ctx, creator);
16811671

1682-
size_t id = 0;
1672+
uint32_t id = 0;
16831673
#define V(PropertyName, TypeName) \
16841674
do { \
16851675
Local<TypeName> field = PropertyName(); \
@@ -1696,6 +1686,22 @@ EnvSerializeInfo Environment::Serialize(SnapshotCreator* creator) {
16961686
return info;
16971687
}
16981688

1689+
std::ostream& operator<<(std::ostream& output,
1690+
const std::vector<PropInfo>& vec) {
1691+
output << "{\n";
1692+
for (const auto& info : vec) {
1693+
output << " " << info << ",\n";
1694+
}
1695+
output << "}";
1696+
return output;
1697+
}
1698+
1699+
std::ostream& operator<<(std::ostream& output, const PropInfo& info) {
1700+
output << "{ \"" << info.name << "\", " << std::to_string(info.id) << ", "
1701+
<< std::to_string(info.index) << " }";
1702+
return output;
1703+
}
1704+
16991705
std::ostream& operator<<(std::ostream& output,
17001706
const std::vector<std::string>& vec) {
17011707
output << "{\n";
@@ -1777,7 +1783,7 @@ void Environment::DeserializeProperties(const EnvSerializeInfo* info) {
17771783

17781784
const std::vector<PropInfo>& values = info->persistent_values;
17791785
size_t i = 0; // index to the array
1780-
size_t id = 0;
1786+
uint32_t id = 0;
17811787
#define V(PropertyName, TypeName) \
17821788
do { \
17831789
if (values.size() > i && id == values[i].id) { \

src/env.h

+8-3
Original file line numberDiff line numberDiff line change
@@ -580,7 +580,7 @@ typedef size_t SnapshotIndex;
580580

581581
struct PropInfo {
582582
std::string name; // name for debugging
583-
size_t id; // In the list - in case there are any empty entries
583+
uint32_t id; // In the list - in case there are any empty entries
584584
SnapshotIndex index; // In the snapshot
585585
};
586586

@@ -987,8 +987,9 @@ struct EnvSerializeInfo {
987987
struct SnapshotData {
988988
enum class DataOwnership { kOwned, kNotOwned };
989989

990-
static const size_t kNodeBaseContextIndex = 0;
991-
static const size_t kNodeMainContextIndex = kNodeBaseContextIndex + 1;
990+
static const uint32_t kMagic = 0x143da19;
991+
static const SnapshotIndex kNodeBaseContextIndex = 0;
992+
static const SnapshotIndex kNodeMainContextIndex = kNodeBaseContextIndex + 1;
992993

993994
DataOwnership data_ownership = DataOwnership::kOwned;
994995

@@ -1000,12 +1001,16 @@ struct SnapshotData {
10001001
// TODO(joyeecheung): there should be a vector of env_info once we snapshot
10011002
// the worker environments.
10021003
EnvSerializeInfo env_info;
1004+
10031005
// A vector of built-in ids and v8::ScriptCompiler::CachedData, this can be
10041006
// shared across Node.js instances because they are supposed to share the
10051007
// read only space. We use native_module::CodeCacheInfo because
10061008
// v8::ScriptCompiler::CachedData is not copyable.
10071009
std::vector<native_module::CodeCacheInfo> code_cache;
10081010

1011+
void ToBlob(FILE* out) const;
1012+
static void FromBlob(SnapshotData* out, FILE* in);
1013+
10091014
~SnapshotData();
10101015

10111016
SnapshotData(const SnapshotData&) = delete;

src/node.cc

+111-22
Original file line numberDiff line numberDiff line change
@@ -1161,38 +1161,127 @@ void TearDownOncePerProcess() {
11611161
per_process::v8_platform.Dispose();
11621162
}
11631163

1164+
int GenerateAndWriteSnapshotData(const SnapshotData** snapshot_data_ptr,
1165+
InitializationResult* result) {
1166+
// nullptr indicates there's no snapshot data.
1167+
DCHECK_NULL(*snapshot_data_ptr);
1168+
1169+
// node:embedded_snapshot_main indicates that we are using the
1170+
// embedded snapshot and we are not supposed to clean it up.
1171+
if (result->args[1] == "node:embedded_snapshot_main") {
1172+
*snapshot_data_ptr = SnapshotBuilder::GetEmbeddedSnapshotData();
1173+
if (*snapshot_data_ptr == nullptr) {
1174+
// The Node.js binary is built without embedded snapshot
1175+
fprintf(stderr,
1176+
"node:embedded_snapshot_main was specified as snapshot "
1177+
"entry point but Node.js was built without embedded "
1178+
"snapshot.\n");
1179+
result->exit_code = 1;
1180+
return result->exit_code;
1181+
}
1182+
} else {
1183+
// Otherwise, load and run the specified main script.
1184+
std::unique_ptr<SnapshotData> generated_data =
1185+
std::make_unique<SnapshotData>();
1186+
result->exit_code = node::SnapshotBuilder::Generate(
1187+
generated_data.get(), result->args, result->exec_args);
1188+
if (result->exit_code == 0) {
1189+
*snapshot_data_ptr = generated_data.release();
1190+
} else {
1191+
return result->exit_code;
1192+
}
1193+
}
1194+
1195+
// Get the path to write the snapshot blob to.
1196+
std::string snapshot_blob_path;
1197+
if (!per_process::cli_options->snapshot_blob.empty()) {
1198+
snapshot_blob_path = per_process::cli_options->snapshot_blob;
1199+
} else {
1200+
// Defaults to snapshot.blob in the current working directory.
1201+
snapshot_blob_path = std::string("snapshot.blob");
1202+
}
1203+
1204+
FILE* fp = fopen(snapshot_blob_path.c_str(), "wb");
1205+
if (fp != nullptr) {
1206+
(*snapshot_data_ptr)->ToBlob(fp);
1207+
fclose(fp);
1208+
} else {
1209+
fprintf(stderr,
1210+
"Cannot open %s for writing a snapshot.\n",
1211+
snapshot_blob_path.c_str());
1212+
result->exit_code = 1;
1213+
}
1214+
return result->exit_code;
1215+
}
1216+
1217+
int LoadSnapshotDataAndRun(const SnapshotData** snapshot_data_ptr,
1218+
InitializationResult* result) {
1219+
// nullptr indicates there's no snapshot data.
1220+
DCHECK_NULL(*snapshot_data_ptr);
1221+
// --snapshot-blob indicates that we are reading a customized snapshot.
1222+
if (!per_process::cli_options->snapshot_blob.empty()) {
1223+
std::string filename = per_process::cli_options->snapshot_blob;
1224+
FILE* fp = fopen(filename.c_str(), "rb");
1225+
if (fp == nullptr) {
1226+
fprintf(stderr, "Cannot open %s", filename.c_str());
1227+
result->exit_code = 1;
1228+
return result->exit_code;
1229+
}
1230+
std::unique_ptr<SnapshotData> read_data = std::make_unique<SnapshotData>();
1231+
SnapshotData::FromBlob(read_data.get(), fp);
1232+
*snapshot_data_ptr = read_data.release();
1233+
fclose(fp);
1234+
} else if (per_process::cli_options->node_snapshot) {
1235+
// If --snapshot-blob is not specified, we are reading the embedded
1236+
// snapshot, but we will skip it if --no-node-snapshot is specified.
1237+
*snapshot_data_ptr = SnapshotBuilder::GetEmbeddedSnapshotData();
1238+
}
1239+
1240+
if ((*snapshot_data_ptr) != nullptr) {
1241+
NativeModuleLoader::RefreshCodeCache((*snapshot_data_ptr)->code_cache);
1242+
}
1243+
NodeMainInstance main_instance(*snapshot_data_ptr,
1244+
uv_default_loop(),
1245+
per_process::v8_platform.Platform(),
1246+
result->args,
1247+
result->exec_args);
1248+
result->exit_code = main_instance.Run();
1249+
return result->exit_code;
1250+
}
1251+
11641252
int Start(int argc, char** argv) {
11651253
InitializationResult result = InitializeOncePerProcess(argc, argv);
11661254
if (result.early_return) {
11671255
return result.exit_code;
11681256
}
11691257

1170-
if (per_process::cli_options->build_snapshot) {
1171-
fprintf(stderr,
1172-
"--build-snapshot is not yet supported in the node binary\n");
1173-
return 1;
1174-
}
1258+
DCHECK_EQ(result.exit_code, 0);
1259+
const SnapshotData* snapshot_data = nullptr;
11751260

1176-
{
1177-
bool use_node_snapshot = per_process::cli_options->node_snapshot;
1178-
const SnapshotData* snapshot_data =
1179-
use_node_snapshot ? SnapshotBuilder::GetEmbeddedSnapshotData()
1180-
: nullptr;
1181-
uv_loop_configure(uv_default_loop(), UV_METRICS_IDLE_TIME);
1182-
1183-
if (snapshot_data != nullptr) {
1184-
NativeModuleLoader::RefreshCodeCache(snapshot_data->code_cache);
1261+
auto cleanup_process = OnScopeLeave([&]() {
1262+
TearDownOncePerProcess();
1263+
1264+
if (snapshot_data != nullptr &&
1265+
snapshot_data->data_ownership == SnapshotData::DataOwnership::kOwned) {
1266+
delete snapshot_data;
1267+
}
1268+
});
1269+
1270+
uv_loop_configure(uv_default_loop(), UV_METRICS_IDLE_TIME);
1271+
1272+
// --build-snapshot indicates that we are in snapshot building mode.
1273+
if (per_process::cli_options->build_snapshot) {
1274+
if (result.args.size() < 2) {
1275+
fprintf(stderr,
1276+
"--build-snapshot must be used with an entry point script.\n"
1277+
"Usage: node --build-snapshot /path/to/entry.js\n");
1278+
return 9;
11851279
}
1186-
NodeMainInstance main_instance(snapshot_data,
1187-
uv_default_loop(),
1188-
per_process::v8_platform.Platform(),
1189-
result.args,
1190-
result.exec_args);
1191-
result.exit_code = main_instance.Run();
1280+
return GenerateAndWriteSnapshotData(&snapshot_data, &result);
11921281
}
11931282

1194-
TearDownOncePerProcess();
1195-
return result.exit_code;
1283+
// Without --build-snapshot, we are in snapshot loading mode.
1284+
return LoadSnapshotDataAndRun(&snapshot_data, &result);
11961285
}
11971286

11981287
int Stop(Environment* env) {

0 commit comments

Comments
 (0)