Skip to content

Commit 4d56e46

Browse files
committed
Add comments & cosmetic changes
1 parent cd8d7b6 commit 4d56e46

File tree

2 files changed

+125
-109
lines changed

2 files changed

+125
-109
lines changed

src/staticdata.c

Lines changed: 76 additions & 109 deletions
Original file line numberDiff line numberDiff line change
@@ -3,21 +3,12 @@
33
/*
44
saving and restoring system images
55
6-
This performs serialization and deserialization of in-memory data. The dump.c file is similar, but has less complete coverage:
7-
dump.c has no knowledge of native code (and simply discards it), whereas this supports native code caching in .o files.
8-
Duplication is avoided by elevating the .o-serialized versions of global variables and native-compiled functions to become
9-
the authoritative source for such entities in the system image, with references to these objects appropriately inserted into
10-
the (de)serialized version of Julia's internal data.
11-
12-
Another difference is that while dump.c defines a serialization format that it writes item-by-item, this serializer creates and
13-
saves a compact binary blob. This makes deserialization simple and fast: we only need to deal with pointer relocation,
14-
registering with the garbage collector, and making note of special internal types. During serialization, we also need to
15-
pay special attention to things like builtin functions, C-implemented types (those in jltypes.c), the metadata for documentation,
16-
optimal layouts, integration with native system image generation, and preparing other preprocessing directives.
17-
18-
dump.c has capabilities missing from this serializer, most notably the ability to manage (and merge) method tables.
19-
This is not needed for system images as they are self-contained. However, it would be needed to support incremental
20-
compilation of packages.
6+
This performs serialization and deserialization of system and package images. It creates and saves a compact binary
7+
blob, making deserialization "simple" and fast: we "only" need to deal with uniquing, pointer relocation,
8+
method root insertion, registering with the garbage collector, making note of special internal types, and
9+
backedges/invalidation. Special objects include things like builtin functions, C-implemented types (those in jltypes.c),
10+
the metadata for documentation, optimal layouts, integration with native system image generation, and preparing other
11+
preprocessing directives.
2112
2213
During serialization, the flow has several steps:
2314
@@ -47,20 +38,24 @@
4738
details of this encoding can be found in the pair of functions `get_reloc_for_item` and `get_item_for_reloc`.
4839
4940
`uniquing_list` also holds the serialized location of external DataTypes, MethodInstances, and singletons
50-
in the serialized blob (i.e., new-at-the-time-of-serialization specializations). The target item must
51-
be checked against the running system to see whether such an object already exists (i.e., whether some other
52-
previously-loaded package or workload has created such types/MethodInstances previously). If so,
53-
then the pointer at `location` must be updated to the one in the running system.
54-
`uniquing_target` is a hash table for which `uniquing_target[targetpos] -> chosen_target`.
41+
in the serialized blob (i.e., new-at-the-time-of-serialization specializations).
5542
5643
Most of step 2 is handled by `jl_write_values`, followed by special handling of the dedicated parallel streams.
5744
5845
- step 3 combines the different sections (fields of `jl_serializer_state`) into one
5946
6047
- step 4 writes the values of the hard-coded tagged items and `ccallable_list`
6148
62-
The tables written to the serializer stream make deserialization fairly straightforward. Much of the "real work" is
63-
done by `get_item_for_reloc`.
49+
Much of the "real work" during deserialization is done by `get_item_for_reloc`. But a few items require specific
50+
attention:
51+
- uniquing: during deserialization, the target item (an "external" type or MethodInstance) must be checked against
52+
the running system to see whether such an object already exists (i.e., whether some other previously-loaded package
53+
or workload has created such types/MethodInstances previously) or whether it needs to be created de-novo.
54+
In either case, all references at `location` must be updated to the one in the running system.
55+
`uniquing_target` is a hash table for which `uniquing_target[targetpos] -> chosen_target`.
56+
- method root insertion: when new specializations generate new roots, these roots must be inserted into
57+
method root tables
58+
- backedges & invalidation: external edges have to be checked against the running system and any invalidations executed.
6459
6560
Encoding of a pointer:
6661
- in the location of the pointer, we initially write zero padding
@@ -583,6 +578,7 @@ static uintptr_t jl_fptr_id(void *fptr)
583578
return *(uintptr_t*)pbp;
584579
}
585580

581+
// `jl_queue_for_serialization` adds items to `serialization_order`
586582
#define jl_queue_for_serialization(s, v) jl_queue_for_serialization_((s), (jl_value_t*)(v), 1, 0)
587583
static void jl_queue_for_serialization_(jl_serializer_state *s, jl_value_t *v, int recursive, int immediate);
588584

@@ -616,21 +612,22 @@ static void jl_queue_module_for_serialization(jl_serializer_state *s, jl_module_
616612
}
617613
}
618614

615+
// Anything that requires uniquing or fixing during deserialization needs to be "toplevel"
616+
// in serialization (i.e., have its own entry in `serialization_order`). Consequently,
617+
// objects that act as containers for other potentially-"problematic" objects must add such "children"
618+
// to the queue.
619+
// Most objects use preorder traversal. But things that need uniquing require postorder:
620+
// you want to handle uniquing of `Dict{String,Float64}` before you tackle `Vector{Dict{String,Float64}}`.
621+
// Uniquing is done in `serialization_order`, so the very first mention of such an object must
622+
// be the "source" rather than merely a cross-reference.
619623
static void jl_insert_into_serialization_queue(jl_serializer_state *s, jl_value_t *v, int recursive)
620624
{
621-
assert(!jl_is_symbol(v));
622-
623-
// some values have special representations
624625
jl_datatype_t *t = (jl_datatype_t*)jl_typeof(v);
625626
jl_queue_for_serialization(s, t);
626627

627628
if (!recursive)
628629
goto done_fields;
629630

630-
// TODO: make some of this code conditional on s->incremental flag?
631-
632-
// Because of recaching, we visit types in a depth-first postorder, so that the dependencies
633-
// are recached before the objects that wrap them.
634631
if (jl_is_datatype(v)) {
635632
jl_datatype_t *dt = (jl_datatype_t*)v;
636633
jl_svec_t *tt = dt->parameters;
@@ -646,7 +643,6 @@ static void jl_insert_into_serialization_queue(jl_serializer_state *s, jl_value_
646643
}
647644
}
648645
else if (jl_is_typename(v)) {
649-
// XXX: typename might require really complicating uniquing to handle kwfunc
650646
jl_typename_t *tn = (jl_typename_t*)v;
651647
if (s->incremental) {
652648
assert(!jl_object_in_image((jl_value_t*)tn->module));
@@ -733,8 +729,8 @@ static void jl_queue_for_serialization_(jl_serializer_state *s, jl_value_t *v, i
733729
return;
734730
*bp = (void*)(uintptr_t)-1;
735731

736-
// visit some child field before the use of this value
737-
// TODO: make some of this code conditional on s->incremental flag?
732+
// Items that require postorder traversal must visit their children prior to insertion into
733+
// the worklist/serialization_order
738734
if (!immediate) {
739735
if (jl_is_uniontype(v))
740736
immediate = 1;
@@ -1033,6 +1029,8 @@ static void record_gvars(jl_serializer_state *s, arraylist_t *globals) JL_NOTSAF
10331029

10341030
jl_value_t *jl_find_ptr = NULL;
10351031
// The main function for serializing all the items queued in `serialization_order`
1032+
// (They are also stored in `serialization_queue` which is order-preserving, unlike the hash table used
1033+
// for `serialization_order`).
10361034
static void jl_write_values(jl_serializer_state *s) JL_GC_DISABLED
10371035
{
10381036
size_t l = serialization_queue.len;
@@ -1041,15 +1039,6 @@ static void jl_write_values(jl_serializer_state *s) JL_GC_DISABLED
10411039
arraylist_grow(&layout_table, l * 2);
10421040
memset(layout_table.items, 0, l * 2 * sizeof(void*));
10431041

1044-
// if (serializer_worklist)
1045-
// for (i = 0; i < objects_list.len; i+= 2) {
1046-
// size_t id = (uintptr_t)objects_list.items[i+1];
1047-
// jl_value_t *v = objects_list.items[i];
1048-
// char *linkcode = externally_linked(v) ? "*" : "";
1049-
// jl_printf(JL_STDOUT, "%ld%s: ", id, linkcode);
1050-
// jl_(v);
1051-
// }
1052-
10531042
// Serialize all entries
10541043
for (size_t item = 0; item < l; item++) {
10551044
jl_value_t *v = (jl_value_t*)serialization_queue.items[item]; // the object
@@ -1584,17 +1573,14 @@ static inline uintptr_t get_item_for_reloc(jl_serializer_state *s, uintptr_t bas
15841573
assert(0 <= *link_index && *link_index < jl_array_len(link_ids));
15851574
uint64_t build_id = link_id_data[*link_index];
15861575
*link_index += 1;
1587-
// jl_printf(JL_STDOUT, "Relocating external link %d to buildid %ld with offset 0x%lx\n", link_index, build_id, offset);
15881576
size_t i = 0, nids = jl_array_len(jl_build_ids);
15891577
while (i < nids) {
15901578
if (build_id == build_id_data[i])
15911579
break;
15921580
i++;
15931581
}
1594-
// jl_printf(JL_STDOUT, "i = %ld\n", i);
15951582
assert(i < nids);
15961583
assert(2*i < jl_linkage_blobs.len);
1597-
// jl_printf(JL_STDOUT, "Restoring link %ld with offset 0x%lx and base position %p\n", link_index, offset, jl_linkage_blobs.items[2*i]);
15981584
return (uintptr_t)jl_linkage_blobs.items[2*i] + offset*sizeof(void*);
15991585
}
16001586
abort();
@@ -1725,6 +1711,9 @@ void gc_sweep_sysimg(void)
17251711
}
17261712
}
17271713

1714+
// jl_write_value and jl_read_value are used for storing Julia objects that are adjuncts to
1715+
// the image proper. For example, new methods added to external callables require
1716+
// insertion into the appropriate method table.
17281717
#define jl_write_value(s, v) _jl_write_value((s), (jl_value_t*)(v))
17291718
static void _jl_write_value(jl_serializer_state *s, jl_value_t *v)
17301719
{
@@ -1737,7 +1726,25 @@ static void _jl_write_value(jl_serializer_state *s, jl_value_t *v)
17371726
write_reloc_t(s->s, reloc);
17381727
}
17391728

1729+
static jl_value_t *jl_read_value(jl_serializer_state *s)
1730+
{
1731+
uintptr_t base = (uintptr_t)&s->s->buf[0];
1732+
size_t size = s->s->size;
1733+
uintptr_t offset = *(reloc_t*)(base + (uintptr_t)s->s->bpos);
1734+
s->s->bpos += sizeof(reloc_t);
1735+
if (offset == 0)
1736+
return NULL;
1737+
return (jl_value_t*)get_item_for_reloc(s, base, size, offset, NULL, NULL);
1738+
}
17401739

1740+
// The next two, `jl_read_offset` and `jl_delayed_reloc`, are essentially a split version
1741+
// of `jl_read_value` that allows usage of the relocation data rather than passing NULL
1742+
// to `get_item_for_reloc`.
1743+
// This works around what would otherwise be an order-dependency conundrum: objects
1744+
// that may require relocation data have to be inserted into `serialization_order`,
1745+
// and that may include some of the adjunct data that gets serialized via
1746+
// `jl_write_value`. But we can't interpret them properly until we read the relocation
1747+
// data, and that happens after we pull items out of the serialization stream.
17411748
static uintptr_t jl_read_offset(jl_serializer_state *s)
17421749
{
17431750
uintptr_t base = (uintptr_t)&s->s->buf[0];
@@ -1758,16 +1765,6 @@ static jl_value_t *jl_delayed_reloc(jl_serializer_state *s, uintptr_t offset)
17581765
return ret;
17591766
}
17601767

1761-
static jl_value_t *jl_read_value(jl_serializer_state *s)
1762-
{
1763-
uintptr_t base = (uintptr_t)&s->s->buf[0];
1764-
size_t size = s->s->size;
1765-
uintptr_t offset = *(reloc_t*)(base + (uintptr_t)s->s->bpos);
1766-
s->s->bpos += sizeof(reloc_t);
1767-
if (offset == 0)
1768-
return NULL;
1769-
return (jl_value_t*)get_item_for_reloc(s, base, size, offset, NULL, NULL);
1770-
}
17711768

17721769
static void jl_update_all_fptrs(jl_serializer_state *s)
17731770
{
@@ -1863,55 +1860,6 @@ static void jl_update_all_gvars(jl_serializer_state *s)
18631860
assert(!s->link_ids_gvars || link_index == jl_array_len(s->link_ids_gvars));
18641861
}
18651862

1866-
// New roots for external methods
1867-
static void jl_collect_methods(htable_t *mset, jl_array_t *new_specializations)
1868-
{
1869-
size_t i, l = new_specializations ? jl_array_len(new_specializations) : 0;
1870-
jl_value_t *v;
1871-
jl_method_t *m;
1872-
for (i = 0; i < l; i++) {
1873-
v = jl_array_ptr_ref(new_specializations, i);
1874-
assert(jl_is_code_instance(v));
1875-
m = ((jl_code_instance_t*)v)->def->def.method;
1876-
assert(jl_is_method(m));
1877-
ptrhash_put(mset, (void*)m, (void*)m);
1878-
}
1879-
}
1880-
1881-
static void jl_collect_new_roots(jl_array_t *roots, htable_t *mset, uint64_t key)
1882-
{
1883-
size_t i, sz = mset->size;
1884-
int nwithkey;
1885-
jl_method_t *m;
1886-
void **table = mset->table;
1887-
jl_array_t *newroots = NULL;
1888-
JL_GC_PUSH1(&newroots);
1889-
for (i = 0; i < sz; i += 2) {
1890-
if (table[i+1] != HT_NOTFOUND) {
1891-
m = (jl_method_t*)table[i];
1892-
assert(jl_is_method(m));
1893-
nwithkey = nroots_with_key(m, key);
1894-
if (nwithkey) {
1895-
jl_array_ptr_1d_push(roots, (jl_value_t*)m);
1896-
newroots = jl_alloc_vec_any(nwithkey);
1897-
jl_array_ptr_1d_push(roots, (jl_value_t*)newroots);
1898-
rle_iter_state rootiter = rle_iter_init(0);
1899-
uint64_t *rletable = NULL;
1900-
size_t nblocks2 = 0, nroots = jl_array_len(m->roots), k = 0;
1901-
if (m->root_blocks) {
1902-
rletable = (uint64_t*)jl_array_data(m->root_blocks);
1903-
nblocks2 = jl_array_len(m->root_blocks);
1904-
}
1905-
while (rle_iter_increment(&rootiter, nroots, rletable, nblocks2))
1906-
if (rootiter.key == key)
1907-
jl_array_ptr_set(newroots, k++, jl_array_ptr_ref(m->roots, rootiter.i));
1908-
assert(k == nwithkey);
1909-
}
1910-
}
1911-
}
1912-
JL_GC_POP();
1913-
}
1914-
19151863

19161864
static void jl_compile_extern(jl_method_t *m, void *sysimg_handle) JL_GC_DISABLED
19171865
{
@@ -2092,6 +2040,7 @@ static void jl_strip_all_codeinfos(void)
20922040
// triggering non-relocatability of compressed CodeInfos.
20932041
// Set the number of such roots in each method when the sysimg is
20942042
// serialized.
2043+
// TODO: move this to `jl_write_values`
20952044
static int set_nroots_sysimg__(jl_typemap_entry_t *def, void *_env)
20962045
{
20972046
jl_method_t *m = def->func.method;
@@ -2473,7 +2422,7 @@ static void jl_save_system_image_to_stream(ios_t *f,
24732422
jl_gc_enable(en);
24742423
}
24752424

2476-
static int64_t jl_incremental_header_stuff(ios_t *f, jl_array_t *worklist, jl_array_t **mod_array, jl_array_t **udeps)
2425+
static int64_t jl_write_header_for_incremental(ios_t *f, jl_array_t *worklist, jl_array_t **mod_array, jl_array_t **udeps)
24772426
{
24782427
*mod_array = jl_get_loaded_modules(); // __toplevel__ modules loaded in this session (from Base.loaded_modules_array)
24792428
assert(jl_precompile_toplevel_module == NULL);
@@ -2508,7 +2457,7 @@ JL_DLLEXPORT ios_t *jl_create_system_image(void *_native_data, jl_array_t *workl
25082457
JL_GC_PUSH7(&mod_array, &udeps, &extext_methods, &new_specializations, &method_roots_list, &ext_targets, &edges);
25092458
int64_t srctextpos = 0;
25102459
if (worklist) {
2511-
srctextpos = jl_incremental_header_stuff(f, worklist, &mod_array, &udeps);
2460+
srctextpos = jl_write_header_for_incremental(f, worklist, &mod_array, &udeps);
25122461
jl_gc_enable_finalizers(ct, 0); // make sure we don't run any Julia code concurrently after this point
25132462
jl_prepare_serialization_data(mod_array, newly_inferred, jl_worklist_key(worklist), &extext_methods, &new_specializations, &method_roots_list, &ext_targets, &edges);
25142463
write_padding(f, LLT_ALIGN(ios_pos(f), JL_CACHE_BYTE_ALIGNMENT) - ios_pos(f));
@@ -2758,6 +2707,16 @@ static void jl_restore_system_image_from_stream_(ios_t *f, jl_array_t *depmods,
27582707
jl_read_relocations(&s, s.link_ids_relocs, 0);
27592708
jl_read_arraylist(s.relocs, &s.uniquing_list);
27602709
jl_read_arraylist(s.relocs, &s.fixup_list);
2710+
// Perform the uniquing of objects that we don't "own" and consequently can't promise
2711+
// weren't created by some other package before this one got loaded:
2712+
// - iterate through all objects that need to be uniqued. The first encounter has to be the
2713+
// "reconstructable blob". We either look up the object (if something has created it previously)
2714+
// or construct it for the first time, crucially outside the pointer range of any pkgimage.
2715+
// This ensures it stays unique-worthy.
2716+
// - after we've stored the address of the "real" object (which for convenience we do among the data
2717+
// written to allow lookup/reconstruction), then we have to update references to that "reconstructable blob":
2718+
// instead of performing the relocation within the package image, we instead (re)direct all references
2719+
// to the external object.
27612720
for (size_t i = 0; i < s.uniquing_list.len; i++) {
27622721
uintptr_t item = (uintptr_t)s.uniquing_list.items[i];
27632722
int tag = (item & 1) == 1;
@@ -2823,6 +2782,8 @@ static void jl_restore_system_image_from_stream_(ios_t *f, jl_array_t *depmods,
28232782
assert(!(image_base < (char*)newobj && (char*)newobj <= image_base + sizeof_sysimg + sizeof(uintptr_t)));
28242783
assert(jl_typeis(obj, otyp));
28252784
}
2785+
// Write junk in place of the source data we used during uniquing, to catch inadvertent references to
2786+
// it from elsewhere.
28262787
for (size_t i = 0; i < s.uniquing_list.len; i++) {
28272788
void *item = s.uniquing_list.items[i];
28282789
jl_taggedvalue_t *o = jl_astaggedvalue(item);
@@ -2833,6 +2794,7 @@ static void jl_restore_system_image_from_stream_(ios_t *f, jl_array_t *depmods,
28332794
else
28342795
memset(o, 0xba, sizeof(jl_value_t*));
28352796
}
2797+
// Perform fixups: things like updating world ages, inserting methods & specializations, etc.
28362798
size_t world = jl_atomic_load_acquire(&jl_world_counter);
28372799
for (size_t i = 0; i < s.fixup_list.len; i++) {
28382800
uintptr_t item = (uintptr_t)s.fixup_list.items[i];
@@ -2970,7 +2932,7 @@ static void jl_restore_system_image_from_stream_(ios_t *f, jl_array_t *depmods,
29702932
jl_gc_enable(en);
29712933
}
29722934

2973-
static jl_value_t *jl_restore_package_image_from_stream(ios_t *f, jl_array_t *depmods)
2935+
static jl_value_t *jl_validate_cache_file(ios_t *f, jl_array_t *depmods)
29742936
{
29752937
if (ios_eof(f) || !jl_read_verify_header(f)) {
29762938
return jl_get_exceptionf(jl_errorexception_type,
@@ -2987,10 +2949,15 @@ static jl_value_t *jl_restore_package_image_from_stream(ios_t *f, jl_array_t *de
29872949
}
29882950

29892951
// verify that the system state is valid
2990-
jl_value_t *verify_fail = read_verify_mod_list(f, depmods);
2991-
if (verify_fail) {
2952+
return read_verify_mod_list(f, depmods);
2953+
}
2954+
2955+
// TODO?: refactor to make it easier to create the "package inspector"
2956+
static jl_value_t *jl_restore_package_image_from_stream(ios_t *f, jl_array_t *depmods)
2957+
{
2958+
jl_value_t *verify_fail = jl_validate_cache_file(f, depmods);
2959+
if (verify_fail)
29922960
return verify_fail;
2993-
}
29942961

29952962
jl_value_t *restored = NULL;
29962963
jl_array_t *init_order = NULL, *extext_methods = NULL, *new_specializations = NULL, *method_roots_list = NULL, *ext_targets = NULL, *edges = NULL;

0 commit comments

Comments
 (0)