Skip to content

Conversation

@xal-0
Copy link
Member

@xal-0 xal-0 commented Aug 19, 2025

We already save some memory here by deleting the jl_native_code_desc_t after
we're done serializing the combined module, but some data in the module's
LLVM::Context lives on until the end of the scope in jl_dump_native_impl.
Once we're done with the module, move the lock and ThreadSafeContext so the
reference count drops to zero.

A crude measurement shows that when compiling the Base sysimage, about 3 GiB is
in use. Deleting the jl_native_code_desc_t (as before) saves about 600 MiB,
and cleaning up the context saves an additional ~500 MiB.

@xal-0 xal-0 added performance Must go faster compiler:llvm For issues that relate to LLVM labels Aug 19, 2025
@gbaraldi gbaraldi added backport 1.11 Change should be backported to release-1.11 backport 1.12 Change should be backported to release-1.12 labels Aug 19, 2025
Comment on lines +2274 to +2281
data_outputs = compile(*dataM, "text", threads, [data, &lock, &TSCtx](Module &) {
// Delete data when add_output thinks it's done with it
// Saves memory for use when multithreading
auto lock2 = std::move(lock);
delete data;
// Drop last reference to shared LLVM::Context
auto TSCtx2 = std::move(TSCtx);
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might be able to use modern C++ features for this (though it might require replacing std::function with llvm::unique_function, which hasn't yet been standardized, so that it compiles to do the right thing)

Suggested change
data_outputs = compile(*dataM, "text", threads, [data, &lock, &TSCtx](Module &) {
// Delete data when add_output thinks it's done with it
// Saves memory for use when multithreading
auto lock2 = std::move(lock);
delete data;
// Drop last reference to shared LLVM::Context
auto TSCtx2 = std::move(TSCtx);
});
data_outputs = compile(*dataM, "text", threads, [data, lock = std::move(lock), TSCtx = std::move(TSCtx)](Module &) {
// Delete data when add_output thinks it's done with it
// Saves memory for use when multithreading
delete data;
});

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd need to explicitly invoke the destructor for the lambda: godbolt link

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is the design mistake llvm::unique_function may fix (over std::function)

This was referenced Aug 19, 2025
@xal-0 xal-0 merged commit ceeb661 into JuliaLang:master Aug 22, 2025
12 checks passed
KristofferC pushed a commit that referenced this pull request Sep 1, 2025
…59329)

We already save some memory here by deleting the `jl_native_code_desc_t`
after
we're done serializing the combined module, but some data in the
module's
`LLVM::Context` lives on until the end of the scope in
`jl_dump_native_impl`.
Once we're done with the module, move the lock and `ThreadSafeContext`
so the
reference count drops to zero.

A crude measurement shows that when compiling the Base sysimage, about 3
GiB is
in use. Deleting the `jl_native_code_desc_t` (as before) saves about 600
MiB,
and cleaning up the context saves an additional ~500 MiB.

(cherry picked from commit ceeb661)
KristofferC pushed a commit that referenced this pull request Sep 2, 2025
…59329)

We already save some memory here by deleting the `jl_native_code_desc_t`
after
we're done serializing the combined module, but some data in the
module's
`LLVM::Context` lives on until the end of the scope in
`jl_dump_native_impl`.
Once we're done with the module, move the lock and `ThreadSafeContext`
so the
reference count drops to zero.

A crude measurement shows that when compiling the Base sysimage, about 3
GiB is
in use. Deleting the `jl_native_code_desc_t` (as before) saves about 600
MiB,
and cleaning up the context saves an additional ~500 MiB.

(cherry picked from commit ceeb661)
DilumAluthge added a commit that referenced this pull request Sep 5, 2025
Backported PRs:
- [x] #54840 <!-- Add boundscheck in speccache_eq to avoid OOB access
due to data race -->
- [x] #42080 <!-- recommend explicit `using Foo: Foo, ...` in package
code (was: "using considered harmful") -->
- [x] #58127 <!-- [DOC] Update installation docs: /downloads/ =>
/install/ -->
- [x] #58202 <!-- [release-1.11] malloc: use jl_get_current_task to fix
null check -->
- [x] #58584 <!-- Make `Ptr` values static-show w/ type-information -->
- [x] #58637 <!-- Make late gc lower handle insertelement of alloca use.
-->
- [x] #58837 <!-- fix null comparisons for non-standard address spaces
-->
- [x] #57826 <!-- Add a `similar` method for `Type{<:CodeUnits}` -->
- [x] #58293 <!-- fix trailing indices stackoverflow in reinterpreted
array -->
- [x] #58887 <!-- Pkg: Allow configuring can_fancyprint(io::IO) using
IOContext -->
- [x] #58937 <!-- Fix nthreadpools size in JLOptions -->
- [x] #58978 <!-- Fix precompilepkgs warn loaded setting -->
- [x] #58998 <!-- Bugfix: Use Base.aligned_sizeof instead of sizeof in
Mmap.mmap -->
- [x] #59120 <!-- Fix memory order typo in "src/julia_atomics.h" -->
- [x] #59170 <!-- Clarify and enhance confusing precompile test -->

Need manual backport:
- [ ] #56329 <!-- loading: clean up more concurrency issues -->
- [ ] #56956 <!-- Add "mea culpa" to foreign module assignment error.
-->
- [ ] #57035 <!-- linux: workaround to avoid deadlock inside
dl_iterate_phdr in glibc -->
- [ ] #57089 <!-- Block thread from receiving profile signal with
stackwalk lock -->
- [ ] #57249 <!-- restore non-freebsd-unix fix for profiling -->
- [ ] #58011 <!-- Remove try-finally scope from `@time_imports`
`@trace_compile` `@trace_dispatch` -->
- [ ] #58062 <!-- remove unnecessary edge from `exp_impl` to `pow` -->
- [ ] #58157 <!-- add showing a string to REPL precompile workload -->
- [ ] #58209 <!-- Specialize `one` for the `SizedArray` test helper -->
- [ ] #58108 <!-- Base.get_extension & Dates.format made public -->
- [ ] #58356 <!-- codegen: remove readonly from abstract type calling
convention -->
- [ ] #58415 <!-- [REPL] more reliable extension loading -->
- [ ] #58510 <!-- Don't filter `Core` methods from newly-inferred list
-->
- [ ] #58110 <!-- relax dispatch for the `IteratorSize` method for
`Generator` -->
- [ ] #58965 <!-- Fix `hygienic-scope`s in inner macro expansions -->
- [ ] #58971 <!-- Fix alignment of failed precompile jobs on CI -->
- [ ] #59066 <!-- build: Also pass -fno-strict-aliasing for C++ -->

Contains multiple commits, manual intervention needed:
- [ ] #55877 <!-- fix FileWatching designs and add workaround for a stat
bug on Apple -->
- [ ] #56755 <!-- docs: fix scope type of a `struct` to hard -->
- [ ] #57809 <!-- Fix fptrunc Float64 -> Float16 rounding through
Float32 -->
- [ ] #57398 <!-- Make remaining float intrinsics require float
arguments -->
- [ ] #56351 <!-- Fix `--project=@script` when outside script directory
-->
- [ ] #57129 <!-- clarify that time_ns is monotonic -->
- [ ] #58134 <!-- Note annotated string API is experimental in Julia
1.11 in HISTORY.md -->
- [ ] #58401 <!-- check that hashing of types does not foreigncall
(`jl_type_hash` is concrete evaluated) -->
- [ ] #58435 <!-- Fix layout flags for types that have oddly sized
primitive type fields -->
- [ ] #58483 <!-- Fix tbaa usage when storing into heap allocated
immutable structs -->
- [ ] #58512 <!-- Make more types jl_static_show readably -->
- [ ] #58012 <!-- Re-enable tab completion of kwargs for large method
tables -->
- [ ] #58683 <!-- Add 0 predecessor to entry basic block and handle it
in inlining -->
- [ ] #59112 <!-- Add builtin function name to add methods error -->

Non-merged PRs with backport label:
- [ ] #59329 <!-- aotcompile: destroy LLVM context after serializing
combined module -->
- [ ] #58848 <!-- Set array size only when safe to do so -->
- [ ] #58535 <!-- gf.c: include const-return methods in
`--trace-compile` -->
- [ ] #58038 <!-- strings/cstring: `transcode`: prevent Windows sysimage
invalidation -->
- [ ] #57604 <!-- `@nospecialize` for `string_index_err` -->
- [ ] #57366 <!-- Use ptrdiff_t sized offsets for gvars_offsets to allow
large sysimages -->
- [ ] #56890 <!-- Enable getting non-boxed LLVM type from Julia Type -->
- [ ] #56823 <!-- Make version of opaque closure constructor in world
-->
- [ ] #55958 <!-- also redirect JL_STDERR etc. when redirecting to
devnull -->
- [ ] #55956 <!-- Make threadcall gc safe -->
- [ ] #55534 <!-- Set stdlib sources as read-only during installation
-->
- [ ] #55499 <!-- propagate the terminal's `displaysize` to the
`IOContext` used by the REPL -->
- [ ] #55458 <!-- Allow for generically extracting unannotated string
-->
- [ ] #55457 <!-- Make AnnotateChar equality consider annotations -->
- [ ] #55220 <!-- `isfile_casesensitive` fixes on Windows -->
- [ ] #53957 <!-- tweak how filtering is done for what packages should
be precompiled -->
- [ ] #51479 <!-- prevent code loading from lookin in the versioned
environment when building Julia -->
- [ ] #50813 <!-- More doctests for Sockets and capitalization fix -->
- [ ] #50157 <!-- improve docs for `@inbounds` and
`Base.@propagate_inbounds` -->

---------

Co-authored-by: Kiran Pamnany <kpamnany@users.noreply.github.com>
Co-authored-by: adienes <51664769+adienes@users.noreply.github.com>
Co-authored-by: Gabriel Baraldi <baraldigabriel@gmail.com>
Co-authored-by: Keno Fischer <keno@juliacomputing.com>
Co-authored-by: Simeon David Schaub <simeon@schaub.rocks>
Co-authored-by: Jameson Nash <vtjnash@gmail.com>
Co-authored-by: Alex Arslan <ararslan@comcast.net>
Co-authored-by: Fons van der Plas <fonsvdplas@gmail.com>
Co-authored-by: Ian Butterworth <i.r.butterworth@gmail.com>
Co-authored-by: JonasIsensee <jonas.isensee@web.de>
Co-authored-by: Curtis Vogt <curtis.vogt@gmail.com>
Co-authored-by: Dilum Aluthge <dilum@aluthge.com>
Co-authored-by: DilumAluthgeBot <43731525+DilumAluthgeBot@users.noreply.github.com>
Co-authored-by: DilumAluthge <5619885+DilumAluthge@users.noreply.github.com>
DilumAluthge pushed a commit that referenced this pull request Sep 9, 2025
…59329)

We already save some memory here by deleting the `jl_native_code_desc_t`
after
we're done serializing the combined module, but some data in the
module's
`LLVM::Context` lives on until the end of the scope in
`jl_dump_native_impl`.
Once we're done with the module, move the lock and `ThreadSafeContext`
so the
reference count drops to zero.

A crude measurement shows that when compiling the Base sysimage, about 3
GiB is
in use. Deleting the `jl_native_code_desc_t` (as before) saves about 600
MiB,
and cleaning up the context saves an additional ~500 MiB.

(cherry picked from commit ceeb661)
@DilumAluthge DilumAluthge mentioned this pull request Sep 9, 2025
59 tasks
@DilumAluthge DilumAluthge removed the backport 1.11 Change should be backported to release-1.11 label Sep 9, 2025
@KristofferC KristofferC removed the backport 1.12 Change should be backported to release-1.12 label Sep 11, 2025
@DilumAluthge DilumAluthge added the backport 1.11 Change should be backported to release-1.11 label Sep 11, 2025
KristofferC pushed a commit that referenced this pull request Sep 15, 2025
…59329)

We already save some memory here by deleting the `jl_native_code_desc_t`
after
we're done serializing the combined module, but some data in the
module's
`LLVM::Context` lives on until the end of the scope in
`jl_dump_native_impl`.
Once we're done with the module, move the lock and `ThreadSafeContext`
so the
reference count drops to zero.

A crude measurement shows that when compiling the Base sysimage, about 3
GiB is
in use. Deleting the `jl_native_code_desc_t` (as before) saves about 600
MiB,
and cleaning up the context saves an additional ~500 MiB.

(cherry picked from commit ceeb661)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 1.11 Change should be backported to release-1.11 compiler:llvm For issues that relate to LLVM performance Must go faster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants