-
-
Couldn't load subscription status.
- Fork 5.7k
Experiment with compressing sysimgs with zstd #48244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Caching it to /tmp temporarily for fast startup/load/COW.
|
This seems like a good idea to get in - perhaps more useful on Windows than linux where the system image / exe size is restricted. |
xal-0
added a commit
that referenced
this pull request
Aug 26, 2025
Revived version of #48244, with a slightly different approach. This version looks for a function pointer called `jl_image_unpack` inside compiled system images and invokes it to get the `jl_image_buf_t` struct. Two implementations, `jl_image_unpack_zstd` and `jl_image_unpack_uncomp` are provided (for comparison). The zstd compression is applied only to the heap image, and not the compiled code, since that can be shared across Julia processes. TODO: test a few different compression settings and enable by default. Example data from un-trimmed juliac "hello world": ``` 156M hello-uncomp 43M hello-zstd 48M hello-zstd-1 45M hello-zstd-5 43M hello-zstd-15 39M hello-zstd-22 $ hyperfine -w3 ./hello-uncomp Benchmark 1: ./hello-uncomp Time (mean ± σ): 74.4 ms ± 0.8 ms [User: 51.9 ms, System: 19.0 ms] Range (min … max): 73.0 ms … 76.6 ms 39 runs $ hyperfine -w3 ./hello-zstd-1 Benchmark 1: ./hello-zstd-1 Time (mean ± σ): 152.4 ms ± 0.5 ms [User: 138.2 ms, System: 12.0 ms] Range (min … max): 151.4 ms … 153.2 ms 19 runs $ hyperfine -w3 ./hello-zstd-5 Benchmark 1: ./hello-zstd-5 Time (mean ± σ): 154.3 ms ± 0.5 ms [User: 139.6 ms, System: 12.4 ms] Range (min … max): 153.5 ms … 155.2 ms 19 runs $ hyperfine -w3 ./hello-zstd-15 Benchmark 1: ./hello-zstd-15 Time (mean ± σ): 135.9 ms ± 0.5 ms [User: 121.6 ms, System: 12.0 ms] Range (min … max): 135.1 ms … 136.5 ms 21 runs $ hyperfine -w3 ./hello-zstd-22 Benchmark 1: ./hello-zstd-22 Time (mean ± σ): 149.0 ms ± 0.6 ms [User: 134.7 ms, System: 12.1 ms] Range (min … max): 147.7 ms … 150.4 ms 19 runs ``` --------- Co-authored-by: Gabriel Baraldi <baraldigabriel@gmail.com>
topolarity
pushed a commit
that referenced
this pull request
Sep 30, 2025
Revived version of #48244, with a slightly different approach. This version looks for a function pointer called `jl_image_unpack` inside compiled system images and invokes it to get the `jl_image_buf_t` struct. Two implementations, `jl_image_unpack_zstd` and `jl_image_unpack_uncomp` are provided (for comparison). The zstd compression is applied only to the heap image, and not the compiled code, since that can be shared across Julia processes. TODO: test a few different compression settings and enable by default. Example data from un-trimmed juliac "hello world": ``` 156M hello-uncomp 43M hello-zstd 48M hello-zstd-1 45M hello-zstd-5 43M hello-zstd-15 39M hello-zstd-22 $ hyperfine -w3 ./hello-uncomp Benchmark 1: ./hello-uncomp Time (mean ± σ): 74.4 ms ± 0.8 ms [User: 51.9 ms, System: 19.0 ms] Range (min … max): 73.0 ms … 76.6 ms 39 runs $ hyperfine -w3 ./hello-zstd-1 Benchmark 1: ./hello-zstd-1 Time (mean ± σ): 152.4 ms ± 0.5 ms [User: 138.2 ms, System: 12.0 ms] Range (min … max): 151.4 ms … 153.2 ms 19 runs $ hyperfine -w3 ./hello-zstd-5 Benchmark 1: ./hello-zstd-5 Time (mean ± σ): 154.3 ms ± 0.5 ms [User: 139.6 ms, System: 12.4 ms] Range (min … max): 153.5 ms … 155.2 ms 19 runs $ hyperfine -w3 ./hello-zstd-15 Benchmark 1: ./hello-zstd-15 Time (mean ± σ): 135.9 ms ± 0.5 ms [User: 121.6 ms, System: 12.0 ms] Range (min … max): 135.1 ms … 136.5 ms 21 runs $ hyperfine -w3 ./hello-zstd-22 Benchmark 1: ./hello-zstd-22 Time (mean ± σ): 149.0 ms ± 0.6 ms [User: 134.7 ms, System: 12.1 ms] Range (min … max): 147.7 ms … 150.4 ms 19 runs ``` --------- Co-authored-by: Gabriel Baraldi <baraldigabriel@gmail.com>
gbaraldi
added a commit
that referenced
this pull request
Oct 1, 2025
Revived version of #48244, with a slightly different approach. This version looks for a function pointer called `jl_image_unpack` inside compiled system images and invokes it to get the `jl_image_buf_t` struct. Two implementations, `jl_image_unpack_zstd` and `jl_image_unpack_uncomp` are provided (for comparison). The zstd compression is applied only to the heap image, and not the compiled code, since that can be shared across Julia processes. TODO: test a few different compression settings and enable by default. Example data from un-trimmed juliac "hello world": ``` 156M hello-uncomp 43M hello-zstd 48M hello-zstd-1 45M hello-zstd-5 43M hello-zstd-15 39M hello-zstd-22 $ hyperfine -w3 ./hello-uncomp Benchmark 1: ./hello-uncomp Time (mean ± σ): 74.4 ms ± 0.8 ms [User: 51.9 ms, System: 19.0 ms] Range (min … max): 73.0 ms … 76.6 ms 39 runs $ hyperfine -w3 ./hello-zstd-1 Benchmark 1: ./hello-zstd-1 Time (mean ± σ): 152.4 ms ± 0.5 ms [User: 138.2 ms, System: 12.0 ms] Range (min … max): 151.4 ms … 153.2 ms 19 runs $ hyperfine -w3 ./hello-zstd-5 Benchmark 1: ./hello-zstd-5 Time (mean ± σ): 154.3 ms ± 0.5 ms [User: 139.6 ms, System: 12.4 ms] Range (min … max): 153.5 ms … 155.2 ms 19 runs $ hyperfine -w3 ./hello-zstd-15 Benchmark 1: ./hello-zstd-15 Time (mean ± σ): 135.9 ms ± 0.5 ms [User: 121.6 ms, System: 12.0 ms] Range (min … max): 135.1 ms … 136.5 ms 21 runs $ hyperfine -w3 ./hello-zstd-22 Benchmark 1: ./hello-zstd-22 Time (mean ± σ): 149.0 ms ± 0.6 ms [User: 134.7 ms, System: 12.1 ms] Range (min … max): 147.7 ms … 150.4 ms 19 runs ``` --------- Co-authored-by: Gabriel Baraldi <baraldigabriel@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
While studying why the sysimgs are so large, I decided to experiment with running zstd over the .data section, which made them a quarter the size they currently are which is cool.
Since I had to put the whole sysimg in memory to decompress it, it meant that we got a 70MB increase in memory usage on a clean session. To combat that, as a further experiment. I added caching for the decompressed files in
/tmp(/tmpis probably a bad place to put them :/), with caching the memory usage was about the same and so were the load/startup times.One interesting side effect of this, is that for a session with some code ran, the memory usage was significantly lower.
I ran the readme of OrdinaryDiffEq in both master(julia-2) and this pr(julia) and for some reason we use a lot less memory.
The implementation is linux only currently, though it might work on macos without too many changes, for Windows I would need a bit of help.