Skip to content

Conversation

@gbaraldi
Copy link
Member

@gbaraldi gbaraldi commented Jan 11, 2023

While studying why the sysimgs are so large, I decided to experiment with running zstd over the .data section, which made them a quarter the size they currently are which is cool.

Since I had to put the whole sysimg in memory to decompress it, it meant that we got a 70MB increase in memory usage on a clean session. To combat that, as a further experiment. I added caching for the decompressed files in /tmp(/tmp is probably a bad place to put them :/), with caching the memory usage was about the same and so were the load/startup times.

One interesting side effect of this, is that for a session with some code ran, the memory usage was significantly lower.

I ran the readme of OrdinaryDiffEq in both master(julia-2) and this pr(julia) and for some reason we use a lot less memory.

image

The implementation is linux only currently, though it might work on macos without too many changes, for Windows I would need a bit of help.

@brenhinkeller brenhinkeller added the performance Must go faster label Aug 6, 2023
@ViralBShah
Copy link
Member

This seems like a good idea to get in - perhaps more useful on Windows than linux where the system image / exe size is restricted.

@vtjnash vtjnash closed this Aug 19, 2025
xal-0 added a commit that referenced this pull request Aug 26, 2025
Revived version of #48244, with a slightly different approach. This
version looks for a function pointer called `jl_image_unpack` inside
compiled system images and invokes it to get the `jl_image_buf_t`
struct. Two implementations, `jl_image_unpack_zstd` and
`jl_image_unpack_uncomp` are provided (for comparison). The zstd
compression is applied only to the heap image, and not the compiled
code, since that can be shared across Julia processes.

TODO: test a few different compression settings and enable by default.

Example data from un-trimmed juliac "hello world":
```
156M  hello-uncomp
 43M  hello-zstd
 48M  hello-zstd-1
 45M  hello-zstd-5
 43M  hello-zstd-15
 39M  hello-zstd-22

$ hyperfine -w3 ./hello-uncomp 
Benchmark 1: ./hello-uncomp
  Time (mean ± σ):      74.4 ms ±   0.8 ms    [User: 51.9 ms, System: 19.0 ms]
  Range (min … max):    73.0 ms …  76.6 ms    39 runs

$ hyperfine -w3 ./hello-zstd-1
Benchmark 1: ./hello-zstd-1
  Time (mean ± σ):     152.4 ms ±   0.5 ms    [User: 138.2 ms, System: 12.0 ms]
  Range (min … max):   151.4 ms … 153.2 ms    19 runs
 
$ hyperfine -w3 ./hello-zstd-5 
Benchmark 1: ./hello-zstd-5
  Time (mean ± σ):     154.3 ms ±   0.5 ms    [User: 139.6 ms, System: 12.4 ms]
  Range (min … max):   153.5 ms … 155.2 ms    19 runs

$ hyperfine -w3 ./hello-zstd-15
Benchmark 1: ./hello-zstd-15
  Time (mean ± σ):     135.9 ms ±   0.5 ms    [User: 121.6 ms, System: 12.0 ms]
  Range (min … max):   135.1 ms … 136.5 ms    21 runs
 
$ hyperfine -w3 ./hello-zstd-22
Benchmark 1: ./hello-zstd-22
  Time (mean ± σ):     149.0 ms ±   0.6 ms    [User: 134.7 ms, System: 12.1 ms]
  Range (min … max):   147.7 ms … 150.4 ms    19 runs
```

---------

Co-authored-by: Gabriel Baraldi <baraldigabriel@gmail.com>
topolarity pushed a commit that referenced this pull request Sep 30, 2025
Revived version of #48244, with a slightly different approach. This
version looks for a function pointer called `jl_image_unpack` inside
compiled system images and invokes it to get the `jl_image_buf_t`
struct. Two implementations, `jl_image_unpack_zstd` and
`jl_image_unpack_uncomp` are provided (for comparison). The zstd
compression is applied only to the heap image, and not the compiled
code, since that can be shared across Julia processes.

TODO: test a few different compression settings and enable by default.

Example data from un-trimmed juliac "hello world":
```
156M  hello-uncomp
 43M  hello-zstd
 48M  hello-zstd-1
 45M  hello-zstd-5
 43M  hello-zstd-15
 39M  hello-zstd-22

$ hyperfine -w3 ./hello-uncomp
Benchmark 1: ./hello-uncomp
  Time (mean ± σ):      74.4 ms ±   0.8 ms    [User: 51.9 ms, System: 19.0 ms]
  Range (min … max):    73.0 ms …  76.6 ms    39 runs

$ hyperfine -w3 ./hello-zstd-1
Benchmark 1: ./hello-zstd-1
  Time (mean ± σ):     152.4 ms ±   0.5 ms    [User: 138.2 ms, System: 12.0 ms]
  Range (min … max):   151.4 ms … 153.2 ms    19 runs

$ hyperfine -w3 ./hello-zstd-5
Benchmark 1: ./hello-zstd-5
  Time (mean ± σ):     154.3 ms ±   0.5 ms    [User: 139.6 ms, System: 12.4 ms]
  Range (min … max):   153.5 ms … 155.2 ms    19 runs

$ hyperfine -w3 ./hello-zstd-15
Benchmark 1: ./hello-zstd-15
  Time (mean ± σ):     135.9 ms ±   0.5 ms    [User: 121.6 ms, System: 12.0 ms]
  Range (min … max):   135.1 ms … 136.5 ms    21 runs

$ hyperfine -w3 ./hello-zstd-22
Benchmark 1: ./hello-zstd-22
  Time (mean ± σ):     149.0 ms ±   0.6 ms    [User: 134.7 ms, System: 12.1 ms]
  Range (min … max):   147.7 ms … 150.4 ms    19 runs
```

---------

Co-authored-by: Gabriel Baraldi <baraldigabriel@gmail.com>
gbaraldi added a commit that referenced this pull request Oct 1, 2025
Revived version of #48244, with a slightly different approach. This
version looks for a function pointer called `jl_image_unpack` inside
compiled system images and invokes it to get the `jl_image_buf_t`
struct. Two implementations, `jl_image_unpack_zstd` and
`jl_image_unpack_uncomp` are provided (for comparison). The zstd
compression is applied only to the heap image, and not the compiled
code, since that can be shared across Julia processes.

TODO: test a few different compression settings and enable by default.

Example data from un-trimmed juliac "hello world":
```
156M  hello-uncomp
 43M  hello-zstd
 48M  hello-zstd-1
 45M  hello-zstd-5
 43M  hello-zstd-15
 39M  hello-zstd-22

$ hyperfine -w3 ./hello-uncomp
Benchmark 1: ./hello-uncomp
  Time (mean ± σ):      74.4 ms ±   0.8 ms    [User: 51.9 ms, System: 19.0 ms]
  Range (min … max):    73.0 ms …  76.6 ms    39 runs

$ hyperfine -w3 ./hello-zstd-1
Benchmark 1: ./hello-zstd-1
  Time (mean ± σ):     152.4 ms ±   0.5 ms    [User: 138.2 ms, System: 12.0 ms]
  Range (min … max):   151.4 ms … 153.2 ms    19 runs

$ hyperfine -w3 ./hello-zstd-5
Benchmark 1: ./hello-zstd-5
  Time (mean ± σ):     154.3 ms ±   0.5 ms    [User: 139.6 ms, System: 12.4 ms]
  Range (min … max):   153.5 ms … 155.2 ms    19 runs

$ hyperfine -w3 ./hello-zstd-15
Benchmark 1: ./hello-zstd-15
  Time (mean ± σ):     135.9 ms ±   0.5 ms    [User: 121.6 ms, System: 12.0 ms]
  Range (min … max):   135.1 ms … 136.5 ms    21 runs

$ hyperfine -w3 ./hello-zstd-22
Benchmark 1: ./hello-zstd-22
  Time (mean ± σ):     149.0 ms ±   0.6 ms    [User: 134.7 ms, System: 12.1 ms]
  Range (min … max):   147.7 ms … 150.4 ms    19 runs
```

---------

Co-authored-by: Gabriel Baraldi <baraldigabriel@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Must go faster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants