Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: use faster allocators #2874

Merged
merged 4 commits into from
Jan 14, 2025
Merged

Conversation

baszalmstra
Copy link
Contributor

@baszalmstra baszalmstra commented Jan 10, 2025

I stole this approach from uv. Apparently using specialized allocators can have a significant performance benefit. However, it also adds additional compilation overhead that we dont really need when compiling locally therefor we use this clever trick with features to enable the allocators conditionally.

You can build a "performance" build with cargo build --release --features performance. This should allow you to benchmark this locally.

@ruben-arts
Copy link
Contributor

I've tested this on M4 PRO, whith the holoviews repo and the following command

PIXI_CACHE_DIR=/tmp/bench_pixi hyperfine -p "pixi clean && rm -f pixi.lock" "main-pixi install -vv" "fast-pixi install" -w 2 -r 5 --show-output

It was inconclusive. running it twice give both as fastest.

We should test it on a linux musl build.

@Hofer-Julian
Copy link
Contributor

@ruben-arts and I stumbled over #2878 while testing this PR

@ruben-arts
Copy link
Contributor

Huge improvement after building correctly:

❯ hyperfine -p "rm pixi.lock -f" "pixi-perf-musl list --no-install" "pixi list --no-install" -r 3 -w 1
# THIS PR WITH `--feature performance`
Benchmark 1: pixi-perf-musl list --no-install
  Time (mean ± σ):     10.475 s ±  0.296 s    [User: 121.102 s, System: 10.084 s]
  Range (min … max):   10.260 s … 10.813 s    3 runs
# THIS PR WITHOUT THE `--feature performance`
Benchmark 2: pixi list --no-install
  Time (mean ± σ):     132.517 s ±  3.027 s    [User: 517.171 s, System: 1443.620 s]
  Range (min … max):   129.974 s … 135.865 s    3 runs
 
Summary
  pixi-perf-musl list --no-install ran
   12.65 ± 0.46 times faster than pixi list --no-install

@baszalmstra
Copy link
Contributor Author

Lucky find I guess!

@Hofer-Julian
Copy link
Contributor

A few things that would be good to figure out:

  • Does it actually increase compile time, if not we can do it without the "performance" feature gate
    • Check Windows
    • Check macOS
  • Does it increase the binary size, if yes, we should only enable it on platforms where it actually speeds things up
    • Check windows
    • Check macOS

@baszalmstra
Copy link
Contributor Author

I took the approach from uv where the explicitly gate this to reduce the compile time. It introduces a number of heavy dependencies.

With my simple benchmark it speeds things up on Windows, but just marginally. Unless the binary size increases a lot I say we leave it as is.

@wolfv
Copy link
Member

wolfv commented Jan 13, 2025

I compared artifacts produced by different CI pipelines and they are pretty similar in size. But I am not sure if the artifact produces is using the feature.

@baszalmstra
Copy link
Contributor Author

Only the artifact produced by dist is using the performance features atm.

@baszalmstra
Copy link
Contributor Author

baszalmstra commented Jan 13, 2025

Just ran the same benchmark on my windows machine and compared it also to a few older versions. The results are actually quite insane..

❯ hyperfine --shell nu --prepare 'rm pixi.lock -f' 'pixi-0.39.4 list --no-install' 'pixi-0.39.5 list --no-install' 'pixi-0.40.0 list --no-install' 'pixi-0.40.0-performance list --no-install' -r 3 -w 1
Benchmark 1: pixi-0.39.4 list --no-install
  Time (mean ± σ):     13.292 s ±  0.282 s    [User: 112.317 s, System: 17.124 s]
  Range (min … max):   13.051 s … 13.602 s    3 runs

Benchmark 2: pixi-0.39.5 list --no-install
  Time (mean ± σ):     10.560 s ±  0.380 s    [User: 80.650 s, System: 7.936 s]
  Range (min … max):   10.168 s … 10.926 s    3 runs

Benchmark 3: pixi-0.40.0 list --no-install
  Time (mean ± σ):     10.481 s ±  0.257 s    [User: 80.286 s, System: 7.691 s]
  Range (min … max):   10.202 s … 10.707 s    3 runs

Benchmark 4: pixi-0.40.0-performance list --no-install
  Time (mean ± σ):      6.330 s ±  0.236 s    [User: 62.807 s, System: 7.415 s]
  Range (min … max):    6.188 s …  6.602 s    3 runs

Summary
  pixi-0.40.0-performance list --no-install ran
    1.66 ± 0.07 times faster than pixi-0.40.0 list --no-install
    1.67 ± 0.09 times faster than pixi-0.39.5 list --no-install
    2.10 ± 0.09 times faster than pixi-0.39.4 list --no-install

I ran this on the holoviews repository.

Copy link
Contributor

@Hofer-Julian Hofer-Julian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be interesting if macOS also improves.

Independently of that, I think we can bring this in.

Let's remember to compare binary sizes after the next release.

@ruben-arts
Copy link
Contributor

With the caveat that I got a crazy outlier on the run before this it doesn't seem to do much on macos

hyperfine -p "rm -f pixi.lock" "pixi-perf list --no-install" "pixi-0.40.0 list --no-install" -w 1 -r 3 
Benchmark 1: pixi-perf list --no-install
  Time (mean ± σ):      9.825 s ±  0.424 s    [User: 38.959 s, System: 8.087 s]
  Range (min … max):    9.368 s … 10.206 s    3 runs
 
Benchmark 2: pixi-0.40.0 list --no-install
  Time (mean ± σ):      9.905 s ±  1.550 s    [User: 42.261 s, System: 4.414 s]
  Range (min … max):    8.146 s … 11.072 s    3 runs
 
Summary
  pixi-perf list --no-install ran
    1.01 ± 0.16 times faster than pixi-0.40.0 list --no-install

@ruben-arts ruben-arts enabled auto-merge (squash) January 14, 2025 09:02
@ruben-arts ruben-arts merged commit 9e371c4 into prefix-dev:main Jan 14, 2025
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants