Test windows runners speed #26366

notpeter · 2025-03-10T14:22:53Z

Hypothesis: Caching on Windows GitHub hosted runners is slower than without.

Release Notes:

N/A

JunkuiZhang · 2025-03-10T15:03:36Z

I have a theory. I suspect the slowdown might be related to the fact that our Windows runner never runs cargo clean. If I recall correctly, when the initial size of the target folder was around 30GB, the windows tests ran quite fast.

Unlike the Linux and macOS runners, which run the clear-target-dir-if-larger-than script daily when publishing the nightly version to control the size of the target folder, the Windows runner doesn't have this mechanism. As a result, its target folder keeps growing, and in this PR, it's almost 50GB.

notpeter · 2025-03-10T16:35:10Z

I don't think that's all of what's happening.
Currently the cache is 7GB for tests and 2GB for clippy.

GitHub limits our total cache to 10GB per repo which is barely enough to fit this.

Even if there is some statefulness across runs, each runner invocation can be in any of (at least 7 regions). The speeds to restore cache vary greatly:

clippy cache - ~2047GB - 1m4s and 2m27s
- load swatinem/rust-cache (16s - 19s)
- download 2GB from cache 12s to 90s
- extract 2GB tar archive (~35-40s)
test cache - ~7169 - 2m41s to 5m30s.
- load swatinem/rust-cache (16s - 19s)
- tests download 7GB from cache (1m25s to 3m3s)
- tests extract 7GB from cache (~1m30s - 1m40s)

And creating the tarball to write to the cache is also dog-slow. This 13m0s example

17s prep/cleanup
12m10s to generate the tarball
24s to transfer

I was surprised by the times to compress/decompress.
Perhaps we are accidentally caching many small files?
These commands look fine.

# decompress
"C:\Program Files\Git\usr\bin\tar.exe" -xf C:/a/_temp/62ab0759-0d6a-4a3f-a9fe-5ba1fdde25fe/cache.tzst -P -C C:/a/zed/zed --force-local --use-compress-program "zstd -d"

# compress
C:\Program Files\Git\usr\bin\tar.exe" --posix -cf cache.tzst --exclude cache.tzst -P -C C:/a/zed/zed --files-from manifest.txt --force-local --use-compress-program "zstd -T0

Sadly, even despite all this, caching still out performs non-caching in the worst examples when looking at clippy (3m56s vs 5m51s) and test+build (12m31s vs 16m40s).

I'm going to blow away the runner caches for windows-test and windows-clippy and see if that helps at all.

osiewicz · 2025-03-10T17:11:12Z

Are we by any chance caching target directory? If so, that likely has a tons of small object files.

notpeter · 2025-03-10T19:26:39Z

@osiewicz Not sure what all is being cached. It's these moments that I much miss CircleCI's "rerun with SSH shell" where you can live inspect the contents of a runner while tests are being run and after they've completed.

New smaller caches unsurprisingly lead to improved cache loads speed (setup/xfer/decompress).

Clippy (57s: 18/14/25 vs 1m28s 18/32/38)
Build+Test (1m25s: 19/28/38 vs 2m41s 19/53/90

Generally I'm quite unhappy without our ability to get quick build times even with inordinate expense. Running tests+clippy on the biggest Windows runners available (64vCPU, 256GB ram, $0.512/min) takes in excess of 10minutes ($5.63/build)...of which caching is 1m49s: 18/31/60 ($0.93).

My gut is that if we were using self-hosted stateful runners, with a local cargo cache I think we could get sub 10minutes with 32vcpu/48vpu instances. I don't know if the effort of managing windows instances is worth it, nor do I have hard data to support my hypothesis, but ideally sometime in Q2 I would like to see (a) windows tests to be running on every commit (b) taking less than 10mins per run. Out stateful macOS runners with 10-14cores and 32gb of ram cost ~$10-$13/day. It really bugs me that Windows builds with 4x the ram, 3x the cpu, at ~10x the cost still take 2x as long (~7mins vs 14mins).

SomeoneToIgnore · 2025-03-10T21:54:16Z

We should be caching target/ due to

zed/.github/workflows/ci.yml

Lines 169 to 173 in d3a295b

    
           - name: Cache dependencies 
        
             uses: swatinem/rust-cache@f0deed1e0edfc6a9be95417288c0e1099b1eeec3 # v2 
        
             with: 
        
               save-if: ${{ github.ref == 'refs/heads/main' }} 
        
               cache-provider: "buildjet"

IIRC also crates' index and other .cargo contents.

see https://github.com/Swatinem/rust-cache for more details.

osiewicz · 2025-03-10T21:56:48Z

@notpeter if we had a windows machine to run the builds on, I would be down to look at rustc profiles with you. My gut instinct is that it might not be too great about fs (I've had one such pr in rust-lang/rust#134866 which was impactful even for Mac).
Dev builds do quite a bunch of reading from filesystem due to incremental build cache. Maybe it'd be worthwhile to test with incremental build disabled to see if that's a good lead?

notpeter · 2025-03-11T14:06:15Z

@osiewicz Doesn't this env mean we aren't using incremental anywhere in ci.yml?

zed/.github/workflows/ci.yml

Lines 20 to 23 in a119073

    
           env: 
        
             CARGO_TERM_COLOR: always 
        
             CARGO_INCREMENTAL: 0 
        
             RUST_BACKTRACE: 1

osiewicz · 2025-03-11T14:10:04Z

Oh, yeah, that should do it.

Test windows runners

81d0fda

cla-bot bot added the cla-signed The user has signed the Contributor License Agreement label Mar 10, 2025

notpeter added the windows label Mar 10, 2025

commit2

e055852

notpeter force-pushed the faster_windows branch from 92b975a to e055852 Compare March 10, 2025 14:26

A second run

b3b5d79

notpeter added 2 commits March 10, 2025 12:38

comment3

8575726

Restore cache

fb0453d

Remove comment

a119073

notpeter self-assigned this Mar 10, 2025

notpeter closed this Mar 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Test windows runners speed #26366

Test windows runners speed #26366

Uh oh!

notpeter commented Mar 10, 2025

Uh oh!

JunkuiZhang commented Mar 10, 2025

Uh oh!

notpeter commented Mar 10, 2025 •

edited

Loading

Uh oh!

osiewicz commented Mar 10, 2025

Uh oh!

notpeter commented Mar 10, 2025

Uh oh!

SomeoneToIgnore commented Mar 10, 2025

Uh oh!

osiewicz commented Mar 10, 2025

Uh oh!

notpeter commented Mar 11, 2025

Uh oh!

osiewicz commented Mar 11, 2025

Uh oh!

Uh oh!

Test windows runners speed #26366

Test windows runners speed #26366

Uh oh!

Conversation

notpeter commented Mar 10, 2025

Uh oh!

JunkuiZhang commented Mar 10, 2025

Uh oh!

notpeter commented Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

osiewicz commented Mar 10, 2025

Uh oh!

notpeter commented Mar 10, 2025

Uh oh!

SomeoneToIgnore commented Mar 10, 2025

Uh oh!

osiewicz commented Mar 10, 2025

Uh oh!

notpeter commented Mar 11, 2025

Uh oh!

osiewicz commented Mar 11, 2025

Uh oh!

Uh oh!

notpeter commented Mar 10, 2025 •

edited

Loading