Skip to content

Test windows runners speed #26366

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from
Closed

Test windows runners speed #26366

wants to merge 6 commits into from

Conversation

notpeter
Copy link
Member

Hypothesis: Caching on Windows GitHub hosted runners is slower than without.

Release Notes:

  • N/A

@cla-bot cla-bot bot added the cla-signed The user has signed the Contributor License Agreement label Mar 10, 2025
@JunkuiZhang
Copy link
Contributor

I have a theory. I suspect the slowdown might be related to the fact that our Windows runner never runs cargo clean. If I recall correctly, when the initial size of the target folder was around 30GB, the windows tests ran quite fast.

Unlike the Linux and macOS runners, which run the clear-target-dir-if-larger-than script daily when publishing the nightly version to control the size of the target folder, the Windows runner doesn't have this mechanism. As a result, its target folder keeps growing, and in this PR, it's almost 50GB.

image

@notpeter
Copy link
Member Author

notpeter commented Mar 10, 2025

I don't think that's all of what's happening.
Currently the cache is 7GB for tests and 2GB for clippy.
Screenshot 2025-03-10 at 11 38 02
Screenshot 2025-03-10 at 11 38 40
GitHub limits our total cache to 10GB per repo which is barely enough to fit this.

Even if there is some statefulness across runs, each runner invocation can be in any of (at least 7 regions). The speeds to restore cache vary greatly:

  • clippy cache - ~2047GB - 1m4s and 2m27s
    • load swatinem/rust-cache (16s - 19s)
    • download 2GB from cache 12s to 90s
    • extract 2GB tar archive (~35-40s)
  • test cache - ~7169 - 2m41s to 5m30s.
    • load swatinem/rust-cache (16s - 19s)
    • tests download 7GB from cache (1m25s to 3m3s)
    • tests extract 7GB from cache (~1m30s - 1m40s)

And creating the tarball to write to the cache is also dog-slow. This 13m0s example

  • 17s prep/cleanup
  • 12m10s to generate the tarball
  • 24s to transfer

I was surprised by the times to compress/decompress.
Perhaps we are accidentally caching many small files?
These commands look fine.

# decompress
"C:\Program Files\Git\usr\bin\tar.exe" -xf C:/a/_temp/62ab0759-0d6a-4a3f-a9fe-5ba1fdde25fe/cache.tzst -P -C C:/a/zed/zed --force-local --use-compress-program "zstd -d"

# compress
C:\Program Files\Git\usr\bin\tar.exe" --posix -cf cache.tzst --exclude cache.tzst -P -C C:/a/zed/zed --files-from manifest.txt --force-local --use-compress-program "zstd -T0

Sadly, even despite all this, caching still out performs non-caching in the worst examples when looking at clippy (3m56s vs 5m51s) and test+build (12m31s vs 16m40s).

Screenshot 2025-03-10 at 12 32 36

I'm going to blow away the runner caches for windows-test and windows-clippy and see if that helps at all.

@osiewicz
Copy link
Contributor

Are we by any chance caching target directory? If so, that likely has a tons of small object files.

@notpeter
Copy link
Member Author

@osiewicz Not sure what all is being cached. It's these moments that I much miss CircleCI's "rerun with SSH shell" where you can live inspect the contents of a runner while tests are being run and after they've completed.

New smaller caches unsurprisingly lead to improved cache loads speed (setup/xfer/decompress).

Generally I'm quite unhappy without our ability to get quick build times even with inordinate expense. Running tests+clippy on the biggest Windows runners available (64vCPU, 256GB ram, $0.512/min) takes in excess of 10minutes ($5.63/build)...of which caching is 1m49s: 18/31/60 ($0.93).

My gut is that if we were using self-hosted stateful runners, with a local cargo cache I think we could get sub 10minutes with 32vcpu/48vpu instances. I don't know if the effort of managing windows instances is worth it, nor do I have hard data to support my hypothesis, but ideally sometime in Q2 I would like to see (a) windows tests to be running on every commit (b) taking less than 10mins per run. Out stateful macOS runners with 10-14cores and 32gb of ram cost ~$10-$13/day. It really bugs me that Windows builds with 4x the ram, 3x the cpu, at ~10x the cost still take 2x as long (~7mins vs 14mins).

@notpeter notpeter self-assigned this Mar 10, 2025
@SomeoneToIgnore
Copy link
Contributor

We should be caching target/ due to

- name: Cache dependencies
uses: swatinem/rust-cache@f0deed1e0edfc6a9be95417288c0e1099b1eeec3 # v2
with:
save-if: ${{ github.ref == 'refs/heads/main' }}
cache-provider: "buildjet"

  • IIRC also crates' index and other .cargo contents.

see https://github.com/Swatinem/rust-cache for more details.

@osiewicz
Copy link
Contributor

@notpeter if we had a windows machine to run the builds on, I would be down to look at rustc profiles with you. My gut instinct is that it might not be too great about fs (I've had one such pr in rust-lang/rust#134866 which was impactful even for Mac).
Dev builds do quite a bunch of reading from filesystem due to incremental build cache. Maybe it'd be worthwhile to test with incremental build disabled to see if that's a good lead?

@notpeter
Copy link
Member Author

@osiewicz Doesn't this env mean we aren't using incremental anywhere in ci.yml?

env:
CARGO_TERM_COLOR: always
CARGO_INCREMENTAL: 0
RUST_BACKTRACE: 1

@osiewicz
Copy link
Contributor

Oh, yeah, that should do it.

@notpeter notpeter closed this Mar 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed The user has signed the Contributor License Agreement windows
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants