Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: switch 7 linux jobs to free runners #132409

Merged
merged 1 commit into from
Nov 6, 2024

Conversation

MarcoIeni
Copy link
Member

@MarcoIeni MarcoIeni commented Oct 31, 2024

ubuntu-20.04-4core-16gb is a large runner and we only use it because of it's larger disk space.
Let's see if we can remove it by cleaning more disk space.

Additionally, at the moment every ubuntu image runs the jlumbroso/free-disk-space action. This is a waste of time for large runners that don't need to clean space. This PR changes this behavior to only run this action on free runners which don't have enough space to run the CI.

  • test that actions without the free_disk flag work
Jobs to test Jobs that have the 4 core large runner on `master` (to test to check if we can move them to free runners):
  • dist-aarch64-linux
  • dist-loongarch64-linux
  • dist-loongarch64-musl
  • dist-powerpc-linux
  • dist-powerpc64-linux
  • dist-powerpc64le-linux
  • dist-s390x-linux
  • x86_64-gnu-debug

Jobs that have the free runner on master (to test to check if cleaning large packages takes too much time):

  • arm-android
  • armhf-gnu
  • dist-android
  • dist-armhf-linux
  • dist-armv7-linux
  • dist-i586-gnu-i586-i686-musl
  • dist-i686-linux
  • dist-ohos
  • dist-riscv64-linux
  • dist-various-1
  • dist-various-2
  • dist-x86_64-freebsd
  • dist-x86_64-illumos
  • dist-x86_64-netbsd
  • mingw-check
  • test-various
  • x86_64-rust-for-linux
  • x86_64-gnu
  • x86_64-gnu-stable
  • x86_64-gnu-aux
  • x86_64-gnu-nopt
  • x86_64-gnu-tools

try-job: x86_64-gnu-aux
try-job: x86_64-gnu-nopt
try-job: x86_64-gnu-tools

@rustbot
Copy link
Collaborator

rustbot commented Oct 31, 2024

r? @Mark-Simulacrum

rustbot has assigned @Mark-Simulacrum.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added A-testsuite Area: The testsuite used to check the correctness of rustc S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. labels Oct 31, 2024
@MarcoIeni
Copy link
Member Author

@bors try

bors added a commit to rust-lang-ci/rust that referenced this pull request Oct 31, 2024
…r=<try>

CI: remove linux 4 core large runner

try-job: mingw-check
try-job: dist-loongarch64-musl
@bors
Copy link
Contributor

bors commented Oct 31, 2024

⌛ Trying commit 38dbcf6 with merge a28d902...

@MarcoIeni
Copy link
Member Author

The two jobs correctly free up disk space based on the boolean

image

image

since the difference between the two is only 1 minute and 20 sec I wonder if we can simplify this PR by only having one free_disk boolean and running the action with the default.

@bors
Copy link
Contributor

bors commented Oct 31, 2024

☀️ Try build successful - checks-actions
Build commit: a28d902 (a28d902d64142a74635c6bde1bbfe99ba7da4b66)

@MarcoIeni
Copy link
Member Author

@Kobzol the build succeded so probably we can remove some large runners 👍

What do you think about my previous message:

since the difference between the two is only 1 minute and 20 sec I wonder if we can simplify this PR by only having one free_disk boolean and running the action with the default.

Should we keep the distinction between free_some_disk and free_disk (like this PR), or should I edit this PR to only keep free_disk and always free as much space as possible?

I lean towards the second option to simplify the CI, but I'm curious about your opinion 👍

@Kobzol
Copy link
Contributor

Kobzol commented Nov 1, 2024

Additionally, at the moment every ubuntu image runs the jlumbroso/free-disk-space action. This is a waste of time for large runners that don't need to clean space. This PR changes this behavior to only run this action on free runners which don't have enough space to run the CI.

I took a look at the disk cleaning timing and it is sometimes weirdly slow. On auto - dist-armv7-linux, it takes 20s, but on auto - dist-armhf-linux it takes more than 3 minutes?! What the heck. I think that with large-packages: true, the difference between cleaning and not cleaning will be thus much larger than in the case you have shown here, for some runners.

@MarcoIeni
Copy link
Member Author

for some runners.

we are only doing this for the free ubuntu runner, right? 🤔
This PR removes the "clean disk" step from the large runner to save time as large runners have already enough space 👍

Anyway next week I can test the try job for all the jobs involved just to make sure that large-packages: true doesn't take too long 👍

@Kobzol
Copy link
Contributor

Kobzol commented Nov 2, 2024

we are only doing this for the free ubuntu runner, right? 🤔
This PR removes the "clean disk" step from the large runner to save time as large runners have already enough space 👍

Currently, we free disk for all Ubuntu runners. I was expecting that this PR would only do that for the free runners, but it looks like it currently also cleans the disk for the large runner. I suppose that you somehow combined the disk space removal + some experiment to see if we can switch from large to free runners on some jobs? (My previous experiments did not hint at this, the jobs that switched to the largedisk runner were quite close to the threshold, and I don't think that large_packages: true will help so much).

@MarcoIeni
Copy link
Member Author

MarcoIeni commented Nov 4, 2024

I was expecting that this PR would only do that for the free runners, but it looks like it currently also cleans the disk for the large runner.

My understanding is that this PR only clears the disk of free runners (as I switched job-linux-4c-largedisk to free runners).
That's because I changed the condition from contains(matrix.os, 'ubuntu') to if: contains(matrix.free_*disk, true). Only the free runners have free_some_disk or free_disk set to true.

I suppose that you somehow combined the disk space removal + some experiment to see if we can switch from large to free runners on some jobs?

Yes, I switched some large runners to free runners and cleared more space.
So this PR is an experiment, but we could also merge it as it is (given that bors try works on all jobs of course).
For example, the try build indicates that by cleaning more space, dist-loongarch64-musl works with a free runner 👍

However, before running bors try and marking this PR as ready, I wanted to ask you if you think the complexity of having both free_disk and free_some_disk is worth it.
Imo we could just have free_disk, so basically turn job-linux-4c-largedisk into job-linux-4c and removing job-linux-4c-largedisk.
In this way, we would run large-packages: true for all ubuntu free runners 👍

I hope I'm explaining myself. If this seems difficult to understand I'm available to do a quick sync chat. 👍

@Kobzol
Copy link
Contributor

Kobzol commented Nov 4, 2024

as I switched job-linux-4c-largedisk to free runners

Ah-ha, that's the critical piece of context that I missed, now it makes sense. Yeah, I'm pretty sure that this won't work, clearing the large packages isn't enough to make all the jobs that require more disk work (or it would be borderline on the edge).

@MarcoIeni
Copy link
Member Author

dist-loongarch64-musl worked, this is its disk usage:

Filesystem      Size  Used Avail Use% Mounted on
/dev/root        73G   68G  5.2G  93% /
devtmpfs        7.9G     0  7.9G   0% /dev
tmpfs           7.9G  4.0K  7.9G   1% /dev/shm
tmpfs           1.6G  1.2M  1.6G   1% /run
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           7.9G     0  7.9G   0% /sys/fs/cgroup
/dev/loop0       64M   64M     0 100% /snap/core20/2379
/dev/loop1       92M   92M     0 100% /snap/lxd/29619
/dev/sda15      105M  6.1M   99M   6% /boot/efi
/dev/loop2       39M   39M     0 100% /snap/snapd/21759
/dev/sdb1        74G   [28](https://github.com/rust-lang-ci/rust/actions/runs/11615668881/job/32346809282#step:30:29)K   70G   1% /mnt
tmpfs           1.6G     0  1.6G   0% /run/user/1001

Do you think we could move it to free runners?
If not, what's the accepted threshold?

@Kobzol
Copy link
Contributor

Kobzol commented Nov 4, 2024

Right, so to clarify, some of them might work, but I don't expect that all of them will work, so I think that we still need to keep the 4-core large runners around. But it doesn't hurt to try, of course, I might be wrong. It's true that I probably did not attempt to run all the jobs with large_packages: true.

5 GiB left after CI runs seems fine, so we could move this one.

We should evaluate the time needed to run the disk cleanup action though. Right now, without cleaning the large packages, it seems to take from ~20s to ~4 minutes, depending on the job (which is a bit weird, but it is what it is). We should evaluate the longest time this takes with large packages. In a way, we don't need to care, because time on the free runners is... free :) But we should be aware of it, to make sure that it's not increasing the bottleneck of our CI's longest jobs.

Also, your boolean is a good idea for another reason - we should only do the cleanup on the free runners, not on the large (8core+ ones). Especially since it seems that my initial assumption that the cleanup only takes a few seconds (with large_packages: false) does not always hold.

.github/workflows/ci.yml Outdated Show resolved Hide resolved
@MarcoIeni
Copy link
Member Author

@bors try

@bors
Copy link
Contributor

bors commented Nov 4, 2024

⌛ Trying commit b9a27fc with merge e55be56...

bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 4, 2024
…r=<try>

CI: remove linux 4 core large runner

try-job: mingw-check
@MarcoIeni
Copy link
Member Author

Ok, then I will:

  • run large-packages: true for all ubuntu free runners and see how it impacts times
  • check which 4 cores linux large runners we can replace with free runners

After that, I'll mark this PR as ready for review 👍

@bors
Copy link
Contributor

bors commented Nov 4, 2024

💔 Test failed - checks-actions

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 4, 2024
@MarcoIeni
Copy link
Member Author

I canceled the job manually to avoid wasting compute, but the proposed syntax worked
image

I just need to verify that it keeps working for jobs that don't have that field 👍

@Kobzol
Copy link
Contributor

Kobzol commented Nov 4, 2024

Oh, I think I remember now why I removed large_packages: true. It was actually breaking some of the CI jobs, because it removed packages that were used by them 😅 This is something that we might run into.

@MarcoIeni
Copy link
Member Author

@bors try

@bors
Copy link
Contributor

bors commented Nov 4, 2024

⌛ Trying commit 69c4c84 with merge d851376...

@bors
Copy link
Contributor

bors commented Nov 4, 2024

⌛ Trying commit 4be804e with merge 5636d90...

bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 4, 2024
…r=<try>

CI: remove linux 4 core large runner

try-job: x86_64-gnu-aux
try-job: x86_64-gnu-nopt
try-job: x86_64-gnu-tools
@mati865
Copy link
Contributor

mati865 commented Nov 4, 2024

Sorry for the offtopic but out of curiosity how many free runners are assigned to Rust?

@Kobzol
Copy link
Contributor

Kobzol commented Nov 4, 2024

r? @Kobzol

@rustbot rustbot assigned Kobzol and unassigned Mark-Simulacrum Nov 4, 2024
@bors
Copy link
Contributor

bors commented Nov 4, 2024

☀️ Try build successful - checks-actions
Build commit: 5636d90 (5636d900d812c08a7fc4bc4278001c61f4e7fbe6)

@MarcoIeni MarcoIeni marked this pull request as ready for review November 5, 2024 07:39
@MarcoIeni
Copy link
Member Author

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 5, 2024
@Kobzol
Copy link
Contributor

Kobzol commented Nov 5, 2024

Thank you!

@bors r+

@bors
Copy link
Contributor

bors commented Nov 5, 2024

📌 Commit 4be804e has been approved by Kobzol

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 5, 2024
@MarcoIeni
Copy link
Member Author

Sorry for the offtopic but out of curiosity how many free runners are assigned to Rust?

GitHub generously offers us the Enterprise plan, so we have 500 concurrent free runners 🚀

https://docs.github.com/en/actions/administering-github-actions/usage-limits-billing-and-administration#usage-limits

@klensy
Copy link
Contributor

klensy commented Nov 5, 2024

remove linux 4 core large runner

looking at diff, not 4?

@MarcoIeni MarcoIeni changed the title CI: remove linux 4 core large runner CI: switch 4 linux jobs to free runners Nov 5, 2024
@MarcoIeni MarcoIeni changed the title CI: switch 4 linux jobs to free runners CI: switch 7 linux jobs to free runners Nov 5, 2024
@MarcoIeni MarcoIeni force-pushed the ci-remove-linux-4c-large branch from 4be804e to 02b3415 Compare November 5, 2024 15:56
@MarcoIeni
Copy link
Member Author

remove linux 4 core large runner

looking at diff, not 4?

Thanks! I changed the PR title and force pushed for a clean commit history 👍

@klensy
Copy link
Contributor

klensy commented Nov 5, 2024

need r- r+ again, so it won't picked for merge now.

@Kobzol
Copy link
Contributor

Kobzol commented Nov 5, 2024

@bors r+

@bors
Copy link
Contributor

bors commented Nov 5, 2024

📌 Commit 02b3415 has been approved by Kobzol

It is now in the queue for this repository.

bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 5, 2024
…iaskrgr

Rollup of 8 pull requests

Successful merges:

 - rust-lang#132259 (rustc_codegen_llvm: Add a new 'pc' option to branch-protection)
 - rust-lang#132409 (CI: switch 7 linux jobs to free runners)
 - rust-lang#132498 (Suggest fixing typos and let bindings at the same time)
 - rust-lang#132524 (chore(style): sync submodule exclusion list between tidy and rustfmt)
 - rust-lang#132567 (Properly suggest `E::assoc` when we encounter `E::Variant::assoc`)
 - rust-lang#132571 (add const_eval_select macro to reduce redundancy)
 - rust-lang#132637 (Do not filter empty lint passes & re-do CTFE pass)
 - rust-lang#132642 (Add documentation on `ast::Attribute`)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 0078e64 into rust-lang:master Nov 6, 2024
6 checks passed
@rustbot rustbot added this to the 1.84.0 milestone Nov 6, 2024
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Nov 6, 2024
Rollup merge of rust-lang#132409 - MarcoIeni:ci-remove-linux-4c-large, r=Kobzol

CI: switch 7 linux jobs to free runners

try-job: x86_64-gnu-aux
try-job: x86_64-gnu-nopt
try-job: x86_64-gnu-tools
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testsuite Area: The testsuite used to check the correctness of rustc S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants