CI: Build layers independently to speed up GitHub Actions tests #1787

kinkie · 2024-04-18T15:45:33Z

Adding a second matrix test dimension can decrease
testing wall time by about 75% by parallelizing tests

rousskov

Overall, I am OK with this PR. I am not excited about it (due to its drawbacks), but if others support this change, it should go in. Thank you for suggesting this optimization.

I have adjusted PR title to use a common "CI:" prefix and to be more specific about which GitHub tests we are optimizing here.

I discuss some pros and cons below and in change requests, but this review does not request any changes.

Adding a second matrix test dimension can decrease testing wall time by about 75% by parallelizing tests

Current master build-test (all layers) took 40 minutes: https://github.com/squid-cache/squid/actions/runs/8708500767/usage

The longest single-layer test in this PR took 13 minutes: https://github.com/squid-cache/squid/actions/runs/8740887918/usage?pr=1787

This particular comparison shows a 68% improvement (of build-test time) so the quoted claim feels valid (for build-test).

The total test time is going to be dominated by other tests (e.g., Code QL test took about 37 minutes and Jenkins took 106 minutes), but there is value in speeding up build-test results.

This PR starts using more GitHub Actions resources per PR, which may slow down an average PR (or raise red flags with GitHub) when/if we start testing more PRs concurrently, but I doubt that is a valid concern for today, and we can easily remove this CI optimization later if needed.

The increased build-tests configuration complexity is a concern, but the current configuration tolerates that increase well. And, again, we can easily remove this CI optimization later if needed.

The increased PR checks noise (on the PR page that humans read) and increased GitHub notifications load (primarily on bots) are the biggest drawbacks I can think of, but they are not showstoppers IMO.

.github/workflows/default.yaml

kinkie · 2024-04-25T09:22:00Z

The total test time is going to be dominated by other tests (e.g., Code QL test took about 37 minutes and Jenkins took 106 minutes), but there is value in speeding up build-test results.

There is no reason why we can't apply the same concept to jenkins, but there is a key difference:
our jenkins setup is resource-constrained, so while parallelization might help with stragglers, the
total amount of CPU cycles we have is bound so I don't expect total time to execute to change much.

On github we have virtually unlimited executors, so wall clock is dominated by how much we can
parallelize. The situation will improve as we move more and more environment to the github build matrix,
and we can dedicate jenkins to more complex and slow tests

kinkie · 2024-04-29T15:33:57Z

@yadij , what is your opinion on this change?

yadij · 2024-05-03T13:22:14Z

If this works, then great.

kinkie · 2024-05-04T08:43:50Z

If this works, then great.

As an experiment, I have converted the "6-matrix" Jenkins job to this model. Even in a setup where we are constrained by CPU cycles, doing that has cut the time to do a full build on a warm ccache by half, from 11 to 6 hours, I expect mostly as a reduced effect of stragglers on lower-powered workers and better resource utilization.

I have in mind a way to implement a similar benefit to the staging builds, while decoupling from Anubis' configuration to avoid having to chase changes

kinkie · 2024-05-10T14:52:43Z

Clearing for merge in absence of explicit approvals, anubis will take care of slow-burner timeout.
Landing will fail as Anubis has the wrong number of successful jobs for landing. I'll fix that when the time comes

rousskov · 2024-05-10T16:35:28Z

Clearing for merge in absence of explicit approvals, anubis will take care of slow-burner timeout.

@kinkie, this PR cleared all timeouts more than a week ago. It is currently "waiting for requested reviews", as shown in the PR Approval status check (and as reflected in its labels).

rousskov · 2024-05-10T18:16:04Z

Landing will fail as Anubis has the wrong number of successful jobs for landing. I'll fix that when the time comes

That time is here AFAICT. Please see the following comment for some related suggestions: #1785 (comment)

kinkie · 2024-05-10T19:16:13Z

Landing will fail as Anubis has the wrong number of successful jobs for landing. I'll fix that when the time comes

That time is here AFAICT. Please see the following comment for some related suggestions: #1785 (comment)

My plan is to wait for Anubis to try and merge; once it fails (or succeeds too early) adjust the number of successful checks for following PRs

rousskov · 2024-05-10T21:19:19Z

Landing will fail as Anubis has the wrong number of successful jobs for landing. I'll fix that when the time comes

That time is here AFAICT. Please see the following comment for some related suggestions: #1785 (comment)

My plan is to wait for Anubis to try and merge; once it fails (or succeeds too early) adjust the number of successful checks for following PRs

The above plan cannot work "as is" because Anubis does not try to merge a PR that has not passed required PR checks -- doing so would violate some of the key invariants Anubis is meant to protect. I would not have wasted my time suggesting a longer procedure (that has a chance of working well) if the above plan would work well.

Look for "Some checks haven’t completed yet" phrase on this PR page. That phrase will not change without our action. The corresponding "checks" box/list shows a yellow required status check that GitHub and Anubis are waiting for.

kinkie · 2024-05-10T21:28:00Z

Look for "Some checks haven’t completed yet" phrase on this PR page. That phrase will not change without our action. The corresponding "checks" box/list shows a yellow required status check that GitHub and Anubis are waiting for.

You are right, we are increasing the number of checks, not decreasing.
I will follow the procedure you sugest tomorrow morning.

rousskov · 2024-05-10T22:23:56Z

we are increasing the number of checks, not decreasing

It is not (just) the total number of checks -- that number affects staged PRs. We are (also) changing which checks are required (because we need to remove a build-tests (ubuntu-22.04) check from the list of the required checks). Anubis should not stage a PR that did not pass a required PR check.

Adding a second matrix test dimension can decrease testing wall time by about 75% by parallelizing tests

kinkie · 2024-05-11T21:05:33Z

Now done. Hopefully I got the right number of checks in Anubis. We will know soon

…d-cache#1787) Adding a second matrix test dimension can decrease testing wall time by about 75% by parallelizing tests

Github: parallelize build testing by layer

4e8c338

kinkie mentioned this pull request Apr 18, 2024

Add MacOS to Github Actions CI tests #1785

Closed

rousskov changed the title ~~Github: parallelize build testing by layer~~ CI: Build layers independently to speed up GitHub Actions tests Apr 18, 2024

rousskov reviewed Apr 18, 2024

View reviewed changes

.github/workflows/default.yaml Show resolved Hide resolved

.github/workflows/default.yaml Show resolved Hide resolved

kinkie added S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box S-waiting-for-more-reviewers needs a reviewer and/or a second opinion labels Apr 23, 2024

rousskov removed the S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box label Apr 26, 2024

kinkie requested a review from yadij April 29, 2024 15:33

rousskov added S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box and removed S-waiting-for-more-reviewers needs a reviewer and/or a second opinion labels Apr 29, 2024

Merge branch 'master' into github-parallel-layers-testing

e308dec

kinkie added the S-could-use-an-approval An approval may speed this PR merger (but is not required) label May 4, 2024

rousskov removed the S-could-use-an-approval An approval may speed this PR merger (but is not required) label May 4, 2024

kinkie added the S-could-use-an-approval An approval may speed this PR merger (but is not required) label May 6, 2024

rousskov removed the S-could-use-an-approval An approval may speed this PR merger (but is not required) label May 8, 2024

kinkie added the M-cleared-for-merge https://github.com/measurement-factory/anubis#pull-request-labels label May 10, 2024

kinkie removed the request for review from yadij May 10, 2024 17:49

rousskov removed the S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box label May 10, 2024

rousskov mentioned this pull request May 10, 2024

CI: Add clang to GitHub Actions build-tests matrix #1812

Closed

squid-anubis pushed a commit that referenced this pull request May 11, 2024

CI: Build layers independently to speed up GitHub Actions tests (#1787)

45eabfa

Adding a second matrix test dimension can decrease testing wall time by about 75% by parallelizing tests

squid-anubis added the M-waiting-staging-checks https://github.com/measurement-factory/anubis#pull-request-labels label May 11, 2024

squid-anubis closed this May 11, 2024

kinkie added a commit to kinkie/squid that referenced this pull request Jun 9, 2024

CI: Build layers independently to speed up GitHub Actions tests (squi…

2f76f1e

…d-cache#1787) Adding a second matrix test dimension can decrease testing wall time by about 75% by parallelizing tests

kinkie mentioned this pull request Jun 9, 2024

coverity cid 1461163 #1838

Closed

kinkie added a commit to kinkie/squid that referenced this pull request Jun 22, 2024

CI: Build layers independently to speed up GitHub Actions tests (squi…

f409314

…d-cache#1787) Adding a second matrix test dimension can decrease testing wall time by about 75% by parallelizing tests

kinkie added a commit to kinkie/squid that referenced this pull request Oct 12, 2024

CI: Build layers independently to speed up GitHub Actions tests (squi…

1ccf90a

…d-cache#1787) Adding a second matrix test dimension can decrease testing wall time by about 75% by parallelizing tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI: Build layers independently to speed up GitHub Actions tests #1787

CI: Build layers independently to speed up GitHub Actions tests #1787

kinkie commented Apr 18, 2024

rousskov left a comment

kinkie commented Apr 25, 2024

kinkie commented Apr 29, 2024

yadij commented May 3, 2024

kinkie commented May 4, 2024

kinkie commented May 10, 2024

rousskov commented May 10, 2024

rousskov commented May 10, 2024

kinkie commented May 10, 2024

rousskov commented May 10, 2024 •

edited

Loading

kinkie commented May 10, 2024

rousskov commented May 10, 2024

kinkie commented May 11, 2024

CI: Build layers independently to speed up GitHub Actions tests #1787

CI: Build layers independently to speed up GitHub Actions tests #1787

Conversation

kinkie commented Apr 18, 2024

rousskov left a comment

Choose a reason for hiding this comment

kinkie commented Apr 25, 2024

kinkie commented Apr 29, 2024

yadij commented May 3, 2024

kinkie commented May 4, 2024

kinkie commented May 10, 2024

rousskov commented May 10, 2024

rousskov commented May 10, 2024

kinkie commented May 10, 2024

rousskov commented May 10, 2024 • edited Loading

kinkie commented May 10, 2024

rousskov commented May 10, 2024

kinkie commented May 11, 2024

rousskov commented May 10, 2024 •

edited

Loading