Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: Build layers independently to speed up GitHub Actions tests #1787

Closed

Conversation

kinkie
Copy link
Contributor

@kinkie kinkie commented Apr 18, 2024

Adding a second matrix test dimension can decrease
testing wall time by about 75% by parallelizing tests

@rousskov rousskov changed the title Github: parallelize build testing by layer CI: Build layers independently to speed up GitHub Actions tests Apr 18, 2024
Copy link
Contributor

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I am OK with this PR. I am not excited about it (due to its drawbacks), but if others support this change, it should go in. Thank you for suggesting this optimization.

I have adjusted PR title to use a common "CI:" prefix and to be more specific about which GitHub tests we are optimizing here.

I discuss some pros and cons below and in change requests, but this review does not request any changes.


Adding a second matrix test dimension can decrease testing wall time by about 75% by parallelizing tests

Current master build-test (all layers) took 40 minutes: https://github.com/squid-cache/squid/actions/runs/8708500767/usage

The longest single-layer test in this PR took 13 minutes: https://github.com/squid-cache/squid/actions/runs/8740887918/usage?pr=1787

This particular comparison shows a 68% improvement (of build-test time) so the quoted claim feels valid (for build-test).

The total test time is going to be dominated by other tests (e.g., Code QL test took about 37 minutes and Jenkins took 106 minutes), but there is value in speeding up build-test results.

This PR starts using more GitHub Actions resources per PR, which may slow down an average PR (or raise red flags with GitHub) when/if we start testing more PRs concurrently, but I doubt that is a valid concern for today, and we can easily remove this CI optimization later if needed.

The increased build-tests configuration complexity is a concern, but the current configuration tolerates that increase well. And, again, we can easily remove this CI optimization later if needed.

The increased PR checks noise (on the PR page that humans read) and increased GitHub notifications load (primarily on bots) are the biggest drawbacks I can think of, but they are not showstoppers IMO.

.github/workflows/default.yaml Show resolved Hide resolved
.github/workflows/default.yaml Show resolved Hide resolved
@kinkie kinkie added S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box S-waiting-for-more-reviewers needs a reviewer and/or a second opinion labels Apr 23, 2024
@kinkie
Copy link
Contributor Author

kinkie commented Apr 25, 2024

The total test time is going to be dominated by other tests (e.g., Code QL test took about 37 minutes and Jenkins took 106 minutes), but there is value in speeding up build-test results.

There is no reason why we can't apply the same concept to jenkins, but there is a key difference:
our jenkins setup is resource-constrained, so while parallelization might help with stragglers, the
total amount of CPU cycles we have is bound so I don't expect total time to execute to change much.

On github we have virtually unlimited executors, so wall clock is dominated by how much we can
parallelize. The situation will improve as we move more and more environment to the github build matrix,
and we can dedicate jenkins to more complex and slow tests

@rousskov rousskov removed the S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box label Apr 26, 2024
@kinkie kinkie requested a review from yadij April 29, 2024 15:33
@kinkie
Copy link
Contributor Author

kinkie commented Apr 29, 2024

@yadij , what is your opinion on this change?

@rousskov rousskov added S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box and removed S-waiting-for-more-reviewers needs a reviewer and/or a second opinion labels Apr 29, 2024
@yadij
Copy link
Contributor

yadij commented May 3, 2024

If this works, then great.

@kinkie
Copy link
Contributor Author

kinkie commented May 4, 2024

If this works, then great.

As an experiment, I have converted the "6-matrix" Jenkins job to this model. Even in a setup where we are constrained by CPU cycles, doing that has cut the time to do a full build on a warm ccache by half, from 11 to 6 hours, I expect mostly as a reduced effect of stragglers on lower-powered workers and better resource utilization.

I have in mind a way to implement a similar benefit to the staging builds, while decoupling from Anubis' configuration to avoid having to chase changes

@kinkie kinkie added the S-could-use-an-approval An approval may speed this PR merger (but is not required) label May 4, 2024
@rousskov rousskov removed the S-could-use-an-approval An approval may speed this PR merger (but is not required) label May 4, 2024
@kinkie kinkie added the S-could-use-an-approval An approval may speed this PR merger (but is not required) label May 6, 2024
@rousskov rousskov removed the S-could-use-an-approval An approval may speed this PR merger (but is not required) label May 8, 2024
@kinkie
Copy link
Contributor Author

kinkie commented May 10, 2024

Clearing for merge in absence of explicit approvals, anubis will take care of slow-burner timeout.
Landing will fail as Anubis has the wrong number of successful jobs for landing. I'll fix that when the time comes

@kinkie kinkie added the M-cleared-for-merge https://github.com/measurement-factory/anubis#pull-request-labels label May 10, 2024
@rousskov
Copy link
Contributor

Clearing for merge in absence of explicit approvals, anubis will take care of slow-burner timeout.

@kinkie, this PR cleared all timeouts more than a week ago. It is currently "waiting for requested reviews", as shown in the PR Approval status check (and as reflected in its labels).

@kinkie kinkie removed the request for review from yadij May 10, 2024 17:49
@rousskov rousskov removed the S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box label May 10, 2024
@rousskov
Copy link
Contributor

Landing will fail as Anubis has the wrong number of successful jobs for landing. I'll fix that when the time comes

That time is here AFAICT. Please see the following comment for some related suggestions: #1785 (comment)

@kinkie
Copy link
Contributor Author

kinkie commented May 10, 2024

Landing will fail as Anubis has the wrong number of successful jobs for landing. I'll fix that when the time comes

That time is here AFAICT. Please see the following comment for some related suggestions: #1785 (comment)

My plan is to wait for Anubis to try and merge; once it fails (or succeeds too early) adjust the number of successful checks for following PRs

@rousskov
Copy link
Contributor

rousskov commented May 10, 2024

Landing will fail as Anubis has the wrong number of successful jobs for landing. I'll fix that when the time comes

That time is here AFAICT. Please see the following comment for some related suggestions: #1785 (comment)

My plan is to wait for Anubis to try and merge; once it fails (or succeeds too early) adjust the number of successful checks for following PRs

The above plan cannot work "as is" because Anubis does not try to merge a PR that has not passed required PR checks -- doing so would violate some of the key invariants Anubis is meant to protect. I would not have wasted my time suggesting a longer procedure (that has a chance of working well) if the above plan would work well.

Look for "Some checks haven’t completed yet" phrase on this PR page. That phrase will not change without our action. The corresponding "checks" box/list shows a yellow required status check that GitHub and Anubis are waiting for.

@kinkie
Copy link
Contributor Author

kinkie commented May 10, 2024

Look for "Some checks haven’t completed yet" phrase on this PR page. That phrase will not change without our action. The corresponding "checks" box/list shows a yellow required status check that GitHub and Anubis are waiting for.

You are right, we are increasing the number of checks, not decreasing.
I will follow the procedure you sugest tomorrow morning.

@rousskov
Copy link
Contributor

we are increasing the number of checks, not decreasing

It is not (just) the total number of checks -- that number affects staged PRs. We are (also) changing which checks are required (because we need to remove a build-tests (ubuntu-22.04) check from the list of the required checks). Anubis should not stage a PR that did not pass a required PR check.

squid-anubis pushed a commit that referenced this pull request May 11, 2024
Adding a second matrix test dimension can decrease
testing wall time by about 75% by parallelizing tests
@squid-anubis squid-anubis added the M-waiting-staging-checks https://github.com/measurement-factory/anubis#pull-request-labels label May 11, 2024
@squid-anubis squid-anubis added M-merged https://github.com/measurement-factory/anubis#pull-request-labels and removed M-waiting-staging-checks https://github.com/measurement-factory/anubis#pull-request-labels M-cleared-for-merge https://github.com/measurement-factory/anubis#pull-request-labels labels May 11, 2024
@kinkie
Copy link
Contributor Author

kinkie commented May 11, 2024

Now done. Hopefully I got the right number of checks in Anubis. We will know soon

kinkie added a commit to kinkie/squid that referenced this pull request Jun 9, 2024
…d-cache#1787)

Adding a second matrix test dimension can decrease
testing wall time by about 75% by parallelizing tests
@kinkie kinkie mentioned this pull request Jun 9, 2024
kinkie added a commit to kinkie/squid that referenced this pull request Jun 22, 2024
…d-cache#1787)

Adding a second matrix test dimension can decrease
testing wall time by about 75% by parallelizing tests
kinkie added a commit to kinkie/squid that referenced this pull request Oct 12, 2024
…d-cache#1787)

Adding a second matrix test dimension can decrease
testing wall time by about 75% by parallelizing tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
M-merged https://github.com/measurement-factory/anubis#pull-request-labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants