-
Notifications
You must be signed in to change notification settings - Fork 535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI: Build layers independently to speed up GitHub Actions tests #1787
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I am OK with this PR. I am not excited about it (due to its drawbacks), but if others support this change, it should go in. Thank you for suggesting this optimization.
I have adjusted PR title to use a common "CI:" prefix and to be more specific about which GitHub tests we are optimizing here.
I discuss some pros and cons below and in change requests, but this review does not request any changes.
Adding a second matrix test dimension can decrease testing wall time by about 75% by parallelizing tests
Current master build-test (all layers) took 40 minutes: https://github.com/squid-cache/squid/actions/runs/8708500767/usage
The longest single-layer test in this PR took 13 minutes: https://github.com/squid-cache/squid/actions/runs/8740887918/usage?pr=1787
This particular comparison shows a 68% improvement (of build-test time) so the quoted claim feels valid (for build-test).
The total test time is going to be dominated by other tests (e.g., Code QL test took about 37 minutes and Jenkins took 106 minutes), but there is value in speeding up build-test results.
This PR starts using more GitHub Actions resources per PR, which may slow down an average PR (or raise red flags with GitHub) when/if we start testing more PRs concurrently, but I doubt that is a valid concern for today, and we can easily remove this CI optimization later if needed.
The increased build-tests configuration complexity is a concern, but the current configuration tolerates that increase well. And, again, we can easily remove this CI optimization later if needed.
The increased PR checks noise (on the PR page that humans read) and increased GitHub notifications load (primarily on bots) are the biggest drawbacks I can think of, but they are not showstoppers IMO.
There is no reason why we can't apply the same concept to jenkins, but there is a key difference: On github we have virtually unlimited executors, so wall clock is dominated by how much we can |
@yadij , what is your opinion on this change? |
If this works, then great. |
As an experiment, I have converted the "6-matrix" Jenkins job to this model. Even in a setup where we are constrained by CPU cycles, doing that has cut the time to do a full build on a warm ccache by half, from 11 to 6 hours, I expect mostly as a reduced effect of stragglers on lower-powered workers and better resource utilization. I have in mind a way to implement a similar benefit to the staging builds, while decoupling from Anubis' configuration to avoid having to chase changes |
Clearing for merge in absence of explicit approvals, anubis will take care of slow-burner timeout. |
@kinkie, this PR cleared all timeouts more than a week ago. It is currently "waiting for requested reviews", as shown in the PR Approval status check (and as reflected in its labels). |
That time is here AFAICT. Please see the following comment for some related suggestions: #1785 (comment) |
My plan is to wait for Anubis to try and merge; once it fails (or succeeds too early) adjust the number of successful checks for following PRs |
The above plan cannot work "as is" because Anubis does not try to merge a PR that has not passed required PR checks -- doing so would violate some of the key invariants Anubis is meant to protect. I would not have wasted my time suggesting a longer procedure (that has a chance of working well) if the above plan would work well. Look for "Some checks haven’t completed yet" phrase on this PR page. That phrase will not change without our action. The corresponding "checks" box/list shows a yellow required status check that GitHub and Anubis are waiting for. |
You are right, we are increasing the number of checks, not decreasing. |
It is not (just) the total number of checks -- that number affects staged PRs. We are (also) changing which checks are required (because we need to remove a |
Adding a second matrix test dimension can decrease testing wall time by about 75% by parallelizing tests
Now done. Hopefully I got the right number of checks in Anubis. We will know soon |
…d-cache#1787) Adding a second matrix test dimension can decrease testing wall time by about 75% by parallelizing tests
…d-cache#1787) Adding a second matrix test dimension can decrease testing wall time by about 75% by parallelizing tests
…d-cache#1787) Adding a second matrix test dimension can decrease testing wall time by about 75% by parallelizing tests
Adding a second matrix test dimension can decrease
testing wall time by about 75% by parallelizing tests