Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infrastructure for Azure pipelines underdimensioned #33980

Open
frivoal opened this issue May 7, 2022 · 7 comments
Open

Infrastructure for Azure pipelines underdimensioned #33980

frivoal opened this issue May 7, 2022 · 7 comments

Comments

@frivoal
Copy link
Contributor

frivoal commented May 7, 2022

I don't know precisely how this is provisioned, but it seems like the infrastructure that runs "Azure pipelines" in the continuous integrations tests is undermentioned. Once it runs, it's pretty fast, but it can stay queued for extended periods of time.

For instance, #33940 was blocked for about 2 hours waiting for the Azure Pipelines to be run. In the grand scheme of things, 2h may not be that much, but it's very different from 10 minutes, and changes a task that you can do in one sitting into something you have to handle in multiple work sessions, which is unwelcome overhead. If possible, it'd be nice to reduce that delay.

Thanks!

@frivoal frivoal added the infra label May 7, 2022
foolip added a commit to foolip/wpt that referenced this issue May 10, 2022
We currently trigger 5*8=40 jobs daily, and 3*8=24 of those trigger
every 3 hours, while we only have 20 parallel jobs.

We don't show the Edge Canary results on wpt.fyi by default, so reduce
them to once a week to reduce load.

Helps with web-platform-tests#33980.
@foolip
Copy link
Member

foolip commented May 10, 2022

We have a maximum of 20 parallel jobs on Azure Pipelines, but after #33755 + #33861 we can trigger up to 40 jobs at the same time, each of which is expected to take ~2 hours.

I've sent #34015 so that only 16 jobs get triggered every 3 hours, but we will still have a backlog every day, and the same 40 jobs as currently once a week, leading to delays.

@mustjab do you think there's anything we could do about the quota? Or other ways to solve this?

@mustjab
Copy link
Contributor

mustjab commented May 10, 2022

I think we can stop Edge Dev runs and can just do Canary runs for now. Also, for the weekly run, can we schedule to run on the weekend when we have fewer runs?

@foolip Do you remember who you worked with to increase the parallel job limit before? I can also try to outreach to them and see if we can increase that a bit more.

@foolip
Copy link
Member

foolip commented May 10, 2022

@mustjab I don't know for certain, but I think it was @thejohnjansen who asked someone on the Azure Pipelines team to increase the limit internally. The mechanism for doing that wasn't visible to me, I could only see the increased parallelism take effect.

Regarding Edge Canary, note that because of web-platform-tests/wpt.fyi#1635 we don't show those runs on wpt.fyi. However, with that issue fixed we could start using the Edge Canary runs instead. In any event, I think we should run either Edge Canary or Edge Dev, not both.

@mustjab
Copy link
Contributor

mustjab commented May 10, 2022

Let's stop Edge Canary runs and keep only Edge Dev channel runs. We can switch these runs to daily instead of every 3 hours to help reduce the load. Does that work?

@foolip foolip changed the title Infrastructure for Azure pipelines underdimetioned? Infrastructure for Azure pipelines underdimensioned? May 10, 2022
@foolip foolip changed the title Infrastructure for Azure pipelines underdimensioned? Infrastructure for Azure pipelines underdimensioned May 10, 2022
@foolip
Copy link
Member

foolip commented May 10, 2022

@mustjab #34015 was merged which will run Chrome Canary only once a day. We could remove it entirely, if you like. However, running Edge Dev less frequently wouldn't be great because it would mean that the wpt.fyi front page gets new aligned runs less often. And it would take longer to recover from any infra issue on any browser.

Also note that peak usage does not decrease at all unless we find a mechanism to spread out runs over time, since currently epochs/three_hourly and epochs/daily are both updated at the same time once a day. Getting backlogged once a day is better than it happening every 3 hours of course, but it would still affect wpt contributors.

@mustjab
Copy link
Contributor

mustjab commented May 10, 2022

Thanks for merging that. Let's see if that helps with the load and if we still see issues with that, then we can stop these runs until we figure out a way to increase the limit.

For Edge Dev channel, is there a different cadence that we can do other than every 3 hours? Maybe every 6 hours, to reduce the load? That should still keep the wpt.fyi front page results fresh enough.

@TalbotG
Copy link
Contributor

TalbotG commented May 11, 2023

I don't know precisely how this is provisioned, but it seems like the infrastructure that runs "Azure pipelines" in the continuous integrations tests is undermentioned. Once it runs, it's pretty fast, but it can stay queued for extended periods of time.

I fully agree with you. This has happened to me on may 10th 2023 (see #39947) and presumably also in #34926.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants