Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Add utility slave #13383

Merged
merged 1 commit into from
Nov 23, 2018
Merged

Add utility slave #13383

merged 1 commit into from
Nov 23, 2018

Conversation

marcoabreu
Copy link
Contributor

@marcoabreu marcoabreu commented Nov 23, 2018

This PR adds utility slaves.

The idea of utility slaves is to offload miscellaneous lifecycle-tasks that are not actually part of the task but rather involved to keep the pipeline alive. We should not process these tasks on the same slaves as actual workload since this has multiple implications:

  1. mxnetlinux-cpu slaves are very powerful and only have 3 executors. We don't need 72 CPU cores to load a groovy script
  2. Lifecycle tasks like publishing the GitHub status to PRs share the same queue as regular workload. During high load, this might result in a PR not being notified that a job has been queued (or finished) until a mxnetlinux-cpu worker is free. This might confuse the user.
  3. Since a lifecycle task might be only executed when it reaches the top of the queue, the follow-up jobs will only be added to the queue after the lifecycle job has been processed. Only at this point, the real jobs will be enqueued. This means that it has two wait for it's part twice. Especially considering our auto scaling gets triggered every 5 minutes, it could mean that the real workload misses multiple scaling cycles since the lifecycle job hasn't passed yet and thus the "real" jobs haven't been scheduled yet.

Basically what this replaces is the concept of a high-priority queue. Since we don't have that system in Jenkins (and neither do we want to use expensive slaves for that), we can instead make these slaves with a lot of executors and a special label.

This job is NOT intended to run ANY custom workload but only to execute Jenkins operations that require a node context. Usually, these tasks would run on the Jenkins master, but I disabled that for security reasons.

The slave would be a c5.large with 2vCPUs.

@marcoabreu marcoabreu requested a review from szha as a code owner November 23, 2018 13:56
Copy link
Contributor

@KellenSunderland KellenSunderland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I'd recommend we hold off on large changes in the CI for a few days to ensure things are stable for the release process. What do you think?

@marcoabreu
Copy link
Contributor Author

This patch is exactly to improve the stability. We have to merge these changes before the release or we won't be able to support a patch release.

@KellenSunderland
Copy link
Contributor

The intent of all software patching is to improve things in one dimension or another. Large changes at a critical time rarely achieve solely their intent as no-one writes perfect code without testing (at least I don't). Did we do any shadow testing of this change in the dev account? I'm ok with approving this as long as you're highly-confident that it wont cause issues.

Copy link
Contributor

@KellenSunderland KellenSunderland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks ok with me.

@marcoabreu
Copy link
Contributor Author

Yes, this PR (and the Backend changes) have been tested on the dev account :)

With patch I'm not talking about 1.3.1 or 1.4 but 1.4.1. we would have no CI coverage if we don't apply the pipeline changes before the branch is cut :/

@marcoabreu
Copy link
Contributor Author

Thanks for your review!

@marcoabreu marcoabreu merged commit aee0953 into apache:master Nov 23, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants