Skip to content

Sort trigger-arch build queue to push previously failed jobs to the end #23

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 16, 2024

Conversation

yosifkit
Copy link
Member

@yosifkit yosifkit commented Feb 16, 2024

This will give unbuilt items a chance to build before others get a retry.

(did a few successful run tests via Jenkins trigger-arm32v6 job)

@yosifkit yosifkit requested a review from tianon as a code owner February 16, 2024 19:50
Copy link

@whalelines whalelines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, a few questions

jq '.' <<<"$newFailedJobsJson" | tee pastFailedJobs.json
'''
archiveArtifacts(
artifacts: '*.json',

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why all .json files?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷 we know the directory is definitely 100% empty (deleteDir() above), so I'm not too worried about it either way

archiveArtifacts(
artifacts: '*.json',
fingerprint: true,
onlyIfSuccessful: true,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean something other than es.result == 'SUCCESS'? I'm wondering why the files are only archived on success when it seems these changes are focused on failures.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This confusingly refers to the success of this job, not the success of all the jobs this job is triggering (nor all the stages of this job, which each have their own status). This is a nice way to give us a safety against losing our data since we're doing a very poor-man's linked list here and one hiccup means we lose our intermediate data. 😄

(Hence pulling the previous version of this file from the lastSuccessful instead of just the most recent job.)

sh '''#!/usr/bin/env bash
set -Eeuo pipefail -x

jq '.' <<<"$newFailedJobsJson" | tee pastFailedJobs.json

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the string pastFailedJobs.json be abstracted into a string since it is used several places?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that could help cut down on typos, but it does get a little annoying to inject properly (we'd probably need to do something like env.PAST_FAILED_JOBS_FILENAME = ... or something so that it can be accessed from all the shell invocations without dealing with quoting hell).

Copy link
Member

@tianon tianon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few really minor nits (none of which actually change the functionality so shouldn't need rigorous re-testing), but overall really solid and seems like it'll work just fine 👍

@@ -13,6 +13,10 @@ env.BASHBREW_ARCH = env.JOB_NAME.split('/')[-1].minus('trigger-') // "windows-am
def queue = []
def breakEarly = false // thanks Jenkins...

// this Json includes the number of attempts per failing buildId
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// this Json includes the number of attempts per failing buildId
// this JSON includes the number of attempts per failing buildId

or even

Suggested change
// this Json includes the number of attempts per failing buildId
// this includes the number of attempts per failing buildId

Comment on lines 60 to 61
(env.pastFailedJobsJson | fromjson) as $pastFailedJobs |
[
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(env.pastFailedJobsJson | fromjson) as $pastFailedJobs |
[
(env.pastFailedJobsJson | fromjson) as $pastFailedJobs
| [

]
' builds.json
''').trim()
] | sort_by($pastFailedJobs[.buildId].count // 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure:

$ jq -n '{} | .foo.bar // "baz"'
"baz"

❤️

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, but maybe we should include some bit of my old TODO comment here, to remind us what this is doing / for?

Suggested change
] | sort_by($pastFailedJobs[.buildId].count // 0)
]
# this Jenkins job exports a JSON file that includes the number of attempts so far per failing buildId so that this can sort by attempts which means failing builds always live at the bottom of the queue (sorted by the number of times they have failed, so the most failing is always last)
| sort_by($pastFailedJobs[.buildId].count // 0)

count: c,
identifier: identifier,
url: res.absoluteUrl,
endtime: (res.startTimeInMillis + res.duration) / 1000.0, // convert to seconds
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
endtime: (res.startTimeInMillis + res.duration) / 1000.0, // convert to seconds
endTime: (res.startTimeInMillis + res.duration) / 1000.0, // convert to seconds

if (pastFailedJobs[buildObj.buildId]) {
c += pastFailedJobs[buildObj.buildId].count
}
// TODO maybe implement some amount of backoff? keep first url/endtime?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TODO maybe implement some amount of backoff? keep first url/endtime?
// TODO maybe implement some amount of backoff? keep first url/endTime?

sh '''#!/usr/bin/env bash
set -Eeuo pipefail -x

jq '.' <<<"$newFailedJobsJson" | tee pastFailedJobs.json
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
jq '.' <<<"$newFailedJobsJson" | tee pastFailedJobs.json
jq <<<"$newFailedJobsJson" '.' | tee pastFailedJobs.json

echo(res.result)
if (res.result != 'SUCCESS') {
def c = 1
if (pastFailedJobs[buildObj.buildId]) {
c += pastFailedJobs[buildObj.buildId].count
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something to think about in the future / next round: if count isn't set for some reason (but the object exists), this will be null and we'll get an execption from java.lang.Integer#plus (shouldn't matter for this because we're the only ones creating this file and it should always be well-formed, but this will be a good reminder for if/when something goes wrong that this line might need to be more defensive against the inputs).

@tianon
Copy link
Member

tianon commented Feb 16, 2024

CI failure is #25, fixed in #26, and this file isn't tested by CI anyhow 👍

@tianon tianon merged commit 88a9541 into docker-library:main Feb 16, 2024
@tianon tianon deleted the lower-retry branch February 16, 2024 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants