-
Notifications
You must be signed in to change notification settings - Fork 565
chore(e2e): naively parallelize CI jobs by chunking alphabetically #2520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(e2e): naively parallelize CI jobs by chunking alphabetically #2520
Conversation
3d7f26d
to
b6c047a
Compare
This was tested on my private fork, see https://github.com/estroz/operator-lifecycle-manager/actions/runs/1565786458 |
b6c047a
to
5175e23
Compare
This is really, really (albeit a bit hacky) cool work man - this will infinitely improve the current state of long, frustrating e2e feedback. I just had a couple of questions, and I'd like to land anything that substantially improves the e2e feedback, but we'll need to update the branch protection for our tide configuration (or remove tide entirely which I think is my preference). |
@timflannagan the main things I want to add before this merges is tests for the script, and some deduplication by prefix (if both |
64f79c0
to
e77482c
Compare
I think the overall changes are reasonable to me looking back at that this but my brain is pretty much mush at this point. I'd like to land these changes ASAP but we'd also need to land a tide configuration update as well. I should be back to full capacity by tomorrow. |
// wordTrie is a trie of word nodes, instead of individual characters. | ||
type wordTrie map[string]*wordTrieNode | ||
|
||
type wordTrieNode struct { | ||
word string | ||
children map[string]*wordTrieNode | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason you settled on a trie implementation vs. a more primitive implementation of sorting test specs and chunking arbitrarily?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Primarily to prevent one spec being run by two different chunks. Imagine running two specs defined by phrases "Foo" and "Foo bar" naively in two chunks: the first chunk (^Foo.+
) would run both specs, while the second chunk (^Foo bar.+
) would run only the latter spec. By using a word trie, the phrases are reduced down to just the word "Foo", which will run both specs in one chunk. At first the downside appears to be a less even distribution of chunk runtimes, since "Foo" and "Foo bar" will always be run together; however this downside also affects naive chunking because spec phrase prefixes are still shared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably a really naive question, but, why not just use Ginkgo's parallelization: e.g. ginkgo -procs=N
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-procs
gives you core parallelism, the actions matrix + chunks gives you core and runner parallelism. Runner parallelism is important here because we cannot choose machine size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also important to note that the way our CI environment is setup, we create a kind cluster everytime we spawn a new e2e process, so this PR simply builds on top of that existing setup and chunks test specs and feeds those test specs as focus regex groups that an individual kind cluster will then run. There's likely a cleaner solution going forward, but we'd have to shift around some of our current setup to get it working correctly.
e77482c
to
2e300ee
Compare
2e300ee
to
2d4506c
Compare
I think I'm comfortable enough with the implementation to avoid holding up this PR with any further comments given it helps improve QoL quite a bit. I would like to see us open an issue around aggregating the debug/junit/etc. artifacts into a single source, but that can be done asynchronously anyways. /lgtm |
Interesting, the extra info-level logging is showing up as errors in the run output:
|
2d4506c
to
abde1b0
Compare
20e7adb
to
5121b9e
Compare
Signed-off-by: Eric Stroczynski <ericstroczynski@gmail.com>
Signed-off-by: Eric Stroczynski <ericstroczynski@gmail.com>
Signed-off-by: timflannagan <timflannagan@gmail.com>
5121b9e
to
baf0cad
Compare
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: estroz, timflannagan The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Description of the change: parallelized e2e tests (for upstream CI only)
Motivation for the change: current sequential runs are ~1h10m. the longest parallel run is ~24 min, and could be lower if we increase the matrix size. you will hit Amdahl's law quickly from test setup. it's a little hacky but adjusts automatically when new specs are added/removed.
I'll add some tests next week
Reviewer Checklist
/doc