chore(e2e): naively parallelize CI jobs by chunking alphabetically #2520

estroz · 2021-12-10T23:42:47Z

Description of the change: parallelized e2e tests (for upstream CI only)

Motivation for the change: current sequential runs are ~1h10m. the longest parallel run is ~24 min, and could be lower if we increase the matrix size. you will hit Amdahl's law quickly from test setup. it's a little hacky but adjusts automatically when new specs are added/removed.

I'll add some tests next week

Reviewer Checklist

Implementation matches the proposed design, or proposal is updated to match implementation
Sufficient unit test coverage
Sufficient end-to-end test coverage
Docs updated or added to /doc
Commit messages sensible and descriptive

estroz · 2021-12-11T00:51:45Z

This was tested on my private fork, see https://github.com/estroz/operator-lifecycle-manager/actions/runs/1565786458

.github/workflows/e2e-tests.yml

test/e2e/split/main.go

.github/workflows/e2e-tests.yml

Makefile

timflannagan · 2021-12-12T23:18:41Z

This is really, really (albeit a bit hacky) cool work man - this will infinitely improve the current state of long, frustrating e2e feedback. I just had a couple of questions, and I'd like to land anything that substantially improves the e2e feedback, but we'll need to update the branch protection for our tide configuration (or remove tide entirely which I think is my preference).

estroz · 2021-12-14T17:01:19Z

@timflannagan the main things I want to add before this merges is tests for the script, and some deduplication by prefix (if both Subscriptions and Subscripts blah blah exist, the latter will be run twice if they are in different chunks). I'll be working on that this week so this can get in asap.

.github/workflows/e2e-tests.yml

timflannagan · 2021-12-15T01:13:36Z

I think the overall changes are reasonable to me looking back at that this but my brain is pretty much mush at this point. I'd like to land these changes ASAP but we'd also need to land a tide configuration update as well. I should be back to full capacity by tomorrow.

test/e2e/split/main.go

.github/workflows/e2e-tests.yml

timflannagan · 2021-12-22T17:06:25Z

test/e2e/split/main.go

+// wordTrie is a trie of word nodes, instead of individual characters.
+type wordTrie map[string]*wordTrieNode
+
+type wordTrieNode struct {
+	word     string
+	children map[string]*wordTrieNode
+}


Any particular reason you settled on a trie implementation vs. a more primitive implementation of sorting test specs and chunking arbitrarily?

Primarily to prevent one spec being run by two different chunks. Imagine running two specs defined by phrases "Foo" and "Foo bar" naively in two chunks: the first chunk (^Foo.+) would run both specs, while the second chunk (^Foo bar.+) would run only the latter spec. By using a word trie, the phrases are reduced down to just the word "Foo", which will run both specs in one chunk. At first the downside appears to be a less even distribution of chunk runtimes, since "Foo" and "Foo bar" will always be run together; however this downside also affects naive chunking because spec phrase prefixes are still shared.

This is probably a really naive question, but, why not just use Ginkgo's parallelization: e.g. ginkgo -procs=N?

-procs gives you core parallelism, the actions matrix + chunks gives you core and runner parallelism. Runner parallelism is important here because we cannot choose machine size.

It's also important to note that the way our CI environment is setup, we create a kind cluster everytime we spawn a new e2e process, so this PR simply builds on top of that existing setup and chunks test specs and feeds those test specs as focus regex groups that an individual kind cluster will then run. There's likely a cleaner solution going forward, but we'd have to shift around some of our current setup to get it working correctly.

test/e2e/split/main.go

timflannagan · 2022-01-05T22:18:35Z

I think I'm comfortable enough with the implementation to avoid holding up this PR with any further comments given it helps improve QoL quite a bit. I would like to see us open an issue around aggregating the debug/junit/etc. artifacts into a single source, but that can be done asynchronously anyways.

/lgtm

timflannagan · 2022-01-05T22:49:22Z

Interesting, the extra info-level logging is showing up as errors in the run output:

Run make e2e-local E2E_TEST_CHUNK=1 E2E_TEST_NUM_CHUNKS=4 E2E_NODES=2 ARTIFACTS_DIR=./artifacts/
Error: 2022/01/05 20:31:52 test/e2e/e2e_test.go: found no top level describes, skipping
Error: 2022/01/05 20:31:52 test/e2e/like_metric_matcher_test.go: found no top level describes, skipping
Error: 2022/01/05 20:31:52 test/e2e/setup_bare_test.go: found no top level describes, skipping
...

.github/workflows/e2e-tests.yml

Signed-off-by: Eric Stroczynski <ericstroczynski@gmail.com>

Signed-off-by: timflannagan <timflannagan@gmail.com>

timflannagan · 2022-01-18T18:55:15Z

/approve
/lgtm

openshift-ci · 2022-01-18T18:55:20Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: estroz, timflannagan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [timflannagan]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci bot requested review from ankitathomas and exdx December 10, 2021 23:42

estroz force-pushed the chore/parallelize-ci-e2e branch from 3d7f26d to b6c047a Compare December 11, 2021 00:48

estroz force-pushed the chore/parallelize-ci-e2e branch from b6c047a to 5175e23 Compare December 11, 2021 01:02

timflannagan reviewed Dec 11, 2021

View reviewed changes

.github/workflows/e2e-tests.yml Show resolved Hide resolved