Improve performance of CI system

We have crippled our CI systems performance after introducing support for arm64 based images. A key reason for this is that emulation of arm64 images from the amd64 based runners github provide is far worse, besides the fact that we end up building base-notebook and minimal-notebook for arm64 in sequence alongside the other images now.

I'm not fully sure how we should optimize this long run, but under the assumption that we will have high performance self-hosted arm64 based GitHub Action runners that can work in parallel to the amd64 runners. Below is an overview of a very optimized system, where several parts can be done separately.

1. __Nightly builds__
   We have nightly builds with `:nightly-amd64` and `nightly-arm64` tags
2. __amd64 / arm64 in parallel__
   All tests for amd64 and arm64 run in parallel, relying on `nightly-amd64` and `nightly-arm64` caches
3. __Images in parallel where possible__
   All tests for individual images are run in a dedicated job that `needs` its base image job to complete.

   Some images can run in parallel:
   - base
   - minimal
   - scipy | r
   - tensorflow | datascience | pyspark
   - all-spark
4. __Avoid rebuilds when merging__
   Tests finish by updating a github container registry associated with a PR. By doing so, our publishing job on merge to master can opt to use the images as they were built during tests if they are considered fresh enough.
5. __Parallel manifest creation__
   Merge to default branch triggers manifest creation jobs on both amd64 and arm64. If we opt to not optimize using step 4 then this could also build fresh images using nightly cache first.
6. __Combine manifests into one before pushing to official registry__
   Merge to default branch triggers a job that pulls both the amd64 image and arm64 image and defines a combined docker manifest which is then pushed to our official container registry. I think this could be done with something like `docker manifest create <name of combined image> <amd64 only image> <arm64 only image>` but @manics knows more and I lack experience with this.
   
### Standalone performance issue

This standalone issue will go away by using better strategies like above. It isn't so critical to fix either I'd say. But currently, we build minimal-notebook again without using cache during `push-multi` assuming `push-multi` for `base-notebook` has already run. It is because we re-tag jupyter/base-notebook:latest I think.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve performance of CI system #1407

Standalone performance issue

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve performance of CI system #1407

Description

Standalone performance issue

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions