Skip to content

[CI] Epic: possible enhancements to current Dockerfile #4021

Open
@apostasie

Description

@apostasie

What is the problem you're trying to solve

When building dockerfile

  • reduce network traffic
  • reduce cache size
  • speed-up build, both locally and on the CI

Describe the solution you'd like

While previous PRs proposing a full rewrite of the Dockerfile have not been accepted, we could still try and address some of the issues incrementally.

Here is a list of the issues I gathered from my own rewrite so far (at https://github.com/farcloser/lepton/blob/main/Dockerfile)

Feel free to comment on which ones do matter to nerdctl and which should not be considered.

  1. We are retrieving very large amount of data from github just by cloning. The reasons for that are two fold: we are cloning full clones, while we could shallow clones (eg: https://github.com/farcloser/lepton/blob/main/Dockerfile#L288). For containerd alone, a full clone is 180MB - and we are cloning once per architecture, and we could clone once for all platforms instead. ([CI] Dockerfile enhancements: cloning #4022)

  2. When bulding multi-architecture, many stages are duplicated across arch.
    This is true for clone stages as mentioned, but for others as well.

  3. Retrieving binary releases does also generate large amount of traffic, and also makes for inconsistent release tarballs, where different binaries have been compiled with different (outdated and/or vulnerable) versions of go.
    We could instead just compile everything we need from source, with the same golang version.

  4. CGO binaries are not consistently harden (bind now, ro relocation, pie). Compiling from source would also enable that.

  5. the rootless ssh shenanigan is probably not necessary (linger systemd-run --system --scope su --login should be enough) - removing the need to install and setup ssh

  6. go modules for nerdctl should be vendored, once, at clone-time, instead of being retrieved at runtime for every target

  7. go mods cache could be using a cache layer to reduce both network traffic and cache size, across all projects

  8. full README mentions libsecomp (and license) being statically linked, but not zlib, libcap, glib, nor libslirp. Maybe we should?

  9. usefuleness of xx vs. Docker Hub dependency / additional third party tool could be reviewed, especially if we are considering other items in the list (eg: hardening, cache reduction). For our use case, xx does not do much.

  10. COPYing the entire nerdctl local context will bust the cache for every stage with a fresh checkout, further forcing repeat operations on subsequent operations. More careful / selective copy could significantly increase cacheability (eg: the only case where we need the entire context is when retrieving the commit SHA, which could/should be put towards the end of any pipeline). Furthermore, copying is unnecessary in some cases where a bind mount would be enough and reduce useless cache size

  11. buildkit systemd service files are now provided from their repo - the current copy / replace from containerd unit could be replaced

  12. windows CNI install is outdated and not currently parameteriz-able as we depend on an old containerd install script. It could be owned by nerdctl instead with proper version+revision and built like the rest.

  13. We are still using two different golang images seemingly with no reason, and could use just one instead.

  14. We could have a "sanity" stage for the full-release, that would verify that binaries are running, built for the right architecture, that the shasum matches, and that CGO binaries have been hardened.

  15. The way we build the test-integration stage is very baroque. If we are to start testing on the host instead of inside the container, we need a simple, sound way to setup the environment for testing

I know we disagree on below, so, keeping them separately for reference as not-desirable (will also move more items down here from above as discussion go and some items are deemed undesirable):

N1. golang should be retrieved and installed from go servers instead of from Hub (reduction in traffic with Hub, removal of all version scheme conversion monkeying, no delay in availability, no unexpected changes on the same image tag)

Finally, I will update the list if new suggestions come up.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions