-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix handling of image digests in the minikube cache #10075
Comments
Problems seen in #10060 |
"Id" is not available in all container runtimes, and might not even be static across a single runtime. "Digest" is only available in the online registries, and not preserved when saving and loading etc. So we might need to revisit the cache code, and add some more metadata (besides just the tarball) The end goal is to be able to support things like ":latest", to compare two images with same repo:tag. |
The new version also has support for progress bars, so maybe we can finally fix #7012 It is super-annoying now, we don't even show the name but only show that tractor emoji. The request for showing the transfer of the individual cache images is here: #7175 Note that there are actually two things involved, the transfer and the load / uncompress |
So when we pull an image, we get two things: the Id and the Digest. Both are normally sha256 digests, but of two of different things. The Id file is stored in the actual tarball, under Config.
But the Digest is related to the URL it was pulled from.
The same image (id) can have multiple digests, depending on the URL. $ docker pull ubuntu:20.04
20.04: Pulling from library/ubuntu
Digest: sha256:c95a8e48bf88e9849f3e0f723d9f49fa12c5a00cfc6e60d2bc99d87555295e4c
Status: Image is up to date for ubuntu:20.04
docker.io/library/ubuntu:20.04
$ docker pull localhost:5000/ubuntu:20.04
20.04: Pulling from ubuntu
Digest: sha256:4e4bc990609ed865e07afc8427c30ffdddca5153fd4e82c20d8f0783a291e241
Status: Image is up to date for localhost:5000/ubuntu:20.04
localhost:5000/ubuntu:20.04 $ docker images ubuntu:20.04
REPOSITORY TAG IMAGE ID CREATED SIZE
ubuntu 20.04 f643c72bc252 5 weeks ago 72.9MB
$ docker images localhost:5000/ubuntu:20.04
REPOSITORY TAG IMAGE ID CREATED SIZE
localhost:5000/ubuntu 20.04 f643c72bc252 5 weeks ago 72.9MB The digest can also be totally missing, for locally created images. Recreating the $ docker save ubuntu:20.04 | tar xfO - manifest.json
[{"Config":"f643c72bc25212974c16f3348b3a898b1ec1eb13ec1539e10a103e6e217eb2f1.json","RepoTags":["ubuntu:20.04"],"Layers":["96b555d8387efc8adb493c5a3d5c453f44874ccaddf93ac35027f26608d83108/layer.tar","d7bd
d7d291799b8991768e87181710917acf79a1639b22968785aee614bb0ed3/layer.tar","db5e5b730d6b10f78842614d63c9ff75096fc5a55a20bd704a5f7205a0a82218/layer.tar"]}]
$ docker save ubuntu:20.04 | tar xfO - f643c72bc25212974c16f3348b3a898b1ec1eb13ec1539e10a103e6e217eb2f1.json | sha256sum
f643c72bc25212974c16f3348b3a898b1ec1eb13ec1539e10a103e6e217eb2f1 - This operation can take a while to seek, and filenames used vary a bit too. Recreating the $ docker save busybox > busybox.tar
$ docker rmi busybox:latest
Untagged: busybox:latest
Untagged: busybox@sha256:49dae530fd5fee674a6b0d3da89a380fc93746095e7eca0f1b70188a95fd5d71
$ docker load < busybox.tar
Loaded image: busybox:latest
$ docker inspect busybox | head
[
{
"Id": "sha256:a77dce18d0ecb0c1f368e336528ab8054567a9055269c07c0169cba15aec0291",
"RepoTags": [
"busybox:latest"
],
"RepoDigests": [],
"Parent": "",
"Comment": "",
"Created": "2020-12-30T01:19:53.606880456Z",
$ docker pull busybox:latest
latest: Pulling from library/busybox
Digest: sha256:49dae530fd5fee674a6b0d3da89a380fc93746095e7eca0f1b70188a95fd5d71
Status: Image is up to date for busybox:latest
docker.io/library/busybox:latest
$ docker inspect busybox | head
[
{
"Id": "sha256:a77dce18d0ecb0c1f368e336528ab8054567a9055269c07c0169cba15aec0291",
"RepoTags": [
"busybox:latest"
],
"RepoDigests": [
"busybox@sha256:49dae530fd5fee674a6b0d3da89a380fc93746095e7eca0f1b70188a95fd5d71"
],
"Parent": "", So we need to make both metadata accessible, quickly in the case of Id. This is so that we can compare the cache contents with the machine storage. Currently we only store the repo and tag, implicitly at that (in the filename...) As noted elsewhere (#9593), we also need to store the platform (i.e. os and arch) |
Note that it also need to "resolve" short names into the long names. busybox -> busybox:latest -> docker.io/busybox:latest -> docker.io/library/busybox:latest Some engines use alias lookups, and some only use the long names. See https://www.redhat.com/sysadmin/container-image-short-names |
This isn't based on the URL. The reason they have different digests is that (presumably) you have run
Note that the top-level manifest list for
but you've pushed the linux/amd64 variant |
That is true, I did that. 🤭 This is also making cross-compiling images a lot harder, especially the ones that depend on parent images... I even had to deploy a local registry, just to make it work. Eventually we forked (copied) the upstream instead. Even worse, but "easier" |
But I'm still not sure I see the benefit of using digest, especially if limited to the old docker tools that mangle it ? i.e.
If it would have been content-based like git (or camlistore), then there would for sure be a lot of value added... Maybe I just need to use |
Seems like this checksum is a fight that needs to be fought every couple of years or so... There was a struggle when it came to validating software downloads, with Zero Install. Maybe we need both of them, to protect against archiver and compression attacks (tarbombs) |
But I dunno. I was ready to go back to * See #7012 |
It is content based, just biased towards the content you deal with during distribution instead of at runtime.
Right, we do have both available to us. The non-compressed bits have content-addressable identifiers that are referenced by the image manifest, so in theory you can get everything you need, it's just a little more cumbersome than a flat key space. I'm not entirely sure how the minikube cache works, but everything is content-addressable, so doing what you want is definitely possible, it's just a matter of which identifiers are available and mapping between them. If you want one blessed identifier, you should use the Image ID because that's available everywhere and remains constant regardless of compression and annotations, but minikube could be clever about mapping between digests and image IDs and take advantage of a registry as a cache.
I don't quite understand what you're saying here.
I think I'm missing something. The distribution API does use HTTP.
I believe so, yes, when writing to a tarball. |
It was regarding the docker push versus crane push conversation. Seems that the digests can be made to work, just stop using docker ?
It was just a rant of crawling back into the cave: FTP the tarball only, and load it locally instead ? Partly triggered by the corporate firewall blocking the access to the corporate registry, that day.
The workaround was using go-getter to download the tarball, and then go-containerregistry to load. We considered using it for other purposes, such as offline usage or when gcr.io is being blocked. |
Something really really weird is going on, with github comments. Should move "discussions" elsewhere. Tried to restore the mangled content, sorry about fatfingering the reply button (and making an edit instead) |
Roughly, but of course that's easier said than done :)
🤦
Another workaround would be to supply a custom transport to go-containerregistry that can tee updates back to something else to display progress while the layer gets downloaded.
This is reasonable. I'll point you towards some work that @deitch did recently for caching things in linuxkit using go-containerregistry that might be similar to your needs. This is pretty close to what I would design from scratch, FWIW: linuxkit/linuxkit#3573 |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
Currently we carry a patch to save the image tags for digests in the tarball, but it's not working OK.
docker.io/library/busybox:latest@sha256:49dae530fd5fee674a6b0d3da89a380fc93746095e7eca0f1b70188a95fd5d71
https://github.com/google/go-containerregistry/blob/v0.3.0/pkg/name/digest.go#L29
The code still checks the name only (since the preload), and the docker code seems to be broken.
https://github.com/kubernetes/minikube/blob/v1.16.0/pkg/minikube/image/image.go#L51
https://github.com/kubernetes/minikube/blob/v1.16.0/pkg/minikube/image/image.go#L65
For instance, podman doesn't support it - so the current workaround is to just silently drop it.
Error: invalid image reference "docker.io/library/busybox:latest@sha256:49dae530fd5fee674a6b0d3da89a380fc93746095e7eca0f1b70188a95fd5d71": Docker references with both a tag and digest are currently not supported
We don't really want it anyway, when trying to get away from docker default and towards using CRI.
So it would be good to work with go-containerregistry upstream, and try to drop the current patch...
google/go-containerregistry#703
afbjorklund/go-containerregistry@fbad78e
Minikube is currently using version 0.1.0
https://github.com/google/go-containerregistry/releases/tag/v0.1.0
Upstream has since released version 0.3.0
https://github.com/google/go-containerregistry/releases/tag/v0.3.0
The text was updated successfully, but these errors were encountered: