Docker images contain 20MB of deleted /var/lib/apt/lists/ files #90
Description
Hi! Before I dive in, I just want to say thank you for maintaining these images :-)
So I happened to notice this section present in the Dockerfiles (generated here):
# delete all the apt list files since they're big and get stale quickly
RUN rm -rf /var/lib/apt/lists/*
# this forces "apt-get update" in dependent images, which is also good
I agree it's a good idea to remove these files to force later "apt-get update", however the comment about saving space is not correct, since deleting files in a layer after they've already been added won't free up the space. The comment seems to have been copy-pasted from this script (which isn't run across multiple layers so actually does save space).
Rather than just correcting the comment, it would be best to avoid the 20MB wasted space in the first place.
The files in /var/lib/apt/lists/
come from the base image archive from Canonical, which is directly extracted using the ADD
command's tar file support. This cannot be switched to the curl/untar/delete pattern used in downstream images, since until the base archive is extracted there are no binaries in the image to use. As such, the removal of /var/lib/apt/lists/
needs to occur prior to the Docker build process.
This example shows the Ubuntu 16.04 image being reduced from 118MB to 97.6MB by doing exactly that...
#!/bin/bash
# Fetch base archive and Dockerfile used for the existing Ubuntu 16.04 image
curl -fLO https://partner-images.canonical.com/core/xenial/current/ubuntu-xenial-core-cloudimg-amd64-root.tar.gz
curl -fLO https://raw.githubusercontent.com/tianon/docker-brew-ubuntu-core/dist-amd64/xenial/Dockerfile
# Prepare a slimmed down version
gzip -dc ubuntu-xenial-core-cloudimg-amd64-root.tar.gz | tar --delete --wildcards 'var/lib/apt/lists/*' | gzip > rootfs-minimised.tar.gz
sed 's/ubuntu-xenial-core-cloudimg-amd64-root\.tar\.gz/rootfs-minimised\.tar\.gz/' Dockerfile > Dockerfile-new
# Compare the before/after
docker build -t ubuntu-16.04-test:before .
docker build -t ubuntu-16.04-test:after -f Dockerfile-new .
docker images ubuntu-16.04-test
Output:
REPOSITORY TAG IMAGE ID CREATED SIZE
ubuntu-16.04-test after 7b258205a6b1 Less than a second ago 97.6MB
ubuntu-16.04-test before 65cb86c05710 13 seconds ago 118MB
I guess the question will be whether to store both the original base archive and the processed one in this repo (so people can still use the hashes and compare), or whether to just store the processed one.
Also, I think it's worth pushing the upstream maintainers of these base images to remove the APT lists from them, which will avoid all of this busywork. Perhaps this size-reduction use-case is a more compelling one for them than that outlined here:
https://bugs.launchpad.net/cloud-images/+bug/1685399
Many thanks!