-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to clean GCR cache? #1402
Comments
I think the following is true: Ignoring HOW tagged images might be cleaned up (next step), I think it is safe to assume that we could enumerate all the images in the cache that were created more than some time period in the past and delete them, ignoring error responses that indicated it could not be deleted because it's "in use". This would remove any dangling cache images (ie. incomplete builds) as well as cache images that are no longer referenced by a tagged image. I meant to add: if Kaniko could do this piece, then we would not have to copy the cache configuration to other tools or keep the two in-sync. |
This was also brought up in GoogleContainerTools/skaffold#3487 |
@raijinsetsu do you have a script to do that or a program ? |
I have something... but there's a bug in the tagged-image deletion logic. So, just ignore that piece. I cannot send the entire script due to proprietary code, but here is the part of it that handles cache maintenance. # transformListGcrImageTags
# Reads "<digest> <tag/timestamp> [timestamp]" from stdin
# and outputs "<digest> <timestamp> [tag]"
function transformListGcrImageTags() {
local digest tag timestamp
while read digest tag timestamp ; do
if [[ "$digest" == "DIGEST" ]]; then
# ignore the header line
continue
fi
if [[ -z "$timestamp" ]]; then
# tag is actually the timestamp
echo $digest ${tag}Z
else
echo $digest ${timestamp}Z $tag
fi
done
}
function listGcrImageTags() {
if [[ -n "${2-}" ]]; then
gcloud container images list-tags --filter "$2" "$1" | transformListGcrImageTags
else
gcloud container images list-tags "$1" | transformListGcrImageTags
fi
}
# filterBranchImages <image> <git origin>
# filters the images: if it has a remote branch, remove the tag
# receives input from STDIN
# example:
# gcloud container images list-tags ${gcr_root}/rest-server/dev | filterBranchImages ${gcr_root}/rest-server/dev ${remote_origin}
function filterBranchImages() {
local digest tag timestamp
# read one line into fields
while read digest timestamp tag ; do
if [[ -n "$tag" ]]; then
# tagged image
# strip off the trailing commit hash from the tag to get the branch
local b=${tag%-*}
if ! (remoteBranchExists "branch" "$b" "$2" || remoteBranchExists "branch" "feat/$b" "$2" || remoteBranchExists "branch" "fix/$b" "$2" || remoteBranchExists "branch" "chore/$b" "$2" ) ; then
# the remote branch does not exist
echo "${1}:${tag}"
fi
else
# untagged image - cannot determine remote branch
echo "${1}@sha256:${digest}"
fi
done
}
function filterCacheImages() {
local digest tag timestamp
while read digest timestamp tag ; do
if [[ -n "$tag" ]]; then
echo "${1}:${tag}"
else
echo "${1}@sha256:${digest}"
fi
done
}
function deleteTags() {
# use xargs to batch up multiple deletions
# also suppress stdout
xargs -r gcloud container images delete > /dev/null
}
function cleanupGcrImages() {
local expireTS=$((now - 1209600))
local expire=$( date -Iseconds -d \@$expireTS )
echo "Cache expiration: $expire"
if [[ $TEST -eq 0 ]]; then
listGcrImageTags ${gcr_root}/rest-server/dev | filterBranchImages ${gcr_root}/rest-server/dev ${remote_origin} | tee >(cat >&2) | deleteTags
listGcrImageTags ${gcr_root}/rest-server/dev/cache "timestamp.datetime < $expire" | filterCacheImages ${gcr_root}/rest-server/dev/cache ${remote_origin} | tee >(cat >&2) | deleteTags
else
listGcrImageTags ${gcr_root}/rest-server/dev | filterBranchImages ${gcr_root}/rest-server/dev ${remote_origin}
listGcrImageTags ${gcr_root}/rest-server/dev/cache "timestamp.datetime < $expire" | filterCacheImages ${gcr_root}/rest-server/dev/cache ${remote_origin}
fi
} It's supposed to conditionally delete the tagged image based on the presence of the corresponding branch in Git but we ran into an edge case where that is not true. I think it would be great if the Cache cleanup piece were part of Kaniko but I understand if this is TOO repository specific. |
Wanted to note that GCP's Artifact Registry has now added the concept of a "Cleanup Policy" which could help here: |
Possible dupe of #998 |
A kaniko user asked via email ,
any suggestions on maintaining the cache in GCR? Storage is cheap, sure, but we make thousands of images a month for our CI/CD pipeline and that will add up quickly. Are there plans to include some sort of cleanup routines in the Kaniko image to manage the cache?
The text was updated successfully, but these errors were encountered: