Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to clean GCR cache? #1402

Open
tejal29 opened this issue Aug 28, 2020 · 6 comments
Open

How to clean GCR cache? #1402

tejal29 opened this issue Aug 28, 2020 · 6 comments
Labels
area/documentation For all bugs related to documentation area/registry For all bugs having to do with pushing/pulling into registries kind/feature-request kind/question Further information is requested ok-to-close? possible-dupe registry/gcr

Comments

@tejal29
Copy link
Member

tejal29 commented Aug 28, 2020

A kaniko user asked via email ,

any suggestions on maintaining the cache in GCR? Storage is cheap, sure, but we make thousands of images a month for our CI/CD pipeline and that will add up quickly. Are there plans to include some sort of cleanup routines in the Kaniko image to manage the cache?

@tejal29 tejal29 added kind/question Further information is requested kind/feature-request labels Aug 28, 2020
@raijinsetsu
Copy link

raijinsetsu commented Aug 28, 2020

I think the following is true:
Every tagged image references the layers/images in the cache in their manifest (ie. the images we want to keep around). The repository prevents you from deleting those layers as they are "in use".

Ignoring HOW tagged images might be cleaned up (next step), I think it is safe to assume that we could enumerate all the images in the cache that were created more than some time period in the past and delete them, ignoring error responses that indicated it could not be deleted because it's "in use". This would remove any dangling cache images (ie. incomplete builds) as well as cache images that are no longer referenced by a tagged image.

I meant to add: if Kaniko could do this piece, then we would not have to copy the cache configuration to other tools or keep the two in-sync.

@tejal29
Copy link
Member Author

tejal29 commented Aug 28, 2020

This was also brought up in GoogleContainerTools/skaffold#3487

@tejal29
Copy link
Member Author

tejal29 commented Aug 28, 2020

I think the following is true:
Every tagged image references the layers/images in the cache in their manifest (ie. the images we want to keep around). The repository prevents you from deleting those layers as they are "in use".

Ignoring HOW tagged images might be cleaned up (next step), I think it is safe to assume that we could enumerate all the images in the cache that were created more than some time period in the past and delete them, ignoring error responses that indicated it could not be deleted because it's "in use". This would remove any dangling cache images (ie. incomplete builds) as well as cache images that are no longer referenced by a tagged image.

@raijinsetsu do you have a script to do that or a program ?

@raijinsetsu
Copy link

I have something... but there's a bug in the tagged-image deletion logic. So, just ignore that piece. I cannot send the entire script due to proprietary code, but here is the part of it that handles cache maintenance.

# transformListGcrImageTags
#       Reads "<digest> <tag/timestamp> [timestamp]" from stdin
#               and outputs "<digest> <timestamp> [tag]"
function transformListGcrImageTags() {

        local digest tag timestamp

        while read digest tag timestamp ; do

                if [[ "$digest" == "DIGEST" ]]; then
                        # ignore the header line
                        continue
                fi

                if [[ -z "$timestamp" ]]; then
                        # tag is actually the timestamp
                        echo $digest ${tag}Z
                else
                        echo $digest ${timestamp}Z $tag
                fi
        done
}

function listGcrImageTags() {

        if [[ -n "${2-}" ]]; then
                gcloud container images list-tags --filter "$2" "$1" | transformListGcrImageTags
        else
                gcloud container images list-tags "$1" | transformListGcrImageTags
        fi
}

# filterBranchImages <image> <git origin>
#       filters the images: if it has a remote branch, remove the tag
#       receives input from STDIN
#       example:
#               gcloud container images list-tags ${gcr_root}/rest-server/dev | filterBranchImages ${gcr_root}/rest-server/dev ${remote_origin}
function filterBranchImages() {

        local digest tag timestamp

        # read one line into fields
        while read digest timestamp tag ; do
        
                if [[ -n "$tag" ]]; then
                        # tagged image

                        # strip off the trailing commit hash from the tag to get the branch
                        local b=${tag%-*}
                        if ! (remoteBranchExists "branch" "$b" "$2" || remoteBranchExists "branch" "feat/$b" "$2" || remoteBranchExists "branch" "fix/$b" "$2" || remoteBranchExists "branch" "chore/$b" "$2" ) ; then
                                # the remote branch does not exist
                                echo "${1}:${tag}"
                        fi
                else
                        # untagged image - cannot determine remote branch
                        echo "${1}@sha256:${digest}"
                fi
        done
}

function filterCacheImages() {

        local digest tag timestamp

        while read digest timestamp tag ; do
                if [[ -n "$tag" ]]; then
                        echo "${1}:${tag}"
                else
                        echo "${1}@sha256:${digest}"
                fi
        done
}

function deleteTags() {

        # use xargs to batch up multiple deletions
        # also suppress stdout
        xargs -r gcloud container images delete > /dev/null
}

function cleanupGcrImages() {

        local expireTS=$((now - 1209600))
        local expire=$( date -Iseconds -d \@$expireTS )
        echo "Cache expiration: $expire"

        if [[ $TEST -eq 0 ]]; then
                listGcrImageTags ${gcr_root}/rest-server/dev | filterBranchImages ${gcr_root}/rest-server/dev ${remote_origin} | tee >(cat >&2) | deleteTags
                listGcrImageTags ${gcr_root}/rest-server/dev/cache "timestamp.datetime < $expire" | filterCacheImages ${gcr_root}/rest-server/dev/cache ${remote_origin} | tee >(cat >&2) | deleteTags
        else
                listGcrImageTags ${gcr_root}/rest-server/dev | filterBranchImages ${gcr_root}/rest-server/dev ${remote_origin}
                listGcrImageTags ${gcr_root}/rest-server/dev/cache "timestamp.datetime < $expire" | filterCacheImages ${gcr_root}/rest-server/dev/cache ${remote_origin}
        fi
}

It's supposed to conditionally delete the tagged image based on the presence of the corresponding branch in Git but we ran into an edge case where that is not true.

I think it would be great if the Cache cleanup piece were part of Kaniko but I understand if this is TOO repository specific.

@aaron-prindle aaron-prindle added area/registry For all bugs having to do with pushing/pulling into registries area/documentation For all bugs related to documentation registry/gcr labels Jul 27, 2023
@aaron-prindle
Copy link
Collaborator

Wanted to note that GCP's Artifact Registry has now added the concept of a "Cleanup Policy" which could help here:
https://cloud.google.com/artifact-registry/docs/repositories/cleanup-policy

@aaron-prindle
Copy link
Collaborator

Possible dupe of #998

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/documentation For all bugs related to documentation area/registry For all bugs having to do with pushing/pulling into registries kind/feature-request kind/question Further information is requested ok-to-close? possible-dupe registry/gcr
Projects
None yet
Development

No branches or pull requests

3 participants