Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add scripts for backups #377

Merged
merged 1 commit into from
Nov 2, 2019
Merged

add scripts for backups #377

merged 1 commit into from
Nov 2, 2019

Conversation

listx
Copy link
Contributor

@listx listx commented Sep 26, 2019

This is a rough cut of some tooling for basic backups from prod to a
backup.

/assign @thockin @justinsb

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 26, 2019
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 26, 2019
@listx
Copy link
Contributor Author

listx commented Oct 1, 2019

After this gets reviewed + merged, I plan on adding Prow jobs in k8s/test-infra repo to make use of these scripts at regular intervals (the more frequent, the better).

I could also port the "record_consistency.sh" logic into the promoter itself it seems like a conflict of interest, because the whole point of this PR is to add secondary checks to make sure the promoter is behaving properly.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 21, 2019
@listx listx changed the title WIP: add scripts for disaster recovery add scripts for backups Oct 21, 2019
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 21, 2019
@listx
Copy link
Contributor Author

listx commented Oct 21, 2019

@thockin This is much simpler than the earlier PR (1 script instead of 3). PTAL

@listx
Copy link
Contributor Author

listx commented Oct 23, 2019

This is blocked by #376

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be under infra/gcp ?

We have a turtle-at-the-bottom problem. We shift the focus to gcrane now - how can we build trust in that? I'd suggest we lock to a specific SHA256.

# Backup tools

The tools in this folder are designed to backup the
{asia,eu,us}.gcr.io/k8s-artifacts-prod GCRs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all 3 are supposed to be identical, do we need backups of all 3? Or is it just easier that way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only did it for symmetry (knee-jerk reaction).

Philosophically if I choose 1 region to back up out of the 3, then that 1 region becomes special and I don't like this.


# USAGE NOTES
#
# Backs up prod registries. This is a thin orchestrator as all the heavy lifting
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment implies copy_with_date.sh and record_consistency.sh are separate files, but no such file exists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, will fix.


# Copy each region to its backup.
for prod_repo in "${prod_repos[@]}"; do
copy_with_date "${prod_repo}" "${prod_repo}-backup" "${sa_key_path}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB I changed "-backup" to "-bak" in latest push of the ensure script (because we overflowed allowable project names :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK

docker run \
-v "${sa_key_path}":/auth.json \
--env GOOGLE_APPLICATION_CREDENTIALS=/auth.json \
gcr.io/go-containerregistry/gcrane \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a particular tag? Assuming :latest is kind of scary...

What protects this backup tool from being compromised?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a particular tag? Assuming :latest is kind of scary...

Yes, I should have referenced a sha256 digest here. Will change to not use :latest.

What protects this backup tool from being compromised?

Good question. For now I think the sha256 is the only guarantee. If we need to update we would have to run some e2e test that ensures that the new version doesn't change behavior. Short of that, at least manual testing of some sort.

@thockin
Copy link
Member

thockin commented Oct 23, 2019

Also, if pinning to a SHA, comment why :)

@listx
Copy link
Contributor Author

listx commented Oct 23, 2019

I removed the unnecessary call to gcloud (activating creds) because they are only used inside the gcrane container.

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

/lgtm
/approve

# folders more easily traversable from a human perspective.
timestamp="$(date -u +"%Y/%m/%d/%H")"

source_gcr_repo="${1}" # "us.gcr.io/k8s-artifacts-prod"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should probably check $# and abort if the args are the wrong number, ok for followup

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 29, 2019
This is a rough cut of some tooling for basic backups from prod to a
backup. The `backup.sh` script will run from a periodic Prow job every
hour. This script expects the "k8s-artifacts-prod-backup" GCR to exist,
as well as a service account that has write access to this backup
location.
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 30, 2019
@listx
Copy link
Contributor Author

listx commented Oct 30, 2019

/assign @cblecker


# Perform backup by copying all images recursively over.
docker run \
-v "${sa_key_path}":/auth.json \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will it be a good idea to use buildkit here @listx

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't heard of that. Care to elaborate? :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please ignore me. all looks good.

@cblecker
Copy link
Member

cblecker commented Nov 2, 2019

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 2, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cblecker, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 2, 2019
@k8s-ci-robot k8s-ci-robot merged commit 2d26f91 into kubernetes:master Nov 2, 2019
@listx listx deleted the dr branch June 22, 2020 03:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants