Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCKER] Add python-github-backup #51

Merged
merged 8 commits into from
Jun 19, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 57 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
.DS_Store
.env

# ==============================================================================
# TERRAFORM
# ==============================================================================
# Local .terraform directories
**/.terraform/*

# .tfstate files
*.tfstate
*.tfstate.*

# Crash log files
crash.log

# Ignore any .tfvars files that are generated automatically for each Terraform run. Most
# .tfvars files are managed as part of configuration and so should be included in
# version control.
#
# example.tfvars

# Ignore override files as they are usually used to override resources locally and so
# are not checked in
override.tf
override.tf.json
*_override.tf
*_override.tf.json

# Include override files you do wish to add to version control using negated pattern
#
# !example_override.tf

# Include tfplan files to ignore the plan output of command: terraform plan -out=tfplan
# example: *tfplan*

# ==============================================================================
# PYTHON
# ==============================================================================
*.py[oc]
__pycache__

# Temp files
*~
~*
.*~
\#*
.#*
*#
dist

# Build files
build
dist
pkg
*.egg
*.egg-info
15 changes: 8 additions & 7 deletions docker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,11 @@ All these images are pushed to [Github Container Registry](https://github.com/ma

## Images

| Image | Description | Pull | Status |
| ------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------- | --------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------ |
| [ansible-worker](ansible-worker/Dockerfile) | Alpine with Ansible, OpenSSH, and sshpass preinstalled | `ghcr.io/marco-lancini/ansible-worker:latest` | ![[DOCKER IMAGE] Ansible Worker](https://github.com/marco-lancini/utils/workflows/%5BDOCKER%20IMAGE%5D%20Ansible%20Worker/badge.svg) |
| [github-changelog-generator](github-changelog-generator/README.md) | Docker image for [github-changelog-generator](https://github.com/github-changelog-generator/github-changelog-generator) | `ghcr.io/marco-lancini/github-changelog-generator:latest` | ![[DOCKER IMAGE] Latex](https://github.com/marco-lancini/utils/workflows/%5BDOCKER%20IMAGE%5D%20Github-ChangeLog-Generator/badge.svg)
| [latex](latex/README.md) | Alpine with texlive preinstalled | `ghcr.io/marco-lancini/latex:latest` | ![[DOCKER IMAGE] Latex](https://github.com/marco-lancini/utils/workflows/%5BDOCKER%20IMAGE%5D%20Latex/badge.svg) |
| [markserv](marksev/README.md) | Image for [Markserv](https://github.com/markserv/markserv) | `ghcr.io/marco-lancini/markserv:latest` | ![[DOCKER IMAGE] Markserv](https://github.com/marco-lancini/utils/workflows/%5BDOCKER%20IMAGE%5D%20Markserv/badge.svg) |
| [nomad](nomad/Dockerfile) | Image for HashiCorp Nomad | `ghcr.io/marco-lancini/nomad:latest` | ![[DOCKER IMAGE] Nomad](https://github.com/marco-lancini/utils/workflows/%5BDOCKER%20IMAGE%5D%20Nomad/badge.svg) |
| Image | Description | Pull | Status |
| ------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
| [ansible-worker](ansible-worker/Dockerfile) | Alpine with Ansible, OpenSSH, and sshpass preinstalled | `ghcr.io/marco-lancini/ansible-worker:latest` | ![[DOCKER IMAGE] Ansible Worker](https://github.com/marco-lancini/utils/workflows/%5BDOCKER%20IMAGE%5D%20Ansible%20Worker/badge.svg) |
| [github-changelog-generator](github-changelog-generator/README.md) | Docker image for [github-changelog-generator](https://github.com/github-changelog-generator/github-changelog-generator) | `ghcr.io/marco-lancini/github-changelog-generator:latest` | ![[DOCKER IMAGE] Latex](https://github.com/marco-lancini/utils/workflows/%5BDOCKER%20IMAGE%5D%20Github-ChangeLog-Generator/badge.svg) |
| [latex](latex/README.md) | Alpine with texlive preinstalled | `ghcr.io/marco-lancini/latex:latest` | ![[DOCKER IMAGE] Latex](https://github.com/marco-lancini/utils/workflows/%5BDOCKER%20IMAGE%5D%20Latex/badge.svg) |
| [markserv](marksev/README.md) | Image for [Markserv](https://github.com/markserv/markserv) | `ghcr.io/marco-lancini/markserv:latest` | ![[DOCKER IMAGE] Markserv](https://github.com/marco-lancini/utils/workflows/%5BDOCKER%20IMAGE%5D%20Markserv/badge.svg) |
| [nomad](nomad/Dockerfile) | Image for HashiCorp Nomad | `ghcr.io/marco-lancini/nomad:latest` | ![[DOCKER IMAGE] Nomad](https://github.com/marco-lancini/utils/workflows/%5BDOCKER%20IMAGE%5D%20Nomad/badge.svg) |
| [python-github-backup](python-github-backup/README.md) | Image for a customised version of the [python-github-backup](https://github.com/josegonzalez/python-github-backup) repo, as described in [Github Backups with ECS and S3](https://www.marcolancini.it/2021/blog-github-backups-with-ecs/) | N/A |
18 changes: 18 additions & 0 deletions docker/python-github-backup/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
FROM python:3.9.5-slim-buster

RUN addgroup --gid 11111 app
RUN adduser --shell /bin/false --no-create-home --uid 11111 --gid 11111 app

RUN apt-get update \
&& apt-get install -y --no-install-recommends git \
&& apt-get purge -y --auto-remove \
&& rm -rf /var/lib/apt/lists/*

WORKDIR /src
COPY docker/python-github-backup/python-github-backup /src
RUN pip install -e .

RUN chown -R app:app /src
USER app

ENTRYPOINT ["github-backup"]
7 changes: 7 additions & 0 deletions docker/python-github-backup/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Python Github Backup

Docker image for a customised version of the [python-github-backup](https://github.com/josegonzalez/python-github-backup) repo, as described in [Github Backups with ECS and S3](https://www.marcolancini.it/2021/blog-github-backups-with-ecs/).

In particular, the following has been added:
* Fetch the Github PAT and target user from environment variables
* The data fetched from Github is zipped and uploaded to an S3 bucket
119 changes: 119 additions & 0 deletions docker/python-github-backup/python-github-backup/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# python-github-backup

> **Original Source:** https://github.com/josegonzalez/python-github-backup

The package can be used to backup an *entire* organization or repository, including issues and wikis in the most appropriate format (clones for wikis, json files for issues).


## Usage

```
github-backup [-h] [-u USERNAME] [-p PASSWORD] [-t TOKEN] [--as-app]
[-o OUTPUT_DIRECTORY] [-i] [--starred] [--all-starred]
[--watched] [--followers] [--following] [--all]
[--issues] [--issue-comments] [--issue-events] [--pulls]
[--pull-comments] [--pull-commits] [--pull-details]
[--labels] [--hooks] [--milestones] [--repositories]
[--bare] [--lfs] [--wikis] [--gists] [--starred-gists]
[--skip-existing] [-L [LANGUAGES [LANGUAGES ...]]]
[-N NAME_REGEX] [-H GITHUB_HOST] [-O] [-R REPOSITORY]
[-P] [-F] [--prefer-ssh] [-v]
[--keychain-name OSX_KEYCHAIN_ITEM_NAME]
[--keychain-account OSX_KEYCHAIN_ITEM_ACCOUNT]
[--releases] [--assets] [--throttle-limit THROTTLE_LIMIT]
[--throttle-pause THROTTLE_PAUSE]
USER

Backup a github account

positional arguments:
USER github username

optional arguments:
-h, --help show this help message and exit
-u USERNAME, --username USERNAME
username for basic auth
-p PASSWORD, --password PASSWORD
password for basic auth. If a username is given but
not a password, the password will be prompted for.
-t TOKEN, --token TOKEN
personal access, OAuth, or JSON Web token, or path to
token (file://...)
--as-app authenticate as github app instead of as a user.
-o OUTPUT_DIRECTORY, --output-directory OUTPUT_DIRECTORY
directory at which to backup the repositories
-i, --incremental incremental backup
--starred include JSON output of starred repositories in backup
--all-starred include starred repositories in backup [*]
--watched include JSON output of watched repositories in backup
--followers include JSON output of followers in backup
--following include JSON output of following users in backup
--all include everything in backup (not including [*])
--issues include issues in backup
--issue-comments include issue comments in backup
--issue-events include issue events in backup
--pulls include pull requests in backup
--pull-comments include pull request review comments in backup
--pull-commits include pull request commits in backup
--pull-details include more pull request details in backup [*]
--labels include labels in backup
--hooks include hooks in backup (works only when
authenticated)
--milestones include milestones in backup
--repositories include repository clone in backup
--bare clone bare repositories
--lfs clone LFS repositories (requires Git LFS to be
installed, https://git-lfs.github.com) [*]
--wikis include wiki clone in backup
--gists include gists in backup [*]
--starred-gists include starred gists in backup [*]
--skip-existing skip project if a backup directory exists
-L [LANGUAGES [LANGUAGES ...]], --languages [LANGUAGES [LANGUAGES ...]]
only allow these languages
-N NAME_REGEX, --name-regex NAME_REGEX
python regex to match names against
-H GITHUB_HOST, --github-host GITHUB_HOST
GitHub Enterprise hostname
-O, --organization whether or not this is an organization user
-R REPOSITORY, --repository REPOSITORY
name of repository to limit backup to
-P, --private include private repositories [*]
-F, --fork include forked repositories [*]
--prefer-ssh Clone repositories using SSH instead of HTTPS
-v, --version show program's version number and exit
--keychain-name OSX_KEYCHAIN_ITEM_NAME
OSX ONLY: name field of password item in OSX keychain
that holds the personal access or OAuth token
--keychain-account OSX_KEYCHAIN_ITEM_ACCOUNT
OSX ONLY: account field of password item in OSX
keychain that holds the personal access or OAuth token
--releases include release information, not including assets or
binaries
--assets include assets alongside release information; only
applies if including releases
--throttle-limit THROTTLE_LIMIT
start throttling of GitHub API requests after this
amount of API requests remain
--throttle-pause THROTTLE_PAUSE
wait this amount of seconds when API request
throttling is active (default: 30.0, requires
--throttle-limit to be set)
```


## Examples

Backup all repositories, including private ones:
```
export ACCESS_TOKEN=SOME-GITHUB-TOKEN
github-backup WhiteHouse --token $ACCESS_TOKEN --organization --output-directory /tmp/white-house --repositories --private
```

Backup a single organization repository with everything else (wiki, pull requests, comments, issues etc):
```
export ACCESS_TOKEN=SOME-GITHUB-TOKEN
ORGANIZATION=docker
REPO=cli
# e.g. git@github.com:docker/cli.git
github-backup $ORGANIZATION -P -t $ACCESS_TOKEN -o . --all -O -R $REPO
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
#!/usr/bin/env python

import os, sys, logging
from datetime import datetime

from github_backup.github_backup import (
backup_account,
backup_repositories,
check_git_lfs_install,
filter_repositories,
get_authenticated_user,
log_info,
log_warning,
mkdir_p,
parse_args,
retrieve_repositories,
)

from github_backup.zip import do_zip
from github_backup.s3upload import upload_file

logging.basicConfig(
format='%(asctime)s.%(msecs)03d: %(message)s',
datefmt='%Y-%m-%dT%H:%M:%S',
level=logging.INFO
)

def main():
#
# Parse Arguments
#
args = parse_args()
output_directory = os.path.realpath(args.output_directory)
zip_directory = os.path.realpath(args.output_zip)
output_s3 = args.output_s3

#
# Setup folders
#
if not os.path.isdir(output_directory):
log_info('Create output directory {0}'.format(output_directory))
mkdir_p(output_directory)

if not os.path.isdir(zip_directory):
log_info('Create ZIP directory {0}'.format(zip_directory))
mkdir_p(zip_directory)

if args.lfs_clone:
check_git_lfs_install()

if not args.as_app:
log_info('Backing up user {0} to {1}'.format(args.user, output_directory))
authenticated_user = get_authenticated_user(args)
else:
authenticated_user = {'login': None}

#
# Retrieve data
#
repositories = retrieve_repositories(args, authenticated_user)
repositories = filter_repositories(args, repositories)
backup_repositories(args, output_directory, repositories)
backup_account(args, output_directory)
log_info("[!] Ingestion complete")

#
# Zip content
#
today_date = datetime.today().strftime('%Y-%m-%d')
zip_name = f"{today_date}_github_backup.zip"
fname = os.path.join(zip_directory, zip_name)
log_info(f"[!] Zipping output folder: {fname}")
do_zip(output_directory, fname)

#
# Sync to S3
#
log_info(f"[!] Uploading ZIP to S3: {output_s3}/{zip_name}")
upload_file(fname, output_s3, object_name=zip_name)

log_info("[!] Completed!")


if __name__ == '__main__':
try:
main()
except Exception as e:
log_warning(str(e))
sys.exit(1)
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
__version__ = '0.39.0'
Loading