Skip to content

Commit

Permalink
[DOCKER] Add python-github-backup (#51)
Browse files Browse the repository at this point in the history
  • Loading branch information
marco-lancini authored Jun 19, 2021
1 parent 1cb71fb commit e6ad7fb
Show file tree
Hide file tree
Showing 12 changed files with 1,537 additions and 7 deletions.
57 changes: 57 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
.DS_Store
.env

# ==============================================================================
# TERRAFORM
# ==============================================================================
# Local .terraform directories
**/.terraform/*

# .tfstate files
*.tfstate
*.tfstate.*

# Crash log files
crash.log

# Ignore any .tfvars files that are generated automatically for each Terraform run. Most
# .tfvars files are managed as part of configuration and so should be included in
# version control.
#
# example.tfvars

# Ignore override files as they are usually used to override resources locally and so
# are not checked in
override.tf
override.tf.json
*_override.tf
*_override.tf.json

# Include override files you do wish to add to version control using negated pattern
#
# !example_override.tf

# Include tfplan files to ignore the plan output of command: terraform plan -out=tfplan
# example: *tfplan*

# ==============================================================================
# PYTHON
# ==============================================================================
*.py[oc]
__pycache__

# Temp files
*~
~*
.*~
\#*
.#*
*#
dist

# Build files
build
dist
pkg
*.egg
*.egg-info
15 changes: 8 additions & 7 deletions docker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,11 @@ All these images are pushed to [Github Container Registry](https://github.com/ma

## Images

| Image | Description | Pull | Status |
| ------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------- | --------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------ |
| [ansible-worker](ansible-worker/Dockerfile) | Alpine with Ansible, OpenSSH, and sshpass preinstalled | `ghcr.io/marco-lancini/ansible-worker:latest` | ![[DOCKER IMAGE] Ansible Worker](https://github.com/marco-lancini/utils/workflows/%5BDOCKER%20IMAGE%5D%20Ansible%20Worker/badge.svg) |
| [github-changelog-generator](github-changelog-generator/README.md) | Docker image for [github-changelog-generator](https://github.com/github-changelog-generator/github-changelog-generator) | `ghcr.io/marco-lancini/github-changelog-generator:latest` | ![[DOCKER IMAGE] Latex](https://github.com/marco-lancini/utils/workflows/%5BDOCKER%20IMAGE%5D%20Github-ChangeLog-Generator/badge.svg)
| [latex](latex/README.md) | Alpine with texlive preinstalled | `ghcr.io/marco-lancini/latex:latest` | ![[DOCKER IMAGE] Latex](https://github.com/marco-lancini/utils/workflows/%5BDOCKER%20IMAGE%5D%20Latex/badge.svg) |
| [markserv](marksev/README.md) | Image for [Markserv](https://github.com/markserv/markserv) | `ghcr.io/marco-lancini/markserv:latest` | ![[DOCKER IMAGE] Markserv](https://github.com/marco-lancini/utils/workflows/%5BDOCKER%20IMAGE%5D%20Markserv/badge.svg) |
| [nomad](nomad/Dockerfile) | Image for HashiCorp Nomad | `ghcr.io/marco-lancini/nomad:latest` | ![[DOCKER IMAGE] Nomad](https://github.com/marco-lancini/utils/workflows/%5BDOCKER%20IMAGE%5D%20Nomad/badge.svg) |
| Image | Description | Pull | Status |
| ------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
| [ansible-worker](ansible-worker/Dockerfile) | Alpine with Ansible, OpenSSH, and sshpass preinstalled | `ghcr.io/marco-lancini/ansible-worker:latest` | ![[DOCKER IMAGE] Ansible Worker](https://github.com/marco-lancini/utils/workflows/%5BDOCKER%20IMAGE%5D%20Ansible%20Worker/badge.svg) |
| [github-changelog-generator](github-changelog-generator/README.md) | Docker image for [github-changelog-generator](https://github.com/github-changelog-generator/github-changelog-generator) | `ghcr.io/marco-lancini/github-changelog-generator:latest` | ![[DOCKER IMAGE] Latex](https://github.com/marco-lancini/utils/workflows/%5BDOCKER%20IMAGE%5D%20Github-ChangeLog-Generator/badge.svg) |
| [latex](latex/README.md) | Alpine with texlive preinstalled | `ghcr.io/marco-lancini/latex:latest` | ![[DOCKER IMAGE] Latex](https://github.com/marco-lancini/utils/workflows/%5BDOCKER%20IMAGE%5D%20Latex/badge.svg) |
| [markserv](marksev/README.md) | Image for [Markserv](https://github.com/markserv/markserv) | `ghcr.io/marco-lancini/markserv:latest` | ![[DOCKER IMAGE] Markserv](https://github.com/marco-lancini/utils/workflows/%5BDOCKER%20IMAGE%5D%20Markserv/badge.svg) |
| [nomad](nomad/Dockerfile) | Image for HashiCorp Nomad | `ghcr.io/marco-lancini/nomad:latest` | ![[DOCKER IMAGE] Nomad](https://github.com/marco-lancini/utils/workflows/%5BDOCKER%20IMAGE%5D%20Nomad/badge.svg) |
| [python-github-backup](python-github-backup/README.md) | Image for a customised version of the [python-github-backup](https://github.com/josegonzalez/python-github-backup) repo, as described in [Github Backups with ECS and S3](https://www.marcolancini.it/2021/blog-github-backups-with-ecs/) | N/A |
18 changes: 18 additions & 0 deletions docker/python-github-backup/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
FROM python:3.9.5-slim-buster

RUN addgroup --gid 11111 app
RUN adduser --shell /bin/false --no-create-home --uid 11111 --gid 11111 app

RUN apt-get update \
&& apt-get install -y --no-install-recommends git \
&& apt-get purge -y --auto-remove \
&& rm -rf /var/lib/apt/lists/*

WORKDIR /src
COPY docker/python-github-backup/python-github-backup /src
RUN pip install -e .

RUN chown -R app:app /src
USER app

ENTRYPOINT ["github-backup"]
7 changes: 7 additions & 0 deletions docker/python-github-backup/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Python Github Backup

Docker image for a customised version of the [python-github-backup](https://github.com/josegonzalez/python-github-backup) repo, as described in [Github Backups with ECS and S3](https://www.marcolancini.it/2021/blog-github-backups-with-ecs/).

In particular, the following has been added:
* Fetch the Github PAT and target user from environment variables
* The data fetched from Github is zipped and uploaded to an S3 bucket
119 changes: 119 additions & 0 deletions docker/python-github-backup/python-github-backup/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# python-github-backup

> **Original Source:** https://github.com/josegonzalez/python-github-backup
The package can be used to backup an *entire* organization or repository, including issues and wikis in the most appropriate format (clones for wikis, json files for issues).


## Usage

```
github-backup [-h] [-u USERNAME] [-p PASSWORD] [-t TOKEN] [--as-app]
[-o OUTPUT_DIRECTORY] [-i] [--starred] [--all-starred]
[--watched] [--followers] [--following] [--all]
[--issues] [--issue-comments] [--issue-events] [--pulls]
[--pull-comments] [--pull-commits] [--pull-details]
[--labels] [--hooks] [--milestones] [--repositories]
[--bare] [--lfs] [--wikis] [--gists] [--starred-gists]
[--skip-existing] [-L [LANGUAGES [LANGUAGES ...]]]
[-N NAME_REGEX] [-H GITHUB_HOST] [-O] [-R REPOSITORY]
[-P] [-F] [--prefer-ssh] [-v]
[--keychain-name OSX_KEYCHAIN_ITEM_NAME]
[--keychain-account OSX_KEYCHAIN_ITEM_ACCOUNT]
[--releases] [--assets] [--throttle-limit THROTTLE_LIMIT]
[--throttle-pause THROTTLE_PAUSE]
USER
Backup a github account
positional arguments:
USER github username
optional arguments:
-h, --help show this help message and exit
-u USERNAME, --username USERNAME
username for basic auth
-p PASSWORD, --password PASSWORD
password for basic auth. If a username is given but
not a password, the password will be prompted for.
-t TOKEN, --token TOKEN
personal access, OAuth, or JSON Web token, or path to
token (file://...)
--as-app authenticate as github app instead of as a user.
-o OUTPUT_DIRECTORY, --output-directory OUTPUT_DIRECTORY
directory at which to backup the repositories
-i, --incremental incremental backup
--starred include JSON output of starred repositories in backup
--all-starred include starred repositories in backup [*]
--watched include JSON output of watched repositories in backup
--followers include JSON output of followers in backup
--following include JSON output of following users in backup
--all include everything in backup (not including [*])
--issues include issues in backup
--issue-comments include issue comments in backup
--issue-events include issue events in backup
--pulls include pull requests in backup
--pull-comments include pull request review comments in backup
--pull-commits include pull request commits in backup
--pull-details include more pull request details in backup [*]
--labels include labels in backup
--hooks include hooks in backup (works only when
authenticated)
--milestones include milestones in backup
--repositories include repository clone in backup
--bare clone bare repositories
--lfs clone LFS repositories (requires Git LFS to be
installed, https://git-lfs.github.com) [*]
--wikis include wiki clone in backup
--gists include gists in backup [*]
--starred-gists include starred gists in backup [*]
--skip-existing skip project if a backup directory exists
-L [LANGUAGES [LANGUAGES ...]], --languages [LANGUAGES [LANGUAGES ...]]
only allow these languages
-N NAME_REGEX, --name-regex NAME_REGEX
python regex to match names against
-H GITHUB_HOST, --github-host GITHUB_HOST
GitHub Enterprise hostname
-O, --organization whether or not this is an organization user
-R REPOSITORY, --repository REPOSITORY
name of repository to limit backup to
-P, --private include private repositories [*]
-F, --fork include forked repositories [*]
--prefer-ssh Clone repositories using SSH instead of HTTPS
-v, --version show program's version number and exit
--keychain-name OSX_KEYCHAIN_ITEM_NAME
OSX ONLY: name field of password item in OSX keychain
that holds the personal access or OAuth token
--keychain-account OSX_KEYCHAIN_ITEM_ACCOUNT
OSX ONLY: account field of password item in OSX
keychain that holds the personal access or OAuth token
--releases include release information, not including assets or
binaries
--assets include assets alongside release information; only
applies if including releases
--throttle-limit THROTTLE_LIMIT
start throttling of GitHub API requests after this
amount of API requests remain
--throttle-pause THROTTLE_PAUSE
wait this amount of seconds when API request
throttling is active (default: 30.0, requires
--throttle-limit to be set)
```


## Examples

Backup all repositories, including private ones:
```
export ACCESS_TOKEN=SOME-GITHUB-TOKEN
github-backup WhiteHouse --token $ACCESS_TOKEN --organization --output-directory /tmp/white-house --repositories --private
```

Backup a single organization repository with everything else (wiki, pull requests, comments, issues etc):
```
export ACCESS_TOKEN=SOME-GITHUB-TOKEN
ORGANIZATION=docker
REPO=cli
# e.g. git@github.com:docker/cli.git
github-backup $ORGANIZATION -P -t $ACCESS_TOKEN -o . --all -O -R $REPO
```
89 changes: 89 additions & 0 deletions docker/python-github-backup/python-github-backup/bin/github-backup
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
#!/usr/bin/env python

import os, sys, logging
from datetime import datetime

from github_backup.github_backup import (
backup_account,
backup_repositories,
check_git_lfs_install,
filter_repositories,
get_authenticated_user,
log_info,
log_warning,
mkdir_p,
parse_args,
retrieve_repositories,
)

from github_backup.zip import do_zip
from github_backup.s3upload import upload_file

logging.basicConfig(
format='%(asctime)s.%(msecs)03d: %(message)s',
datefmt='%Y-%m-%dT%H:%M:%S',
level=logging.INFO
)

def main():
#
# Parse Arguments
#
args = parse_args()
output_directory = os.path.realpath(args.output_directory)
zip_directory = os.path.realpath(args.output_zip)
output_s3 = args.output_s3

#
# Setup folders
#
if not os.path.isdir(output_directory):
log_info('Create output directory {0}'.format(output_directory))
mkdir_p(output_directory)

if not os.path.isdir(zip_directory):
log_info('Create ZIP directory {0}'.format(zip_directory))
mkdir_p(zip_directory)

if args.lfs_clone:
check_git_lfs_install()

if not args.as_app:
log_info('Backing up user {0} to {1}'.format(args.user, output_directory))
authenticated_user = get_authenticated_user(args)
else:
authenticated_user = {'login': None}

#
# Retrieve data
#
repositories = retrieve_repositories(args, authenticated_user)
repositories = filter_repositories(args, repositories)
backup_repositories(args, output_directory, repositories)
backup_account(args, output_directory)
log_info("[!] Ingestion complete")

#
# Zip content
#
today_date = datetime.today().strftime('%Y-%m-%d')
zip_name = f"{today_date}_github_backup.zip"
fname = os.path.join(zip_directory, zip_name)
log_info(f"[!] Zipping output folder: {fname}")
do_zip(output_directory, fname)

#
# Sync to S3
#
log_info(f"[!] Uploading ZIP to S3: {output_s3}/{zip_name}")
upload_file(fname, output_s3, object_name=zip_name)

log_info("[!] Completed!")


if __name__ == '__main__':
try:
main()
except Exception as e:
log_warning(str(e))
sys.exit(1)
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
__version__ = '0.39.0'
Loading

0 comments on commit e6ad7fb

Please sign in to comment.