Skip to content

QuantEcon/workflow-backups

Repository files navigation

QuantEcon Repository Backup Workflow

GitHub Python License: MIT

A centralized workflow for backing up QuantEcon repositories to AWS S3.

Overview

This workflow automatically backs up GitHub repositories to AWS S3 for disaster recovery and compliance. It runs from this single repository and backs up all matching repositories across the organization. It supports pattern-based repository selection, allowing you to backup specific repositories from an organization.

Features

  • Pattern-based selection: Use regex patterns to select which repositories to backup
  • Mirror backups: Complete repository backups including all branches, tags, and history
  • S3 storage: Secure storage in AWS S3 with upload verification
  • Skip existing: Avoid redundant backups with automatic duplicate detection
  • Backup reporting: Generate reports on backup status and storage usage

Quick Start

1. Configure AWS IAM for OIDC Authentication (Recommended)

OIDC authentication is more secure than static credentials—no long-lived secrets to manage.

Create IAM Identity Provider

aws iam create-open-id-connect-provider \
  --url https://token.actions.githubusercontent.com \
  --client-id-list sts.amazonaws.com

Create IAM Role with Trust Policy

Create a file trust-policy.json:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::ACCOUNT_ID:oidc-provider/token.actions.githubusercontent.com"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "token.actions.githubusercontent.com:aud": "sts.amazonaws.com"
        },
        "StringLike": {
          "token.actions.githubusercontent.com:sub": "repo:YOUR_ORG/workflow-backups:*"
        }
      }
    }
  ]
}

Create the role:

aws iam create-role \
  --role-name GitHubActionsBackupRole \
  --assume-role-policy-document file://trust-policy.json

Attach S3 Permissions

Create s3-policy.json:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject",
        "s3:ListBucket",
        "s3:HeadObject"
      ],
      "Resource": [
        "arn:aws:s3:::YOUR_BUCKET_NAME",
        "arn:aws:s3:::YOUR_BUCKET_NAME/*"
      ]
    }
  ]
}
aws iam put-role-policy \
  --role-name GitHubActionsBackupRole \
  --policy-name S3BackupAccess \
  --policy-document file://s3-policy.json

2. Configure GitHub Repository

Add the following secret to your repository:

  • AWS_ROLE_ARN: The ARN of the IAM role (e.g., arn:aws:iam::123456789012:role/GitHubActionsBackupRole)

Optionally add a variable:

  • AWS_REGION: AWS region (default: us-east-1)

GitHub Token Permissions

Repository Type Token Required Notes
Public repos GITHUB_TOKEN (default) Works automatically, no setup needed
Private repos Personal Access Token (PAT) Requires repo scope

For public repositories only, the default GITHUB_TOKEN works out of the box.

To backup private repositories, create a PAT with repo scope and add it as a secret:

  • REPO_BACKUP_TOKEN: PAT with repo scope for private repository access

The workflow automatically uses REPO_BACKUP_TOKEN if available, falling back to GITHUB_TOKEN.

3. Create Configuration File

Create config.yml in your repository:

backup:
  enabled: true
  organization: "your-org"
  
  # Skip archived repositories (default: false)
  exclude_archived: true
  
  # Exact repository names (simple, no regex needed)
  repositories:
    - "my-important-repo"
    - "another.repo"
  
  # Regex patterns for matching multiple repos
  patterns:
    - "lecture-.*"      # Backup repos starting with "lecture-"
    - "quantecon-.*"    # Backup repos starting with "quantecon-"
  
  # Exclude specific repos by exact name
  exclude_repositories:
    - "testing-repo"
    - "codespaces-test"
  
  # Exclude repos matching patterns (applied after include matching)
  exclude_patterns:
    - ".*-test$"        # Exclude repos ending in -test
    - "test-.*"         # Exclude repos starting with test-
  
  # Metadata backup (GitHub issues, releases, etc.)
  # WARNING: Issues backup requires many API calls - see issue #3
  backup_metadata:
    issues: false       # Export issues with comments (disabled by default)
  
  s3:
    bucket: "your-backup-bucket"
    region: "us-east-1"
    prefix: "backups/"

4. Use the Workflow

The included workflow (.github/workflows/backup.yml) runs weekly and can be triggered manually:

name: Repository Backup
on:
  schedule:
    - cron: '0 2 * * 0'  # Weekly on Sunday at 2 AM UTC
  workflow_dispatch:

jobs:
  backup:
    runs-on: ubuntu-latest
    permissions:
      id-token: write
      contents: read
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      - run: pip install -r requirements.txt
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
          aws-region: us-east-1
      - run: python -m src.main --config config.yml --task backup
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

Local Development

# Clone repository
git clone https://github.com/quantecon/workflow-backups.git
cd workflow-backups

# Set up environment
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Configure AWS credentials locally
export AWS_ACCESS_KEY_ID="your_key"
export AWS_SECRET_ACCESS_KEY="your_secret"
export GITHUB_TOKEN="your_token"

# Run backup
python -m src.main --config config.yml --task backup

# Run tests
pytest tests/

CLI Options

python -m src.main --help

Options:
  --config PATH       Path to configuration file (default: config.yml)
  --task {backup,report}  Task to run (default: backup)
  --organization ORG  Override organization from config
  --force            Force backup even if today's backup exists
  --verbose          Enable debug logging

Backup Storage Structure

Backups are stored in S3 with the following structure:

s3://bucket-name/
├── repo-name/
│   ├── repo-name-20251127.tar.gz
│   ├── repo-name-20251120.tar.gz
│   ├── repo-name-issues-20251127.json   # If issues backup enabled
│   └── repo-name-20251113.tar.gz
└── another-repo/
    └── another-repo-20251127.tar.gz

Each backup includes metadata:

  • Repository full name
  • Backup timestamp
  • Default branch
  • Archive size

Restoring a Backup

To restore a repository from a backup:

# 1. Download the backup from S3
aws s3 cp s3://bucket-name/repo-name/repo-name-20251127.tar.gz .

# 2. Extract the archive (contains a bare git mirror)
tar -xzf repo-name-20251127.tar.gz

# 3. Clone from the bare repo to create a working repository
git clone repo-name restored-repo

# 4. You now have a full working repository
cd restored-repo
git branch -a   # View all branches
git tag         # View all tags

The backup is a complete git mirror including all branches, tags, and full commit history.

Security

Read-Only Operations

This workflow is designed to be completely read-only with respect to GitHub repositories:

  • No GitHub write operations: The code only uses read operations (get_organization(), get_repos(), get_issues(), repository/issue property access)
  • No git push/commit: Only git clone --mirror is used (downloads only, never pushes)
  • No repository modifications: Source repositories are never modified in any way

The only write operations are:

  • S3 uploads: Backup archives are uploaded to your S3 bucket
  • Issue updates: Backup reports are posted to issues in this repository (workflow-backups)

Token Permissions

Token Required Scopes Purpose
GITHUB_TOKEN contents: read Clone public repos
REPO_BACKUP_TOKEN (PAT) Contents: Read-only Clone private repos
AWS Role s3:PutObject, s3:GetObject, s3:ListBucket S3 backup storage

Best Practices

  • Use OIDC authentication for AWS (no static credentials)
  • Use fine-grained PATs with minimal scopes
  • Store all tokens/credentials as GitHub Secrets
  • Never commit credentials to the repository

Technology Stack

  • Language: Python 3.9+
  • Cloud Storage: AWS S3
  • GitHub API: PyGithub
  • AWS SDK: boto3
  • Testing: pytest (88% code coverage)

License

MIT License - see LICENSE for details.

Support

  • Open an issue in this repository
  • Contact the QuantEcon development team

About

GitHub workflow for managing QuantEcon backups to S3

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

  •  

Packages

No packages published