This runbook provides detailed operational procedures for managing backups in the transcript-create system. It covers routine operations, troubleshooting, and maintenance tasks.
- Daily Operations
- Backup Scripts Usage
- Monitoring
- Troubleshooting
- Maintenance Tasks
- Cloud Storage Operations
The backup service runs automatically via cron on the following schedule:
| Task | Schedule | Script |
|---|---|---|
| Database Backup | Daily 2:00 AM UTC | backup_db.sh |
| Media Backup | Daily 3:00 AM UTC | backup_media.sh |
| Backup Verification | Weekly Sunday 4:00 AM UTC | verify_backup.sh |
View recent backup logs:
# Database backup logs
tail -100 /backups/logs/backup_$(date +%Y%m%d).log
# Media backup logs
tail -100 /backups/media/logs/media_backup_*.log | tail -100
# Cron logs
docker compose logs backup | tail -50Check latest backup age:
# Database backups
ls -lht /backups/daily/ | head -5
# Media backups
ls -ld /backups/media/currentVerify backup sizes:
# Database backup size
du -sh /backups/daily/transcripts_daily_*.sql.gz | tail -1
# Total backup storage
du -sh /backups/Location: scripts/backup_db.sh
Basic usage:
cd /path/to/transcript-create # Change this to your project root directory
export DATABASE_URL="postgresql+psycopg://postgres:postgres@localhost:5432/transcripts"
./scripts/backup_db.shWith encryption:
export BACKUP_ENCRYPT=true
export BACKUP_GPG_RECIPIENT="backup@example.com"
./scripts/backup_db.sh --encryptVerify existing backups:
./scripts/backup_db.sh --verify-onlyManual backup (outside schedule):
# Run inside Docker container
docker compose exec backup bash -c "cd /scripts && ./backup_db.sh"
# Or from host with Docker environment
docker compose run --rm backup bash -c "cd /scripts && ./backup_db.sh"Environment variables:
DATABASE_URL # PostgreSQL connection string (required)
BACKUP_DIR # Backup directory (default: /backups)
BACKUP_ENCRYPT # Enable encryption (default: false)
BACKUP_GPG_RECIPIENT # GPG recipient email
BACKUP_RETENTION_DAILY # Days to keep daily backups (default: 7)
BACKUP_RETENTION_WEEKLY # Weeks to keep weekly backups (default: 4)
BACKUP_RETENTION_MONTHLY # Months to keep monthly backups (default: 12)
BACKUP_S3_BUCKET # S3 bucket for remote storage
BACKUP_GCS_BUCKET # GCS bucket for remote storage
BACKUP_AZURE_CONTAINER # Azure container for remote storageLocation: scripts/backup_media.sh
Basic usage:
export MEDIA_SOURCE_DIR=/data
export MEDIA_BACKUP_DIR=/backups/media
./scripts/backup_media.shFull backup (no incremental):
./scripts/backup_media.sh --fullVerify media backups:
./scripts/backup_media.sh --verify-onlyEnvironment variables:
MEDIA_SOURCE_DIR # Source directory (default: /data)
MEDIA_BACKUP_DIR # Backup directory (default: /backups/media)
MEDIA_RETENTION_DAYS # Retention in days (default: 30)
BACKUP_S3_BUCKET # S3 bucket for remote storage
BACKUP_GCS_BUCKET # GCS bucket for remote storage
BACKUP_AZURE_CONTAINER # Azure container for remote storageLocation: scripts/restore_db.sh
List available backups:
./scripts/restore_db.sh --list-backupsRestore from specific backup:
export DATABASE_URL="postgresql+psycopg://postgres:postgres@localhost:5432/transcripts"
./scripts/restore_db.sh --backup-file /backups/daily/transcripts_daily_20241026_020000.sql.gzForce restore (skip confirmation):
./scripts/restore_db.sh --backup-file /backups/daily/transcripts_daily_20241026_020000.sql.gz --forcePoint-in-time recovery guidance:
./scripts/restore_db.sh --pitr --target-time '2024-10-26 01:30:00'Location: scripts/verify_backup.sh
Basic verification:
./scripts/verify_backup.shWith test restore:
export DATABASE_URL="postgresql+psycopg://postgres:postgres@localhost:5432/transcripts"
./scripts/verify_backup.sh --test-restoreCustom backup age threshold:
./scripts/verify_backup.sh --max-age-hours 48Exit codes:
0- All verifications passed1- Verification failures detected2- Backups too old or missing
Export backup metrics for monitoring:
Create a script to export metrics: scripts/export_backup_metrics.sh
#!/bin/bash
# Export backup metrics to file for node_exporter textfile collector
METRICS_FILE="/var/lib/node_exporter/textfile_collector/backups.prom"
# Database backup metrics
if [[ -f /backups/daily/transcripts_daily_*.sql.gz ]]; then
latest_backup=$(ls -t /backups/daily/transcripts_daily_*.sql.gz | head -1)
backup_time=$(stat -c %Y "$latest_backup")
backup_size=$(stat -c %s "$latest_backup")
cat > "$METRICS_FILE" <<EOF
# HELP backup_last_success_timestamp Unix timestamp of last successful backup
# TYPE backup_last_success_timestamp gauge
backup_last_success_timestamp{type="database"} $backup_time
# HELP backup_size_bytes Size of latest backup in bytes
# TYPE backup_size_bytes gauge
backup_size_bytes{type="database"} $backup_size
# HELP backup_last_status Exit status of last backup (0=success)
# TYPE backup_last_status gauge
backup_last_status{type="database"} 0
EOF
fiKey panels to include:
-
Backup Age
(time() - backup_last_success_timestamp{type="database"}) / 3600 -
Backup Size Trend
backup_size_bytes{type="database"} -
Backup Success Rate
rate(backup_last_status{type="database"}[24h]) -
Storage Usage
node_filesystem_avail_bytes{mountpoint="/backups"}
Example Prometheus alerts:
groups:
- name: backup_alerts
rules:
- alert: BackupTooOld
expr: time() - backup_last_success_timestamp > 93600 # 26 hours
for: 5m
labels:
severity: critical
annotations:
summary: "Backup is too old"
description: "Last backup was {{ $value | humanizeDuration }} ago"
- alert: BackupSizeIncrease
expr: |
(backup_size_bytes - backup_size_bytes offset 7d) / backup_size_bytes offset 7d > 0.5
for: 1h
labels:
severity: warning
annotations:
summary: "Backup size increased significantly"
description: "Backup size increased by {{ $value | humanizePercentage }} in the last week"
- alert: BackupStorageLow
expr: |
node_filesystem_avail_bytes{mountpoint="/backups"} / node_filesystem_size_bytes{mountpoint="/backups"} < 0.1
for: 15m
labels:
severity: warning
annotations:
summary: "Backup storage running low"
description: "Only {{ $value | humanizePercentage }} storage available"Symptoms:
- No recent backup files
- Error messages in logs
- Backup script exits with non-zero status
Diagnosis:
# Check recent logs
tail -100 /backups/logs/backup_$(date +%Y%m%d).log
# Check backup service status
docker compose ps backup
# Check database connectivity
docker compose exec backup pg_isready -h db -U postgresCommon causes and solutions:
-
Database connection failed:
# Verify DATABASE_URL is correct docker compose exec backup printenv DATABASE_URL # Test connection docker compose exec backup psql -c "SELECT version();"
-
Insufficient disk space:
# Check available space df -h /backups # Clean up old backups manually if needed find /backups/daily -name "*.sql.gz" -mtime +7 -delete
-
Permission issues:
# Check backup directory permissions ls -ld /backups/ # Fix permissions chmod 755 /backups chown -R postgres:postgres /backups
-
pg_dump errors:
# Test pg_dump directly docker compose exec db pg_dump -U postgres -d transcripts > /tmp/test.sql # Check PostgreSQL logs docker compose logs db | grep -i error
Symptoms:
- Restore script exits with error
- Database not restored correctly
- Missing tables or data
Diagnosis:
# Verify backup file integrity
gzip -t /backups/daily/transcripts_daily_*.sql.gz
# Check backup contents
gunzip -c /backups/daily/transcripts_daily_*.sql.gz | head -100
# Verify checksum
sha256sum -c /backups/daily/transcripts_daily_*.sql.gz.sha256Solutions:
-
Corrupted backup file:
# Use an older backup ./scripts/restore_db.sh --list-backups ./scripts/restore_db.sh --backup-file /backups/daily/older_backup.sql.gz -
Database connection issues:
# Verify database is running docker compose ps db # Verify connection docker compose exec db psql -U postgres -c "SELECT 1;"
-
Insufficient disk space:
# Check database volume space docker system df -v # Clean up if needed docker volume prune
Symptoms:
- Backup size significantly larger or smaller than usual
- Disk space filling up quickly
Diagnosis:
# Compare backup sizes
ls -lh /backups/daily/ | tail -10
# Check database size
docker compose exec db psql -U postgres -c "
SELECT pg_size_pretty(pg_database_size('transcripts'));
"
# Check table sizes
docker compose exec db psql -U postgres -d transcripts -c "
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 10;
"Actions:
- Review data growth patterns
- Check for runaway queries creating temporary data
- Consider archiving old data
- Review backup compression settings
Symptoms:
- Backups not appearing in cloud storage
- Upload errors in logs
Diagnosis:
# Check cloud CLI availability
docker compose exec backup which aws
docker compose exec backup which gsutil
docker compose exec backup which az
# Test cloud credentials
docker compose exec backup aws s3 ls
docker compose exec backup gsutil ls
docker compose exec backup az storage account listSolutions:
-
Missing credentials:
# Configure AWS credentials docker compose exec backup aws configure # Or set environment variables export AWS_ACCESS_KEY_ID=... export AWS_SECRET_ACCESS_KEY=...
-
Network issues:
# Test connectivity docker compose exec backup curl -I https://s3.amazonaws.com # Check proxy settings if behind firewall export HTTP_PROXY=... export HTTPS_PROXY=...
-
Bucket permissions:
# Verify bucket access aws s3api head-bucket --bucket your-backup-bucket # Check IAM permissions for the user/role
-
Review backup logs:
# Check for errors or warnings grep -i "error\|warning\|failed" /backups/logs/backup_*.log | tail -50
-
Verify backup integrity:
./scripts/verify_backup.sh
-
Check storage usage:
du -sh /backups/* df -h /backups
-
Test restore in staging:
# Copy latest backup to staging scp /backups/daily/latest.sql.gz staging:/tmp/ # Restore on staging ssh staging "cd /app && ./scripts/restore_db.sh --backup-file /tmp/latest.sql.gz --force" # Verify staging functionality
-
Review retention policy:
# Count backups by type echo "Daily: $(ls /backups/daily/*.sql.gz | wc -l)" echo "Weekly: $(ls /backups/weekly/*.sql.gz | wc -l)" echo "Monthly: $(ls /backups/monthly/*.sql.gz | wc -l)"
-
Audit cloud storage costs:
# AWS S3 aws s3 ls s3://your-bucket/backups/ --recursive --summarize --human-readable # GCS gsutil du -sh gs://your-bucket/backups/
-
Disaster recovery drill (see Disaster Recovery Plan)
-
Review and update documentation
-
Security audit:
- Review backup access logs
- Rotate encryption keys
- Update cloud credentials
- Review IAM policies
-
Capacity planning:
- Analyze backup growth trends
- Project storage needs for next quarter
- Adjust retention policies if needed
Upload backup manually:
aws s3 cp /backups/daily/transcripts_daily_20241026.sql.gz \
s3://your-bucket/backups/postgres/ \
--storage-class GLACIERList backups:
aws s3 ls s3://your-bucket/backups/postgres/ --recursive --human-readableDownload backup:
aws s3 cp s3://your-bucket/backups/postgres/transcripts_daily_20241026.sql.gz \
/tmp/restore.sql.gzConfigure lifecycle policy:
aws s3api put-bucket-lifecycle-configuration \
--bucket your-bucket \
--lifecycle-configuration file://s3-lifecycle.jsonUpload backup:
gsutil cp /backups/daily/transcripts_daily_20241026.sql.gz \
gs://your-bucket/backups/postgres/List backups:
gsutil ls -lh gs://your-bucket/backups/postgres/Download backup:
gsutil cp gs://your-bucket/backups/postgres/transcripts_daily_20241026.sql.gz \
/tmp/restore.sql.gzSet storage class:
gsutil rewrite -s NEARLINE \
gs://your-bucket/backups/postgres/transcripts_daily_20241026.sql.gzUpload backup:
az storage blob upload \
--account-name youraccount \
--container-name backups \
--name postgres/transcripts_daily_20241026.sql.gz \
--file /backups/daily/transcripts_daily_20241026.sql.gz \
--tier CoolList backups:
az storage blob list \
--account-name youraccount \
--container-name backups \
--prefix postgres/ \
--output tableDownload backup:
az storage blob download \
--account-name youraccount \
--container-name backups \
--name postgres/transcripts_daily_20241026.sql.gz \
--file /tmp/restore.sql.gz# Check last backup
ls -lht /backups/daily/ | head -1
# Manual database backup
docker compose exec backup bash -c "cd /scripts && ./backup_db.sh"
# Manual media backup
docker compose exec backup bash -c "cd /scripts && ./backup_media.sh"
# Verify backups
docker compose exec backup bash -c "cd /scripts && ./verify_backup.sh"
# List available backups
docker compose exec backup bash -c "cd /scripts && ./restore_db.sh --list-backups"
# Check backup service logs
docker compose logs -f backup
# Check storage space
df -h /backups
du -sh /backups/*transcripts_{type}_{timestamp}.sql.gz[.gpg]
Where:
type = daily | weekly | monthly
timestamp = YYYYMMDD_HHMMSS
.gpg = optional encryption suffix
Examples:
transcripts_daily_20241026_020000.sql.gz
transcripts_weekly_20241020_020000.sql.gz.gpg
transcripts_monthly_20241001_020000.sql.gz
Last Updated: 2024-10-26
Next Review: 2025-01-26