A powerful and flexible Google Drive downloader with pause/resume support, view-only file handling, and extensive testing infrastructure.
- β Multiple Download Modes: Standard downloads, view-only file extraction, video downloads
- β Pause/Resume System: Robust checkpoint system for large downloads
- β View-Only Support: Download view-only PDFs and documents with advanced browser automation
- β Video Downloads: Extract streaming videos from Google Drive
- β OCR Support: Make PDFs searchable with Tesseract OCR (optional)
- β Parallel Processing: Multi-threaded downloads for better performance
- β Internationalization: Multi-language support (i18n)
- β Comprehensive Testing: 90%+ test coverage with robust testing infrastructure
- β Rich CLI Interface: Beautiful command-line interface with progress tracking
- Python 3.8 or higher
- Google Drive API credentials (
credentials.json) - FFmpeg (for video downloads)
- Tesseract OCR (optional, for searchable PDFs)
- Playwright browsers (for view-only downloads)
# Clone the repository
git clone https://github.com/yourusername/gd-downloader.git
cd gd-downloader
# Install with test dependencies
pip install -e .[test]
# Install optional dependencies
pip install ocrmypdf # For OCR support
pip install playwright # For view-only downloads
playwright install chromium # Install browserpip install gd-downloader-
Google Drive API Setup:
- Create Google Cloud Project
- Enable Google Drive API
- Create OAuth 2.0 credentials
- Download
credentials.jsonto project root
-
FFmpeg Setup (for videos):
# Windows choco install ffmpeg # macOS brew install ffmpeg # Linux sudo apt-get install ffmpeg
-
Tesseract Setup (optional, for OCR):
# Windows choco install tesseract # macOS brew install tesseract # Linux sudo apt-get install tesseract-ocr
# Download a single folder
python main.py "https://drive.google.com/drive/folders/YOUR_FOLDER_ID"
# Download to specific directory
python main.py "https://drive.google.com/drive/folders/YOUR_FOLDER_ID" --output "/path/to/downloads"
# Download with progress tracking
python main.py "https://drive.google.com/drive/folders/YOUR_FOLDER_ID" --progress# Download only documents (skip videos)
python main.py "URL" --only-docs
# Download with OCR support
python main.py "URL" --ocr --ocr-lang "por+eng"
# Download with parallel processing
python main.py "URL" --workers 10
# Download view-only PDFs
python main.py "URL" --view-only
# Download with pause/resume support
python main.py "URL" --checkpoint-interval 10# Download view-only PDFs with browser automation
python main.py "URL" --view-only --scroll-speed 50
# Download view-only with OCR
python main.py "URL" --view-only --ocr
# Download with custom browser settings
python main.py "URL" --view-only --user-agent "custom-agent-string"# Download videos from Google Drive
python main.py "URL" --only-videos
# Download with GPU acceleration
python main.py "URL" --only-videos --gpu nvidia
# Download with custom quality
python main.py "URL" --only-videos --quality highThe project includes a comprehensive testing infrastructure designed to ensure reliability and maintainability. For complete testing instructions, see docs/TESTING_GUIDE.md.
# Run quick validation (recommended for development)
python scripts/quick_test.py
# Run all unit tests
python -m pytest tests/unit/ -v
# Run tests with coverage
python -m pytest tests/unit/ --cov=. --cov-report=html
# Run critical tests only (fast)
python -m pytest tests/unit/ -m "critical" -v# Quick validation script
python scripts/quick_test.py
# Comprehensive functionality test
python scripts/test_functionality.py
# Full test suite with all categories
python run_tests.py --all --coverage
# Run specific test categories
python run_tests.py --unit --integration
python run_tests.py --e2e --performance- Unit Tests: Individual component testing (
tests/unit/) - Integration Tests: Multi-component interaction testing (
tests/integration/) - End-to-End Tests: Full workflow testing (
tests/e2e/) - Performance Tests: Load and stress testing
- HTML report:
htmlcov/index.html - Terminal report: Use
--cov-report=term-missing - Minimum coverage: 85% for unit tests
gd-downloader/
βββ main.py # Main application entry point
βββ auth_drive.py # Google Drive authentication
βββ downloader.py # Download logic and orchestration
βββ config.py # Configuration constants and utilities
βββ validators.py # Input validation functions
βββ errors.py # Custom exception classes
βββ checkpoint.py # Pause/resume system
βββ i18n.py # Internationalization system
βββ ui.py # Rich CLI interface
βββ logger.py # Advanced logging system
βββ requirements.txt # Production dependencies
βββ pyproject.toml # Project configuration
βββ pytest.ini # Test configuration
βββ .gitignore # Git ignore file
βββ README.md # This file
βββ LICENSE # MIT License
βββ
βββ src/ # Source code
βββ docs/ # Documentation
β βββ TESTING_GUIDE.md
β βββ API_REFERENCE.md
β βββ EXAMPLES.md
βββ scripts/ # Utility scripts
β βββ quick_test.py # Quick validation script
β βββ test_functionality.py # Comprehensive functionality test
β βββ cleanup.py # Cleanup utilities
βββ tests/ # Complete test suite
β βββ conftest.py # Global test configuration and fixtures
β βββ unit/ # Unit tests for individual modules
β β βββ test_basic_validation.py
β β βββ test_checkpoint.py
β β βββ test_config.py
β β βββ test_errors.py
β β βββ test_i18n.py
β β βββ test_ui.py
β β βββ test_validators.py
β βββ integration/ # Integration tests for module interactions
β βββ e2e/ # End-to-end tests for complete workflows
β βββ fixtures/ # Test data and mock factories
β β βββ mock_data.py
β βββ utils/ # Test utilities and helpers
β βββ test_helpers.py
βββ temp/ # Temporary files
# Google Drive API
export GOOGLE_CLIENT_ID="your_client_id"
export GOOGLE_CLIENT_SECRET="your_client_secret"
# Download settings
export DEFAULT_WORKERS=5
export MAX_RETRY_ATTEMPTS=5
export DOWNLOAD_TIMEOUT=300
# OCR settings
export OCR_DEFAULT_LANG="por+eng"
export OCR_TESSERACT_PATH="/usr/bin/tesseract"Create config_local.py for custom settings:
# Custom configuration
DEFAULT_WORKERS = 10
MAX_DOWNLOAD_SIZE = 5 * 1024 * 1024 * 1024 # 5GB
ENABLE_OCR = True
OCR_LANGUAGES = ["por", "eng", "spa"]The project supports multiple languages. Current language files are in the lang/ directory.
- Create language file:
lang/your_code.lang - Add translations following the JSON format
- Update
i18n.pyto include the new language
- English (en) - Default
- Portuguese (por)
- Spanish (spa)
- French (fra) - Coming soon
- German (deu) - Coming soon
Contributions are welcome! Please read CONTRIBUTING.md for guidelines.
# Clone the repository
git clone https://github.com/yourusername/gd-downloader.git
cd gd-downloader
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install development dependencies
pip install -e .[test,dev]
# Install pre-commit hooks
pre-commit install
# Run tests
python scripts/quick_test.py
python -m pytest tests/unit/ -v- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
- Testing Guide - Comprehensive testing instructions
- API Reference - API documentation
- Examples - Usage examples
- Troubleshooting - Common issues and solutions
Error: Invalid credentials.json format
Solution: Ensure credentials.json is properly formatted JSON
Error: Permission denied
Solution: Check file permissions and disk space
Error: Browser automation failed
Solution: Install Playwright: pip install playwright && playwright install
Error: Tesseract not found
Solution: Install Tesseract OCR: choco install tesseract (Windows)
- Check the Troubleshooting Guide
- Search Issues
- Read the FAQ
This project is licensed under the MIT License - see the LICENSE file for details.
- Google Drive API for file access
- Playwright for browser automation
- Rich for CLI interface
- PyAutoGUI for scroll simulation
- OCRmyPDF for searchable PDFs
- Web interface (Flask/FastAPI)
- REST API for remote access
- Desktop application (Electron/Tkinter)
- Cloud storage integration (Dropbox, OneDrive)
- Torrent client integration
- Machine learning for file categorization
For support and questions:
- Create an Issue
- Check the Documentation
- Join our Discord community
Made with β€οΈ for the Google Drive community