Skip to content

EgidioBezerra/GD-Downloader

Repository files navigation

GD-Downloader

Python License Platform

A powerful and flexible Google Drive downloader with pause/resume support, view-only file handling, and extensive testing infrastructure.

🌟 Features

  • βœ… Multiple Download Modes: Standard downloads, view-only file extraction, video downloads
  • βœ… Pause/Resume System: Robust checkpoint system for large downloads
  • βœ… View-Only Support: Download view-only PDFs and documents with advanced browser automation
  • βœ… Video Downloads: Extract streaming videos from Google Drive
  • βœ… OCR Support: Make PDFs searchable with Tesseract OCR (optional)
  • βœ… Parallel Processing: Multi-threaded downloads for better performance
  • βœ… Internationalization: Multi-language support (i18n)
  • βœ… Comprehensive Testing: 90%+ test coverage with robust testing infrastructure
  • βœ… Rich CLI Interface: Beautiful command-line interface with progress tracking

πŸš€ Quick Start

Prerequisites

  • Python 3.8 or higher
  • Google Drive API credentials (credentials.json)
  • FFmpeg (for video downloads)
  • Tesseract OCR (optional, for searchable PDFs)
  • Playwright browsers (for view-only downloads)

Installation

Option 1: Install from Source (Recommended)

# Clone the repository
git clone https://github.com/yourusername/gd-downloader.git
cd gd-downloader

# Install with test dependencies
pip install -e .[test]

# Install optional dependencies
pip install ocrmypdf  # For OCR support
pip install playwright  # For view-only downloads
playwright install chromium  # Install browser

Option 2: Install Basic Version

pip install gd-downloader

Configuration

  1. Google Drive API Setup:

    • Create Google Cloud Project
    • Enable Google Drive API
    • Create OAuth 2.0 credentials
    • Download credentials.json to project root
  2. FFmpeg Setup (for videos):

    # Windows
    choco install ffmpeg
    
    # macOS
    brew install ffmpeg
    
    # Linux
    sudo apt-get install ffmpeg
  3. Tesseract Setup (optional, for OCR):

    # Windows
    choco install tesseract
    
    # macOS
    brew install tesseract
    
    # Linux
    sudo apt-get install tesseract-ocr

πŸ“– Usage

Basic Download

# Download a single folder
python main.py "https://drive.google.com/drive/folders/YOUR_FOLDER_ID"

# Download to specific directory
python main.py "https://drive.google.com/drive/folders/YOUR_FOLDER_ID" --output "/path/to/downloads"

# Download with progress tracking
python main.py "https://drive.google.com/drive/folders/YOUR_FOLDER_ID" --progress

Advanced Options

# Download only documents (skip videos)
python main.py "URL" --only-docs

# Download with OCR support
python main.py "URL" --ocr --ocr-lang "por+eng"

# Download with parallel processing
python main.py "URL" --workers 10

# Download view-only PDFs
python main.py "URL" --view-only

# Download with pause/resume support
python main.py "URL" --checkpoint-interval 10

View-Only Downloads

# Download view-only PDFs with browser automation
python main.py "URL" --view-only --scroll-speed 50

# Download view-only with OCR
python main.py "URL" --view-only --ocr

# Download with custom browser settings
python main.py "URL" --view-only --user-agent "custom-agent-string"

Video Downloads

# Download videos from Google Drive
python main.py "URL" --only-videos

# Download with GPU acceleration
python main.py "URL" --only-videos --gpu nvidia

# Download with custom quality
python main.py "URL" --only-videos --quality high

πŸ§ͺ Testing

The project includes a comprehensive testing infrastructure designed to ensure reliability and maintainability. For complete testing instructions, see docs/TESTING_GUIDE.md.

Quick Tests

# Run quick validation (recommended for development)
python scripts/quick_test.py

# Run all unit tests
python -m pytest tests/unit/ -v

# Run tests with coverage
python -m pytest tests/unit/ --cov=. --cov-report=html

# Run critical tests only (fast)
python -m pytest tests/unit/ -m "critical" -v

Test Scripts

# Quick validation script
python scripts/quick_test.py

# Comprehensive functionality test
python scripts/test_functionality.py

# Full test suite with all categories
python run_tests.py --all --coverage

# Run specific test categories
python run_tests.py --unit --integration
python run_tests.py --e2e --performance

Test Categories

  • Unit Tests: Individual component testing (tests/unit/)
  • Integration Tests: Multi-component interaction testing (tests/integration/)
  • End-to-End Tests: Full workflow testing (tests/e2e/)
  • Performance Tests: Load and stress testing

Coverage Reports

  • HTML report: htmlcov/index.html
  • Terminal report: Use --cov-report=term-missing
  • Minimum coverage: 85% for unit tests

πŸ“ Project Structure

gd-downloader/
β”œβ”€β”€ main.py                 # Main application entry point
β”œβ”€β”€ auth_drive.py           # Google Drive authentication
β”œβ”€β”€ downloader.py           # Download logic and orchestration
β”œβ”€β”€ config.py               # Configuration constants and utilities
β”œβ”€β”€ validators.py           # Input validation functions
β”œβ”€β”€ errors.py               # Custom exception classes
β”œβ”€β”€ checkpoint.py           # Pause/resume system
β”œβ”€β”€ i18n.py                 # Internationalization system
β”œβ”€β”€ ui.py                   # Rich CLI interface
β”œβ”€β”€ logger.py               # Advanced logging system
β”œβ”€β”€ requirements.txt         # Production dependencies
β”œβ”€β”€ pyproject.toml          # Project configuration
β”œβ”€β”€ pytest.ini              # Test configuration
β”œβ”€β”€ .gitignore              # Git ignore file
β”œβ”€β”€ README.md               # This file
β”œβ”€β”€ LICENSE                 # MIT License
β”œβ”€β”€ 
β”œβ”€β”€ src/                    # Source code
β”œβ”€β”€ docs/                   # Documentation
β”‚   β”œβ”€β”€ TESTING_GUIDE.md
β”‚   β”œβ”€β”€ API_REFERENCE.md
β”‚   └── EXAMPLES.md
β”œβ”€β”€ scripts/                # Utility scripts
β”‚   β”œβ”€β”€ quick_test.py       # Quick validation script
β”‚   β”œβ”€β”€ test_functionality.py # Comprehensive functionality test
β”‚   └── cleanup.py          # Cleanup utilities
β”œβ”€β”€ tests/                  # Complete test suite
β”‚   β”œβ”€β”€ conftest.py         # Global test configuration and fixtures
β”‚   β”œβ”€β”€ unit/               # Unit tests for individual modules
β”‚   β”‚   β”œβ”€β”€ test_basic_validation.py
β”‚   β”‚   β”œβ”€β”€ test_checkpoint.py
β”‚   β”‚   β”œβ”€β”€ test_config.py
β”‚   β”‚   β”œβ”€β”€ test_errors.py
β”‚   β”‚   β”œβ”€β”€ test_i18n.py
β”‚   β”‚   β”œβ”€β”€ test_ui.py
β”‚   β”‚   └── test_validators.py
β”‚   β”œβ”€β”€ integration/        # Integration tests for module interactions
β”‚   β”œβ”€β”€ e2e/                # End-to-end tests for complete workflows
β”‚   β”œβ”€β”€ fixtures/           # Test data and mock factories
β”‚   β”‚   └── mock_data.py
β”‚   └── utils/              # Test utilities and helpers
β”‚       └── test_helpers.py
└── temp/                   # Temporary files

πŸ”§ Configuration

Environment Variables

# Google Drive API
export GOOGLE_CLIENT_ID="your_client_id"
export GOOGLE_CLIENT_SECRET="your_client_secret"

# Download settings
export DEFAULT_WORKERS=5
export MAX_RETRY_ATTEMPTS=5
export DOWNLOAD_TIMEOUT=300

# OCR settings
export OCR_DEFAULT_LANG="por+eng"
export OCR_TESSERACT_PATH="/usr/bin/tesseract"

Configuration File

Create config_local.py for custom settings:

# Custom configuration
DEFAULT_WORKERS = 10
MAX_DOWNLOAD_SIZE = 5 * 1024 * 1024 * 1024  # 5GB
ENABLE_OCR = True
OCR_LANGUAGES = ["por", "eng", "spa"]

🌍 Internationalization

The project supports multiple languages. Current language files are in the lang/ directory.

Adding New Languages

  1. Create language file: lang/your_code.lang
  2. Add translations following the JSON format
  3. Update i18n.py to include the new language

Supported Languages

  • English (en) - Default
  • Portuguese (por)
  • Spanish (spa)
  • French (fra) - Coming soon
  • German (deu) - Coming soon

🀝 Contributing

Contributions are welcome! Please read CONTRIBUTING.md for guidelines.

Development Setup

# Clone the repository
git clone https://github.com/yourusername/gd-downloader.git
cd gd-downloader

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install development dependencies
pip install -e .[test,dev]

# Install pre-commit hooks
pre-commit install

# Run tests
python scripts/quick_test.py
python -m pytest tests/unit/ -v

Submitting Changes

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests for new functionality
  5. Ensure all tests pass
  6. Submit a pull request

πŸ“š Documentation

πŸ› Troubleshooting

Common Issues

1. Authentication Errors

Error: Invalid credentials.json format
Solution: Ensure credentials.json is properly formatted JSON

2. Download Failures

Error: Permission denied
Solution: Check file permissions and disk space

3. View-Only Issues

Error: Browser automation failed
Solution: Install Playwright: pip install playwright && playwright install

4. OCR Issues

Error: Tesseract not found
Solution: Install Tesseract OCR: choco install tesseract (Windows)

Getting Help

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

🀝 Credits

  • Google Drive API for file access
  • Playwright for browser automation
  • Rich for CLI interface
  • PyAutoGUI for scroll simulation
  • OCRmyPDF for searchable PDFs

πŸ“ˆ Roadmap

  • Web interface (Flask/FastAPI)
  • REST API for remote access
  • Desktop application (Electron/Tkinter)
  • Cloud storage integration (Dropbox, OneDrive)
  • Torrent client integration
  • Machine learning for file categorization

πŸ“ž Support

For support and questions:


Made with ❀️ for the Google Drive community

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages