Skip to content

p0sadas/Professional-Unsplash-Image-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Banner

πŸ–ΌοΈ Unsplash Image Scraper

Download free high-quality images from Unsplash with ease

Python Selenium License Code Style

English β€’ EspaΓ±ol


English

πŸ“‹ Table of Contents

✨ Features

  • πŸ” Smart Search - Search for any topic and download related images
  • πŸš€ Automated Scrolling - Automatically loads more images to meet your requirements
  • πŸ“¦ Batch Download - Download multiple images in one go
  • βš™οΈ Configurable - Easy to customize settings and parameters
  • 🎯 CLI Support - Both command-line and interactive modes
  • πŸ“ Comprehensive Logging - Track the scraping process with detailed logs
  • 🧹 Clean Code - Well-structured, documented, and following PEP 8 standards
  • πŸ”’ Error Handling - Robust error handling for network issues and timeouts
  • 🎨 Free License Only - Only downloads images with free licenses from Unsplash

πŸ”§ Prerequisites

Before you begin, ensure you have the following installed:

  • Python 3.8 or higher (Download Python)
  • Google Chrome Browser (latest version recommended)
  • ChromeDriver - Will be automatically managed by Selenium

Note: This scraper uses Selenium WebDriver which will automatically download and manage ChromeDriver for you.

πŸ“₯ Installation

  1. Clone the repository
git clone https://github.com/p0sadas/unsplash-image-scraper.git
cd unsplash-image-scraper
  1. Create a virtual environment (recommended)
# Windows
python -m venv venv
venv\Scripts\activate

# Linux/Mac
python3 -m venv venv
source venv/bin/activate
  1. Install dependencies
pip install -r requirements.txt

πŸš€ Usage

Interactive Mode

Simply run the main script without arguments:

python main.py

You'll be prompted to enter:

  • Search query (e.g., "mountains", "technology", "animals")
  • Number of images to download

Command-Line Mode

# Basic usage (runs in headless mode by default)
python main.py -q "cats" -n 10

# With custom output directory
python main.py -q "nature" -n 25 -o "my_images"

# Show browser window (disable headless mode)
python main.py -q "technology" -n 15 --no-headless

Available Arguments

Argument Short Description Required
--query -q Search query (e.g., 'cat', 'nature') No*
--num-images -n Number of images to download No*
--output -o Output directory (default: downloads) No
--no-headless - Show browser window (headless is default) No
--help -h Show help message No

*If not provided, interactive mode will be used.

βš™οΈ Configuration

You can customize the scraper behavior by modifying src/config.py:

# Timeouts
WEBDRIVER_TIMEOUT = 20  # seconds
SCROLL_PAUSE_TIME = 0.3  # seconds between scrolls

# Output
DOWNLOAD_DIR = BASE_DIR / "downloads"
IMAGE_FORMAT = "jpg"

# Logging
LOG_LEVEL = "INFO"  # DEBUG, INFO, WARNING, ERROR

πŸ“ Project Structure

unsplash-image-scraper/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ __init__.py           # Package initialization
β”‚   β”œβ”€β”€ config.py             # Configuration settings
β”‚   └── unsplash_scraper.py   # Main scraper class
β”œβ”€β”€ downloads/                # Downloaded images (created automatically)
β”œβ”€β”€ main.py                   # Entry point script
β”œβ”€β”€ requirements.txt          # Python dependencies
β”œβ”€β”€ .gitignore               # Git ignore rules
β”œβ”€β”€ LICENSE                  # MIT License
└── README.md               # This file

πŸ’‘ Examples

Example 1: Download Cat Images

python main.py -q "cats" -n 20

Output:

πŸ” Searching for 'cats'...
πŸ“Š Target: 20 images
πŸ“ Output: C:\path\to\downloads

βœ… Found 20 images
πŸ“₯ Downloading images...

✨ Successfully downloaded 20 images!
πŸ“‚ Images saved to: C:\path\to\downloads

Example 2: Using as a Python Module

from src.unsplash_scraper import UnsplashScraper
from pathlib import Path

# Create scraper instance
with UnsplashScraper(headless=True) as scraper:
    # Scrape image URLs
    urls = scraper.scrape_images("mountains", num_images=10)

    # Download images
    output = Path("my_mountains")
    scraper.download_images(urls, output_dir=output)

print(f"Downloaded {len(urls)} images!")

Example 3: Run with Browser Visible

# Show the browser window (useful for debugging)
python main.py -q "abstract art" -n 30 --no-headless

πŸ” Troubleshooting

Issue: "ChromeDriver not found"

Solution: Selenium 4.16+ automatically manages ChromeDriver. Ensure you have the latest version:

pip install --upgrade selenium

Issue: "TimeoutException"

Solution: This usually means the page took too long to load. Try:

  • Increasing WEBDRIVER_TIMEOUT in src/config.py
  • Checking your internet connection
  • Ensuring Unsplash is accessible in your region

Issue: "No images found"

Solution:

  • Try a different search query
  • Ensure you're searching for topics that exist on Unsplash
  • Check if Unsplash has changed their page structure (XPath selectors may need updating)

Issue: "Download fails for some images"

Solution: This is normal - some images may be temporarily unavailable. The scraper will log errors and continue with other images.

🀝 Contributing

Contributions are welcome! Here's how you can help:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Please ensure your code follows PEP 8 style guidelines and includes appropriate documentation.

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

⚠️ Disclaimer

This tool is for educational purposes only. Please respect Unsplash's Terms of Service and API Guidelines. Always give credit to photographers when using their images.

πŸ™ Acknowledgments

  • Unsplash for providing free high-quality images
  • Selenium for web automation capabilities
  • The open-source community for inspiration and support

Made with ❀️ by Angel Posadas

⭐ Star this repo if you found it helpful!

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages