- Features
- Prerequisites
- Installation
- Usage
- Configuration
- Project Structure
- Examples
- Troubleshooting
- Contributing
- License
- Acknowledgments
- π Smart Search - Search for any topic and download related images
- π Automated Scrolling - Automatically loads more images to meet your requirements
- π¦ Batch Download - Download multiple images in one go
- βοΈ Configurable - Easy to customize settings and parameters
- π― CLI Support - Both command-line and interactive modes
- π Comprehensive Logging - Track the scraping process with detailed logs
- π§Ή Clean Code - Well-structured, documented, and following PEP 8 standards
- π Error Handling - Robust error handling for network issues and timeouts
- π¨ Free License Only - Only downloads images with free licenses from Unsplash
Before you begin, ensure you have the following installed:
- Python 3.8 or higher (Download Python)
- Google Chrome Browser (latest version recommended)
- ChromeDriver - Will be automatically managed by Selenium
Note: This scraper uses Selenium WebDriver which will automatically download and manage ChromeDriver for you.
- Clone the repository
git clone https://github.com/p0sadas/unsplash-image-scraper.git
cd unsplash-image-scraper- Create a virtual environment (recommended)
# Windows
python -m venv venv
venv\Scripts\activate
# Linux/Mac
python3 -m venv venv
source venv/bin/activate- Install dependencies
pip install -r requirements.txtSimply run the main script without arguments:
python main.pyYou'll be prompted to enter:
- Search query (e.g., "mountains", "technology", "animals")
- Number of images to download
# Basic usage (runs in headless mode by default)
python main.py -q "cats" -n 10
# With custom output directory
python main.py -q "nature" -n 25 -o "my_images"
# Show browser window (disable headless mode)
python main.py -q "technology" -n 15 --no-headless| Argument | Short | Description | Required |
|---|---|---|---|
--query |
-q |
Search query (e.g., 'cat', 'nature') | No* |
--num-images |
-n |
Number of images to download | No* |
--output |
-o |
Output directory (default: downloads) | No |
--no-headless |
- | Show browser window (headless is default) | No |
--help |
-h |
Show help message | No |
*If not provided, interactive mode will be used.
You can customize the scraper behavior by modifying src/config.py:
# Timeouts
WEBDRIVER_TIMEOUT = 20 # seconds
SCROLL_PAUSE_TIME = 0.3 # seconds between scrolls
# Output
DOWNLOAD_DIR = BASE_DIR / "downloads"
IMAGE_FORMAT = "jpg"
# Logging
LOG_LEVEL = "INFO" # DEBUG, INFO, WARNING, ERRORunsplash-image-scraper/
βββ src/
β βββ __init__.py # Package initialization
β βββ config.py # Configuration settings
β βββ unsplash_scraper.py # Main scraper class
βββ downloads/ # Downloaded images (created automatically)
βββ main.py # Entry point script
βββ requirements.txt # Python dependencies
βββ .gitignore # Git ignore rules
βββ LICENSE # MIT License
βββ README.md # This file
python main.py -q "cats" -n 20Output:
π Searching for 'cats'...
π Target: 20 images
π Output: C:\path\to\downloads
β
Found 20 images
π₯ Downloading images...
β¨ Successfully downloaded 20 images!
π Images saved to: C:\path\to\downloads
from src.unsplash_scraper import UnsplashScraper
from pathlib import Path
# Create scraper instance
with UnsplashScraper(headless=True) as scraper:
# Scrape image URLs
urls = scraper.scrape_images("mountains", num_images=10)
# Download images
output = Path("my_mountains")
scraper.download_images(urls, output_dir=output)
print(f"Downloaded {len(urls)} images!")# Show the browser window (useful for debugging)
python main.py -q "abstract art" -n 30 --no-headlessSolution: Selenium 4.16+ automatically manages ChromeDriver. Ensure you have the latest version:
pip install --upgrade seleniumSolution: This usually means the page took too long to load. Try:
- Increasing
WEBDRIVER_TIMEOUTinsrc/config.py - Checking your internet connection
- Ensuring Unsplash is accessible in your region
Solution:
- Try a different search query
- Ensure you're searching for topics that exist on Unsplash
- Check if Unsplash has changed their page structure (XPath selectors may need updating)
Solution: This is normal - some images may be temporarily unavailable. The scraper will log errors and continue with other images.
Contributions are welcome! Here's how you can help:
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Please ensure your code follows PEP 8 style guidelines and includes appropriate documentation.
This project is licensed under the MIT License - see the LICENSE file for details.
This tool is for educational purposes only. Please respect Unsplash's Terms of Service and API Guidelines. Always give credit to photographers when using their images.
- Unsplash for providing free high-quality images
- Selenium for web automation capabilities
- The open-source community for inspiration and support
Made with β€οΈ by Angel Posadas
β Star this repo if you found it helpful!
